Opened 13 years ago
Last modified 17 months ago
#7859 reopened bug
poll() on fd of an exited process returns wrong revents
Reported by: | scgtrp | Owned by: | phoudoin |
---|---|---|---|
Priority: | high | Milestone: | Unscheduled |
Component: | System/POSIX | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
The following test program runs properly on Linux, but does not exit on Haiku due to poll() setting revents to indicate that the fd is still valid and has readable data, even after the process exits:
#include <stdio.h> #include <poll.h> int main() { FILE* f = popen("/bin/bash -c 'for i in 1 2 3; do { echo $i; sleep 1; }; done'", "r"); printf("f=%p\n", f); int fd = fileno(f); printf("fd=%d\n", fd); struct pollfd pfd; pfd.fd = fd; pfd.events = POLLIN | POLLRDBAND; char buffer[80]; while (1) { int rv = poll(&pfd, 1, 500); printf("rv=%d\n", rv); if (rv == 0) continue; if (rv < 0) break; printf("events=%08x revents=%08x\n", pfd.events, pfd.revents); if ((pfd.events & pfd.revents) == 0) break; fgets(buffer, 79, f); printf("output: %s", buffer); } return 0; }
Attachments (1)
Change History (35)
comment:1 by , 13 years ago
Version: | R1/alpha3 → R1/Development |
---|
comment:2 by , 10 years ago
Milestone: | R1 → Unscheduled |
---|
Move POSIX compatibility related tickets out of R1 milestone (FutureHaiku/Features).
comment:3 by , 7 years ago
Milestone: | Unscheduled → R1/beta1 |
---|
comment:4 by , 7 years ago
Priority: | normal → high |
---|
comment:5 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → in-progress |
by , 7 years ago
Attachment: | 0001-fifo-fix-7859.patch added |
---|
comment:6 by , 7 years ago
patch: | 0 → 1 |
---|
comment:7 by , 7 years ago
With the patch above, the test program now behave as expected. I check POLLHUP is returned not only on first poll() after the writer is gone and no more data but all others poll() after that, too.
follow-up: 9 comment:8 by , 7 years ago
Do we have some test suites that could be usefull to run to check my fix don't introduce unexpected regression? POSIX test suite?
comment:9 by , 7 years ago
Replying to phoudoin:
Do we have some test suites that could be usefull to run to check my fix don't introduce unexpected regression? POSIX test suite?
Nothing for poll() in https://github.com/haiku/open_posix_testsuite.
I suppose you could include the code from this bug report as a test in src/tests/system..
comment:12 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | in-progress → closed |
comment:13 by , 7 years ago
This revision may have introduced a regression of some form. Attempting to build webkit now results in perl hanging with the following backtrace:
kdebug> threads 21966 thread id state wait for object cpu pri stack team name 0x907cf600 21966 waiting other - 10 0x822df00021966 perl kdebug> bt 21966 stack trace for thread 21966 "perl" kernel stack: 0x822df000 to 0x822e3000 user stack: 0x710f1000 to 0x720f1000 frame caller <image>:function + offset 0 822e2d94 (+ 80) 8007d789 <kernel_x86> thread_block + 0xf9 1 822e2de4 (+ 48) 800d710a <kernel_x86> fifo::Inode<0xec2cf478>::WaitForReadRequest(fifo::ReadRequest&: 0x822e2e94) + 0x8a 2 822e2e14 (+ 64) 800d7302 <kernel_x86> fifo::Inode<0xec2cf478>::ReadDataFromBuffer(void*: 0x194c12a8, unsigned long*: 0x822e2e90, false, true, fifo::ReadR equest&: 0x822e2e94) + 0x72 3 822e2e54 (+ 112) 800d752b <kernel_x86> _ZN4fifoL9fifo_readEP9fs_volumeP8fs_vnodePvxS4_Pm + 0xcb 4 822e2ec4 (+ 48) 800e04cd <kernel_x86> file_read(file_descriptor*: 0xf4f42440, int64: 0, void*: 0x194c12a8, unsigned long*: 0x822e2f50) + 0x3d 5 822e2ef4 (+ 80) 800d5122 <kernel_x86> common_user_io(int32: 424415912, int64: 8192, void*: 0x194c12a8, uint32: 0x2000 (8192), false) + 0x102 6 822e2f44 (+ 100) 80129c0f <kernel_x86> handle_syscall + 0xdc user iframe at 0x822e2fa8 (end = 0x822e3000) eax 0x8e ebx 0x9d910c ecx 0x720ee7ac edx 0x6110c114 esi 0x19419ba8 edi 0x4 ebp 0x720ee7d8 esp 0x822e2fdc eip 0x6110c114 eflags 0x3202 user esp 0x720ee7ac vector: 0x63, error code: 0x0 7 822e2fa8 (+ 0) 6110c114 <commpage> commpage_syscall + 0x04 8 720ee7d8 (+ 48) 00c31fae <libperl.so> PerlIOUnix_read + 0x55 9 720ee808 (+ 32) 00c3534a <libperl.so> Perl_PerlIO_read + 0x39 10 720ee828 (+ 64) 00c354bd <libperl.so> PerlIOBuf_fill + 0x145 11 720ee868 (+ 32) 00c33d56 <libperl.so> Perl_PerlIO_fill + 0x33 12 720ee888 (+ 64) 00c352a6 <libperl.so> PerlIOBase_read + 0xe8 13 720ee8c8 (+ 32) 00c3555e <libperl.so> PerlIOBuf_read + 0x3e 14 720ee8e8 (+ 32) 00c3534a <libperl.so> Perl_PerlIO_read + 0x39 15 720ee908 (+ 48) 00c36c91 <libperl.so> PerlIO_getc + 0x20 16 720ee938 (+8384) 00be2565 <libperl.so> Perl_sv_gets + 0x8f7 17 720f09f8 (+ 112) 00bd0654 <libperl.so> Perl_do_readline + 0x576 18 720f0a68 (+ 48) 00bd0dbe <libperl.so> Perl_pp_readline + 0x27e 19 720f0a98 (+ 16) 00bcc3f5 <libperl.so> Perl_runops_standard + 0x21 20 720f0aa8 (+ 160) 00b6b632 <libperl.so> Perl_call_sv + 0x302 21 720f0b48 (+ 128) 00b6d0d4 <libperl.so> Perl_call_list + 0x169 22 720f0bc8 (+ 48) 00b57dbd <libperl.so> _init (nearest) + 0x4ee5 23 720f0bf8 (+ 112) 00b66b85 <libperl.so> Perl_newATTRSUB_flags + 0xc80 24 720f0c68 (+ 48) 00b66d0c <libperl.so> Perl_newATTRSUB + 0x28 25 720f0c98 (+ 80) 00b68bb9 <libperl.so> Perl_utilize + 0x5a4 26 720f0ce8 (+ 112) 00b92df8 <libperl.so> Perl_yyparse + 0x855 27 720f0d58 (+ 272) 00b70f3a <libperl.so> perl_parse + 0x1af0 28 720f0e68 (+ 64) 015f9bf3 <perl> main + 0x99 29 720f0ea8 (+ 64) 015f9a1b <perl> _start + 0x50 30 720f0ee8 (+ 80) 02224fc1 </boot/system/runtime_loader@0x02214000> <unknown> + 0x10fc1 31 720f0f38 (+ 0) 6110c250 <commpage> commpage_thread_exit + 0x00
comment:14 by , 7 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Yes, I guess I'll have to notify B_SELECT_ERROR too on pipe close. Software exiting on select() error are broken by my change currently.
comment:16 by , 7 years ago
Looking for a less complex reproduction testcase than "building webkit" :-\
comment:18 by , 7 years ago
Updated to hrev51343. cmake seems to suffer from this regression.
To reproduce: hp marble
comment:19 by , 7 years ago
Looking at the patch it seems you are missing brackets on the first if where you added more notifications, that may be triggering the new error.
comment:20 by , 7 years ago
Thanks, I'll debug the root cause deeper as soon as time permits. Meanwhile, I've reverted my changes so that regression don't make the situation worse than before to more people than before.
comment:21 by , 7 years ago
patch: | 1 → 0 |
---|
comment:23 by , 6 years ago
Milestone: | R1/beta1 → Unscheduled |
---|
comment:25 by , 5 years ago
Just to be clear, the missing brackets I was referring to is in commit https://git.haiku-os.org/haiku/commit/?id=hrev51343
@@ -616,14 +616,17 @@ Inode::NotifyEndClosed(bool writer) if (fReadSelectSyncPool) notify_select_event_pool(fReadSelectSyncPool, B_SELECT_DISCONNECTED); + notify_select_event_pool(fReadSelectSyncPool, B_SELECT_ERROR); + notify_select_event_pool(fReadSelectSyncPool, B_SELECT_READ); }
comment:26 by , 5 years ago
Well, sure, the bracket is clearly missing, but it only skip the check on fReadSelectPool before calling notify_select_event_pool(). I can't explain hanging on fifo.
Anyway, as @waddlesplash was working on a refactoring of select() implementation (https://review.haiku-os.org/c/haiku/+/1742), which will allow to raise multiple events at once, maybe it will be easier to try to fix this issue after it's merged.
comment:27 by , 19 months ago
Could someone check the behavior of the provided test on FreeBSD or MacOSX? It looks to me like it could be a linux specific behavior.
comment:28 by , 17 months ago
The test program is not showing the problem anymore, it exits cleanly.
Was it fixed in hrev57013 maybe?
comment:29 by , 17 months ago
Last time I checked, the SIGCHILD signal was delivered before the select read event (maybe one out ten times)
follow-up: 31 comment:30 by , 17 months ago
Should the test be updated to retry the poll on EINTR then?
comment:31 by , 17 months ago
Replying to pulkomandy:
Should the test be updated to retry the poll on EINTR then?
I would say yes, but nonetheless I feel it's not logical to deliver the signal before other conditions which should have led to a reschedule.
comment:32 by , 17 months ago
I have updated the test (the version in the git repo) to do this, and print a warning if the SIGCHLD comes first. I'm still unable to reproduce the problem initially reported in this ticket, poll returns an "error" pollable FD as expected, I think.
comment:33 by , 17 months ago
If it fix the initial issue, fine let's close this old ticket then. For context, I discovered this issue while rewriting TextSearch engine by pipping the input to 'xargs grep'. xargs process never exited. That why instead I used xargs EOF markup option.
comment:34 by , 17 months ago
The reason I'm researching this is because I tried to use expect
(initially in trying to run the binutils testsuite) and I'm facing similar problems (it fails to notice the process it runs have terminated). But expect
code relies on tcl
, both are quite complicated, and I have not found yet what exactly is broken, but it seems similar to this.
The immediate problem is the FIFO code notifying a read event when there aren't any more writers. It should notify a
B_SELECT_DISCONNECTED
instead. It looks like the code is in need of a general review with respect toB_SELECT_ERROR
andB_SELECT_DISCONNECTED
notifications.