Opened 13 years ago

Last modified 17 months ago

#7859 reopened bug

poll() on fd of an exited process returns wrong revents

Reported by: scgtrp Owned by: phoudoin
Priority: high Milestone: Unscheduled
Component: System/POSIX Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

The following test program runs properly on Linux, but does not exit on Haiku due to poll() setting revents to indicate that the fd is still valid and has readable data, even after the process exits:

#include <stdio.h>
#include <poll.h>

int main() {
	FILE* f = popen("/bin/bash -c 'for i in 1 2 3; do { echo $i; sleep 1; }; done'", "r");
	printf("f=%p\n", f);
	int fd = fileno(f);
	printf("fd=%d\n", fd);
	
	struct pollfd pfd;
	pfd.fd = fd;
	pfd.events = POLLIN | POLLRDBAND;
	
	char buffer[80];
	
	while (1) {
		int rv = poll(&pfd, 1, 500);
		printf("rv=%d\n", rv);
		if (rv == 0)
			continue;
		if (rv < 0)
			break;
		printf("events=%08x revents=%08x\n", pfd.events, pfd.revents);
		if ((pfd.events & pfd.revents) == 0)
			break;
		
		fgets(buffer, 79, f);
		printf("output: %s", buffer);
	}
	
	return 0;
}

Attachments (1)

0001-fifo-fix-7859.patch (1.9 KB ) - added by phoudoin 7 years ago.

Download all attachments as: .zip

Change History (35)

comment:1 by bonefish, 13 years ago

Version: R1/alpha3R1/Development

The immediate problem is the FIFO code notifying a read event when there aren't any more writers. It should notify a B_SELECT_DISCONNECTED instead. It looks like the code is in need of a general review with respect to B_SELECT_ERROR and B_SELECT_DISCONNECTED notifications.

comment:2 by luroh, 10 years ago

Milestone: R1Unscheduled

Move POSIX compatibility related tickets out of R1 milestone (FutureHaiku/Features).

comment:3 by phoudoin, 7 years ago

Milestone: UnscheduledR1/beta1

comment:4 by phoudoin, 7 years ago

Priority: normalhigh

comment:5 by phoudoin, 7 years ago

Owner: changed from nobody to phoudoin
Status: newin-progress

by phoudoin, 7 years ago

Attachment: 0001-fifo-fix-7859.patch added

comment:6 by phoudoin, 7 years ago

patch: 01

comment:7 by phoudoin, 7 years ago

With the patch above, the test program now behave as expected. I check POLLHUP is returned not only on first poll() after the writer is gone and no more data but all others poll() after that, too.

comment:8 by phoudoin, 7 years ago

Do we have some test suites that could be usefull to run to check my fix don't introduce unexpected regression? POSIX test suite?

in reply to:  8 comment:9 by korli, 7 years ago

Replying to phoudoin:

Do we have some test suites that could be usefull to run to check my fix don't introduce unexpected regression? POSIX test suite?

Nothing for poll() in https://github.com/haiku/open_posix_testsuite.

I suppose you could include the code from this bug report as a test in src/tests/system..

comment:10 by phoudoin, 7 years ago

I was considering it, indeed. Will do that.

comment:11 by phoudoin, 7 years ago

Fixed in hrev51332.

Last edited 7 years ago by phoudoin (previous) (diff)

comment:12 by phoudoin, 7 years ago

Resolution: fixed
Status: in-progressclosed

comment:13 by anevilyak, 7 years ago

This revision may have introduced a regression of some form. Attempting to build webkit now results in perl hanging with the following backtrace:

kdebug> threads 21966
thread         id  state     wait for  object      cpu pri  stack      team  name
0x907cf600  21966  waiting   other                   -  10  0x822df00021966  perl

kdebug> bt 21966
stack trace for thread 21966 "perl"
    kernel stack: 0x822df000 to 0x822e3000
      user stack: 0x710f1000 to 0x720f1000
frame               caller     <image>:function + offset
 0 822e2d94 (+  80) 8007d789   <kernel_x86> thread_block + 0xf9
 1 822e2de4 (+  48) 800d710a   <kernel_x86> fifo::Inode<0xec2cf478>::WaitForReadRequest(fifo::ReadRequest&: 0x822e2e94) + 0x8a
 2 822e2e14 (+  64) 800d7302   <kernel_x86> fifo::Inode<0xec2cf478>::ReadDataFromBuffer(void*: 0x194c12a8, unsigned long*: 0x822e2e90, false, true, fifo::ReadR
equest&: 0x822e2e94) + 0x72
 3 822e2e54 (+ 112) 800d752b   <kernel_x86> _ZN4fifoL9fifo_readEP9fs_volumeP8fs_vnodePvxS4_Pm + 0xcb
 4 822e2ec4 (+  48) 800e04cd   <kernel_x86> file_read(file_descriptor*: 0xf4f42440, int64: 0, void*: 0x194c12a8, unsigned long*: 0x822e2f50) + 0x3d
 5 822e2ef4 (+  80) 800d5122   <kernel_x86> common_user_io(int32: 424415912, int64: 8192, void*: 0x194c12a8, uint32: 0x2000 (8192), false) + 0x102
 6 822e2f44 (+ 100) 80129c0f   <kernel_x86> handle_syscall + 0xdc
user iframe at 0x822e2fa8 (end = 0x822e3000)
 eax 0x8e          ebx 0x9d910c       ecx 0x720ee7ac  edx 0x6110c114
 esi 0x19419ba8    edi 0x4            ebp 0x720ee7d8  esp 0x822e2fdc
 eip 0x6110c114 eflags 0x3202    user esp 0x720ee7ac
 vector: 0x63, error code: 0x0
 7 822e2fa8 (+   0) 6110c114   <commpage> commpage_syscall + 0x04
 8 720ee7d8 (+  48) 00c31fae   <libperl.so> PerlIOUnix_read + 0x55
 9 720ee808 (+  32) 00c3534a   <libperl.so> Perl_PerlIO_read + 0x39
10 720ee828 (+  64) 00c354bd   <libperl.so> PerlIOBuf_fill + 0x145
11 720ee868 (+  32) 00c33d56   <libperl.so> Perl_PerlIO_fill + 0x33
12 720ee888 (+  64) 00c352a6   <libperl.so> PerlIOBase_read + 0xe8
13 720ee8c8 (+  32) 00c3555e   <libperl.so> PerlIOBuf_read + 0x3e
14 720ee8e8 (+  32) 00c3534a   <libperl.so> Perl_PerlIO_read + 0x39
15 720ee908 (+  48) 00c36c91   <libperl.so> PerlIO_getc + 0x20
16 720ee938 (+8384) 00be2565   <libperl.so> Perl_sv_gets + 0x8f7
17 720f09f8 (+ 112) 00bd0654   <libperl.so> Perl_do_readline + 0x576
18 720f0a68 (+  48) 00bd0dbe   <libperl.so> Perl_pp_readline + 0x27e
19 720f0a98 (+  16) 00bcc3f5   <libperl.so> Perl_runops_standard + 0x21
20 720f0aa8 (+ 160) 00b6b632   <libperl.so> Perl_call_sv + 0x302
21 720f0b48 (+ 128) 00b6d0d4   <libperl.so> Perl_call_list + 0x169
22 720f0bc8 (+  48) 00b57dbd   <libperl.so> _init (nearest) + 0x4ee5
23 720f0bf8 (+ 112) 00b66b85   <libperl.so> Perl_newATTRSUB_flags + 0xc80
24 720f0c68 (+  48) 00b66d0c   <libperl.so> Perl_newATTRSUB + 0x28
25 720f0c98 (+  80) 00b68bb9   <libperl.so> Perl_utilize + 0x5a4
26 720f0ce8 (+ 112) 00b92df8   <libperl.so> Perl_yyparse + 0x855
27 720f0d58 (+ 272) 00b70f3a   <libperl.so> perl_parse + 0x1af0
28 720f0e68 (+  64) 015f9bf3   <perl> main + 0x99
29 720f0ea8 (+  64) 015f9a1b   <perl> _start + 0x50
30 720f0ee8 (+  80) 02224fc1   </boot/system/runtime_loader@0x02214000> <unknown> + 0x10fc1
31 720f0f38 (+   0) 6110c250   <commpage> commpage_thread_exit + 0x00

comment:14 by phoudoin, 7 years ago

Resolution: fixed
Status: closedreopened

Yes, I guess I'll have to notify B_SELECT_ERROR too on pipe close. Software exiting on select() error are broken by my change currently.

comment:15 by phoudoin, 7 years ago

Working on it.

comment:16 by phoudoin, 7 years ago

Looking for a less complex reproduction testcase than "building webkit" :-\

comment:17 by phoudoin, 7 years ago

Pushed a potential regression fix in hrev51343

comment:18 by diver, 7 years ago

Updated to hrev51343. cmake seems to suffer from this regression.

To reproduce: hp marble

comment:19 by gbonvehi, 7 years ago

Looking at the patch it seems you are missing brackets on the first if where you added more notifications, that may be triggering the new error.

Last edited 7 years ago by gbonvehi (previous) (diff)

comment:20 by phoudoin, 7 years ago

Thanks, I'll debug the root cause deeper as soon as time permits. Meanwhile, I've reverted my changes so that regression don't make the situation worse than before to more people than before.

comment:21 by pulkomandy, 7 years ago

patch: 10

comment:22 by nzimmermann, 7 years ago

Any news here Philippe?

comment:23 by waddlesplash, 6 years ago

Milestone: R1/beta1Unscheduled

comment:24 by phoudoin, 5 years ago

Sorry, no news, I was away from Haiku since that long time alas.

comment:25 by gbonvehi, 5 years ago

Just to be clear, the missing brackets I was referring to is in commit https://git.haiku-os.org/haiku/commit/?id=hrev51343

@@ -616,14 +616,17 @@ Inode::NotifyEndClosed(bool writer)
 
 			if (fReadSelectSyncPool)
 				notify_select_event_pool(fReadSelectSyncPool, B_SELECT_DISCONNECTED);
+				notify_select_event_pool(fReadSelectSyncPool, B_SELECT_ERROR);
+				notify_select_event_pool(fReadSelectSyncPool, B_SELECT_READ);
 		}

comment:26 by phoudoin, 5 years ago

Well, sure, the bracket is clearly missing, but it only skip the check on fReadSelectPool before calling notify_select_event_pool(). I fail to see how it can explain the pipe hanging regression.

Anyway, as @waddlesplash was working on a refactoring of select() implementation (https://review.haiku-os.org/c/haiku/+/1742), which will allow to raise multiple events at once, maybe it will be easier to try to fix this issue after it's merged.

Last edited 5 years ago by phoudoin (previous) (diff)

comment:27 by korli, 19 months ago

Could someone check the behavior of the provided test on FreeBSD or MacOSX? It looks to me like it could be a linux specific behavior.

comment:28 by pulkomandy, 17 months ago

The test program is not showing the problem anymore, it exits cleanly.

Was it fixed in hrev57013 maybe?

comment:29 by korli, 17 months ago

Last time I checked, the SIGCHILD signal was delivered before the select read event (maybe one out ten times)

comment:30 by pulkomandy, 17 months ago

Should the test be updated to retry the poll on EINTR then?

in reply to:  30 comment:31 by korli, 17 months ago

Replying to pulkomandy:

Should the test be updated to retry the poll on EINTR then?

I would say yes, but nonetheless I feel it's not logical to deliver the signal before other conditions which should have led to a reschedule.

comment:32 by pulkomandy, 17 months ago

I have updated the test (the version in the git repo) to do this, and print a warning if the SIGCHLD comes first. I'm still unable to reproduce the problem initially reported in this ticket, poll returns an "error" pollable FD as expected, I think.

comment:33 by phoudoin, 17 months ago

If it fix the initial issue, fine let's close this old ticket then. For context, I discovered this issue while rewriting TextSearch engine by pipping the input to 'xargs grep'. xargs process never exited. That why instead I used xargs EOF markup option.

comment:34 by pulkomandy, 17 months ago

The reason I'm researching this is because I tried to use expect (initially in trying to run the binutils testsuite) and I'm facing similar problems (it fails to notice the process it runs have terminated). But expect code relies on tcl, both are quite complicated, and I have not found yet what exactly is broken, but it seems similar to this.

Note: See TracTickets for help on using tickets.