Opened 20 months ago

Closed 9 months ago

Last modified 8 months ago

#18327 closed bug (fixed)

select() does not report disconnect events

Reported by: LekKit Owned by: axeld
Priority: normal Milestone: R1/beta5
Component: Network & Internet/TCP Version: R1/beta4
Keywords: select, tcp, network Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by LekKit)

When another end of the TCP socket is closed, select() should report that in rfds set. It doesn't as of hrev56862, and doesn't wake from blocking either. This prevents any software using select() from tracking disconnect events, and freeing clients appropriately.

Attached a tiny program that reproduces this. On systems susceptible to the bug it'd hang and never exit.

Update: The poll() facility is subject to this bug as well, i.e. listening for POLLIN | POLLERR | POLLHUP never notices a HUP event.

Update (2): Haiku-specific API wait_for_objects() from <OS.h> also hangs.

Attachments (2)

haiku_select_repro.c (1.5 KB ) - added by LekKit 20 months ago.
Reproduces the issue
haiku_poll_repro.c (1.7 KB ) - added by LekKit 20 months ago.
Reproduction, poll() variant

Download all attachments as: .zip

Change History (22)

by LekKit, 20 months ago

Attachment: haiku_select_repro.c added

Reproduces the issue

comment:1 by LekKit, 20 months ago

Description: modified (diff)

by LekKit, 20 months ago

Attachment: haiku_poll_repro.c added

Reproduction, poll() variant

comment:2 by LekKit, 20 months ago

Confirmed to hang on slightly older hrev56578 as well.

Version 0, edited 20 months ago by LekKit (next)

comment:3 by LekKit, 20 months ago

Description: modified (diff)

comment:4 by LekKit, 20 months ago

Both tests pass on hrev45141, so it's a regression at some point between those. POLLIN is received in case of the poll() implementation, same as Linux does.

Bisected the bug was introduced between hrev55733 and hrev55747. No other builds in between that I could find, someone might know better.

Last edited 20 months ago by LekKit (previous) (diff)

comment:5 by LekKit, 20 months ago

Non-TCP pipe() descriptors are properly notifying poll() upon close of another end of the pipe, so it is really a TCP-specific issue.

P.S. Using pipes with select() just doesn't work at all, perhaps that should be an unrelated issue (Or lacking implementation?), but let's ignore that for now.

in reply to:  5 ; comment:7 by X512, 20 months ago

Replying to LekKit:

P.S. Using pipes with select() just doesn't work at all, perhaps that should be an unrelated issue (Or lacking implementation?), but let's ignore that for now.

Please create another ticket if you can.

in reply to:  7 comment:8 by LekKit, 20 months ago

Replying to X512:

Replying to LekKit:

P.S. Using pipes with select() just doesn't work at all, perhaps that should be an unrelated issue (Or lacking implementation?), but let's ignore that for now.

Please create another ticket if you can.

Sure. Will come up with a detailed reproduction and do so.

in reply to:  7 comment:9 by LekKit, 20 months ago

Replying to X512:

Replying to LekKit:

P.S. Using pipes with select() just doesn't work at all, perhaps that should be an unrelated issue (Or lacking implementation?), but let's ignore that for now.

Please create another ticket if you can.

Okay sorry, I must have mixed things up. It works now? Either my older test wasn't correct, or it was some older Haiku revision in a VM and it is now fixed.

Back to this issue topic then.

comment:10 by korli, 18 months ago

Milestone: UnscheduledR1/beta5
Version: R1/beta5R1/beta4

comment:11 by waddlesplash, 18 months ago

Resolution: fixed
Status: newclosed

Fixes merged in hrev57013.

comment:12 by LekKit, 9 months ago

Regressed again between hrev57545 - hrev57554: Reproduction tests fail on the latter and up to the newest hrev57617.

Might have been introduced by https://git.haiku-os.org/haiku/commit/src/add-ons/kernel/network/protocols/tcp?id=4fec81750d7b9fabdb3230ed02a71d6065efa00b

This issue happens to affect my piece of software, which is what brought me to report this originally. Today I got a similar bug report so here we are again.

Link to upstream RVVM issue if anyone is interested in details: https://github.com/LekKit/RVVM/issues/127

Last edited 9 months ago by LekKit (previous) (diff)

comment:13 by LekKit, 9 months ago

Resolution: fixed
Status: closedreopened

comment:14 by waddlesplash, 9 months ago

I guess it must be that commit because there isn't anything else in that range. But that's very odd, because that commit only changes the send path; it doesn't affect the receive path or any of the logic leading to notify(B_SELECT_ERROR) being called, except for one refactor change to _Receive().

in reply to:  14 comment:15 by LekKit, 9 months ago

Replying to waddlesplash:

I guess it must be that commit because there isn't anything else in that range. But that's very odd, because that commit only changes the send path; it doesn't affect the receive path or any of the logic leading to notify(B_SELECT_ERROR) being called, except for one refactor change to _Receive().

It is possible that the tables have turned and now disconnection isn't properly handled by the sending side (I.e. we close() the socket, but FIN never actually reaches the other side). I am not very familiar with Haiku networking stack tho to say for sure.

I have added simple blocking recv() to haiku_select_repro.c right after close(conn) line:

    close(conn); // Simulate a disconnect, implicitly does shutdown()

    char buffer[16];
    recv(client, buffer, sizeof(buffer), 0);

It should return 0 (EOF) immediately, but actually blocks forever. So this new bug is not only related to select().

comment:16 by LekKit, 9 months ago

I am now almost sure this is related to sending side rather than receiving side or select() syscall, even tho it is reproducible with same tests as the older bug.

I have split my test into a client & server, ran a client on Haiku and server on Linux. After Haiku client closes it's connection, the server isn't notified of the disconnect - FIN segment is never sent by Haiku.

comment:18 by LekKit, 9 months ago

Perhaps a unit test is needed to prevent similar breakage in future? Something based on my reproduction tests, but with more testing like accept() inheriting nonblocking flag, checking return value of recv() on disconnecting socket, etc

comment:19 by waddlesplash, 9 months ago

Resolution: fixed
Status: reopenedclosed

Fixed in hrev57626.

Yes, we could use more unit tests here, I will have to look at that. (But we don't run the ones we have very consistently...)

comment:20 by waddlesplash, 8 months ago

It looks like the in-tree tcp_connection_test already would've caught this problem if it'd been run.

Note: See TracTickets for help on using tickets.