#18327 closed bug (fixed)
select() does not report disconnect events
Reported by: | LekKit | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta5 |
Component: | Network & Internet/TCP | Version: | R1/beta4 |
Keywords: | select, tcp, network | Cc: | |
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
When another end of the TCP socket is closed, select() should report that in rfds set. It doesn't as of hrev56862, and doesn't wake from blocking either. This prevents any software using select() from tracking disconnect events, and freeing clients appropriately.
Attached a tiny program that reproduces this. On systems susceptible to the bug it'd hang and never exit.
Update: The poll() facility is subject to this bug as well, i.e. listening for POLLIN | POLLERR | POLLHUP never notices a HUP event.
Update (2): Haiku-specific API wait_for_objects() from <OS.h> also hangs.
Attachments (2)
Change History (22)
by , 20 months ago
Attachment: | haiku_select_repro.c added |
---|
comment:1 by , 20 months ago
Description: | modified (diff) |
---|
comment:2 by , 20 months ago
Confirmed to hang on slightly older hrev56578 as well.
comment:3 by , 20 months ago
Description: | modified (diff) |
---|
comment:4 by , 20 months ago
follow-up: 7 comment:5 by , 20 months ago
Non-TCP pipe() descriptors are properly notifying poll() upon close of another end of the pipe, so it is really a TCP-specific issue.
P.S. Using pipes with select() just doesn't work at all, perhaps that should be an unrelated issue (Or lacking implementation?), but let's ignore that for now.
comment:6 by , 20 months ago
follow-ups: 8 9 comment:7 by , 20 months ago
Replying to LekKit:
P.S. Using pipes with select() just doesn't work at all, perhaps that should be an unrelated issue (Or lacking implementation?), but let's ignore that for now.
Please create another ticket if you can.
comment:8 by , 20 months ago
comment:9 by , 20 months ago
Replying to X512:
Replying to LekKit:
P.S. Using pipes with select() just doesn't work at all, perhaps that should be an unrelated issue (Or lacking implementation?), but let's ignore that for now.
Please create another ticket if you can.
Okay sorry, I must have mixed things up. It works now? Either my older test wasn't correct, or it was some older Haiku revision in a VM and it is now fixed.
Back to this issue topic then.
comment:10 by , 18 months ago
Milestone: | Unscheduled → R1/beta5 |
---|---|
Version: | R1/beta5 → R1/beta4 |
comment:12 by , 9 months ago
Regressed again between hrev57545 - hrev57554: Reproduction tests fail on the latter and up to the newest hrev57617.
Might have been introduced by https://git.haiku-os.org/haiku/commit/src/add-ons/kernel/network/protocols/tcp?id=4fec81750d7b9fabdb3230ed02a71d6065efa00b
This issue happens to affect my piece of software, which is what brought me to report this originally. Today I got a similar bug report so here we are again.
Link to upstream RVVM issue if anyone is interested in details: https://github.com/LekKit/RVVM/issues/127
comment:13 by , 9 months ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
follow-up: 15 comment:14 by , 9 months ago
I guess it must be that commit because there isn't anything else in that range. But that's very odd, because that commit only changes the send path; it doesn't affect the receive path or any of the logic leading to notify(B_SELECT_ERROR) being called, except for one refactor change to _Receive().
comment:15 by , 9 months ago
Replying to waddlesplash:
I guess it must be that commit because there isn't anything else in that range. But that's very odd, because that commit only changes the send path; it doesn't affect the receive path or any of the logic leading to notify(B_SELECT_ERROR) being called, except for one refactor change to _Receive().
It is possible that the tables have turned and now disconnection isn't properly handled by the sending side (I.e. we close()
the socket, but FIN
never actually reaches the other side). I am not very familiar with Haiku networking stack tho to say for sure.
I have added simple blocking recv() to haiku_select_repro.c
right after close(conn)
line:
close(conn); // Simulate a disconnect, implicitly does shutdown() char buffer[16]; recv(client, buffer, sizeof(buffer), 0);
It should return 0 (EOF) immediately, but actually blocks forever. So this new bug is not only related to select()
.
comment:16 by , 9 months ago
I am now almost sure this is related to sending side rather than receiving side or select()
syscall, even tho it is reproducible with same tests as the older bug.
I have split my test into a client & server, ran a client on Haiku and server on Linux. After Haiku client closes it's connection, the server isn't notified of the disconnect - FIN segment is never sent by Haiku.
comment:18 by , 9 months ago
Perhaps a unit test is needed to prevent similar breakage in future? Something based on my reproduction tests, but with more testing like accept()
inheriting nonblocking flag, checking return value of recv()
on disconnecting socket, etc
comment:19 by , 9 months ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Fixed in hrev57626.
Yes, we could use more unit tests here, I will have to look at that. (But we don't run the ones we have very consistently...)
comment:20 by , 8 months ago
It looks like the in-tree tcp_connection_test already would've caught this problem if it'd been run.
Reproduces the issue