Opened 16 years ago
Closed 10 years ago
#2963 closed bug (fixed)
select() doesn't wait after non-blocking connect()
Reported by: | bonefish | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Network & Internet/TCP | Version: | R1/Development |
Keywords: | Cc: | haiku@… | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
select() returns immediately after connect()ing a non-blocking TCP socked. Expected behavior is that it should wait until the connection has been established.
The reason is apparently that after finishing the TCPEndpoint::Connect() the endpoint's state is SYNCHRONIZE_SENT, which is_writable() classifies as writable.
Attached test program connects to www.google.com:80. On my machine it reports usually 20 - 30 ms to establish the connection in Linux, but only a few microseconds in Haiku.
Attachments (3)
Change History (12)
by , 16 years ago
Attachment: | nonblock-connect-test.cpp added |
---|
comment:1 by , 16 years ago
Cc: | added |
---|
comment:2 by , 11 years ago
comment:3 by , 11 years ago
I had a look at improving this today.
I modified TCPEndPoint::SendAvailable() to return 0 when the fState is SYNCHRONIZE_SENT. This gets select() to properly lock, waiting for the connection to establish.
However, the next problem is it doesn't get unlocked when the connection actually happens. I also tried closing the socket from another thread (this is what I'm after), and this also didn't work. So, the select stays locked forever (or until timeout).
I tried adding my socket to the read set for the select call, but this didn't help. The "error" case is not implemented, as far as I could tell.
Also, it's easier to test this by using a non-existing IP to connect to. You can easily tell apart the cases where it works as expected (long timeout for the select) from those where it doesn't (selectreturs very fast).
I also tried making the socket blocking again after calling connect, but select still doesn't block with the current code. And still blocks infinitely with my modified one.
Any hint as to what to do next?
comment:4 by , 11 years ago
Version: | R1/pre-alpha1 → R1/Development |
---|
by , 10 years ago
Attachment: | 0001-net_socket-notify-of-socket-errors-immediately-when-.patch added |
---|
comment:5 by , 10 years ago
patch: | 0 → 1 |
---|
by , 10 years ago
Attachment: | 0002-tcp-wait-for-connection-to-complete-before-notifying.patch added |
---|
comment:6 by , 10 years ago
The second patch fixes the issue. Attaching here in case anyone wants to review. The first patch just fixes a small TODO for B_SELECT_ERROR
.
I have half a mind to also change TCPEndpoint::SendData
so it waits for the connection to establish (or returns B_WOULD_BLOCK) before putting any data into the queue, as this would make it consistent with the B_SELECT_WRITE
notification. This is how Linux behaves. FreeBSD will happily queue data for sending before the connection is established though. Thoughts?
comment:7 by , 10 years ago
Patch looks good AFAICT. If POSIX doesn't have a say on the matter, I'd follow the BSDs, as those pretty much invented the thing :-)
From a programmer's perspective, I think the FreeBSD approach is nicer for lazy developers like me; in practice, one can rarely depend on this, anyway, if other systems don't follow the same behavior, so I wouldn't mind either way.
comment:8 by , 10 years ago
Following the BSDs seems like a pretty safe option indeed. Can't find any regressions with blocking or non-blocking sockets, so pushing now.
ping?
Apparently the lockup problems we have in Web+ are because things are waiting for too long on a connect(). Trying to close the socket while it is connect()ing doesn't unlock the connect call, so we wait for the whole 75 seconds timeout on that (things works better for unlocking read() and write()).
I wanted to try using non-blocking mode and waiting on a select() for the connection to actually happen, hoping that closing the socket would unlock select(). But select doesn't block at all, so that solution can't work.