Opened 16 years ago

Closed 9 years ago

#2963 closed bug (fixed)

select() doesn't wait after non-blocking connect()

Reported by: bonefish Owned by: axeld
Priority: normal Milestone: R1
Component: Network & Internet/TCP Version: R1/Development
Keywords: Cc: haiku@…
Blocked By: Blocking:
Platform: All

Description

hrev28311

select() returns immediately after connect()ing a non-blocking TCP socked. Expected behavior is that it should wait until the connection has been established.

The reason is apparently that after finishing the TCPEndpoint::Connect() the endpoint's state is SYNCHRONIZE_SENT, which is_writable() classifies as writable.

Attached test program connects to www.google.com:80. On my machine it reports usually 20 - 30 ms to establish the connection in Linux, but only a few microseconds in Haiku.

Attachments (3)

Change History (12)

by bonefish, 16 years ago

Attachment: nonblock-connect-test.cpp added

comment:1 by scottmc, 16 years ago

Cc: haiku@… added

comment:2 by pulkomandy, 11 years ago

ping?

Apparently the lockup problems we have in Web+ are because things are waiting for too long on a connect(). Trying to close the socket while it is connect()ing doesn't unlock the connect call, so we wait for the whole 75 seconds timeout on that (things works better for unlocking read() and write()).

I wanted to try using non-blocking mode and waiting on a select() for the connection to actually happen, hoping that closing the socket would unlock select(). But select doesn't block at all, so that solution can't work.

comment:3 by pulkomandy, 11 years ago

I had a look at improving this today.

I modified TCPEndPoint::SendAvailable() to return 0 when the fState is SYNCHRONIZE_SENT. This gets select() to properly lock, waiting for the connection to establish.

However, the next problem is it doesn't get unlocked when the connection actually happens. I also tried closing the socket from another thread (this is what I'm after), and this also didn't work. So, the select stays locked forever (or until timeout).

I tried adding my socket to the read set for the select call, but this didn't help. The "error" case is not implemented, as far as I could tell.

Also, it's easier to test this by using a non-existing IP to connect to. You can easily tell apart the cases where it works as expected (long timeout for the select) from those where it doesn't (selectreturs very fast).

I also tried making the socket blocking again after calling connect, but select still doesn't block with the current code. And still blocks infinitely with my modified one.

Any hint as to what to do next?

comment:4 by anevilyak, 11 years ago

Version: R1/pre-alpha1R1/Development

comment:5 by hamish, 9 years ago

patch: 01

comment:6 by hamish, 9 years ago

The second patch fixes the issue. Attaching here in case anyone wants to review. The first patch just fixes a small TODO for B_SELECT_ERROR.

I have half a mind to also change TCPEndpoint::SendData so it waits for the connection to establish (or returns B_WOULD_BLOCK) before putting any data into the queue, as this would make it consistent with the B_SELECT_WRITE notification. This is how Linux behaves. FreeBSD will happily queue data for sending before the connection is established though. Thoughts?

comment:7 by axeld, 9 years ago

Patch looks good AFAICT. If POSIX doesn't have a say on the matter, I'd follow the BSDs, as those pretty much invented the thing :-)

From a programmer's perspective, I think the FreeBSD approach is nicer for lazy developers like me; in practice, one can rarely depend on this, anyway, if other systems don't follow the same behavior, so I wouldn't mind either way.

comment:8 by hamish, 9 years ago

Following the BSDs seems like a pretty safe option indeed. Can't find any regressions with blocking or non-blocking sockets, so pushing now.

comment:9 by hamish, 9 years ago

Resolution: fixed
Status: newclosed

Fixed in hrev49217.

Note: See TracTickets for help on using tickets.