#18759 closed bug (fixed)
TCP crash, possible regression
Reported by: | axeld | Owned by: | axeld |
---|---|---|---|
Priority: | blocker | Milestone: | R1/beta5 |
Component: | Network & Internet/TCP | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
I just booted into my main Haiku VM after an update to todays revision, and was greeted by this KDL.
Attachments (2)
Change History (17)
by , 11 months ago
Attachment: | HaikuMainScreenshot-2024-01-19.png added |
---|
comment:1 by , 11 months ago
Milestone: | Unscheduled → R1/beta5 |
---|---|
Priority: | normal → blocker |
comment:2 by , 11 months ago
In net_socket.cpp, line 792 is an unchecked access to net_socket::parent, which seems to be NULL in this case.
comment:3 by , 11 months ago
But "parent" is a weak pointer, and we then check whether we actually got a reference on the next line. So that does not look like it can be the problem?
It almost looks like the socket itself might be NULL.
comment:5 by , 11 months ago
Nothing in _MarkEstablished() invokes socket_aborted directly. This looks like a tail call from something else.
The only thing in the whole tree that calls set_aborted() is TCPEndpoint::_Close. But that invokes has_parent first, so I'd expect that to have crashed instead. So this is strange.
comment:8 by , 9 months ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I guess you only fixed a symptom, although it doesn't happen directly on boot anymore. Soon after it still crashes again. Stacktrace attached.
comment:9 by , 9 months ago
This looks like a completely different problem; or maybe the two are related but the relation isn't clear. This looks to be a use-after-free.
comment:11 by , 9 months ago
I guess it doesn't make much difference so we might as well stick with this one.
comment:12 by , 9 months ago
How have we gotten to socket_dequeue_connected from tcp_receive_data? The usual path for accept (from a debug build, so no tail calls):
7 ffffffff81240cc0 (+ 48) ffffffff81ee304b </boot/system/add-ons/kernel/network/stack> socket_dequeue_connected(net_socket*, net_socket**) + 0xb9 8 ffffffff81240cf0 (+ 80) ffffffff81cef9b3 </boot/system/add-ons/kernel/network/protocols/tcp> TCPEndpoint::Accept(net_socket**) + 0x125 9 ffffffff81240d40 (+ 32) ffffffff81ced70d </boot/system/add-ons/kernel/network/protocols/tcp> tcp_accept(net_protocol*, net_socket**) + 0x23 10 ffffffff81240d60 (+ 64) ffffffff81ee38f2 </boot/system/add-ons/kernel/network/stack> socket_accept[clone .localalias] (net_socket*, sockaddr*, unsigned int*, net_socket**) + 0x52 11 ffffffff81240da0 (+ 48) ffffffff81eed127 </boot/system/add-ons/kernel/network/stack> stack_interface_accept(net_socket*, sockaddr*, unsigned int*, net_socket**) + 0x3c 12 ffffffff81240dd0 (+ 112) ffffffff801f8bb4 <kernel_x86_64> common_accept(int, sockaddr*, unsigned int*, bool) + 0x76
comment:13 by , 9 months ago
Actually, an even better question is: how have we gotten to socket_dequeue_connected without a socket_accept appearing anywhere in the stack trace? tcp_accept/TCPEndpoint::Accept isn't called anywhere within the module, and the only thing that calls it in the network module (as far as I can tell) is socket_accept, which has more logic after it calls this hook and couldn't be tail-call-optimized away. So what's going on here?
comment:14 by , 9 months ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Ah, I see the problem: your TCP addon is from "non-packaged". In hrev57408 the size of net_socket_module_info changed as some methods were removed. So that's the issue.
comment:15 by , 7 months ago
Duh, I feel stupid now! Nice catch! I had forgotten about this; I tested turning off SACK. This also fixes my 'git fetch' issues that I did not look into yet, thanks a lot!
Seems to be reproducible 100% on this VM; I got it three times in a row now.