Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#2594 closed bug (fixed)

PANIC: ASSERT FAILED (src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp:304): buffer != __null

Reported by: stippi Owned by: axeld
Priority: high Milestone: R1/alpha1
Component: Network & Internet/TCP Version: R1/pre-alpha1
Keywords: Cc: mattmadia@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Revision is hrev26680, no patches from newer revision applied to kernel stuff. I left the system running over night with a Transmission download (0.70 for BONE version). Returning to it in the morning, it showed the above panic. Here is the backtrace:

stack trace for thread 2681 "torrent 0x18084e10"
...
<kernel>:panic
</boot/beos/.../protocols/tcp>:Get__11BufferQueueUlbPP10net_buffer + 0x0069
</boot/beos/.../protocols/tcp>:ReadData__11TCPEndpointUlUlPP10net_buffer + 0x0307
</boot/beos/.../protocols/tcp>:tcp_read_data__FP12net_protocolUlUlPP10net_buffer + 0x0029
</boot/beos/.../network/stack>:socket_receive__FP10net_socketP6msghdrPvUli + 0x0087
<
</boot/beos/.../network/stack>:stack_interface_recvfrp__FP10net_socketPvUliP8sockaddrPUi + 0x008c
<kernel>:common_recvfrom__FiPvUliP8sockaddrPUib + 0x0055
<kernel>:_user_recvfrom + 0x0091
syscall stuff
...

Attachments (7)

kdl_BufferQueue.txt (2.2 KB) - added by mmadia 10 years ago.
BufferQueue.cpp.1.diff (4.6 KB) - added by Adek336 10 years ago.
img_1568.jpg (85.8 KB) - added by Adek336 10 years ago.
smp.jpg (124.8 KB) - added by mmadia 10 years ago.
photograph of bt in kdl, amd x2 cpu. both cores enabled.
smp-disabled.jpg (106.6 KB) - added by mmadia 10 years ago.
photograph of bt in kdl, amd x2 cpu. smp disabled via boot options menu
img_1845.jpg (170.3 KB) - added by Adek336 10 years ago.
img_1846.jpg (181.6 KB) - added by Adek336 10 years ago.

Download all attachments as: .zip

Change History (34)

comment:1 Changed 11 years ago by axeld

I guess you didn't use the debugger and investigated the issue a bit? The tcp module has some useful KDL commands for problems like that :-)

In any case, it's an interesting bug. It seems to be an internal bug in the BufferQueue, maybe caused by lack of memory, but it could have any other reason, too. So I hope this happens again, eventually to me this time.

How large was the downloaded file, and how fast was it?

comment:2 Changed 11 years ago by axeld

Priority: normalhigh
Summary: [TPC] PANIC: ASSERT FAILED (src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp:304): buffer != __nullPANIC: ASSERT FAILED (src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp:304): buffer != __null

comment:3 Changed 10 years ago by Adek336

Note #2706.

comment:4 Changed 10 years ago by mmadia

Cc: mattmadia@… added

I too can reproduce this KDL reliably with the same version of Transmission on hrev28822. Adding almost complete output of bt

What other commands shoud I run while in KDL?

Changed 10 years ago by mmadia

Attachment: kdl_BufferQueue.txt added

comment:5 Changed 10 years ago by emitrax

I don't know about the kernel debugger command for the network stack, Axel will probably tell you more, but it might be useful to enable the debug output. That's done by uncommenting line 15 in src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp.

comment:6 Changed 10 years ago by axeld

Status: newassigned

comment:7 Changed 10 years ago by axeld

This should be fixed with hrev28859 - can you please retry if that was all? Thanks to Adrian, btw, who pointed me to the problem (in bug #2706).

comment:8 Changed 10 years ago by Adek336

The problem is still with us.

Changed 10 years ago by Adek336

Attachment: BufferQueue.cpp.1.diff added

comment:9 Changed 10 years ago by Adek336

seems to fix the problem

comment:10 Changed 10 years ago by Adek336

I checked with the old code + Verify() (but with trace messages instead of panics), that the buffer queue gets broken many minutes earlier than the buffer != null panic happens.

Also, the bug would probably break the data sent through tcp, because data from some segments would be duplicated in the buffer queue.

comment:11 Changed 10 years ago by axeld

Thanks Adrian! I've cleaned up your patch, and fixed a few more problems in hrev28878. I will look into writing a test for BufferQueue next week, though, to make sure it's really okay now (seeing how many bugs proof-reading revealed already).

I will close this bug once the test app is in place.

Changed 10 years ago by Adek336

Attachment: img_1568.jpg added

comment:12 Changed 10 years ago by Adek336

hrev28878, BufferQueue.cpp:454: assert failed buffer->size > 0

comment:13 Changed 10 years ago by axeld

Resolution: fixed
Status: assignedclosed

Everything should work fine now, with hrev28883.

comment:14 Changed 10 years ago by Adek336

assert failed buffer->size > 0 still happens.

comment:15 Changed 10 years ago by axeld

Resolution: fixed
Status: closedreopened

How exactly can you reproduce this so fast? :-)

Thanks for the note, it seems to be really hard to get that right. I fixed one occurence of that assert with the test app, so I thought I got it. At least that's the least important assert :-)

comment:16 Changed 10 years ago by Adek336

I can reproduce it in a just few minutes with transmission + 30 torrents :-)

comment:17 Changed 10 years ago by mmadia

Would a Haiku binary of v0.7x help with testing? Adek336 does this still occur with the v1.42 at http://www.haikuware.com/view-details/development/app-installation/transmission-142

comment:18 in reply to:  17 Changed 10 years ago by Adek336

Replying to mmadia:

Would a Haiku binary of v0.7x help with testing? Adek336 does this still occur with the v1.42 at http://www.haikuware.com/view-details/development/app-installation/transmission-142

Indeed, it happens with both transmission 0.4 and 1.42, and the bug is clearly in the kernel side of things. (Btw, transmission 1.42 daemon crashes to userland debugging very quickly, does it crash so quickly for you as well?)

comment:19 Changed 10 years ago by anevilyak

Can you try again with hrev28937 or newer?

comment:20 Changed 10 years ago by Adek336

In hrev28945 it is assert failed, BufferQueue.cpp:166, "next == null
buffer == null next->seque".

Changed 10 years ago by mmadia

Attachment: smp.jpg added

photograph of bt in kdl, amd x2 cpu. both cores enabled.

Changed 10 years ago by mmadia

Attachment: smp-disabled.jpg added

photograph of bt in kdl, amd x2 cpu. smp disabled via boot options menu

comment:21 Changed 10 years ago by mmadia

added two photographs of the resulting KDL, tested on revision hrev28947~49 used Transmission 0.70-bone

comment:22 Changed 10 years ago by axeld

Milestone: R1R1/alpha1

Maybe I should just stop adding new assertion to the code ;-)

Next time that happens, what's more interesting than a stack crawl is a dump of the buffer passed in and the buffer queue itself (via dumping the TCP connection).

Changed 10 years ago by Adek336

Attachment: img_1845.jpg added

comment:23 Changed 10 years ago by Adek336

Here you go!

comment:24 Changed 10 years ago by axeld

Resolution: fixed
Status: reopenedclosed

Thanks, Adrian! Turns out I tried to reproduce the bug with a version of the module that hadn't the assert activated...

Anyway, it's fixed now, since hrev28958 - the assert was just wrong.

Changed 10 years ago by Adek336

Attachment: img_1846.jpg added

comment:25 Changed 10 years ago by Adek336

Resolution: fixed
Status: closedreopened

There's still an empty buffer there.

comment:26 Changed 10 years ago by axeld

Resolution: fixed
Status: reopenedclosed

Thanks, fixed in hrev28967, finally.

comment:27 Changed 10 years ago by Adek336

Works well, great job, thanks !

Note: See TracTickets for help on using tickets.