Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#2594 closed bug (fixed)

PANIC: ASSERT FAILED (src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp:304): buffer != __null

Reported by: stippi Owned by: axeld
Priority: high Milestone: R1/alpha1
Component: Network & Internet/TCP Version: R1/pre-alpha1
Keywords: Cc: mattmadia@…
Blocked By: Blocking:
Platform: All

Description

Revision is hrev26680, no patches from newer revision applied to kernel stuff. I left the system running over night with a Transmission download (0.70 for BONE version). Returning to it in the morning, it showed the above panic. Here is the backtrace:

stack trace for thread 2681 "torrent 0x18084e10"
...
<kernel>:panic
</boot/beos/.../protocols/tcp>:Get__11BufferQueueUlbPP10net_buffer + 0x0069
</boot/beos/.../protocols/tcp>:ReadData__11TCPEndpointUlUlPP10net_buffer + 0x0307
</boot/beos/.../protocols/tcp>:tcp_read_data__FP12net_protocolUlUlPP10net_buffer + 0x0029
</boot/beos/.../network/stack>:socket_receive__FP10net_socketP6msghdrPvUli + 0x0087
<
</boot/beos/.../network/stack>:stack_interface_recvfrp__FP10net_socketPvUliP8sockaddrPUi + 0x008c
<kernel>:common_recvfrom__FiPvUliP8sockaddrPUib + 0x0055
<kernel>:_user_recvfrom + 0x0091
syscall stuff
...

Attachments (7)

kdl_BufferQueue.txt (2.2 KB ) - added by mmadia 16 years ago.
BufferQueue.cpp.1.diff (4.6 KB ) - added by Adek336 16 years ago.
img_1568.jpg (85.8 KB ) - added by Adek336 16 years ago.
smp.jpg (124.8 KB ) - added by mmadia 16 years ago.
photograph of bt in kdl, amd x2 cpu. both cores enabled.
smp-disabled.jpg (106.6 KB ) - added by mmadia 16 years ago.
photograph of bt in kdl, amd x2 cpu. smp disabled via boot options menu
img_1845.jpg (170.3 KB ) - added by Adek336 16 years ago.
img_1846.jpg (181.6 KB ) - added by Adek336 16 years ago.

Download all attachments as: .zip

Change History (34)

comment:1 by axeld, 16 years ago

I guess you didn't use the debugger and investigated the issue a bit? The tcp module has some useful KDL commands for problems like that :-)

In any case, it's an interesting bug. It seems to be an internal bug in the BufferQueue, maybe caused by lack of memory, but it could have any other reason, too. So I hope this happens again, eventually to me this time.

How large was the downloaded file, and how fast was it?

comment:2 by axeld, 16 years ago

Priority: normalhigh
Summary: [TPC] PANIC: ASSERT FAILED (src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp:304): buffer != __nullPANIC: ASSERT FAILED (src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp:304): buffer != __null

comment:3 by Adek336, 16 years ago

Note #2706.

comment:4 by mmadia, 16 years ago

Cc: mattmadia@… added

I too can reproduce this KDL reliably with the same version of Transmission on hrev28822. Adding almost complete output of bt

What other commands shoud I run while in KDL?

by mmadia, 16 years ago

Attachment: kdl_BufferQueue.txt added

comment:5 by emitrax, 16 years ago

I don't know about the kernel debugger command for the network stack, Axel will probably tell you more, but it might be useful to enable the debug output. That's done by uncommenting line 15 in src/add-ons/kernel/network/protocols/tcp/BufferQueue.cpp.

comment:6 by axeld, 16 years ago

Status: newassigned

comment:7 by axeld, 16 years ago

This should be fixed with hrev28859 - can you please retry if that was all? Thanks to Adrian, btw, who pointed me to the problem (in bug #2706).

comment:8 by Adek336, 16 years ago

The problem is still with us.

by Adek336, 16 years ago

Attachment: BufferQueue.cpp.1.diff added

comment:9 by Adek336, 16 years ago

seems to fix the problem

comment:10 by Adek336, 16 years ago

I checked with the old code + Verify() (but with trace messages instead of panics), that the buffer queue gets broken many minutes earlier than the buffer != null panic happens.

Also, the bug would probably break the data sent through tcp, because data from some segments would be duplicated in the buffer queue.

comment:11 by axeld, 16 years ago

Thanks Adrian! I've cleaned up your patch, and fixed a few more problems in hrev28878. I will look into writing a test for BufferQueue next week, though, to make sure it's really okay now (seeing how many bugs proof-reading revealed already).

I will close this bug once the test app is in place.

by Adek336, 16 years ago

Attachment: img_1568.jpg added

comment:12 by Adek336, 16 years ago

hrev28878, BufferQueue.cpp:454: assert failed buffer->size > 0

comment:13 by axeld, 16 years ago

Resolution: fixed
Status: assignedclosed

Everything should work fine now, with hrev28883.

comment:14 by Adek336, 16 years ago

assert failed buffer->size > 0 still happens.

comment:15 by axeld, 16 years ago

Resolution: fixed
Status: closedreopened

How exactly can you reproduce this so fast? :-)

Thanks for the note, it seems to be really hard to get that right. I fixed one occurence of that assert with the test app, so I thought I got it. At least that's the least important assert :-)

comment:16 by Adek336, 16 years ago

I can reproduce it in a just few minutes with transmission + 30 torrents :-)

comment:17 by mmadia, 16 years ago

Would a Haiku binary of v0.7x help with testing? Adek336 does this still occur with the v1.42 at http://www.haikuware.com/view-details/development/app-installation/transmission-142

in reply to:  17 comment:18 by Adek336, 16 years ago

Replying to mmadia:

Would a Haiku binary of v0.7x help with testing? Adek336 does this still occur with the v1.42 at http://www.haikuware.com/view-details/development/app-installation/transmission-142

Indeed, it happens with both transmission 0.4 and 1.42, and the bug is clearly in the kernel side of things. (Btw, transmission 1.42 daemon crashes to userland debugging very quickly, does it crash so quickly for you as well?)

comment:19 by anevilyak, 16 years ago

Can you try again with hrev28937 or newer?

comment:20 by Adek336, 16 years ago

In hrev28945 it is assert failed, BufferQueue.cpp:166, "next == null
buffer == null next->seque".

by mmadia, 16 years ago

Attachment: smp.jpg added

photograph of bt in kdl, amd x2 cpu. both cores enabled.

by mmadia, 16 years ago

Attachment: smp-disabled.jpg added

photograph of bt in kdl, amd x2 cpu. smp disabled via boot options menu

comment:21 by mmadia, 16 years ago

added two photographs of the resulting KDL, tested on revision hrev28947~49 used Transmission 0.70-bone

comment:22 by axeld, 16 years ago

Milestone: R1R1/alpha1

Maybe I should just stop adding new assertion to the code ;-)

Next time that happens, what's more interesting than a stack crawl is a dump of the buffer passed in and the buffer queue itself (via dumping the TCP connection).

by Adek336, 16 years ago

Attachment: img_1845.jpg added

comment:23 by Adek336, 16 years ago

Here you go!

comment:24 by axeld, 16 years ago

Resolution: fixed
Status: reopenedclosed

Thanks, Adrian! Turns out I tried to reproduce the bug with a version of the module that hadn't the assert activated...

Anyway, it's fixed now, since hrev28958 - the assert was just wrong.

by Adek336, 16 years ago

Attachment: img_1846.jpg added

comment:25 by Adek336, 16 years ago

Resolution: fixed
Status: closedreopened

There's still an empty buffer there.

comment:26 by axeld, 16 years ago

Resolution: fixed
Status: reopenedclosed

Thanks, fixed in hrev28967, finally.

comment:27 by Adek336, 16 years ago

Works well, great job, thanks !

Note: See TracTickets for help on using tickets.