Opened 14 years ago
Closed 12 years ago
#6736 closed bug (fixed)
[Network stack] crashes while trying to quit Fuppes
Reported by: | diver | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Network & Internet/IPv4 | Version: | R1/Development |
Keywords: | multicast ipv4 | Cc: | |
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
Attachments (6)
Change History (22)
by , 14 years ago
Attachment: | fuppes_kdl.png added |
---|
by , 14 years ago
Attachment: | fuppes_kdl2.png added |
---|
by , 14 years ago
Attachment: | fuppes_kdl3.png added |
---|
by , 14 years ago
Attachment: | fuppes_kdl4.png added |
---|
comment:2 by , 14 years ago
Keywords: | multicast added |
---|
comment:4 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Should be fixed in hrev42899. The stack traces are not pointing at the actual problem due to the heap getting corrupted and only on further use crashes happen. Please reopen if you encounter it again.
comment:6 by , 13 years ago
Looks like it is just partially fixed. Unfortunately it is still observed with my sis19x network driver on revisions after hrev42899. Typical stack crawl is looking like one in attached fuppes_kdl2.png (JoinGroup + 0x00f6). One time it was crashed in the same place as shown in fuppes_kdl3.png (Clear() + 0x0060). Note that sis19x is native driver so freebsd compat layer could not be taken into account - so hrev42899 changes are unrelated in this exact case.
I have tried to catch the case some week ago but failed. Looks like the suspect is the "MultiHashTable<MulticastStateHash>* sMulticastState" object. At least commenting it out forces the KDLs to disappear.
Note that typical reproduce sequence is:
a) start fuppes;
b) Ctrl-C to quit fuppes;
c) start fuppes -> fall through into KDL;
Sometime it is required to repeat b) and c) some times to receive the KDL.
Are there any suggestion to trace or debug?
PS: Test was performed with version of sis19x just answering B_OK on B_ETHER_ADDMULTI / B_ETHER_REMMULTI ioctl requests.
follow-ups: 9 10 comment:7 by , 13 years ago
Multicast is pretty much broken at this point in the ipv4/ipv6 modules for quite some time.
comment:8 by , 13 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
At least screenshots 1 and 4 look exactly like the ones that happened here due to the FreeBSD compatibility layer issue fixed in hrev42904. So there seem to be two separate bugs here, one of which was fixed, the other still open. Even though it'd be nicer to have separate bug reports for each, let's just reopen this one.
Replying to siarzhuk:
Looks like it is just partially fixed. Unfortunately it is still observed with my sis19x network driver on revisions after hrev42899. Typical stack crawl is looking like one in attached fuppes_kdl2.png (JoinGroup + 0x00f6). One time it was crashed in the same place as shown in fuppes_kdl3.png (Clear() + 0x0060). Note that sis19x is native driver so freebsd compat layer could not be taken into account - so hrev42899 changes are unrelated in this exact case.
I see. The FreeBSD ones still use the same upper layers though, so it's entirely possible to run into that issue with a FreeBSD driver as well. In my limited test case I was doing something unrelated, so I didn't stress test the multicast mechanism after the fix.
Are there any suggestion to trace or debug?
Adding/enabling tracing to see what's really going on would make sense. I can take another look of course to see if I can spot anything when using the indicated software. I've used my own SSDP implementation and did kill the app always, so it's possible that, if the software mentioned does do a proper cleanup on getting the signal, different code paths were used.
comment:9 by , 13 years ago
Replying to axeld:
Multicast is pretty much broken at this point in the ipv4/ipv6 modules for quite some time.
...
Fixing at least the KDLs would be my immediate goal anyway.
comment:10 by , 13 years ago
Replying to axeld:
Multicast is pretty much broken at this point in the ipv4/ipv6 modules for quite some time.
Is there a ticket for that?
comment:11 by , 13 years ago
Well, I have catched the case: the LeaveGroup is not called on dropping multicast group membership. And looks like the groups hash-map contains after this invalid pointer to deleted object. So the next attempt to add the same group fails on iteration of the map. I have no idea about all those templates kung-fu but attached patch solves at least this problem with fuppes application. :-)
comment:12 by , 13 years ago
patch: | 0 → 1 |
---|
comment:13 by , 12 years ago
Tried again with hrev44559.
vm_soft_fault: va 0x0 not covered by area in address space vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x24, ip 0xcd7ee73d, write 0, user 0, thread 0x2a1 PANIC: vm_page_fault: unhandled page fault in kernel space at 0x24, ip 0xcd7ee73d Welcome to Kernel Debugging Land... Thread 673 "fuppes" running on CPU 0 stack trace for thread 673 "fuppes" kernel stack: 0x806cd000 to 0x806d1000 user stack: 0x7efef000 to 0x7ffef000 frame caller <image>:function + offset 0 806d0794 (+ 32) 801241e2 <kernel_x86>:arch_debug_stack_trace + 0x0012 1 806d07b4 (+ 16) 800910cf <kernel_x86> stack_trace_trampoline(NULL) + 0x000b 2 806d07c4 (+ 12) 8012964e <kernel_x86>:arch_debug_call_with_fault_handler + 0x001b 3 806d07d0 (+ 48) 80092b5e <kernel_x86>:debug_call_with_fault_handler + 0x005e 4 806d0800 (+ 64) 800912ef <kernel_x86> kernel_debugger_loop(0x8016d6f7 "PANIC: ", 0x80182ae0 "vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx ", 0x806d08ac "$", int32: 0) + 0x021b 5 806d0840 (+ 48) 80091653 <kernel_x86> kernel_debugger_internal(0x8016d6f7 "PANIC: ", 0x80182ae0 "vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx ", 0x806d08ac "$", int32: 0) + 0x0053 6 806d0870 (+ 48) 80092ed8 <kernel_x86>:panic + 0x0024 7 806d08a0 (+ 144) 80106f8d <kernel_x86>:vm_page_fault + 0x0129 8 806d0930 (+ 80) 8012583e <kernel_x86> page_fault_exception(iframe*: 0x806d098c) + 0x017e 9 806d0980 (+ 12) 8012a5fd <kernel_x86>:int_bottom + 0x003d kernel iframe at 0x806d098c (end = 0x806d09dc) eax 0x0 ebx 0xcd7f43e4 ecx 0xd2870620 edx 0xd2870600 esi 0x82009488 edi 0x5 ebp 0x806d0a54 esp 0x806d09c0 eip 0xcd7ee73d eflags 0x13282 vector: 0xe, error code: 0x0 10 806d098c (+ 200) cd7ee73d </boot/system/add-ons/kernel/network/protocols/ipv4> IPv4Multicast<0xd2870600>::JoinGroup() + 0x0249 11 806d0a54 (+ 160) cd7f2332
by , 12 years ago
Attachment: | Fuppes_kdl.zip added |
---|
comment:14 by , 12 years ago
Component: | Network & Internet/Stack → Network & Internet/IPv4 |
---|---|
Description: | modified (diff) |
Keywords: | ipv4 added |
Actually, I'm not sure about the 1st and 4th screenshots, but the were observed while trying to quit Fuppes.