Opened 11 years ago

Closed 10 years ago

#2143 closed bug (fixed)

KDL in net timer.

Reported by: bga Owned by: axeld
Priority: high Milestone: R1/alpha1
Component: Network & Internet/Stack Version: R1/pre-alpha1
Keywords: Cc: tqh
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

This is with the latest revision, hrev25113. I got a KDL on net timer when starting Firefox. In would guess this is related to Ingo's recent changes but I am not sure.

Attachments (1)

KDL.png (375.1 KB ) - added by bga 11 years ago.
KDL image.

Download all attachments as: .zip

Change History (13)

by bga, 11 years ago

Attachment: KDL.png added

KDL image.

comment:1 by bonefish, 11 years ago

Component: System/KernelNetwork & Internet/Stack
Milestone: R1R1/alpha1
Owner: changed from bonefish to axeld

Can you reproduce this? When I tried, Firefox started without problem and I could visit Haiku's website.

I wouldn't rule out, that my changes are to blame, but until proven I move this bug to the net stack component.

After a quick glance at the code, I saw that uninit_timers() doesn't look good, BTW. Destroying a benaphore won't really be noticed by anyone unless one locks it before doing that. Even then I'm not sure that there are no races with the timer thread.

comment:2 by axeld, 11 years ago

uninit_timers() is only called when the stack is being unloaded - at that time, no protocol should be active anymore (and therefore, no timers). So that's at least unrelated to the problem we're seeing here :-)

comment:3 by tqh, 11 years ago

Cc: tqh added

comment:4 by bga, 11 years ago

Unfortunately I could not duplicate this bug but I will keep trying.

in reply to:  2 comment:5 by bonefish, 11 years ago

Replying to axeld:

uninit_timers() is only called when the stack is being unloaded - at that time, no protocol should be active anymore (and therefore, no timers). So that's at least unrelated to the problem we're seeing here :-)

I didn't think it was related. I just found it weird that the timer thread checks the acquisition of the benaphore at all and while uninit_timers() doesn't do anything to ever trigger this check in the first place.

comment:6 by bonefish, 11 years ago

Probably related (hrev25537):

PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbeef, ip 0xdeadbeef

Welcome to Kernel Debugging Land...
Running on CPU 0
kdebug> sc
stack trace for thread 66 "net timer"
    kernel stack: 0x807ca000 to 0x807ce000
frame            caller     <image>:function + offset
807cdc20 (+  48) 8004e69b   <kernel>:invoke_debugger_command + 0x00cf
807cdc50 (+  64) 8004f444   <kernel>:_ParseCommand__16ExpressionParserRi + 0x01f8
807cdc90 (+  48) 8004ee36   <kernel>:EvaluateCommand__16ExpressionParserPCcRi + 0x01de
807cdcc0 (+ 224) 80050558   <kernel>:evaluate_debug_command + 0x0088
807cdda0 (+  64) 8004d1d6   <kernel>:kernel_debugger_loop__Fv + 0x017a
807cdde0 (+  48) 8004de89   <kernel>:kernel_debugger + 0x010d
807cde10 (+ 192) 8004dd71   <kernel>:panic + 0x0029
807cded0 (+  64) 8009acef   <kernel>:vm_page_fault + 0x00ab
807cdf10 (+  64) 800a4de1   <kernel>:page_fault_exception + 0x00b1
807cdf50 (+  12) 800a857d   <kernel>:int_bottom + 0x001d (nearest)
iframe at 0x807cdf5c (end = 0x807cdfb4)
 eax 0xdeadbeef     ebx 0x807c9498      ecx 0x9115f000   edx 0x200246
 esi 0x91be9e6c     edi 0x0             ebp 0x807cdfd8   esp 0x807cdf90
 eip 0xdeadbeef  eflags 0x210287
 vector: 0xe, error code: 0x0
807cdf5c (+ 124) deadbeef
807cdfd8 (+  32) 800446ef   <kernel>:_create_kernel_thread_kentry__Fv + 0x001b
807cdff8 (+2139299848) 80044684   <kernel>:thread_kthread_exit__Fv + 0x0000

If TCP does indeed use the timers as explained on the commit mailing list recently, it might be a good idea to remove the race condition, as it perfectly explains these kinds of problems.

comment:7 by bga, 11 years ago

I never got this anymore. Is this still an issue to anyone? If not, maybe it can be closed.

comment:8 by tqh, 11 years ago

I'm the only cc, and I've never seen it. I just watch for Firefox related problems. So I think it can be closed.

comment:9 by axeld, 11 years ago

The original problem is still persistent, and is waiting for a fix, I just didn't get around doing it.

comment:10 by axeld, 11 years ago

Resolution: fixed
Status: newclosed

The issue should be fixed with hrev26981 (the uninit_timer() problem in hrev26980).

comment:11 by stpere, 10 years ago

Resolution: fixed
Status: closedreopened

Hi,

I got two KDL on hrev30140, both very similar (I would say the same). So I think I'm getting near a way to reproduce it. It involves the network preflets and the ftp command line client.

I get this error :

ARP host 0294a8c0 updated with different hardware address 00:0c:29:4a:0e:7d.
ARP host 0294a8c0 updated with different hardware address 00:0c:29:4a:0e:7d.
vm_soft_fault: va 0xdeadb000 not covered by area in address space
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0xdeadbef3, ip 0x800bac37, write 1, user 0, thread 0x38
PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbef3, ip 0x800bac37

Welcome to Kernel Debugging Land...

Thread 56 "net timer" running on CPU 0
kdebug> bt
stack trace for thread 56 "net timer"
    kernel stack: 0x8023d000 to 0x80241000
frame               caller     <image>:function + offset

 0 80240b6c (+  48) 80060271   <kernel_x86>:invoke_debugger_command + 0x00f5
 1 80240b9c (+  64) 80060061   <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: [34m0x8012a220[0m, int32: [34m0[0m, [34m0x0[0m [31m"<NULL>"[0m) + 0x0079
 2 80240bdc (+  64) 800603e8   <kernel_x86>:invoke_debugger_command_pipe + 0x009c
 3 80240c1c (+  48) 80061998   <kernel_x86> ExpressionParser<[32m0x80240cd0[0m>::_ParseCommandPipe([34m0x80240ccc[0m) + 0x0234
 4 80240c4c (+  64) 80060dd2   <kernel_x86> ExpressionParser<[32m0x80240cd0[0m>::EvaluateCommand([34m0x8011ab60[0m [36m"bt"[0m, [34m0x80240ccc[0m) + 0x02ba
 5 80240c8c (+ 224) 80062dc0   <kernel_x86>:evaluate_debug_command + 0x0088
 6 80240d6c (+  64) 8005e162   <kernel_x86> kernel_debugger_loop() + 0x01ae
 7 80240dac (+  32) 8005eff1   <kernel_x86>:kernel_debugger + 0x004d
 8 80240dcc (+ 192) 8005ef99   <kernel_x86>:panic + 0x0029
 9 80240e8c (+  80) 800c1c31   <kernel_x86>:vm_page_fault + 0x0139
10 80240edc (+  64) 800d1c1d   <kernel_x86>:page_fault_exception + 0x00d9
11 80240f1c (+  12) 800d5316   <kernel_x86>:int_bottom + 0x0036
kernel iframe at 0x80240f28 (end = 0x80240f78)
 eax 0x81025e40     ebx 0x80567568      ecx 0xdeadbeef   edx 0xdeadbeef
 esi 0x80567c94     edi 0x81025e40      ebp 0x80240f78   esp 0x80240f5c
 eip 0x800bac37  eflags 0x10282    
 vector: 0xe, error code: 0x2

12 80240f28 (+  80) 800bac37   <kernel_x86>:list_remove_link + 0x000b
13 80240f78 (+  32) 800bad1c   <kernel_x86>:list_remove_item + 0x0018
14 80240f98 (+  64) 8056457a   </boot/system/add-ons/kernel/network/stack> timer_thread(NULL) + 0x009a
15 80240fd8 (+  32) 800548ff   <kernel_x86> _create_kernel_thread_kentry() + 0x001b
16 80240ff8 (+2145120264) 8005489c   <kernel_x86> thread_kthread_exit() + 0x0000

Basically, the connection between my haiku guest and my ubuntu host get lost while I'm in FTP, I open the network preflet to change the ip address (probably useless), and tada, KDL..

I'm not sure if it's really related to this ticket, but I think it's pretty close (net timer thread, etc..)

comment:12 by axeld, 10 years ago

Resolution: fixed
Status: reopenedclosed

Even though it's the same stack trace, this is a whole different bug. If you're not sure, making a comment without reopening the bug would be preferred.

I'm currently looking into that bug in particular, btw, so there is no need to open another ticket for this.

Note: See TracTickets for help on using tickets.