Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#5229 closed bug (fixed)

PANIC: acquire_spinlock(): Failed to acquire spinlock 0xcdc5eb68 for a long time!

Reported by: anevilyak Owned by: bonefish
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Ran into this again while doing a full build inside the scheduling_recorder:

Welcome to Kernel Debugging Land...
Thread 31237 "cc1" running on CPU 1
kdebug> bt
stack trace for thread 31237 "cc1"
    kernel stack: 0xdeb40000 to 0xdeb44000
      user stack: 0x7efed000 to 0x7ffed000
frame               caller     <image>:function + offset
 0 deb43a30 (+  32) 8006f8c1   <kernel_x86> invoke_command_trampoline(void*: 0xdeb43ab0) + 0x0015
 1 deb43a50 (+  12) 800df998   <kernel_x86>:arch_debug_call_with_fault_handler + 0x001b
 2 deb43a5c (+  48) 8006d7a2   <kernel_x86>:debug_call_with_fault_handler + 0x0051
 3 deb43a8c (+  64) 8006fc6a   <kernel_x86>:invoke_debugger_command + 0x00bb
 4 deb43acc (+  48) 8006fd87   <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: 0x80135622, int32: 0, char*: NULL) + 0x0083
 5 deb43afc (+  32) 8006fe4f   <kernel_x86>:invoke_debugger_command_pipe + 0x008b
 6 deb43b1c (+ 128) 80073bee   <kernel_x86> ExpressionParser<0xdeb43bec>::_ParseCommandPipe(int&: 0xdeb43be8) + 0x0aae
 7 deb43b9c (+  48) 800763b7   <kernel_x86> ExpressionParser<0xdeb43bec>::EvaluateCommand(char const*: 0x80135620 "bt", int&: 0xdeb43be8) + 0x06df
 8 deb43bcc (+ 192) 80076530   <kernel_x86>:evaluate_debug_command + 0x0084
 9 deb43c8c (+  96) 8006e6f2   <kernel_x86> kernel_debugger_internal(char const*: 0x1 "<???>", int32: -558613224) + 0x03a7
10 deb43cec (+  16) 8006e853   <kernel_x86>:kernel_debugger + 0x0019
11 deb43cfc (+ 160) 8006e92d   <kernel_x86>:panic + 0x002a
12 deb43d9c (+  48) 8005a8bd   <kernel_x86>:acquire_spinlock + 0x0052
13 deb43dcc (+  48) 80078ae7   <kernel_x86> SystemProfiler<0xcdc5eb40>::ThreadEnqueuedInRunQueue(thread*: 0xcd31b000) + 0x002d
14 deb43dfc (+  48) 80068f07   <kernel_x86> affine_enqueue_in_run_queue(thread*: 0xcd31b000) + 0x011f
15 deb43e2c (+  48) 80043feb   <kernel_x86> ConditionVariable<0xccd364c4>::_NotifyLocked(true, int32: 0) + 0x00bf
16 deb43e5c (+  64) 80044089   <kernel_x86> ConditionVariable<0xccd364c4>::_Notify(true, false, int32: 0) + 0x0083
17 deb43e9c (+  48) 807dd270   <ata> ATAChannel<0xccd364b0>::Interrupt(uint8: 0x50 (80)) + 0x0056
18 deb43ecc (+  32) 807e0557   <ata> ata_interrupt_handler(void*: 0xccd364b0, uint8: 0x50 (80)) + 0x0024
19 deb43eec (+  80) 807eca4c   <ata_adapter>:ata_adapter_inthand + 0x00c0
20 deb43f3c (+  48) 8004b774   <kernel_x86>:int_io_interrupt_handler + 0x0052
21 deb43f6c (+  48) 800dc20f   <kernel_x86> hardware_interrupt(iframe*: 0xdeb43fa8) + 0x0076
22 deb43f9c (+  12) 800dfda6   <kernel_x86>:int_bottom_user + 0x005a
user iframe at 0xdeb43fa8 (end = 0xdeb44000)
 eax 0x0            ebx 0x8b0678        ecx 0x6fff0000   edx 0x4
 esi 0x1802db40     edi 0x0             ebp 0x0          esp 0xdeb43fdc
 eip 0x500846    eflags 0x10206    user esp 0x7ffecd00
 vector: 0x2b, error code: 0x0
23 deb43fa8 (+558612568) 00500846   <cc1>:discard_stmt_changes + 0x0066
kdebug> spinlock 0xcdc5eb68
spinlock 0xcdc5eb68:
  locked from 0x80079a10

kdebug> calling 0x80079a10 yields no result though. Any ideas how to resolve where it was locked from? symbols 0x80079a10 does not appear to be helpful. Leaving in KDL pending further input.

Change History (6)

comment:1 Changed 10 years ago by anevilyak

If of interest, CPU 0's state was:

Thread 8651 "scheduling_recorder" running on CPU 0
kdebug> bt
stack trace for thread 8651 "scheduling_recorder"
    kernel stack: 0x8118f000 to 0x81193000
      user stack: 0x7efef000 to 0x7ffef000
frame               caller     <image>:function + offset
 0 81192b58 (+  32) 8006f8c1   <kernel_x86> invoke_command_trampoline(void*: 0x81192bd8) + 0x0015
 1 81192b78 (+  12) 800df998   <kernel_x86>:arch_debug_call_with_fault_handler + 0x001b
 2 81192b84 (+  48) 8006d7a2   <kernel_x86>:debug_call_with_fault_handler + 0x0051
 3 81192bb4 (+  64) 8006fc6a   <kernel_x86>:invoke_debugger_command + 0x00bb
 4 81192bf4 (+  48) 8006fd87   <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: 0x80135e22, int32: 0, char*: NULL) + 0x0083
 5 81192c24 (+  32) 8006fe4f   <kernel_x86>:invoke_debugger_command_pipe + 0x008b
 6 81192c44 (+ 128) 80073bee   <kernel_x86> ExpressionParser<0x81192d14>::_ParseCommandPipe(int&: 0x81192d10) + 0x0aae
 7 81192cc4 (+  48) 800763b7   <kernel_x86> ExpressionParser<0x81192d14>::EvaluateCommand(char const*: 0x80135e20 "bt", int&: 0x81192d10) + 0x06df
 8 81192cf4 (+ 192) 80076530   <kernel_x86>:evaluate_debug_command + 0x0084
 9 81192db4 (+  96) 8006e6f2   <kernel_x86> kernel_debugger_internal(char const*: 0x13ef80 "<???>", int32: 0) + 0x03a7
10 81192e14 (+  32) 8006ea0e   <kernel_x86>:debug_trap_cpu_in_kdl + 0x005b
11 81192e34 (+  48) 8005a4cc   <kernel_x86> process_pending_ici(int32: -2129056084) + 0x027e
12 81192e64 (+  32) 8005a8cc   <kernel_x86>:acquire_spinlock + 0x0061
13 81192e84 (+  48) 80044912   <kernel_x86> ConditionVariable<0xcdc5ebb4>::Add(ConditionVariableEntry*: 0x81192ed8) + 0x0022
14 81192eb4 (+  80) 80079a93   <kernel_x86> SystemProfiler<0xcdc5eb40>::NextBuffer(uint32: 0x44d5c (281948), unsigned long long*: 0x81192f2c) + 0x00bb
15 81192f04 (+  64) 8007ab38   <kernel_x86>:_user_system_profiler_next_buffer + 0x00a2
16 81192f44 (+ 100) 800dff81   <kernel_x86>:handle_syscall + 0x00be
user iframe at 0x81192fa8 (end = 0x81193000)
 eax 0xd8           ebx 0x2039fc        ecx 0x7ffeee4c   edx 0xffff0114
 esi 0x44d5c        edi 0x7ffeef48      ebp 0x7ffeef68   esp 0x81192fdc
 eip 0xffff0114  eflags 0x246      user esp 0x7ffeee4c
 vector: 0x63, error code: 0x0
17 81192fa8 (+   0) ffff0114   <commpage>:commpage_syscall + 0x0004
18 7ffeef68 (+  52) 00201363   </boot/system/bin/scheduling_recorder@0x00200000>:unknown + 0x1363
19 7ffeef9c (+  64) 001052c3   </boot/system/runtime_loader@0x00100000>:unknown + 0x52c3
20 7ffeefdc (+   0) 7ffeefec   309477:scheduling_recorder_main_stack@0x7efef000 + 0xffffec
kdebug>

Running calling 0x80079a10 yields a read fault here though.

comment:2 Changed 10 years ago by anevilyak

Also on CPU 0 the spinlock being acquired is 0x8013b678, locked from 0x8004404e.

comment:3 Changed 10 years ago by anevilyak

In any case, I only ever see this problem with the scheduling_recorder, and the backtraces consistently point to the guts of the profiler so I'd unfortunately have to say this appears unrelated to your and Matt's problems, and instead is just a simple lock order reversal or similar issue in the kernel profiler :/

comment:4 in reply to:  3 ; Changed 10 years ago by bonefish

Status: newin-progress

Replying to anevilyak:

In any case, I only ever see this problem with the scheduling_recorder, and the backtraces consistently point to the guts of the profiler so I'd unfortunately have to say this appears unrelated to your and Matt's problems, and instead is just a simple lock order reversal or similar issue in the kernel profiler :/

Yep, that's indeed a locking order reversal. SystemProfiler::NextBuffer() uses a condition variable while holding the object's lock, and when waking up a waiting thread the condition variable code will enter the system profiler while holding the condition variable lock.

Regarding "calling" (and "bt"), ATM there's the somewhat annoying limitation on SMP machines that the stack trace code doesn't handle threads currently running on other CPUs specially. Which it should, though, since the current stack pointer in the thread's data is obsolete at this point. The command you're looking for is "ls" (lookup symbol), though. "help | grep symbol" FTW. :-)

comment:5 Changed 10 years ago by bonefish

Resolution: fixed
Status: in-progressclosed

Fixed in hrev35003.

comment:6 in reply to:  4 Changed 10 years ago by anevilyak

Replying to bonefish:

Regarding "calling" (and "bt"), ATM there's the somewhat annoying limitation on SMP machines that the stack trace code doesn't handle threads currently running on other CPUs specially. Which it should, though, since the current stack pointer in the thread's data is obsolete at this point. The command you're looking for is "ls" (lookup symbol), though. "help | grep symbol" FTW. :-)

Thanks for the tip, I overlooked 'lookup' while browsing through the available commands, and wasn't aware the kernel debugger supported either pipes or grep, very cool :) And thanks for fixing that :)

Note: See TracTickets for help on using tickets.