Opened 20 months ago
Last modified 15 months ago
#18430 new bug
[kernel] System hangs or KDLs disabling CPUs
Reported by: | waddlesplash | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | System/Kernel | Version: | R1/beta4 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
This one is pretty easy for me to trigger in my VMware setup, and also on at least one of my bare-metal machines. We used to shut down on the boot CPU but disabled that because it hangs on some machines, I suspect this is the same bug.
Change History (6)
comment:1 by , 20 months ago
comment:2 by , 20 months ago
Note that smp_send_broadcast_ici
works in interrupts-disabled until it finishes, when it reenables interrupts. So on re-enabling interrupts, it simply never got rescheduled again?
comment:3 by , 20 months ago
Summary: | [kernel] System hangs disabling CPUs → [kernel] System hangs or KDLs disabling CPUs |
---|
comment:4 by , 20 months ago
Here's a test program I made:
#include <OS.h> extern status_t _kern_set_cpu_enabled(int32 cpu, bool enabled); int main() { for (int32 i = 0; i < (INT32_MAX-1); i++) { _kern_set_cpu_enabled(1, i%2); } }
Running this with the system under load produces KDLs. Example:
PANIC: ASSERT FAILED (../haiku-git/src/system/kernel/scheduler/scheduler.cpp:425): nextThreadData->Core() == core Welcome to Kernel Debugging Land... Thread 1726 "daemon" running on CPU 1 stack trace for thread 1726 "daemon" kernel stack: 0xffffffff82087000 to 0xffffffff8208c000 user stack: 0x00007f9764bca000 to 0x00007f9764c0a000 frame caller <image>:function + offset 0 ffffffff8208bb08 (+ 24) ffffffff80146eac <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 1 ffffffff8208bb20 (+ 80) ffffffff800b0518 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff8208bb70 (+ 96) ffffffff800b1b83 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf3 3 ffffffff8208bbd0 (+ 80) ffffffff800b1f1e <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff8208bc20 (+ 240) ffffffff800b2277 <kernel_x86_64> panic + 0xb7 5 ffffffff8208bd10 (+ 112) ffffffff8009baa6 <kernel_x86_64> reschedule(int) + 0x296 6 ffffffff8208bd80 (+ 48) ffffffff8008b566 <kernel_x86_64> thread_block + 0xc6 7 ffffffff8208bdb0 (+ 112) ffffffff80057be5 <kernel_x86_64> ConditionVariableEntry::Wait(unsigned int, long) + 0x155 8 ffffffff8208be20 (+ 160) ffffffff8006f4b1 <kernel_x86_64> read_port_etc + 0x111 9 ffffffff8208bec0 (+ 96) ffffffff80070401 <kernel_x86_64> _user_read_port_etc + 0x91 10 ffffffff8208bf20 (+ 16) ffffffff8014899f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff8208bf30 (end = 0xffffffff8208bff8) rax 0xde rbx 0x7f9764c07c50 rcx 0x15748583d6c rdx 0x7f9764c07c50 rsi 0x7f9764c07c4c rdi 0x17 rbp 0x7f9764c09c90 r8 0x0 r9 0x0 r10 0x2001 r11 0x246 r12 0x7f15f5b7e290 r13 0x7f9764c07c4c r14 0xffffffff r15 0x7f15f5b7e430 rip 0x15748583d6c rsp 0x7f9764c07c28 rflags 0x246 vector: 0x63, error code: 0x0 11 ffffffff8208bf30 (+140290320489824) 0000015748583d6c <libroot.so> _kern_read_port_etc + 0x0c 12 00007f9764c09c90 (+ 16) 000001d3cbb59b29 <_APP_> SyslogDaemon::_DaemonThread(void*) + 0x09 13 00007f9764c09ca0 (+ 32) 0000015748582b49 <libroot.so> _thread_do_exit_work (nearest) + 0x89 14 00007f9764c09cc0 (+ 0) 00007fab621e1258 <commpage> commpage_thread_exit + 0x00
continuing from this just ends in a NULL dereference.
comment:6 by , 15 months ago
Nope, retried the test program on hrev57316 and got this under load:
PANIC: ASSERT FAILED (../haiku-git/src/system/kernel/scheduler/scheduler_cpu.cpp:566): newKey >= 0 Welcome to Kernel Debugging Land... Thread 25971 "sed" running on CPU 1 stack trace for thread 25971 "sed" kernel stack: 0xffffffff8073c000 to 0xffffffff80741000 user stack: 0x00007fc0b26b7000 to 0x00007fc0b36b7000 frame caller <image>:function + offset 0 ffffffff80740470 (+ 32) ffffffff8014ade0 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 1 ffffffff80740490 (+ 80) ffffffff800b3a98 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff807404e0 (+ 96) ffffffff800b5144 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4 3 ffffffff80740540 (+ 80) ffffffff800b54de <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff80740590 (+ 240) ffffffff800b5877 <kernel_x86_64> panic + 0xb7 5 ffffffff80740680 (+ 64) ffffffff800a35b1 <kernel_x86_64> Scheduler::CoreEntry::_UpdateLoad(bool) + 0x6c1 6 ffffffff807406c0 (+ 112) ffffffff8009f727 <kernel_x86_64> reschedule(int) + 0x967 7 ffffffff80740730 (+ 544) ffffffff8008bf2d <kernel_x86_64> thread_exit + 0x84d 8 ffffffff80740950 (+1440) ffffffff800752af <kernel_x86_64> handle_signals + 0x99f 9 ffffffff80740ef0 (+ 48) ffffffff8008899e <kernel_x86_64> thread_at_kernel_exit + 0x1e 10 ffffffff80740f20 (+ 16) ffffffff8014cbe2 <kernel_x86_64> x86_64_syscall_entry + 0x31e user iframe at 0xffffffff80740f30 (end = 0xffffffff80740ff8) rax 0x0 rbx 0x0 rcx 0x1373c0b8459 rdx 0x198ba2a1b28 rsi 0x7fc0b36b6cb0 rdi 0x0 rbp 0x7fc0b36b6db0 r8 0x0 r9 0x1 r10 0xf r11 0x202 r12 0x4 r13 0x7fc0b36b7670 r14 0x0 r15 0x0 rip 0x1373c0b8459 rsp 0x7fc0b36b6d98 rflags 0x202 vector: 0x63, error code: 0x0
The mouse cursor moves initially, but eventually that too stops working.
The system is deadlocked on the kernel address space:
And the holding thread is the undertaker, which is stopped for no apparent reason:
Trying to print the thread information ends in a read fault:
The next thing to be dumped in the scheduler data is the time-used and quantum. Computing the quantum tries to access the thread's
fCore
, which I guess must be NULL. So, somehow this thread got de-scheduled from the disabled CPU and never rescheduled anywhere else.