#14530 closed bug (fixed)
KDL when running profiler
Reported by: | luroh | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta2 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | korli, mmlr | |
Blocked By: | Blocking: | #15108 | |
Platform: | x86-64 |
Description
64-bit r1beta1 hrev52295+91
Fairly easy to reproduce, happens ~20% of the time when running 'profile HaikuDepot > output.txt'
PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 503 "BUrlProtocol.HTTP" running on CPU 0 stack trace for thread 503 "BUrlProtocol.HTTP" kernel stack: 0xffffffff88901000 to 0xffffffff88906000 user stack: 0x00007fdb34a0d000 to 0x00007fdb34a4d000 frame caller <image>:function + offset 0 ffffffff88905438 (+ 24) ffffffff8013fccc <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 1 ffffffff88905450 (+ 80) ffffffff800a7978 <kernel_x86_64> debug_call_with_fault_handler + 0x68 2 ffffffff889054a0 (+ 96) ffffffff800a9321 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, intkdebug>
Change History (10)
comment:1 by , 6 years ago
Component: | - General → System/Kernel |
---|
comment:2 by , 5 years ago
Blocking: | 15108 added |
---|
comment:3 by , 5 years ago
Platform: | All → x86-64 |
---|
comment:4 by , 5 years ago
It also happens when running Web+, also in a BUrlProtocol thread. But I actually get another few lines of the backtrace:
PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 521 "BUrlProtocol.HTTP" running on CPU 0 stack trace for thread 521 "BUrlProtocol.HTTP" kernel stack: 0xffffffff81427000 to 0xffffffff8142c000 user stack: 0x00007f51b4c79000 to 0x00007f51b4cb9000 frame caller <image>:function + offset 0 ffffffff8142b468 (+ 24) ffffffff8014df7c <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 1 ffffffff8142b480 (+ 80) ffffffff800ad928 <kernel_x86_64> debug_call_with_fault_handler + 0x88 2 ffffffff8142b4d0 (+ 96) ffffffff800af2b1 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf1 3 ffffffff8142b530 (+ 80) ffffffff800af5ae <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff8142b580 (+ 240) ffffffff800af917 <kernel_x86_64> panic + 0xb7 5 ffffffff8142b670 (+ 224) ffffffff801586c8 <kernel_x86_64> x86_unexpected_exception + 0x168 6 ffffffff8142b750 (+ 536) ffffffff8014f822 <kernel_x86_64> int_bottom + 0x56 kernel iframe at 0xffffffff8142b968 (end = 0xffffffff8142ba30) rax 0xffffffff8142bb20 rbx 0xe3e53d9b9ff0025d rcx 0x0 rdx 0x10 rsi 0xe3e53d9b9ff0025d rdi 0xffffffff8142bb20 rbp 0xffffffff8142ba50 r8 0xffffffff87250b20 r9 0x8f76a49097a6ffc3 r10 0x784b8d21b5b4a9e2 r11 0x84ae96afe2bba r12 0xffffffff8142bb88 r13 0xffffffff8142bb80 r14 0xe3e53d9b9ff0025d r15 0x0 rip 0xffffffff8016d800 rsp 0xffffffff8142ba30 rflags 0x13016 vector: 0xd, error code: 0x0 7 ffffffff8142b968 (+ 232) ffffffff8016d800 <kernel_x86_64> memcpy + 0x50 8 ffffffff8142ba50 (+ 112) ffffffff8012c30c kdebug>
And then the return address that it somehow failed to lookup a symbol for is...:
kdebug> ls 0xffffffff8012c30c 0xffffffff8012c30c = _ZN12_GLOBAL__N_111user_accessIZNS_20arch_cpu_user_memcpyEPvPKvmEUlvE_EEbT_ + 0xac (kernel_x86_64)
It's odd that the stack trace couldn't get that; is the symbol name too long or something?
The address it's trying to copy to (0xe3e53d9b9ff0025d
) is clearly junk, and since it isn't in canonical form, the fault handler can't catch it.
Is the profiler somehow not restoring state properly on x86_64? But then why does BUrlProtocol.HTTP seem to be the only thing that can trigger this?
comment:5 by , 5 years ago
Aha! The problem is that the demanglers cause a page fault when trying to demangle it. With "sc -d" to disable demangling:
stack trace for thread 521 "BUrlProtocol.HTTP" kernel stack: 0xffffffff81427000 to 0xffffffff8142c000 user stack: 0x00007f51b4c79000 to 0x00007f51b4cb9000 frame caller <image>:function + offset 0 ffffffff8142b1a8 (+ 32) ffffffff800b0859 <kernel_x86_64> _ZL25invoke_command_trampolinePv + 0x19 1 ffffffff8142b1c8 (+ 24) ffffffff8014df7c <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 2 ffffffff8142b1e0 (+ 80) ffffffff800ad928 <kernel_x86_64> debug_call_with_fault_handler + 0x88 3 ffffffff8142b230 (+ 96) ffffffff800b0adf <kernel_x86_64> invoke_debugger_command + 0xef 4 ffffffff8142b290 (+ 64) ffffffff800b0c59 <kernel_x86_64> _ZL19invoke_pipe_segmentP21debugger_command_pipeiPc + 0xf9 5 ffffffff8142b2d0 (+ 80) ffffffff800b0d6c <kernel_x86_64> invoke_debugger_command_pipe + 0xac 6 ffffffff8142b320 (+ 96) ffffffff800b59f8 <kernel_x86_64> _ZN16ExpressionParser17_ParseCommandPipeERi + 0x118 7 ffffffff8142b380 (+ 96) ffffffff800bc6b3 <kernel_x86_64> _ZN16ExpressionParser15EvaluateCommandEPKcRi + 0xd83 8 ffffffff8142b3e0 (+ 240) ffffffff800bec5c <kernel_x86_64> evaluate_debug_command + 0x11c 9 ffffffff8142b4d0 (+ 96) ffffffff800af370 <kernel_x86_64> _ZL20kernel_debugger_loopPKcS0_P13__va_list_tagi + 0x1b0 10 ffffffff8142b530 (+ 80) ffffffff800af5ae <kernel_x86_64> _ZL24kernel_debugger_internalPKcS0_P13__va_list_tagi + 0x6e 11 ffffffff8142b580 (+ 240) ffffffff800af917 <kernel_x86_64> panic + 0xb7 12 ffffffff8142b670 (+ 224) ffffffff801586c8 <kernel_x86_64> x86_unexpected_exception + 0x168 13 ffffffff8142b750 (+ 536) ffffffff8014f822 <kernel_x86_64> int_bottom + 0x56 kernel iframe at 0xffffffff8142b968 (end = 0xffffffff8142ba30) rax 0xffffffff8142bb20 rbx 0xe3e53d9b9ff0025d rcx 0x0 rdx 0x10 rsi 0xe3e53d9b9ff0025d rdi 0xffffffff8142bb20 rbp 0xffffffff8142ba50 r8 0xffffffff87250b20 r9 0x8f76a49097a6ffc3 r10 0x784b8d21b5b4a9e2 r11 0x84ae96afe2bba r12 0xffffffff8142bb88 r13 0xffffffff8142bb80 r14 0xe3e53d9b9ff0025d r15 0x0 rip 0xffffffff8016d800 rsp 0xffffffff8142ba30 rflags 0x13016 vector: 0xd, error code: 0x0 14 ffffffff8142b968 (+ 232) ffffffff8016d800 <kernel_x86_64> memcpy + 0x50 15 ffffffff8142ba50 (+ 112) ffffffff8012c30c <kernel_x86_64> _ZN12_GLOBAL__N_111user_accessIZNS_20arch_cpu_user_memcpyEPvPKvmEUlvE_EEbT_ + 0xac 16 ffffffff8142bac0 (+ 80) ffffffff8013480c <kernel_x86_64> user_memcpy + 0x2c 17 ffffffff8142bb10 (+ 64) ffffffff80155f0b <kernel_x86_64> _ZL26get_next_frame_no_debuggermPmS_bPN7BKernel6ThreadE + 0x3b 18 ffffffff8142bb50 (+ 112) ffffffff80157022 <kernel_x86_64> arch_debug_get_stack_trace + 0x92 19 ffffffff8142bbc0 (+ 96) ffffffff800c458b <kernel_x86_64> _ZN14SystemProfiler9_DoSampleEv + 0x5b 20 ffffffff8142bc20 (+ 32) ffffffff800c46a6 <kernel_x86_64> _ZN14SystemProfiler15_ProfilingEventEP5timer + 0x16 21 ffffffff8142bc40 (+ 96) ffffffff8008bab4 <kernel_x86_64> timer_interrupt + 0xd4 22 ffffffff8142bca0 (+ 96) ffffffff8005fbf9 <kernel_x86_64> int_io_interrupt_handler + 0xb9 23 ffffffff8142bd00 (+ 32) ffffffff801580d9 <kernel_x86_64> x86_hardware_interrupt + 0xd9 24 ffffffff8142bd20 (+ 536) ffffffff8014f8fd <kernel_x86_64> int_bottom_user + 0xb2 user iframe at 0xffffffff8142bf38 (end = 0xffffffff8142c000) rax 0xda25b9c6d8dfc115 rbx 0x3b853972a5ae287 rcx 0x7 rdx 0x84ae96afe2bba rsi 0x6468b782aa1f92a8 rdi 0x7f51b4cb6330 rbp 0x1cc4f60 r8 0xd30562cc1de268f8 r9 0x8f76a49097a6ffc3 r10 0x784b8d21b5b4a9e2 r11 0x84ae96afe2bba r12 0x322d75af7942e4c7 r13 0x3356323bbbc60762 r14 0x4eae0517e1a116b8 r15 0x80b9c88ad2c78c4 rip 0xf2b6db7ea7 rsp 0x7f51b4cb61f8 rflags 0x13202 vector: 0xfb, error code: 0x0 25 ffffffff8142bf38 (+2156498984) 000000f2b6db7ea7 <libcrypto.so.1.0.0> bn_power5 (nearest) + 0x7a7 26 0000000001cc4f60 (+ 0) e4109b192ab62957 e3e53d9b9ff0025d -- read fault
comment:6 by , 5 years ago
Cc: | added |
---|
So, it appears libcrypto has hand-written assembly that does all kinds of fun stuff to the registers and makes them invalid, which is why we are trying to read a garbage pointer.
CC'ing korli and mmlr. The address is clearly in non-canonical form, I guess we should check for this in user_memcpy and just bail immediately if it is?
comment:7 by , 5 years ago
Alternatively we could modify this code to call the fault handler even under GPFs: https://xref.plausible.coop/source/xref/haiku/src/system/kernel/arch/x86/64/descriptors.cpp#349 I don't know the implications of that however, and if we should take that route.
comment:8 by , 5 years ago
Actually, IS_USER_ADDRESS already checks for canonical form because it looks for things < USER_TOP. Then what we really should do is verify here that the address specified is really a user one.
comment:10 by , 5 years ago
Milestone: | Unscheduled → R1/beta2 |
---|
Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone
Same KDL here.