Opened 2 years ago

Closed 2 years ago

#17531 closed bug (fixed)

KDL (GPE) in syscall entry (thread_hit_debug_event_internal)

Reported by: waddlesplash Owned by: nobody
Priority: normal Milestone: R1/beta4
Component: System/Kernel Version: R1/Development
Keywords: Cc: korli
Blocked By: Blocking:
Platform: All

Description

I was running jam, and hit Ctrl+C at the prompt while it was still in the startup phase and got a KDL.

PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0

Welcome to Kernel Debugging Land...
Thread 34781 "jam" running on CPU 1
stack trace for thread 34781 "jam"
    kernel stack: 0xffffffff8153b000 to 0xffffffff81540000
      user stack: 0x00007fc8194c8000 to 0x00007fc81a4c8000
frame                       caller             <image>:function + offset
 0 ffffffff8153eee8 (+  24) ffffffff801446dc   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16
 1 ffffffff8153ef00 (+  80) ffffffff800adbe8   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff8153ef50 (+  96) ffffffff800af203   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf3
 3 ffffffff8153efb0 (+  80) ffffffff800af59e   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff8153f000 (+ 240) ffffffff800af8f7   <kernel_x86_64> panic + 0xb7
 5 ffffffff8153f0f0 (+ 856) ffffffff80145ecc   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff8153f448 (end = 0xffffffff8153f510)
 rax 0x0                   rbx 0x0                   rcx 0xa0
 rdx 0xffffffff96355cd8    rsi 0xffffffff96355cd8    rdi 0xffffffffa1d78150
 rbp 0xffffffff8153fdb8     r8 0x0                    r9 0x0
 r10 0x1                   r11 0x7e                  r12 0x0
 r13 0x200                 r14 0xffffffffa1d77f40    r15 0x0
 rip 0xffffffff800bd7d9    rsp 0xffffffff8153f518 rflags 0x10046
 vector: 0xd, error code: 0x0
 6 ffffffff8153f448 (+2416) ffffffff800bd7d9   <kernel_x86_64> thread_hit_debug_event_internal[clone .constprop.0] (debug_debugger_message, void const*, int, bool, bool&) + 0x549
 7 ffffffff8153fdb8 (+ 144) ffffffff800bd92e   <kernel_x86_64> thread_hit_debug_event(debug_debugger_message, void const*, int, bool) + 0x3e
 8 ffffffff8153fe48 (+ 160) ffffffff800bddda   <kernel_x86_64> user_debug_pre_syscall + 0x7a
 9 ffffffff8153fee8 (+  72) ffffffff801462ea   <kernel_x86_64> x86_64_syscall_entry + 0x216
user iframe at 0xffffffff8153ff30 (end = 0xffffffff8153fff8)
 rax 0x95                  rbx 0x7fc81a4c8648        rcx 0x11ef70bc13c
 rdx 0x1                   rsi 0x7fc81a4bf720        rdi 0xffffffff
 rbp 0x7fc81a4bf640         r8 0x80                   r9 0x13
 r10 0x7fc81a4bf660        r11 0x206                 r12 0x1
 r13 0x7fc81a4c8658        r14 0x0                   r15 0x0
 rip 0x11ef70bc13c         rsp 0x7fc81a4bf628     rflags 0x206
 vector: 0x63, error code: 0x0
10 ffffffff8153ff30 (+140499536574224) 0000011ef70bc13c   <libroot.so> _kern_read_stat + 0x0c
11 00007fc81a4bf640 (+ 160) 0000018d6b963c8b   <jam> file_time + 0x2f
12 00007fc81a4bf6e0 (+1136) 0000018d6b965582   <jam> jcache + 0x7b
13 00007fc81a4bfb50 (+  32) 0000018d6b96a370   <jam> yyline + 0x107
14 00007fc81a4bfb70 (+10304) 0000018d6b96a52c   <jam> yylex + 0x5d
15 00007fc81a4c23b0 (+8592) 0000018d6b96c267   <jam> yyparse + 0x28b
16 00007fc81a4c4540 (+ 128) 0000018d6b96831d   <jam> parse_file + 0x40
17 00007fc81a4c45c0 (+  64) 0000018d6b96fd9f   <jam> compile_include + 0x101
18 00007fc81a4c4600 (+  80) 0000018d6b97005e   <jam> compile_on + 0xd1
19 00007fc81a4c4650 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
20 00007fc81a4c4690 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
21 00007fc81a4c46f0 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
22 00007fc81a4c4730 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
23 00007fc81a4c4790 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
24 00007fc81a4c47d0 (+ 144) 0000018d6b970491   <jam> evaluate_rule + 0x2cd
25 00007fc81a4c4860 (+ 176) 0000018d6b970179   <jam> compile_rule + 0xed
26 00007fc81a4c4910 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
27 00007fc81a4c4950 (+ 128) 0000018d6b96834b   <jam> parse_file + 0x6e
28 00007fc81a4c49d0 (+  64) 0000018d6b96fd9f   <jam> compile_include + 0x101
29 00007fc81a4c4a10 (+  80) 0000018d6b97005e   <jam> compile_on + 0xd1
30 00007fc81a4c4a60 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
31 00007fc81a4c4aa0 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
32 00007fc81a4c4b00 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
33 00007fc81a4c4b40 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
34 00007fc81a4c4ba0 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
35 00007fc81a4c4be0 (+ 144) 0000018d6b970491   <jam> evaluate_rule + 0x2cd
36 00007fc81a4c4c70 (+ 176) 0000018d6b970179   <jam> compile_rule + 0xed
37 00007fc81a4c4d20 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
38 00007fc81a4c4d60 (+ 128) 0000018d6b96834b   <jam> parse_file + 0x6e
39 00007fc81a4c4de0 (+  64) 0000018d6b96fd9f   <jam> compile_include + 0x101
40 00007fc81a4c4e20 (+  80) 0000018d6b97005e   <jam> compile_on + 0xd1
41 00007fc81a4c4e70 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
42 00007fc81a4c4eb0 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
43 00007fc81a4c4f10 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
44 00007fc81a4c4f50 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
45 00007fc81a4c4fb0 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
46 00007fc81a4c4ff0 (+ 144) 0000018d6b970491   <jam> evaluate_rule + 0x2cd
47 00007fc81a4c5080 (+ 176) 0000018d6b970179   <jam> compile_rule + 0xed
48 00007fc81a4c5130 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
49 00007fc81a4c5170 (+ 128) 0000018d6b96834b   <jam> parse_file + 0x6e
50 00007fc81a4c51f0 (+  64) 0000018d6b96fd9f   <jam> compile_include + 0x101
51 00007fc81a4c5230 (+  80) 0000018d6b97005e   <jam> compile_on + 0xd1
52 00007fc81a4c5280 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
53 00007fc81a4c52c0 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
54 00007fc81a4c5320 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
55 00007fc81a4c5360 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
56 00007fc81a4c53c0 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
57 00007fc81a4c5400 (+ 144) 0000018d6b970491   <jam> evaluate_rule + 0x2cd
58 00007fc81a4c5490 (+ 176) 0000018d6b970179   <jam> compile_rule + 0xed
59 00007fc81a4c5540 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
60 00007fc81a4c5580 (+ 128) 0000018d6b96834b   <jam> parse_file + 0x6e
61 00007fc81a4c5600 (+  64) 0000018d6b96fd9f   <jam> compile_include + 0x101
62 00007fc81a4c5640 (+  80) 0000018d6b97005e   <jam> compile_on + 0xd1
63 00007fc81a4c5690 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
64 00007fc81a4c56d0 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
65 00007fc81a4c5730 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
66 00007fc81a4c5770 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
67 00007fc81a4c57d0 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
68 00007fc81a4c5810 (+ 144) 0000018d6b970491   <jam> evaluate_rule + 0x2cd
69 00007fc81a4c58a0 (+ 176) 0000018d6b970179   <jam> compile_rule + 0xed
70 00007fc81a4c5950 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
71 00007fc81a4c5990 (+ 128) 0000018d6b96834b   <jam> parse_file + 0x6e
72 00007fc81a4c5a10 (+  64) 0000018d6b96fd9f   <jam> compile_include + 0x101
73 00007fc81a4c5a50 (+  80) 0000018d6b97005e   <jam> compile_on + 0xd1
74 00007fc81a4c5aa0 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
75 00007fc81a4c5ae0 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
76 00007fc81a4c5b40 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
77 00007fc81a4c5b80 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
78 00007fc81a4c5be0 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
79 00007fc81a4c5c20 (+ 144) 0000018d6b970491   <jam> evaluate_rule + 0x2cd
80 00007fc81a4c5cb0 (+ 176) 0000018d6b970179   <jam> compile_rule + 0xed
81 00007fc81a4c5d60 (+  64) 0000018d6b96fc9c   <jam> compile_if + 0x79
82 00007fc81a4c5da0 (+  64) 0000018d6b97053e   <jam> compile_rules + 0x4a
83 00007fc81a4c5de0 (+  96) 0000018d6b96ff54   <jam> compile_local + 0x148
84 00007fc81a4c5e40 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
85 00007fc81a4c5e80 (+ 128) 0000018d6b96834b   <jam> parse_file + 0x6e
86 00007fc81a4c5f00 (+  64) 0000018d6b96fd9f   <jam> compile_include + 0x101
87 00007fc81a4c5f40 (+  64) 0000018d6b970599   <jam> compile_rules + 0xa5
88 00007fc81a4c5f80 (+ 128) 0000018d6b96834b   <jam> parse_file + 0x6e
89 00007fc81a4c6000 (+4448) 0000018d6b962f2c   <jam> main + 0x767
90 00007fc81a4c7160 (+  48) 0000018d6b9627be   <jam> _start + 0x3e
91 00007fc81a4c7190 (+  48) 0000008252a78f55   </boot/system/runtime_loader@0x0000008252a68000> <unknown> + 0x10f55
92 00007fc81a4c71c0 (+   0) 00007ff4a7649260   <commpage> commpage_thread_exit + 0x00
kdebug> 

Attachments (1)

IMG_0368.JPG (4.9 MB ) - added by avanspector 2 years ago.

Change History (14)

comment:1 by waddlesplash, 2 years ago

Version: R1/beta3R1/Development

comment:2 by waddlesplash, 2 years ago

I was running jam in strace at the time, probably relevant.

comment:3 by waddlesplash, 2 years ago

Cc: korli added

Yeah, this is reproducible for me. Ctrl+C during strace of jam seems to generally trigger a KDL.

CC korli: perhaps your recent changes to strace may have "uncovered" this? (Probably not as they were all in userland?) I'll look into fixes later.

comment:4 by korli, 2 years ago

Indeed, now stace enables pre syscall debug traces.

comment:5 by waddlesplash, 2 years ago

Might you have some idea of where to start looking on this (or how to get a more useful message than GPE?)

comment:6 by waddlesplash, 2 years ago

The problem seems to be that the stack pointer is not aligned to 16 (but only to 8) and GCC generated aligned-SSE instructions for a copy. The question then is how we wound up with a misaligned stack pointer in syscall entry; or why this only manifests in user_debug_pre_syscall...

comment:7 by waddlesplash, 2 years ago

So! I added some alignment checks:

diff --git a/src/system/kernel/debug/user_debugger.cpp b/src/system/kernel/debug/user_debugger.cpp
index aa1f760759..8d840d382a 100644
--- a/src/system/kernel/debug/user_debugger.cpp
+++ b/src/system/kernel/debug/user_debugger.cpp
@@ -65,6 +65,7 @@ static void schedule_profiling_timer(Thread* thread, bigtime_t interval);
 static int32 profiling_event(timer* unused);
 static status_t ensure_debugger_installed();
 static void get_team_debug_info(team_debug_info &teamDebugInfo);
+extern void alignment_check(void* ptr);


 static inline status_t
@@ -734,6 +735,7 @@ thread_hit_debug_event_internal(debug_debugger_message event,
        // update the thread debug info
        bool destroyThreadInfo = false;
        thread_debug_info threadDebugInfo;
+       alignment_check(&threadDebugInfo);

        state = disable_interrupts();
        threadDebugInfoLocker.Lock();
@@ -832,6 +834,7 @@ user_debug_pre_syscall(uint32 syscall, void *args)
 {
        // check whether a debugger is installed
        Thread *thread = thread_get_current_thread();
+       alignment_check(&thread);
        int32 teamDebugFlags = atomic_get(&thread->team->debug_info.flags);
        if (!(teamDebugFlags & B_TEAM_DEBUG_DEBUGGER_INSTALLED))
                return;

and in another file (as putting this inline just gets optimized out):

void alignment_check(void* ptr)
{
	if ((intptr_t(ptr) % 16) != 0)
		panic("BAD ALIGNMENT!");
}

And indeed:

PANIC: BAD ALIGNMENT!
Welcome to Kernel Debugging Land...
Thread 453 "true" running on CPU 0
stack trace for thread 453 "true"
    kernel stack: 0xffffffff81aba000 to 0xffffffff81abf000
      user stack: 0x00007fee9bacb000 to 0x00007fee9cacb000
frame                       caller             <image>:function + offset
 0 ffffffff81abe2e0 (+  24) ffffffff80144c3c   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16
 1 ffffffff81abe2f8 (+  80) ffffffff800ae258   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff81abe348 (+  96) ffffffff800af873   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_
tag*, int) + 0xf3
 3 ffffffff81abe3a8 (+  80) ffffffff800afc0e   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_l
ist_tag*, int) + 0x6e
 4 ffffffff81abe3f8 (+ 240) ffffffff800aff67   <kernel_x86_64> panic + 0xb7
 5 ffffffff81abe4e8 (+2224) ffffffff800bdbf8   <kernel_x86_64> thread_hit_debug_event_internal[clone .constprop.0] (debu
g_debugger_message, void const*, int, bool, bool&) + 0x2d8
 6 ffffffff81abed98 (+ 144) ffffffff800bdfde   <kernel_x86_64> thread_hit_debug_event(debug_debugger_message, void const
*, int, bool) + 0x3e
 7 ffffffff81abee28 (+ 192) ffffffff800be4bc   <kernel_x86_64> user_debug_pre_syscall + 0xac
 8 ffffffff81abeee8 (+  72) ffffffff8014684a   <kernel_x86_64> x86_64_syscall_entry + 0x216
user iframe at 0xffffffff81abef30 (end = 0xffffffff81abeff8)
 rax 0xc9                  rbx 0x0                   rcx 0x1abfa603224
 rdx 0x7e9ef2b010          rsi 0x5                   rdi 0x224e
 rbp 0x7fee9cacad80         r8 0x0                    r9 0x1
 r10 0x7fee9cacacec        r11 0x246                 r12 0x0
 r13 0x7e9ef2b010          r14 0x7e9ef2b010          r15 0x1abfa80bac0
 rip 0x1abfa603224         rsp 0x7fee9cacad58     rflags 0x246
 vector: 0x63, error code: 0x0
 9 ffffffff81abef30 (+140664926944848) 000001abfa603224   </boot/system/runtime_loader@0x000001abfa5ee000> <unknown> + 0
x15224
10 00007fee9cacad80 (+ 144) 000001abfa5f7be0   </boot/system/runtime_loader@0x000001abfa5ee000> <unknown> + 0x9be0
11 00007fee9cacae10 (+  48) 000001abfa5fefe3   </boot/system/runtime_loader@0x000001abfa5ee000> <unknown> + 0x10fe3
12 00007fee9cacae40 (+   0) 00007fd3e58d2260   <commpage> commpage_thread_exit + 0x00
kdebug>

comment:8 by waddlesplash, 2 years ago

Adding another check in thread_hit_debug_event also fires. I guess this must be another case of RDP/RSP being incorrectly matched...?

comment:9 by waddlesplash, 2 years ago

The prologues of both functions do

	pushq	%rbp
	movq	%rsp, %rbp

as expected, so I would expect that if RSP was aligned in the first one (which apparently it is) it would also be aligned in all subsequent ones, right?

comment:10 by korli, 2 years ago

indeed aligning the stack looks healthy. can't reproduce with this change: https://review.haiku-os.org/c/haiku/+/5113

comment:11 by waddlesplash, 2 years ago

Milestone: UnscheduledR1/beta4
Resolution: fixed
Status: newclosed

Indeed fixed in hrev55957.

comment:12 by avanspector, 2 years ago

Resolution: fixed
Status: closedreopened

message reoccured because of "page scrubber" while using bepdf

by avanspector, 2 years ago

Attachment: IMG_0368.JPG added

comment:13 by waddlesplash, 2 years ago

Resolution: fixed
Status: reopenedclosed

This is a totally unrelated issue which is probably just another symptom of #13205. Please attach the KDL to that ticket instead.

Note: See TracTickets for help on using tickets.