#13927 closed bug (fixed)
gpgme KDL's when test are enabled
Reported by: | Begasus | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta2 |
Component: | Network & Internet/TCP | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
When trying to update to the latest version for gpgme I ran into KDL's (reproducible), I initially filed an issue it on haikuports and it seems legit at that time, updating the system with full-sync removed the KDL from the build, but then the problem was still there when after the build the filesystem couldn't be unmounted, I could kill the instance with PC and the got back out of the unmounting problem. After that I try'd to build with the option "--disable-gpg-test" and then the build was ok, but I couldn't run the test with "hp --test *" Then I retry'd the build with the test enabled, knowing the problem with unmount. But when trying to run the test again inside haikuporter it send me back into KDL, more info and links at: https://github.com/haikuports/haikuports/issues/2040
Attachments (1)
Change History (23)
by , 7 years ago
Attachment: | gpg_kdl.txt added |
---|
comment:1 by , 7 years ago
patch: | 0 → 1 |
---|
comment:3 by , 7 years ago
PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbefb, ip 0x81251e16 Welcome to Kernel Debugging Land... Thread 249 "loop consumer" running on CPU 0 stack trace for thread 249 "loop consumer" kernel stack: 0x81174000 to 0x81178000 frame caller <image>:function + offset 0 81177a34 (+ 32) 8014f416 <kernel_x86> arch_debug_stack_trace + 0x12 1 81177a54 (+ 16) 800a9fbf <kernel_x86> stack_trace_trampoline(NULL) + 0x0b 2 81177a64 (+ 12) 801411ce <kernel_x86> arch_debug_call_with_fault_handler + 0x1b 3 81177a70 (+ 48) 800abae6 <kernel_x86> debug_call_with_fault_handler + 0x5a 4 81177aa0 (+ 64) 800aa1db <kernel_x86> kernel_debugger_loop([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m0[0m) + 0x217 5 81177ae0 (+ 48) 800aa557 <kernel_x86> kernel_debugger_internal([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m0[0m) + 0x53 6 81177b10 (+ 48) 800abe72 <kernel_x86> panic + 0x3a 7 81177b40 (+ 144) 80125c9d <kernel_x86> vm_page_fault + 0x145 8 81177bd0 (+ 80) 80150c7f <kernel_x86> x86_page_fault_exception + 0x17f 9 81177c20 (+ 12) 80143b8c <kernel_x86> int_bottom + 0x3c kernel iframe at 0x81177c2c (end = 0x81177c7c) eax 0xdeadbeef ebx 0x81256178 ecx 0xdd458354 edx 0x4 esi 0xdd471c60 edi 0xdd471c6c ebp 0x81177cb4 esp 0x81177c60 eip 0x81251e16 eflags 0x210286 vector: 0xe, error code: 0x0 10 81177c2c (+ 136) 81251e16 </boot/system/add-ons/kernel/network/protocols/tcp> ConnectionHashDefinition<[32m0xd259674c[0m>::Compare(ConnectionHashDefinition: [34m0x81177cec[0m) + 0x22 11 81177cb4 (+ 64) 812522da </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::_LookupConnection(sockaddr*: [34m0xdd458354[0m) + 0x66 12 81177cf4 (+ 336) 81252d23 </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::FindConnection(sockaddr*: [34m0xdd458354[0m) + 0xc7 13 81177e44 (+ 160) 8124c9e7 </boot/system/add-ons/kernel/network/protocols/tcp> tcp_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x31b 14 81177ee4 (+ 128) 8115b66a </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x36a 15 81177f64 (+ 80) 8113c9ce </boot/system/add-ons/kernel/network/stack> device_consumer_thread([34m0xd203d000[0m) + 0x6e 16 81177fb4 (+ 48) 80089207 <kernel_x86> common_thread_entry([34m0x81177ff0[0m) + 0x3b kdebug> co read access attempted on read-protected area 0x57 at 0xdeadb000 vm_page_fault: vm_soft_fault returned error 'Permission denied' on fault at 0xdeadbefb, ip 0x81251e16, write 0, user 0, thread 0xf9 PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbefb, ip 0x81251e16 Welcome to Kernel Debugging Land... Thread 249 "loop consumer" running on CPU 0 stack trace for thread 249 "loop consumer" kernel stack: 0x81174000 to 0x81178000 frame caller <image>:function + offset 0 81177a34 (+ 32) 8014f416 <kernel_x86> arch_debug_stack_trace + 0x12 1 81177a54 (+ 16) 800a9fbf <kernel_x86> stack_trace_trampoline(NULL) + 0x0b 2 81177a64 (+ 12) 801411ce <kernel_x86> arch_debug_call_with_fault_handler + 0x1b 3 81177a70 (+ 48) 800abae6 <kernel_x86> debug_call_with_fault_handler + 0x5a 4 81177aa0 (+ 64) 800aa1db <kernel_x86> kernel_debugger_loop([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m0[0m) + 0x217 5 81177ae0 (+ 48) 800aa557 <kernel_x86> kernel_debugger_internal([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m0[0m) + 0x53 6 81177b10 (+ 48) 800abe72 <kernel_x86> panic + 0x3a 7 81177b40 (+ 144) 80125c9d <kernel_x86> vm_page_fault + 0x145 8 81177bd0 (+ 80) 80150c7f <kernel_x86> x86_page_fault_exception + 0x17f 9 81177c20 (+ 12) 80143b8c <kernel_x86> int_bottom + 0x3c kernel iframe at 0x81177c2c (end = 0x81177c7c) eax 0xdeadbeef ebx 0x81256178 ecx 0xdd458354 edx 0x4 esi 0xdd471c60 edi 0xdd471c6c ebp 0x81177cb4 esp 0x81177c60 eip 0x81251e16 eflags 0x210286 vector: 0xe, error code: 0x0 10 81177c2c (+ 136) 81251e16 </boot/system/add-ons/kernel/network/protocols/tcp> ConnectionHashDefinition<[32m0xd259674c[0m>::Compare(ConnectionHashDefinition: [34m0x81177cec[0m) + 0x22 11 81177cb4 (+ 64) 812522da </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::_LookupConnection(sockaddr*: [34m0xdd458354[0m) + 0x66 12 81177cf4 (+ 336) 81252d23 </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::FindConnection(sockaddr*: [34m0xdd458354[0m) + 0xc7 13 81177e44 (+ 160) 8124c9e7 </boot/system/add-ons/kernel/network/protocols/tcp> tcp_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x31b 14 81177ee4 (+ 128) 8115b66a </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x36a 15 81177f64 (+ 80) 8113c9ce </boot/system/add-ons/kernel/network/stack> device_consumer_thread([34m0xd203d000[0m) + 0x6e 16 81177fb4 (+ 48) 80089207 <kernel_x86> common_thread_entry([34m0x81177ff0[0m) + 0x3b kdebug> sc stack trace for thread 249 "loop consumer" kernel stack: 0x81174000 to 0x81178000 frame caller <image>:function + offset 0 81177824 (+ 48) 800acc30 <kernel_x86> invoke_command_trampoline([34m0x811778bc[0m) + 0x1c 1 81177854 (+ 12) 801411ce <kernel_x86> arch_debug_call_with_fault_handler + 0x1b 2 81177860 (+ 48) 800abae6 <kernel_x86> debug_call_with_fault_handler + 0x5a 3 81177890 (+ 64) 800ace89 <kernel_x86> invoke_debugger_command + 0xb9 4 811778d0 (+ 64) 800accb5 <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: [34m0xcce38030[0m, int32: [34m0[0m, [34m0x0[0m [31m"<NULL>"[0m) + 0x79 5 81177910 (+ 64) 800acff4 <kernel_x86> invoke_debugger_command_pipe + 0x9c 6 81177950 (+ 48) 800aef34 <kernel_x86> ExpressionParser<[32m0x81177a00[0m>::_ParseCommandPipe([34m0x811779fc[0m) + 0x234 7 81177980 (+ 64) 800ae154 <kernel_x86> ExpressionParser<[32m0x81177a00[0m>::EvaluateCommand([34m0x801c52c0[0m [36m"sc"[0m, [34m0x811779fc[0m) + 0x2d0 8 811779c0 (+ 224) 800b0580 <kernel_x86> evaluate_debug_command + 0x80 9 81177aa0 (+ 64) 800aa2ea <kernel_x86> kernel_debugger_loop([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m0[0m) + 0x326 10 81177ae0 (+ 48) 800aa557 <kernel_x86> kernel_debugger_internal([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m0[0m) + 0x53 11 81177b10 (+ 48) 800abe72 <kernel_x86> panic + 0x3a 12 81177b40 (+ 144) 80125c9d <kernel_x86> vm_page_fault + 0x145 13 81177bd0 (+ 80) 80150c7f <kernel_x86> x86_page_fault_exception + 0x17f 14 81177c20 (+ 12) 80143b8c <kernel_x86> int_bottom + 0x3c kernel iframe at 0x81177c2c (end = 0x81177c7c) eax 0xdeadbeef ebx 0x81256178 ecx 0xdd458354 edx 0x4 esi 0xdd471c60 edi 0xdd471c6c ebp 0x81177cb4 esp 0x81177c60 eip 0x81251e16 eflags 0x210286 vector: 0xe, error code: 0x0 15 81177c2c (+ 136) 81251e16 </boot/system/add-ons/kernel/network/protocols/tcp> ConnectionHashDefinition<[32m0xd259674c[0m>::Compare(ConnectionHashDefinition: [34m0x81177cec[0m) + 0x22 16 81177cb4 (+ 64) 812522da </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::_LookupConnection(sockaddr*: [34m0xdd458354[0m) + 0x66 17 81177cf4 (+ 336) 81252d23 </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::FindConnection(sockaddr*: [34m0xdd458354[0m) + 0xc7 18 81177e44 (+ 160) 8124c9e7 </boot/system/add-ons/kernel/network/protocols/tcp> tcp_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x31b 19 81177ee4 (+ 128) 8115b66a </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x36a 20 81177f64 (+ 80) 8113c9ce </boot/system/add-ons/kernel/network/stack> device_consumer_thread([34m0xd203d000[0m) + 0x6e 21 81177fb4 (+ 48) 80089207 <kernel_x86> common_thread_entry([34m0x81177ff0[0m) + 0x3b kdebug> co read access attempted on read-protected area 0x57 at 0xdeadb000 vm_page_fault: vm_soft_fault returned error 'Permission denied' on fault at 0xdeadbefb, ip 0x81251e16, write 0, user 0, thread 0xf9 PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbefb, ip 0x81251e16 Welcome to Kernel Debugging Land... Thread 249 "loop consumer" running on CPU 1 stack trace for thread 249 "loop consumer" kernel stack: 0x81174000 to 0x81178000 frame caller <image>:function + offset 0 81177a34 (+ 32) 8014f416 <kernel_x86> arch_debug_stack_trace + 0x12 1 81177a54 (+ 16) 800a9fbf <kernel_x86> stack_trace_trampoline(NULL) + 0x0b 2 81177a64 (+ 12) 801411ce <kernel_x86> arch_debug_call_with_fault_handler + 0x1b 3 81177a70 (+ 48) 800abae6 <kernel_x86> debug_call_with_fault_handler + 0x5a 4 81177aa0 (+ 64) 800aa1db <kernel_x86> kernel_debugger_loop([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m1[0m) + 0x217 5 81177ae0 (+ 48) 800aa557 <kernel_x86> kernel_debugger_internal([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m1[0m) + 0x53 6 81177b10 (+ 48) 800abe72 <kernel_x86> panic + 0x3a 7 81177b40 (+ 144) 80125c9d <kernel_x86> vm_page_fault + 0x145 8 81177bd0 (+ 80) 80150c7f <kernel_x86> x86_page_fault_exception + 0x17f 9 81177c20 (+ 12) 80143b8c <kernel_x86> int_bottom + 0x3c kernel iframe at 0x81177c2c (end = 0x81177c7c) eax 0xdeadbeef ebx 0x81256178 ecx 0xdd458354 edx 0x4 esi 0xdd471c60 edi 0xdd471c6c ebp 0x81177cb4 esp 0x81177c60 eip 0x81251e16 eflags 0x210286 vector: 0xe, error code: 0x0 10 81177c2c (+ 136) 81251e16 </boot/system/add-ons/kernel/network/protocols/tcp> ConnectionHashDefinition<[32m0xd259674c[0m>::Compare(ConnectionHashDefinition: [34m0x81177cec[0m) + 0x22 11 81177cb4 (+ 64) 812522da </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::_LookupConnection(sockaddr*: [34m0xdd458354[0m) + 0x66 12 81177cf4 (+ 336) 81252d23 </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::FindConnection(sockaddr*: [34m0xdd458354[0m) + 0xc7 13 81177e44 (+ 160) 8124c9e7 </boot/system/add-ons/kernel/network/protocols/tcp> tcp_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x31b 14 81177ee4 (+ 128) 8115b66a </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x36a 15 81177f64 (+ 80) 8113c9ce </boot/system/add-ons/kernel/network/stack> device_consumer_thread([34m0xd203d000[0m) + 0x6e 16 81177fb4 (+ 48) 80089207 <kernel_x86> common_thread_entry([34m0x81177ff0[0m) + 0x3b kdebug> shutdown kdebug> co read access attempted on read-protected area 0x57 at 0xdeadb000 vm_page_fault: vm_soft_fault returned error 'Permission denied' on fault at 0xdeadbefb, ip 0x81251e16, write 0, user 0, thread 0xf9 PANIC: vm_page_fault: unhandled page fault in kernel space at 0xdeadbefb, ip 0x81251e16 Welcome to Kernel Debugging Land... Thread 249 "loop consumer" running on CPU 1 stack trace for thread 249 "loop consumer" kernel stack: 0x81174000 to 0x81178000 frame caller <image>:function + offset 0 81177a34 (+ 32) 8014f416 <kernel_x86> arch_debug_stack_trace + 0x12 1 81177a54 (+ 16) 800a9fbf <kernel_x86> stack_trace_trampoline(NULL) + 0x0b 2 81177a64 (+ 12) 801411ce <kernel_x86> arch_debug_call_with_fault_handler + 0x1b 3 81177a70 (+ 48) 800abae6 <kernel_x86> debug_call_with_fault_handler + 0x5a 4 81177aa0 (+ 64) 800aa1db <kernel_x86> kernel_debugger_loop([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m1[0m) + 0x217 5 81177ae0 (+ 48) 800aa557 <kernel_x86> kernel_debugger_internal([34m0x80193777[0m [36m"PANIC: "[0m, [34m0x801aa440[0m [36m"vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx "[0m, [34m0x81177b4c[0m [36m"\FB\BE\AD\DE %\81\93[\80@V\9F\82` G\DD,|\81D\AA\8D\82\B2\80D\AA\8D\82"[0m, int32: [34m1[0m) + 0x53 6 81177b10 (+ 48) 800abe72 <kernel_x86> panic + 0x3a 7 81177b40 (+ 144) 80125c9d <kernel_x86> vm_page_fault + 0x145 8 81177bd0 (+ 80) 80150c7f <kernel_x86> x86_page_fault_exception + 0x17f 9 81177c20 (+ 12) 80143b8c <kernel_x86> int_bottom + 0x3c kernel iframe at 0x81177c2c (end = 0x81177c7c) eax 0xdeadbeef ebx 0x81256178 ecx 0xdd458354 edx 0x4 esi 0xdd471c60 edi 0xdd471c6c ebp 0x81177cb4 esp 0x81177c60 eip 0x81251e16 eflags 0x210286 vector: 0xe, error code: 0x0 10 81177c2c (+ 136) 81251e16 </boot/system/add-ons/kernel/network/protocols/tcp> ConnectionHashDefinition<[32m0xd259674c[0m>::Compare(ConnectionHashDefinition: [34m0x81177cec[0m) + 0x22 11 81177cb4 (+ 64) 812522da </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::_LookupConnection(sockaddr*: [34m0xdd458354[0m) + 0x66 12 81177cf4 (+ 336) 81252d23 </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager<[32m0xd2596720[0m>::FindConnection(sockaddr*: [34m0xdd458354[0m) + 0xc7 13 81177e44 (+ 160) 8124c9e7 </boot/system/add-ons/kernel/network/protocols/tcp> tcp_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x31b 14 81177ee4 (+ 128) 8115b66a </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data(net_buffer*: [34m0xdd458298[0m) + 0x36a 15 81177f64 (+ 80) 8113c9ce </boot/system/add-ons/kernel/network/stack> device_consumer_thread([34m0xd203d000[0m) + 0x6e 16 81177fb4 (+ 48) 80089207 <kernel_x86> common_thread_entry([34m0x81177ff0[0m) + 0x3b kdebug> shutdown =[1D [1D=n[1D [1D[1D [1D+[1D [1D/[1D [1D>[1D [1D.[1D [1D,<MN[1D [1D[1D [1D[1D [1D[1D [1D0-[1D [1D[1D [1D-n kdebug>
comment:5 by , 7 years ago
Component: | - General → Network & Internet/TCP |
---|---|
Owner: | changed from | to
comment:6 by , 7 years ago
FWIW I am unable to reproduce this. I also don't really see the link to #12567. From a quick glance at the change to resolve that one and the corresponding TCP code I don't really see how one would apply the logic there. I have not actually investigated it in depth though.
comment:7 by , 7 years ago
patch: | 1 → 0 |
---|
comment:8 by , 5 years ago
I was using gnupg a couple of days ago and happened to randomly reproduce this exact problem. After reading through the code and adding lots of extra tracing to understand what was going on I have an explanation for what is happening.
The cool thing is that I can reproduce this 100% of the time with this single command:
> gpg --recv-keys 066DAFCB81E42C40
Which results in the following KDL
PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 217 "loop consumer" running on CPU 0 stack trace for thread 217 "loop consumer" kernel stack: 0xffffffff81c5a000 to 0xffffffff81c5f000 frame caller <image>:function + offset 0 ffffffff81c5e608 (+ 24) ffffffff801567dc <kernel_x86_64> arch_debug_call_with_fault_handler + 0x16 1 ffffffff81c5e620 (+ 80) ffffffff800b0218 <kernel_x86_64> debug_call_with_fault_handler + 0x88 2 ffffffff81c5e670 (+ 96) ffffffff800b1ba1 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf1 3 ffffffff81c5e6d0 (+ 80) ffffffff800b1e9e <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff81c5e720 (+ 240) ffffffff800b2207 <kernel_x86_64> panic + 0xb7 5 ffffffff81c5e810 (+ 224) ffffffff80160f98 <kernel_x86_64> x86_unexpected_exception + 0x168 6 ffffffff81c5e8f0 (+ 536) ffffffff80158082 <kernel_x86_64> int_bottom + 0x56 kernel iframe at 0xffffffff81c5eb08 (end = 0xffffffff81c5ebd0) rax 0xdeadbeefdeadbeef rbx 0xffffffff82ade400 rcx 0x8 rdx 0xffffffff82ade400 rsi 0xffffffff81c5ebf0 rdi 0xdeadbeefdeadbf0f rbp 0xffffffff81c5ebe0 r8 0xffffffff801f9860 r9 0x5f r10 0xf r11 0xf r12 0xffffffff81c5ebf0 r13 0xffffffff81c5ebf0 r14 0x1 r15 0xffffffff928e01f0 rip 0xffffffff8e978446 rsp 0xffffffff81c5ebd0 rflags 0x13286 vector: 0xd, error code: 0x0 7 ffffffff81c5eb08 (+ 216) ffffffff8e978446 </boot/system/add-ons/kernel/network/protocols/tcp> ConnectionHashDefinition::Compare const(std::pair<sockaddr const*, sockaddr const*> const&, TCPEndpoint*) + 0x16 8 ffffffff81c5ebe0 (+ 64) ffffffff8e978871 </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager::_LookupConnection(sockaddr const*, sockaddr const*) + 0x81 9 ffffffff81c5ec20 (+ 352) ffffffff8e978e10 </boot/system/add-ons/kernel/network/protocols/tcp> EndpointManager::FindConnection(sockaddr*, sockaddr*) + 0x60 10 ffffffff81c5ed80 (+ 240) ffffffff8e971723 </boot/system/add-ons/kernel/network/protocols/tcp> tcp_receive_data(net_buffer*) + 0x333 11 ffffffff81c5ee70 (+ 208) ffffffff81c4a327 </boot/system/add-ons/kernel/network/protocols/ipv4> ipv4_receive_data(net_buffer*) + 0x5e7 12 ffffffff81c5ef40 (+ 112) ffffffff8e70f152 </boot/system/add-ons/kernel/network/stack> device_consumer_thread(void*) + 0x82 13 ffffffff81c5efb0 (+ 32) ffffffff8008d1f7 <kernel_x86_64> common_thread_entry(void*) + 0x37 14 ffffffff81c5efd0 (+2117734448) ffffffff81c5efe0 kdebug> qemu-system-x86_64: terminating on signal 2
The chain of events leading up to this looks something like:
- A socket is opened and connect is called. This calls EndpointManager::SetConnection(), which adds the TCPEndpoint for the connection to the EndpointManager::fConnectionHash BOpenHashTable here.
KWA tcp_init_protocol() Creating new TCPEndpoint: 0xffffffff82ade400 KWA tcp_open(0xffffffff82ade400) KWA tcp_get_domain(0xffffffff82ade400) KWA tcp_getsockopt(0xffffffff82ade400, -1, 1073741833, value, length) KWA tcp_bind(0xffffffff82ade400, 0xffffffff80592dc0) EndpointManager::Bind(0xffffffff82ade400, 0xffffffff80592dc0) EndpointManager::BindToEphemeral(0xffffffff82ade400) EndpointManager::BindToEphemeral(0xffffffff82ade400) -> 0.0.0.0:40373 EndpointManager::_Bind(0xffffffff82ade400, 0xffffffff80592c60) EndpointManager::_BindToEphemeral(0xffffffff82ade400, 0xffffffff80592dc0) bound port 40373 and is returning status No error KWA tcp_connect(0xffffffff82ade400, 1) EndpointManager::SetConnection(0xffffffff82ade400) EndpointManager::_LookupConnection(0xffffffff80592c80, 0xffffffff80592e80)
- In the kernel debugger, the tcp_endpoints command will show the inserted endpoint.
kdebug> tcp_endpoints -------- TCP Domain 0xffffffff928e01e0 --------- address local peer recv-q send-q state CONNECTION_HASH (1 entries) 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9050 0 0 closed kdebug> es
- A packet is received on this endpoint
TCP: Received packet corresponds to explicit endpoint 0xffffffff82ade400 KWA finishing tcp_receive_data: endpoint=0xffffffff82ade400, segmentAction=1, buffer=0xffffffff92c622d0
- For some reason, tcp_connect() is called again on the same endpoint
KWA tcp_connect(0xffffffff82ade400, 1)
This of course leads to EndpointManager::SetConnection() being called again. Just before the endpoint is inserted into fConnectionHash again, the tcp_endpoints command still returns the same output as above
kdebug> tcp_endpoints -------- TCP Domain 0xffffffff928e01e0 --------- address local peer recv-q send-q state CONNECTION_HASH (1 entries) 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9050 0 0 closed kdebug> es
But then SetConnection() calls fConnectionHash.Insert(endpoint) for an endpoint that is already in the table
EndpointManager::SetConnection(0xffffffff82ade400)
And now there is a cycle in the linked list for the hash of that endpoint in fConnectionHash**
kdebug> tcp_endpoints -------- TCP Domain 0xffffffff928e01e0 --------- address local peer recv-q send-q state CONNECTION_HASH (2 entries) 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed ... 0xffffffff82ade400 127.0.0.1:40373 127.0.0.1:9150 0 0 closed q kdebug> es
- The socket is closed, calling the destructor of the TCPSocket object fConnectionHash contains. This calls fConnectionHash.Remove(), but there are still entries in the table that point to that object. Since this is a debug build, after the destructor is called, 0xdeadbeef is memset in that location (which is why we see that in the original crash report).
KWA tcp_close(0xffffffff82ade400) KWA tcp_receive_data(0xffffffff92c622d0) KWA tcp_free(0xffffffff82ade400) Looking for: peer 127.0.0.1:40373, local 127.0.0.1:9150 KWA tcp_uninit_protocol(0xffffffff82ade400) EndpointManager::FindConnection(0xffffffff92c623b8, 0xffffffff92c62338) Deleting TCPEndpoint: 0xffffffff82ade400 EndpointManager::Unbind(0xffffffff82ade400)
- tcp_receive_data() is called because a packet is received from that same peer, 127.0.0.1:9150. This leads to EndpointManager::_LookupConnection() being called, which calls fConnectionHash.Lookup(local_socket, remote_socket). This hashes to the same position as our linked list cycle.
tcp_receive_data(): EndpointManager::FindConnection() returned NULL for endpoint 0x0000000000000000, replying with RST KWA TCP: Sending RST... add_tcp_header(): buffer 0xffffffff92d6a188, flags 0x14, seq 0, ack 20779736, up 0, win 0 TCP: RST sent with status No error... KWA tcp_receive_data(0xffffffff92d6a188) Looking for: peer 127.0.0.1:9150, local 127.0.0.1:40373 EndpointManager::FindConnection(0xffffffff92d6a270, 0xffffffff92d6a1f0) EndpointManager::_LookupConnection(0xffffffff92d6a270, 0xffffffff92d6a1f0)
- ConnectionHashDefinition::Compare is called to compare the params of fConnectionHash.Lookup to an item in the linked list at that position, and *kaboom*, the already freed TCPEndpoint is dereferenced.
PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0
So there are potentially a few bugs here. Reguardless of why tcp_connect() ends up getting called multiple times, EndpointManager::fConnectionHash ends up getting corrupted, which I think is the main bug here. The complete log is here.
Right now I'm working on a fix and a minimal repro for a regression test.
comment:9 by , 5 years ago
By the way it seems this behavior of gpg that causes it to call connect() twice seems to be a known issue on other platforms.
https://alfredmyers.com/2019/07/23/gpg-keyserver-receive-failed-no-dirmngr/
From that link:
- gpg spawns dirmngr
- dirmngr attempt to connect to localhost port 9050 (this is attempted twice).
- gpg attempts to connect to a unix socket that dirmngr creates after the localhost socket connection fails.
comment:10 by , 5 years ago
I have added a new test suite for BOpenHashTable<T> in a branch. I found an additional memory-corruption bug while doing so.
BOpenHashTable has an AutoExpand non-type template parameter. If set to false, which is not the default, then upon insertion it will:
- Check to see if the current buffer size is > 0. If not, it will resize the buffer to the minimum size, which is 8 elements. If the user calls Init(size_t) before inserting they can set this size to a greater value.
- If AutoExpand is true, then the buffer will be resized if more room is needed to handle the insertion. When resized to add more capacity, the size is doubled.
- Insert with no bounds check.
So currently if someone uses BOpenHashTable<T, false>, and then inserts more than the fixed number of elements defined (either by the default or call to Init(size_t)), then it will write beyond the bounds of the fTable array, smashing some random memory.
BOpenHashTable<T, false> is used in quite a few places, including the ipv4, ipv6, udp and unix add-ons, IOSchedulerSimple and HashedObjectCache. Even if those aren't broken because of this they easily could be.
comment:11 by , 5 years ago
Unfortunately fixing this properly may result in a pretty large set of changes.
Insert returns status_t, but most callers don't check the result. That's kinda ok though cause we can just focus our effort in code that disabled AutoExpand, which is only a handful.
There are many code paths which call BOpenHashTable<T>::InsertUnchecked, which currently does not return an error at all, but I think that's broken. The API conflates error checking with resizing here. There could be a failure even if resizing is disabled (i.e., there is no more room in the buffer).
I'm planning to submit a patch series that looks something like:
- Add test suite for BOpenHashTable template with two failing tests (auto expand disabled test and duplicate insertion test)
- Fix the interface of BOpenHashTable so that Insert and InsertUnchecked will return B_NO_MEMORY if auto expansion is disabled and the fTable buffer is full.
- Add error handling for all calls to Insert or InsertUnchecked where auto expansion is disabled.
The remaining part is to address the duplicate entries issue that caused the crash in this bug report. The main problem with this API is that it implies that the same item can be inserted multiple times, since the CheckForDuplicates non-type template parameter allows you to specify whether to panic if a duplicate entry is inserted. This is flawed though. While inserting multiple distinct object which could be looked up with the same key is totally possible, inserting the same object multiple times is not because it will create a reference cycle.
- Would be changing the duplicate handling logic entirely.
comment:12 by , 5 years ago
Oops, nevermind about the InsertUnchecked thing. After reading more carefully I remembered that fTable size has nothing to do with the number of elements in the container since each entry is a linked list. So scratch what I said about InsertUnchecked needing to be changed to return B_NO_MEMORY.
The duplicate handling is still real though. Whew that's good, I was worried this was going to end up being a huge change.
comment:13 by , 5 years ago
I spent a bunch of time bringing BOpenHashTable under test because I was convinced that changing it to handle duplicates was the right thing to do. But now that I've read through that code carefully I don't think that's the right thing to do, since it adds some non-trivial cost for other uses of that container where it isn't necessary.
Looking back at EndpointManager::SetConnection(), I think that the main problem is here:
if (_LookupConnection(*local, peer) != NULL) return EADDRINUSE; endpoint->LocalAddress().SetTo(*local); endpoint->PeerAddress().SetTo(peer); T(Connect(endpoint)); fConnectionHash.Insert(endpoint); return B_OK;
_LookupConnection should return non-null here, but it isn't for some reason. I think there may be a mistake in this method that's causing it to query for an endpoint that is not quite equal to the one being inserted just below that check.
comment:14 by , 5 years ago
So this is interesting. I added some extra tracing to EndpointManager::SetConnection() and added a panic just as the duplicate TCPEndpoint is inserted.
@@ -259,31 +259,66 @@ status_t EndpointManager::SetConnection(TCPEndpoint* endpoint, const sockaddr* _local, const sockaddr* peer, const sockaddr* interfaceLocal) { - TRACE(("EndpointManager::SetConnection(%p)\n", endpoint)); + TRACE("EndpointManager::SetConnection(%p)\n", endpoint); WriteLocker _(fLock); SocketAddressStorage local(AddressModule()); local.SetTo(_local); + SocketAddressStorage remote(AddressModule()); + remote.SetTo(peer); + char remote_addr[256]; + remote.AsString(remote_addr, 256, true); + + char local_addr[256]; + local.AsString(local_addr, 256, true); + + TRACE("KWA: SetConnection(%p, ...) BEFORE with local=%s and remote=%s\n", + endpoint, local_addr, remote_addr); + if (local.IsEmpty(false)) { uint16 port = local.Port(); local.SetTo(interfaceLocal); local.SetPort(port); } - if (_LookupConnection(*local, peer) != NULL) + local.AsString(local_addr, 256, true); + + TRACE("KWA: SetConnection(%p, ...) AFTER with local=%s and remote=%s\n", + endpoint, + local_addr, + remote_addr); + + TCPEndpoint* existing_entry = _LookupConnection(*local, peer); + TRACE("KWA: SetConnection(%p) existing entry? %s\n", + endpoint, (existing_entry == NULL) ? "NO" : "YES"); + if (existing_entry != NULL) return EADDRINUSE; + existing_entry = _LookupConnection(_local, peer); + TRACE("KWA: SetConnection(%p) what about this one? %s\n", + endpoint, (existing_entry == NULL) ? "NO" : "YES"); + endpoint->LocalAddress().SetTo(*local); endpoint->PeerAddress().SetTo(peer); T(Connect(endpoint)); + existing_entry = _LookupConnection(*local, peer); + TRACE("KWA: SetConnection(%p) one more check? %s\n", endpoint, + (existing_entry == NULL) ? "NO" : "YES"); + + endpoint->LocalAddress().AsString(local_addr, 256, true); + endpoint->PeerAddress().AsString(remote_addr, 256, true); + TRACE("KWA: SetConnection(%p, ...) ONEMORE with local=%s and remote=%s\n", + endpoint, + local_addr, + remote_addr); + fConnectionHash.Insert(endpoint); return B_OK;
And the output is interesting
EndpointManager::BindToEphemeral(0xffffffff82ade700) EndpointManager::BindToEphemeral(0xffffffff82ade700) -> 0.0.0.0:40287 EndpointManager::SetConnection(0xffffffff82ade700) KWA: SetConnection(0xffffffff82ade700, ...) BEFORE with local=0.0.0.0:40287 and remote=127.0.0.1:9050 KWA: SetConnection(0xffffffff82ade700, ...) AFTER with local=127.0.0.1:40287 and remote=127.0.0.1:9050 KWA: SetConnection(0xffffffff82ade700) existing entry? NO KWA: SetConnection(0xffffffff82ade700) what about this one? NO KWA: SetConnection(0xffffffff82ade700) one more check? NO KWA: SetConnection(0xffffffff82ade700, ...) ONEMORE with local=127.0.0.1:40287 and remote=127.0.0.1:9050 TCP: no matching endpoint! TCP: Sending RST... TCP: Received packet corresponds to explicit endpoint 0xffffffff82ade700 EndpointManager::SetConnection(0xffffffff82ade700) KWA: SetConnection(0xffffffff82ade700, ...) BEFORE with local=127.0.0.1:40287 and remote=127.0.0.1:9150 KWA: SetConnection(0xffffffff82ade700, ...) AFTER with local=127.0.0.1:40287 and remote=127.0.0.1:9150 KWA: SetConnection(0xffffffff82ade700) existing entry? NO KWA: SetConnection(0xffffffff82ade700) what about this one? NO KWA: SetConnection(0xffffffff82ade700) one more check? YES KWA: SetConnection(0xffffffff82ade700, ...) ONEMORE with local=127.0.0.1:40287 and remote=127.0.0.1:9150 PANIC: A duplicate has been inserted!
So I get it now. An attempt is made to connect to 127.0.0.1:9050. Nothing is bound to that port so it fails, and the socket ends up in closed state.
kdebug> tcp_endpoints -------- TCP Domain 0xffffffff86713478 --------- address local peer recv-q send-q state 0xffffffff82ade700 127.0.0.1:40287 127.0.0.1:9150 0 0 closed 0xffffffff82adde00 10.0.2.15:40055 104.248.198.131:443 31 0 time-wait 0xffffffff82add500 10.0.2.15:40017 104.248.198.131:443 31 0 time-wait 0xffffffff82adcc00 10.0.2.15:40002 104.248.198.131:443 31 0 time-wait
Then the socket is reused to connect to another port, 127.0.0.1:9150. Now EndpointManager::SetConnection() performs a check to see if an existing endpoint exists with the given (local, peer) addresses, and won't find one. But that doesn't mean that the TCPEndpoint isn't already in fConnectionHash.
So the _LookupConnection() will not actually find a match until LocalAddress() and PeerAddress() are updated, which happens *after* the _LookupConnection() but before the fConnectionHash.Insert().
comment:15 by , 5 years ago
But now that I've read through that code carefully I don't think that's the right thing to do, since it adds some non-trivial cost for other uses of that container where it isn't necessary.
If the hash table doesn't support duplicate insertions properly in any case at all, then we probably should just always panic on this case so we detect it rather than corrupting memory, right?
Thanks for spending the time investigating this!!
follow-up: 19 comment:16 by , 5 years ago
Thanks for the reply waddlesplash,
If the hash table doesn't support duplicate insertions properly in any case at all, then we probably should just always panic on this case so we detect it rather than corrupting memory, right?
Yeah that's what I wanted to do at first. BOpenHashTable actually has a non-type template parameter "CheckDuplicates" which does exactly that. It defaults to false, but if you set it to true then duplicates are checked upon insertion or removal, and the presence of a duplicate will trap in the debugger (or panic in kernel mode).
So on the surface it seems like the right default is to just remove the CheckDuplicates option and always perform this duplicate check, since allowing duplicates is not possible.
The big problem with this is the design of the BOpenHashTable template, in that it isn't an open hash table at all. It's a chained hash table using the classic linked-list for each bucket approach. That means that if you modify BOpenHashTable to always perform this duplicate check, the cost of insertion goes from O(1) to O(N).
Now this duplicate problem isn't really even that big of an issue for a simpler chained hash table implementation. The problem is that this implementation uses an intrusive linked list, which is the real source of the problem here.
So when I said,
But now that I've read through that code carefully I don't think that's the right thing to do, since it adds some non-trivial cost for other uses of that container where it isn't necessary.
What I meant was that if a user of BOpenHashTable as it is implemented today knows for certain that a duplicate insertion is not possible, then the default of CheckDuplicates=false is advantageous, because it cuts down on the cost of insertion.
It is pretty sketchy though, and I think this data structure is pretty difficult to use correctly.
So there are two issues in play at this point:
As a bandaid for the tcp add-on, I think it makes sense to add a fConnectionHash.RemoveUnchecked(endpoint) right before insertion in SetConnection() to handle the case where sockets are being reused. I'm about to submit a patch that does this.
As a followup I think BOpenHashTable should have some work done, and that should probably be a separate ticket that I can follow. There are a bunch of possible approaches to making this container easier to user.
- Make BOpenHashTable a true open hash table. This means potentially more copying as the buffer resizes, but may perform better for some workloads, and removes the need for the intrusive linked list.
- OR Maintain the chained hash table design but don't use an intrusive linked list, which would mean more allocations but would also mean more safety.
- OR Replace the intrusive linked list with a random access collection like a vector or a search tree. This means that you could maintain some invariants for each bucket, for example that the items are sorted so that you can always find duplicates in < O(N) time.
Thanks for spending the time investigating this!!
Sure! Thanks for all of the hard work you guys have done over the years.
comment:17 by , 5 years ago
I submitted https://review.haiku-os.org/c/haiku/+/2173 for this issue. I'll create a followup ticket for addressing BOpenHashTable cause I think it's worth fixing to make it easier to use correctly.
comment:19 by , 5 years ago
Replying to ambroff:
So on the surface it seems like the right default is to just remove the CheckDuplicates option and always perform this duplicate check, since allowing duplicates is not possible.
If enabling CheckDuplicates would have triggered a panic for this bug, then this might be a good idea to enable CheckDuplicates for debug builds.
comment:20 by , 5 years ago
This is fixed in hrev53800. This can be closed.
If enabling CheckDuplicates would have triggered a panic for this bug, then this might be a good idea to enable CheckDuplicates for debug builds.
Yeah that's a pretty good idea. I'll push a patch.
comment:22 by , 5 years ago
Milestone: | Unscheduled → R1/beta2 |
---|
Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone
serial output from VirtualBox on the last attempt ...