#19360 closed bug (fixed)
VM page fault related to AVLTreeBase RAMFS
Reported by: | dovsienko | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta6 |
Component: | File Systems/RAMFS | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
The problem reproduced two times shortly after booting hrev58538. The first KDL had AVLTreeBase::Remove()
in the backtrace, the second had AVLTreeBase::Insert()
, both have RAMFS. The factor that triggered the problem is a Perl script, which runs 4 busy workers, each of which uses a few temporary files (TMPDIR=/boot/system/var/shared_memory
). System memory usage stays around 450MB, the VM (VirtualBox) has 2GB RAM and 4 CPU cores.
For the second case a complete output of the serial console follows.
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0xffffffff8012985d, write 1, kernel, exec 0, thread 0x185 PANIC: vm_page_fault: unhandled page fault in kernel space at 0x0, ip 0xffffffff8012985d Welcome to Kernel Debugging Land... Thread 389 "pthread func" running on CPU 2 stack trace for thread 389 "pthread func" kernel stack: 0xffffffff87007000 to 0xffffffff8700c000 user stack: 0x00007fa7eeda9000 to 0x00007fa7eede9000 frame caller <image>:function + offset 0 ffffffff8700b0d0 (+ 32) ffffffff80154b30 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 1 ffffffff8700b120 (+ 80) ffffffff800b8848 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff8700b180 (+ 96) ffffffff800b9f34 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4 3 ffffffff8700b1d0 (+ 80) ffffffff800ba2ce <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff8700b2c0 (+ 240) ffffffff800ba667 <kernel_x86_64> panic + 0xb7 5 ffffffff8700b3b0 (+ 240) ffffffff8013b248 <kernel_x86_64> vm_page_fault + 0x298 6 ffffffff8700b3f0 (+ 64) ffffffff80160c6e <kernel_x86_64> x86_page_fault_exception + 0x15e 7 ffffffff8700b768 (+ 888) ffffffff8015640c <kernel_x86_64> int_bottom + 0x80 kernel iframe at 0xffffffff8700b768 (end = 0xffffffff8700b830) rax 0x0 rbx 0xffffffff8700b8a0 rcx 0xffffffff8a97e608 rdx 0xffffffff8a97e818 rsi 0xffffffff8a97e610 rdi 0xffffffff82076700 rbp 0xffffffff8700b850 r8 0xffffffff8a97ecd0 r9 0x7e r10 0x0 r11 0x0 r12 0xffffffff82076700 r13 0xffffffff8a97e818 r14 0xffffffff8700b860 r15 0xffffffff8a97ecd0 rip 0xffffffff8012985d rsp 0xffffffff8700b838 rflags 0x10292 vector: 0xe, error code: 0x2 8 ffffffff8700b850 (+ 232) ffffffff8012985d <kernel_x86_64> AVLTreeBase::_RotateLeft(AVLTreeNode**) + 0x0d 9 ffffffff8700ba90 (+ 576) ffffffff80129a49 <kernel_x86_64> AVLTreeBase::_Insert(AVLTreeNode*) + 0xd9 10 ffffffff8700baa0 (+ 16) ffffffff80129ab9 <kernel_x86_64> AVLTreeBase::Insert(AVLTreeNode*) + 0x09 11 ffffffff8700bb10 (+ 112) ffffffff8112d170 </boot/system/add-ons/kernel/file_systems/ramfs> SizeIndex::Changed[clone .localalias] (Node*, long) + 0x110 12 ffffffff8700bb40 (+ 48) ffffffff81121f53 </boot/system/add-ons/kernel/file_systems/ramfs> File::SetSize[clone .localalias] (long) + 0x63 13 ffffffff8700bb90 (+ 80) ffffffff81124d16 </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_open(fs_volume*, fs_vnode*, int, void**) + 0x146 14 ffffffff8700bbd0 (+ 64) ffffffff8010141f <kernel_x86_64> open_vnode(vnode*, int, bool) + 0x3f 15 ffffffff8700bd70 (+ 416) ffffffff80107f68 <kernel_x86_64> create_vnode(vnode*, char const*, int, int, bool) + 0x4b8 16 ffffffff8700bec0 (+ 336) ffffffff8010ad42 <kernel_x86_64> file_create(int, char*, int, int, bool) + 0x62 17 ffffffff8700bf20 (+ 96) ffffffff8011123b <kernel_x86_64> _user_open + 0xcb 18 ffffffff8700bf30 (+ 16) ffffffff8015670f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff8700bf30 (end = 0xffffffff8700bff8) rax 0x71 rbx 0x7fa7eede8a73 rcx 0x21bf2624c7c rdx 0x641 rsi 0x11f15e1c38f0 rdi 0xffffff9c rbp 0x7fa7eede8820 r8 0x7fa7eede8a73 r9 0xffffffff r10 0x1a4 r11 0x202 r12 0x601 r13 0x11f15e1c38f0 r14 0x1b6 r15 0x11f15d8f0740 rip 0x21bf2624c7c rsp 0x7fa7eede87b8 rflags 0x202 vector: 0x63, error code: 0x0 19 00007fa7eede8820 (+ 0) 0000021bf2624c7c </boot/system/lib/libroot.so> _kern_open + 0x0c 20 00007fa7eede8850 (+ 48) 000000c9e58e450b </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0xdd9 21 00007fa7eede88c0 (+ 112) 000000c9e592f688 </boot/system/lib/perl5/5.40.0/x> PerlIOUnix_open + 0xfe 22 00007fa7eede8950 (+ 144) 000000c9e592e6e1 </boot/system/lib/perl5/5.40.0/x> PerlIOBuf_open + 0x150 23 00007fa7eede89f0 (+ 160) 000000c9e5930ea2 </boot/system/lib/perl5/5.40.0/x> PerlIO_openn + 0x1a8 24 00007fa7eede8ac0 (+ 208) 000000c9e58e50e6 </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0x19b4 25 00007fa7eede8b40 (+ 128) 000000c9e5985a66 </boot/system/lib/perl5/5.40.0/x> Perl_sortsv (nearest) + 0x263a 26 00007fa7eede8b60 (+ 32) 000000c9e59ecd26 </boot/system/lib/perl5/5.40.0/x> Perl_runops_standard + 0x16 27 00007fa7eede8cc0 (+ 352) 000000c9e58ce378 </boot/system/lib/perl5/5.40.0/x> Perl_call_sv + 0x432 28 00007fa7eede8d80 (+ 192) 000000e859615de2 </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0xe5a 29 00007fa7eede8df0 (+ 112) 000000e859617c6f </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0x2ce7 30 00007fa7eede8e10 (+ 32) 0000021bf26338d8 </boot/system/lib/libroot.so> pthread_exit (nearest) + 0x38 31 0000000000000000 (+ 0) 00007fffea272258 10712:commpage@0x00007fffea272000 + 0x258 kdebug>
Attachments (1)
Change History (15)
by , 34 hours ago
Attachment: | Screenshot_2025-01-16_22-07-31.png added |
---|
comment:1 by , 34 hours ago
For the fist KDL the diagnostics were as follows:
kdebug> bt bt stack trace for thread 35462 "pthread func" kernel stack: 0xffffffff8647d000 to 0xffffffff86482000 user stack: 0x00007fe7d2a8f000 to 0x00007fe7d2acf000 frame caller <image>:function + offset 0 ffffffff86481090 (+ 32) ffffffff800bb5f9 <kernel_x86_64> invoke_command_trampoline(void*) + 0x19 1 ffffffff864810b0 (+ 32) ffffffff80154b30 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 2 ffffffff86481100 (+ 80) ffffffff800b8848 <kernel_x86_64> debug_call_with_fault_handler + 0x78 3 ffffffff86481160 (+ 96) ffffffff800bb84f <kernel_x86_64> invoke_debugger_command + 0xef 4 ffffffff864811a0 (+ 64) ffffffff800bb9e6 <kernel_x86_64> invoke_pipe_segment(debugger_command_pipe*, int, char*) + 0x116 5 ffffffff864811f0 (+ 80) ffffffff800bbb0c <kernel_x86_64> invoke_debugger_command_pipe + 0x9c 6 ffffffff86481230 (+ 64) ffffffff800bdd66 <kernel_x86_64> ExpressionParser::_ParseCommandPipe(int&) + 0x1e6 7 ffffffff86481270 (+ 64) ffffffff800bf2cd <kernel_x86_64> ExpressionParser::EvaluateCommand(char const*, int&) + 0x43d 8 ffffffff86481360 (+ 240) ffffffff800bf8ff <kernel_x86_64> evaluate_debug_command + 0x12f 9 ffffffff864813c0 (+ 96) ffffffff800b9fd8 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0x198 10 ffffffff86481410 (+ 80) ffffffff800ba2ce <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 11 ffffffff86481500 (+ 240) ffffffff800ba667 <kernel_x86_64> panic + 0xb7 12 ffffffff864815f0 (+ 240) ffffffff8013b248 <kernel_x86_64> vm_page_fault + 0x298 13 ffffffff86481630 (+ 64) ffffffff80160c6e <kernel_x86_64> x86_page_fault_exception + 0x15e 14 ffffffff86481988 (+ 856) ffffffff8015640c <kernel_x86_64> int_bottom + 0x80 kernel iframe at 0xffffffff86481988 (end = 0xffffffff86481a50) rax 0x1 rbx 0xffffffffa3e96d20 rcx 0x0 rdx 0xffffffffa3e96db0 rsi 0xffffffffa3758c68 rdi 0xffffffff8207c4c0 rbp 0xffffffff86481a90 r8 0xffffffff86481930 r9 0xffffffff864819e0 r10 0xffffffff8207c4c0 r11 0x1 r12 0xffffffff8207c4c0 r13 0x0 r14 0xffffffffa3758c60 r15 0xffffffffa3e96d30 rip 0xffffffff80129da2 rsp 0xffffffff86481a50 rflags 0x10246 vector: 0xe, error code: 0x2 15 ffffffff86481a90 (+ 264) ffffffff80129da2 <kernel_x86_64> AVLTreeBase::_Remove(AVLTreeNode*) + 0x92 16 ffffffff86481aa0 (+ 16) ffffffff80129f79 <kernel_x86_64> AVLTreeBase::Remove(AVLTreeNode*) + 0x09 17 ffffffff86481b10 (+ 112) ffffffff8112d103 </boot/system/add-ons/kernel/file_systems/ramfs> SizeIndex::Changed[clone .localalias] (Node*, long) + 0xa3 18 ffffffff86481b40 (+ 48) ffffffff81121f53 </boot/system/add-ons/kernel/file_systems/ramfs> File::SetSize[clone .localalias] (long) + 0x63 19 ffffffff86481b90 (+ 80) ffffffff81124d16 </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_open(fs_volume*, fs_vnode*, int, void**) + 0x146 20 ffffffff86481bd0 (+ 64) ffffffff8010141f <kernel_x86_64> open_vnode(vnode*, int, bool) + 0x3f 21 ffffffff86481d70 (+ 416) ffffffff80107f68 <kernel_x86_64> create_vnode(vnode*, char const*, int, int, bool) + 0x4b8 22 ffffffff86481ec0 (+ 336) ffffffff8010ad42 <kernel_x86_64> file_create(int, char*, int, int, bool) + 0x62 23 ffffffff86481f20 (+ 96) ffffffff8011123b <kernel_x86_64> _user_open + 0xcb 24 ffffffff86481f30 (+ 16) ffffffff8015670f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff86481f30 (end = 0xffffffff86481ff8) rax 0x71 rbx 0x7fe7d2ace5f3 rcx 0xb3b6652c7c rdx 0x641 rsi 0x11103bcd2a80 rdi 0xffffff9c rbp 0x7fe7d2ace3a0 r8 0x7fe7d2ace5f3 r9 0xffffffff r10 0x1a4 r11 0x202 r12 0x601 r13 0x11103bcd2a80 r14 0x1b6 r15 0x11103b3fd900 rip 0xb3b6652c7c rsp 0x7fe7d2ace338 rflags 0x202 vector: 0x63, error code: 0x0 25 00007fe7d2ace3a0 (+ 0) 000000b3b6652c7c </boot/system/lib/libroot.so> _kern_open + 0x0c 26 00007fe7d2ace3d0 (+ 48) 00000143f429150b </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0xdd9 27 00007fe7d2ace440 (+ 112) 00000143f42dc688 </boot/system/lib/perl5/5.40.0/x> PerlIOUnix_open + 0xfe 28 00007fe7d2ace4d0 (+ 144) 00000143f42db6e1 </boot/system/lib/perl5/5.40.0/x> PerlIOBuf_open + 0x150 29 00007fe7d2ace570 (+ 160) 00000143f42ddea2 </boot/system/lib/perl5/5.40.0/x> PerlIO_openn + 0x1a8 30 00007fe7d2ace640 (+ 208) 00000143f42920e6 </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0x19b4 31 00007fe7d2ace6c0 (+ 128) 00000143f4332a66 </boot/system/lib/perl5/5.40.0/x> Perl_sortsv (nearest) + 0x263a 32 00007fe7d2ace6e0 (+ 32) 00000143f4399d26 </boot/system/lib/perl5/5.40.0/x> Perl_runops_standard + 0x16 33 00007fe7d2ace840 (+ 352) 00000143f427b378 </boot/system/lib/perl5/5.40.0/x> Perl_call_sv + 0x432 34 00007fe7d2ace900 (+ 192) 00000189a884fde2 </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0xe5a 35 00007fe7d2ace970 (+ 112) 00000189a8851c6f </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0x2ce7 36 00007fe7d2ace990 (+ 32) 000000b3b66618d8 </boot/system/lib/libroot.so> pthread_exit (nearest) + 0x38 37 0000000000000000 (+ 0) 00007fffffca9258 2773733:commpage@0x00007fffffca9000 + 0x258 kdebug> message message PANIC: vm_page_fault: unhandled page fault in kernel space at 0x0, ip 0xffffffff80129da2 kdebug>
comment:2 by , 34 hours ago
Description: | modified (diff) |
---|---|
Platform: | x86-64 → All |
comment:3 by , 34 hours ago
Forgot to add: the first time I was doing a benchmark of the script using a different number of threads:
- 1 worker thread: 192 seconds (ran this a few times without a problem)
- 2 worker threads: 79 seconds
- 3 worker threads: 68 seconds
- 4 worker threads: KDL before the completion
The second time I ran the script using 4 worker threads and the problem reproduced almost immediately. So this looks like a race condition that is much more likely to manifest when there is a busy thread for every CPU core.
comment:4 by , 33 hours ago
Here is another stack trace, this time from the compiled single-threaded C program (filtertest), which the multi-threaded Perl script runs for every test. Looks like it is not specific to Perl or to a race condition between different threads of the same process.
PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 17794 "filtertest" running on CPU 3 stack trace for thread 17794 "filtertest" kernel stack: 0xffffffff8022f000 to 0xffffffff80234000 user stack: 0x00007f7623bfc000 to 0x00007f7624bfc000 frame caller <image>:function + offset 0 ffffffff80233540 (+ 32) ffffffff80154b30 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 1 ffffffff80233590 (+ 80) ffffffff800b8848 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff802335f0 (+ 96) ffffffff800b9f34 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4 3 ffffffff80233640 (+ 80) ffffffff800ba2ce <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff80233730 (+ 240) ffffffff800ba667 <kernel_x86_64> panic + 0xb7 5 ffffffff80233a88 (+ 856) ffffffff8015640c <kernel_x86_64> int_bottom + 0x80 kernel iframe at 0xffffffff80233a88 (end = 0xffffffff80233b50) rax 0xffffffff8a43eae0 rbx 0xffffffff8a43eae0 rcx 0xffffffff81139e28 rdx 0xffffffff8a43eae0 rsi 0xffffffff8abc3390 rdi 0xffffffff820746f8 rbp 0xffffffff80233b80 r8 0x7 r9 0x43 r10 0x1001 r11 0xffffffff80219200 r12 0xffffffff8abc3390 r13 0xffffffff8abc3390 r14 0xffffffff80233b90 r15 0xcccccccc00000000 rip 0xffffffff8112e1eb rsp 0xffffffff80233b50 rflags 0x10286 vector: 0xd, error code: 0x0 6 ffffffff80233b80 (+ 248) ffffffff8112e1eb </boot/system/add-ons/kernel/file_systems/ramfs> _ZN10AVLTreeMapI16TwoKeyAVLTreeKeyI19SizeIndexPrimaryKeyP4NodeES3_25TwoKeyAVLTreeNodeStrategyIS1_S3_S3_26SizeIndexPrimaryKeyCompare28TwoKeyAVLTreeStandardCompareIS3_E22SizeIndexGetPrimaryKey27TwoKeyAVLTreeStandardGetKeyIS3_S3_EEE12CompareNodesEPK11AVLTreeNodeSG_ + 0x1b 7 ffffffff80233dc0 (+ 576) ffffffff801299e5 <kernel_x86_64> AVLTreeBase::_Insert(AVLTreeNode*) + 0x75 8 ffffffff80233dd0 (+ 16) ffffffff80129ab9 <kernel_x86_64> AVLTreeBase::Insert(AVLTreeNode*) + 0x09 9 ffffffff80233e40 (+ 112) ffffffff8112d170 </boot/system/add-ons/kernel/file_systems/ramfs> SizeIndex::Changed[clone .localalias] (Node*, long) + 0x110 10 ffffffff80233e70 (+ 48) ffffffff8112217b </boot/system/add-ons/kernel/file_systems/ramfs> File::WriteAt(long, void const*, unsigned long, unsigned long*) + 0x6b 11 ffffffff80233ec0 (+ 80) ffffffff811269e2 </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_write(fs_volume*, fs_vnode*, void*, long, void const*, unsigned long*) + 0x102 12 ffffffff80233f20 (+ 96) ffffffff800f350f <kernel_x86_64> common_user_io(int, long, void*, unsigned long, bool) + 0x14f 13 ffffffff80233f30 (+ 16) ffffffff8015670f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff80233f30 (end = 0xffffffff80233ff8) rax 0x96 rbx 0x7b rcx 0x63ab41cecc rdx 0x10c3c3ad9340 rsi 0xffffffffffffffff rdi 0x1 rbp 0x7f7624bfb4a0 r8 0x0 r9 0x24bf7156c0 r10 0x7b r11 0x206 r12 0x10c3c3ad9340 r13 0x7b r14 0x63ab4f7680 r15 0x40000 rip 0x63ab41cecc rsp 0x7f7624bfb488 rflags 0x206 vector: 0x63, error code: 0x0 14 00007f7624bfb4a0 (+ 0) 00000063ab41cecc </boot/system/lib/libroot.so> _kern_write + 0x0c 15 00007f7624bfb4d0 (+ 48) 00000063ab45141a </boot/system/lib/libroot.so> _IO_new_file_write + 0x3a 16 00007f7624bfb500 (+ 48) 00000063ab450de1 </boot/system/lib/libroot.so> _IO_file_setbuf (nearest) + 0x81 17 00007f7624bfb520 (+ 32) 00000063ab451b91 </boot/system/lib/libroot.so> _IO_do_write + 0x21 18 00007f7624bfb540 (+ 32) 00000063ab451ff0 </boot/system/lib/libroot.so> _IO_file_sync + 0x90 19 00007f7624bfb560 (+ 32) 00000063ab454107 </boot/system/lib/libroot.so> fflush_unlocked + 0x27 20 00007f7624bfb590 (+ 48) 000001597e86cffa </boot/system/lib/libstdc++.so.6> std::basic_ostream<char, std::char_traits<char> >::flush() + 0x4a 21 00007f7624bfb5a0 (+ 16) 000001597e7f9e9e </boot/system/lib/libstdc++.so.6> std::ios_base::Init::~Init() + 0x3e 22 00007f7624bfb5e0 (+ 64) 00000063ab494e1c </boot/system/lib/libroot.so> __cxa_finalize + 0x15c 23 00007f7624bfb600 (+ 32) 00000063ab494fd7 </boot/system/lib/libroot.so> exit + 0x17 24 00007f7624bfb6f0 (+ 240) 000000a702bc9531 </boot/home/gitrepos/libpcap/tes> _init (nearest) + 0xb69 25 00007f7624bfb720 (+ 48) 000000a702bc982e </boot/home/gitrepos/libpcap/tes> _start + 0x3e 26 00007f7624bfb750 (+ 48) 000000c1a018de05 </boot/system/runtime_loader> runtime_loader + 0x115 27 0000000000000000 (+ 0) 00007fffffc15258 1464226:commpage@0x00007fffffc15000 + 0x258
comment:5 by , 32 hours ago
Here is another variety of the stack trace. This time the first run of the script was successful, but the second run triggered the problem as early as creating a temporary directory.
PANIC: vm_page_fault: unhandled page fault in kernel space at 0x0, ip 0xffffffff801297ed Welcome to Kernel Debugging Land... Thread 9473 "perl" running on CPU 3 stack trace for thread 9473 "perl" kernel stack: 0xffffffff8183d000 to 0xffffffff81842000 user stack: 0x00007fd39d969000 to 0x00007fd39e969000 frame caller <image>:function + offset 0 ffffffff818411d0 (+ 32) ffffffff80154b30 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 1 ffffffff81841220 (+ 80) ffffffff800b8848 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff81841280 (+ 96) ffffffff800b9f34 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4 3 ffffffff818412d0 (+ 80) ffffffff800ba2ce <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff818413c0 (+ 240) ffffffff800ba667 <kernel_x86_64> panic + 0xb7 5 ffffffff818414b0 (+ 240) ffffffff8013b248 <kernel_x86_64> vm_page_fault + 0x298 6 ffffffff818414f0 (+ 64) ffffffff80160c6e <kernel_x86_64> x86_page_fault_exception + 0x15e 7 ffffffff81841848 (+ 856) ffffffff8015640c <kernel_x86_64> int_bottom + 0x80 kernel iframe at 0xffffffff81841848 (end = 0xffffffff81841910) rax 0x0 rbx 0xffffffff81841930 rcx 0xffffffff820796f8 rdx 0xffffffff8a94a278 rsi 0xffffffff82079708 rdi 0xffffffff82079760 rbp 0xffffffff81841920 r8 0xffffffff82079760 r9 0xffffffff82079760 r10 0x0 r11 0x0 r12 0xffffffff82079760 r13 0xffffffff8a94a278 r14 0xffffffff81841930 r15 0xffffffff82079760 rip 0xffffffff801297ed rsp 0xffffffff81841918 rflags 0x10282 vector: 0xe, error code: 0x2 8 ffffffff81841920 (+ 216) ffffffff801297ed <kernel_x86_64> AVLTreeBase::_RotateRight(AVLTreeNode**) + 0x0d 9 ffffffff81841b60 (+ 576) ffffffff80129a78 <kernel_x86_64> AVLTreeBase::_Insert(AVLTreeNode*) + 0x108 10 ffffffff81841b70 (+ 16) ffffffff80129ab9 <kernel_x86_64> AVLTreeBase::Insert(AVLTreeNode*) + 0x09 11 ffffffff81841ba0 (+ 48) ffffffff8112cf50 </boot/system/add-ons/kernel/file_systems/ramfs> non-virtual thunk to SizeIndex::NodeAdded(Node*)[clone .localalias] () + 0x70 12 ffffffff81841c10 (+ 112) ffffffff8112f664 </boot/system/add-ons/kernel/file_systems/ramfs> Volume::NodeAdded[clone .localalias] (Node*) + 0xc4 13 ffffffff81841c40 (+ 48) ffffffff8112f888 </boot/system/add-ons/kernel/file_systems/ramfs> Volume::PublishVNode[clone .localalias] (Node*) + 0x18 14 ffffffff81841c70 (+ 48) ffffffff8112966c </boot/system/add-ons/kernel/file_systems/ramfs> Node::Link[clone .localalias] (Entry*) + 0xbc 15 ffffffff81841ca0 (+ 48) ffffffff811217dd </boot/system/add-ons/kernel/file_systems/ramfs> Entry::Link[clone .localalias] (Node*) + 0x2d 16 ffffffff81841ce0 (+ 64) ffffffff81120f2b </boot/system/add-ons/kernel/file_systems/ramfs> Directory::CreateEntry[clone .localalias] (Node*, char const*, Entry**) + 0x9b 17 ffffffff81841d20 (+ 64) ffffffff811212f8 </boot/system/add-ons/kernel/file_systems/ramfs> Directory::CreateDirectory(char const*, Directory**) + 0xc8 18 ffffffff81841d90 (+ 112) ffffffff811260f7 </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_create_dir(fs_volume*, fs_vnode*, char const*, int) + 0x1a7 19 ffffffff81841ed0 (+ 320) ffffffff8010b398 <kernel_x86_64> dir_create(int, char*, int, bool) + 0x68 20 ffffffff81841f20 (+ 80) ffffffff80111964 <kernel_x86_64> _user_create_dir + 0x84 21 ffffffff81841f30 (+ 16) ffffffff8015670f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff81841f30 (end = 0xffffffff81841ff8) rax 0x7a rbx 0x11ceb24f3340 rcx 0xc072cded0c rdx 0x1c0 rsi 0x11ceb2bc45a0 rdi 0xffffff9c rbp 0x7fd39e9687d0 r8 0x0 r9 0x0 r10 0x6d r11 0x206 r12 0x0 r13 0x7fd39e969660 r14 0x11ceb2bec198 r15 0x11ceb2912b38 rip 0xc072cded0c rsp 0x7fd39e9687b8 rflags 0x206 vector: 0x63, error code: 0x0 22 00007fd39e9687d0 (+ 0) 000000c072cded0c </boot/system/lib/libroot.so> _kern_create_dir + 0x0c 23 00007fd39e968830 (+ 96) 0000018ab3a19447 </boot/system/lib/perl5/5.40.0/x> Perl_setdefout (nearest) + 0x7aca 24 00007fd39e968850 (+ 32) 0000018ab3a76d26 </boot/system/lib/perl5/5.40.0/x> Perl_runops_standard + 0x16 25 00007fd39e9688e0 (+ 144) 0000018ab3961196 </boot/system/lib/perl5/5.40.0/x> perl_run + 0x282 26 00007fd39e968930 (+ 80) 000001448a514db8 </boot/system/bin/perl> main + 0xd3 27 00007fd39e968960 (+ 48) 000001448a514cbe </boot/system/bin/perl> _start + 0x3e 28 00007fd39e968990 (+ 48) 000001b09aed3e05 </boot/system/runtime_loader> runtime_loader + 0x115 29 0000000000000000 (+ 0) 00007ffffeff9258 769484:commpage@0x00007ffffeff9000 + 0x258 kdebug>
comment:6 by , 28 hours ago
Milestone: | Unscheduled → R1/beta6 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Fixed in hrev58540.
comment:7 by , 21 hours ago
Thank you for looking into this. hrev58541 still experiences a KDL under the same conditions, although it looks different:
kdebug> message message PANIC: remove page 0xffffffff83654070 from cache 0xffffffff8af5c128: page still has mappings! kdebug> kdebug> bt bt stack trace for thread 71237 "pthread func" kernel stack: 0xffffffff85a80000 to 0xffffffff85a85000 frame caller <image>:function + offset 0 ffffffff85a84510 (+ 32) ffffffff800bb609 <kernel_x86_64> invoke_command_trampoline(void*) + 0x19 1 ffffffff85a84530 (+ 32) ffffffff80154b70 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 2 ffffffff85a84580 (+ 80) ffffffff800b8858 <kernel_x86_64> debug_call_with_fault_handler + 0x78 3 ffffffff85a845e0 (+ 96) ffffffff800bb85f <kernel_x86_64> invoke_debugger_command + 0xef 4 ffffffff85a84620 (+ 64) ffffffff800bb9f6 <kernel_x86_64> invoke_pipe_segment(debugger_command_pipe*, int, char*) + 0x116 5 ffffffff85a84670 (+ 80) ffffffff800bbb1c <kernel_x86_64> invoke_debugger_command_pipe + 0x9c 6 ffffffff85a846b0 (+ 64) ffffffff800bdd76 <kernel_x86_64> ExpressionParser::_ParseCommandPipe(int&) + 0x1e6 7 ffffffff85a846f0 (+ 64) ffffffff800bf2dd <kernel_x86_64> ExpressionParser::EvaluateCommand(char const*, int&) + 0x43d 8 ffffffff85a847e0 (+ 240) ffffffff800bf90f <kernel_x86_64> evaluate_debug_command + 0x12f 9 ffffffff85a84840 (+ 96) ffffffff800b9fe8 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0x198 10 ffffffff85a84890 (+ 80) ffffffff800ba2de <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 11 ffffffff85a84980 (+ 240) ffffffff800ba677 <kernel_x86_64> panic + 0xb7 12 ffffffff85a849c0 (+ 64) ffffffff8014eb02 <kernel_x86_64> VMCache::Delete() + 0x82 13 ffffffff85a84a00 (+ 64) ffffffff8014ec3f <kernel_x86_64> VMCache::Delete() + 0x1bf 14 ffffffff85a84a50 (+ 80) ffffffff8012e65d <kernel_x86_64> delete_area(VMAddressSpace*, VMArea*, bool, bool) + 0x17d 15 ffffffff85a84a90 (+ 64) ffffffff80130e1e <kernel_x86_64> vm_delete_areas + 0xce 16 ffffffff85a84f20 (+1168) ffffffff800868ce <kernel_x86_64> _user_exec + 0x31e 17 ffffffff85a84f30 (+ 16) ffffffff8015674f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff85a84f30 (end = 0xffffffff85a84ff8) rax 0x2e rbx 0x0 rcx 0xc1dacab84c rdx 0x12af rsi 0x10617f2e83d0 rdi 0xdb4e042a68 rbp 0x7fb4547fa6e0 r8 0x29 r9 0x12 r10 0x3 r11 0x246 r12 0xdb4e042a68 r13 0x0 r14 0x7fb4547fa5b0 r15 0x0 rip 0xc1dacab84c rsp 0x7fb4547fa578 rflags 0x246 vector: 0x63, error code: 0x0 18 00007fb4547fa6e0 (+ 0) 000000c1dacab84c 19 00007fb4547fa770 (+ 144) 000000c1dad2ef3e 20 00007fb4547fa7e0 (+ 112) 000000db4decb5c2 21 00007fb4547fa8a0 (+ 192) 000000db4df7387b 22 00007fb4547fa8c0 (+ 32) 000000db4dfcfd26 23 00007fb4547faa20 (+ 352) 000000db4deb1378 24 00007fb4547faae0 (+ 192) 00000003313bfde2 25 00007fb4547fab50 (+ 112) 00000003313c1c6f 26 00007fb4547fab70 (+ 32) 000000c1dacba8d8 27 0000000000000000 (+ 0) 00007ffffc5b9258 6006539:commpage@0x00007ffffc5b9000 + 0x258 kdebug>
comment:8 by , 20 hours ago
The previous stack trace reproduces using the following:
while TESTRUN_JOBS=16 ./testprogs/TESTrun; do date; done
After a few normal rounds the script starts and spawns 16 threads, and the KDL pops up before it gets the first result from the first thread:
Skipped tests: invalid_ip6_host_disabled : IPv6 syntax enabled invalid_ip6_net_disabled : IPv6 syntax enabled ------------------------------------------------ 2 tests skipped 0 tests failed 924 tests passed Fri Jan 17 12:56:47 GMT 2025 INFO: Using a test timeout of 1. (*** KDL ***)
comment:9 by , 14 hours ago
Thus far, I haven't managed to reproduce that KDL here, in a VM with 4 cores at least. The script has run for a while and no KDL to be seen. It may be better to open a new ticket for that, as it's likely unrelated to ramfs even.
comment:10 by , 14 hours ago
OK, it took a while but it finally reproduced here.
initial commands: page 0xffffffff840562b0; cache 0xffffffff9081d2b8 PAGE: 0xffffffff840562b0 queue_next,prev: 0xffffffff83f52710, 0xffffffff84145270 physical_number: 0x3456f cache: 0xffffffff9081d2b8 cache_offset: 7727 cache_next: 0xffffffff840ee8d0 state: active wired_count: 0 usage_count: 0 busy: 0 busy_writing: 0 accessed: 0 modified: 0 accessor: -1 area mappings: 0xffffffff908382a8 (4359836) CACHE 0xffffffff9081d2b8: ref_count: 0 source: 0x0000000000000000 type: RAM virtual_base: 0x0 virtual_end: 0x1930000 temporary: 1 lock: 0xffffffff9081d340 lock.holder: 50404 areas: consumers: pages: 6 in cache AREA: 0xffffffff908382a8 name: 'heap' owner: 0xc4d4 id: 0x42869c base: 0x11873179c000 size: 0x2070000 protection: 0x33 page_protection:0x0000000000000000 wiring: 0x0 memory_type: 0x0 cache: 0xffffffff8ff14828 cache_type: RAM cache_offset: 0x0 cache_next: 0x0000000000000000 cache_prev: 0x0000000000000000 page mappings: 8186 kdebug>
comment:11 by , 13 hours ago
CACHE 0xffffffff8ff14828: ref_count: 2 source: 0xffffffff9086adf0 type: RAM virtual_base: 0x0 virtual_end: 0x2070000 temporary: 1 lock: 0xffffffff8ff148b0 lock.holder: 50395 areas: area 0x42869c, heap base_addr: 0x11873179c000, size: 0x2070000 protection: 0x33 owner: 0xc4d4 consumers: pages: 1718 in cache
comment:12 by , 13 hours ago
CACHE 0xffffffff9086adf0: ref_count: 2 source: 0x0000000000000000 type: RAM virtual_base: 0x0 virtual_end: 0x1d10000 temporary: 1 lock: 0xffffffff9086ae78 lock.holder: 50408 areas: consumers: 0xffffffff9085bcc8 0xffffffff8ff14828 pages: 7355 in cache
comment:13 by , 13 hours ago
Seems another thread is in the process of removing a consumer:
PANIC: remove page 0xffffffff840562b0 from cache 0xffffffff9081d2b8: page still has mappings! Welcome to Kernel Debugging Land... Thread 50404 "pthread func" running on CPU 0 stack trace for thread 50404 "pthread func" kernel stack: 0xffffffff827bd000 to 0xffffffff827c2000 frame caller <image>:function + offset 0 ffffffff827c1790 (+ 32) ffffffff80154c50 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 1 ffffffff827c17e0 (+ 80) ffffffff800b8858 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff827c1840 (+ 96) ffffffff800b9f44 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4 3 ffffffff827c1890 (+ 80) ffffffff800ba2de <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff827c1980 (+ 240) ffffffff800ba677 <kernel_x86_64> panic + 0xb7 5 ffffffff827c19c0 (+ 64) ffffffff8014eb92 <kernel_x86_64> VMCache::Delete() + 0x82 6 ffffffff827c1a00 (+ 64) ffffffff8014eccf <kernel_x86_64> VMCache::Delete() + 0x1bf 7 ffffffff827c1a50 (+ 80) ffffffff8012e6ed <kernel_x86_64> delete_area(VMAddressSpace*, VMArea*, bool, bool) + 0x17d 8 ffffffff827c1a90 (+ 64) ffffffff80130eae <kernel_x86_64> vm_delete_areas + 0xce 9 ffffffff827c1f20 (+1168) ffffffff800868ce <kernel_x86_64> _user_exec + 0x31e
vs.
stack trace for thread 50408 "pthread func" kernel stack: 0xffffffff827d1000 to 0xffffffff827d6000 frame caller <image>:function + offset 0 ffffffff827d5910 (+ 144) ffffffff800a3a7e <kernel_x86_64> reschedule(int) + 0x69e 1 ffffffff827d5940 (+ 48) ffffffff8008de06 <kernel_x86_64> thread_block + 0xc6 2 ffffffff827d5990 (+ 80) ffffffff8009daa1 <kernel_x86_64> _mutex_lock(mutex*, void*) + 0xd1 3 ffffffff827d59c0 (+ 48) ffffffff8014ea24 <kernel_x86_64> VMCache::_RemoveConsumer(VMCache*) + 0x54 4 ffffffff827d5a00 (+ 64) ffffffff8014eccf <kernel_x86_64> VMCache::Delete() + 0x1bf 5 ffffffff827d5a50 (+ 80) ffffffff8012e6ed <kernel_x86_64> delete_area(VMAddressSpace*, VMArea*, bool, bool) + 0x17d 6 ffffffff827d5a90 (+ 64) ffffffff80130eae <kernel_x86_64> vm_delete_areas + 0xce 7 ffffffff827d5f20 (+1168) ffffffff800868ce <kernel_x86_64> _user_exec + 0x31e 8 ffffffff827d5f30 (+ 16) ffffffff8015682f <kernel_x86_64> x86_64_syscall_entry + 0xfb
comment:14 by , 12 hours ago
I guess the question here is how cache 0xffffffff9086adf0 wound up with no source. Presumably its source should be 0xffffffff9081d2b8, so how did it lose that?
the first KDL