Opened 34 hours ago

Closed 28 hours ago

Last modified 12 hours ago

#19360 closed bug (fixed)

VM page fault related to AVLTreeBase RAMFS

Reported by: dovsienko Owned by: nobody
Priority: normal Milestone: R1/beta6
Component: File Systems/RAMFS Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by waddlesplash)

The problem reproduced two times shortly after booting hrev58538. The first KDL had AVLTreeBase::Remove() in the backtrace, the second had AVLTreeBase::Insert(), both have RAMFS. The factor that triggered the problem is a Perl script, which runs 4 busy workers, each of which uses a few temporary files (TMPDIR=/boot/system/var/shared_memory). System memory usage stays around 450MB, the VM (VirtualBox) has 2GB RAM and 4 CPU cores.

For the second case a complete output of the serial console follows.

vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0xffffffff8012985d, write 1, kernel, exec 0, thread 0x185
PANIC: vm_page_fault: unhandled page fault in kernel space at 0x0, ip 0xffffffff8012985d

Welcome to Kernel Debugging Land...
Thread 389 "pthread func" running on CPU 2
stack trace for thread 389 "pthread func"
    kernel stack: 0xffffffff87007000 to 0xffffffff8700c000
      user stack: 0x00007fa7eeda9000 to 0x00007fa7eede9000
frame                       caller             <image>:function + offset
 0 ffffffff8700b0d0 (+  32) ffffffff80154b30   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a
 1 ffffffff8700b120 (+  80) ffffffff800b8848   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff8700b180 (+  96) ffffffff800b9f34   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4
 3 ffffffff8700b1d0 (+  80) ffffffff800ba2ce   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff8700b2c0 (+ 240) ffffffff800ba667   <kernel_x86_64> panic + 0xb7
 5 ffffffff8700b3b0 (+ 240) ffffffff8013b248   <kernel_x86_64> vm_page_fault + 0x298
 6 ffffffff8700b3f0 (+  64) ffffffff80160c6e   <kernel_x86_64> x86_page_fault_exception + 0x15e
 7 ffffffff8700b768 (+ 888) ffffffff8015640c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff8700b768 (end = 0xffffffff8700b830)
 rax 0x0                   rbx 0xffffffff8700b8a0    rcx 0xffffffff8a97e608
 rdx 0xffffffff8a97e818    rsi 0xffffffff8a97e610    rdi 0xffffffff82076700
 rbp 0xffffffff8700b850     r8 0xffffffff8a97ecd0     r9 0x7e
 r10 0x0                   r11 0x0                   r12 0xffffffff82076700
 r13 0xffffffff8a97e818    r14 0xffffffff8700b860    r15 0xffffffff8a97ecd0
 rip 0xffffffff8012985d    rsp 0xffffffff8700b838 rflags 0x10292
 vector: 0xe, error code: 0x2
 8 ffffffff8700b850 (+ 232) ffffffff8012985d   <kernel_x86_64> AVLTreeBase::_RotateLeft(AVLTreeNode**) + 0x0d
 9 ffffffff8700ba90 (+ 576) ffffffff80129a49   <kernel_x86_64> AVLTreeBase::_Insert(AVLTreeNode*) + 0xd9
10 ffffffff8700baa0 (+  16) ffffffff80129ab9   <kernel_x86_64> AVLTreeBase::Insert(AVLTreeNode*) + 0x09
11 ffffffff8700bb10 (+ 112) ffffffff8112d170   </boot/system/add-ons/kernel/file_systems/ramfs> SizeIndex::Changed[clone .localalias] (Node*, long) + 0x110
12 ffffffff8700bb40 (+  48) ffffffff81121f53   </boot/system/add-ons/kernel/file_systems/ramfs> File::SetSize[clone .localalias] (long) + 0x63
13 ffffffff8700bb90 (+  80) ffffffff81124d16   </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_open(fs_volume*, fs_vnode*, int, void**) + 0x146
14 ffffffff8700bbd0 (+  64) ffffffff8010141f   <kernel_x86_64> open_vnode(vnode*, int, bool) + 0x3f
15 ffffffff8700bd70 (+ 416) ffffffff80107f68   <kernel_x86_64> create_vnode(vnode*, char const*, int, int, bool) + 0x4b8
16 ffffffff8700bec0 (+ 336) ffffffff8010ad42   <kernel_x86_64> file_create(int, char*, int, int, bool) + 0x62
17 ffffffff8700bf20 (+  96) ffffffff8011123b   <kernel_x86_64> _user_open + 0xcb
18 ffffffff8700bf30 (+  16) ffffffff8015670f   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff8700bf30 (end = 0xffffffff8700bff8)
 rax 0x71                  rbx 0x7fa7eede8a73        rcx 0x21bf2624c7c
 rdx 0x641                 rsi 0x11f15e1c38f0        rdi 0xffffff9c
 rbp 0x7fa7eede8820         r8 0x7fa7eede8a73         r9 0xffffffff
 r10 0x1a4                 r11 0x202                 r12 0x601
 r13 0x11f15e1c38f0        r14 0x1b6                 r15 0x11f15d8f0740
 rip 0x21bf2624c7c         rsp 0x7fa7eede87b8     rflags 0x202
 vector: 0x63, error code: 0x0
19 00007fa7eede8820 (+   0) 0000021bf2624c7c   </boot/system/lib/libroot.so> _kern_open + 0x0c
20 00007fa7eede8850 (+  48) 000000c9e58e450b   </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0xdd9
21 00007fa7eede88c0 (+ 112) 000000c9e592f688   </boot/system/lib/perl5/5.40.0/x> PerlIOUnix_open + 0xfe
22 00007fa7eede8950 (+ 144) 000000c9e592e6e1   </boot/system/lib/perl5/5.40.0/x> PerlIOBuf_open + 0x150
23 00007fa7eede89f0 (+ 160) 000000c9e5930ea2   </boot/system/lib/perl5/5.40.0/x> PerlIO_openn + 0x1a8
24 00007fa7eede8ac0 (+ 208) 000000c9e58e50e6   </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0x19b4
25 00007fa7eede8b40 (+ 128) 000000c9e5985a66   </boot/system/lib/perl5/5.40.0/x> Perl_sortsv (nearest) + 0x263a
26 00007fa7eede8b60 (+  32) 000000c9e59ecd26   </boot/system/lib/perl5/5.40.0/x> Perl_runops_standard + 0x16
27 00007fa7eede8cc0 (+ 352) 000000c9e58ce378   </boot/system/lib/perl5/5.40.0/x> Perl_call_sv + 0x432
28 00007fa7eede8d80 (+ 192) 000000e859615de2   </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0xe5a
29 00007fa7eede8df0 (+ 112) 000000e859617c6f   </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0x2ce7
30 00007fa7eede8e10 (+  32) 0000021bf26338d8   </boot/system/lib/libroot.so> pthread_exit (nearest) + 0x38
31 0000000000000000 (+   0) 00007fffea272258   10712:commpage@0x00007fffea272000 + 0x258
kdebug> 

Attachments (1)

Screenshot_2025-01-16_22-07-31.png (32.6 KB ) - added by dovsienko 34 hours ago.
the first KDL

Download all attachments as: .zip

Change History (15)

by dovsienko, 34 hours ago

the first KDL

comment:1 by dovsienko, 34 hours ago

For the fist KDL the diagnostics were as follows:

kdebug> bt
bt
stack trace for thread 35462 "pthread func"
    kernel stack: 0xffffffff8647d000 to 0xffffffff86482000
      user stack: 0x00007fe7d2a8f000 to 0x00007fe7d2acf000
frame                       caller             <image>:function + offset
 0 ffffffff86481090 (+  32) ffffffff800bb5f9   <kernel_x86_64> invoke_command_trampoline(void*) + 0x19
 1 ffffffff864810b0 (+  32) ffffffff80154b30   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a
 2 ffffffff86481100 (+  80) ffffffff800b8848   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 3 ffffffff86481160 (+  96) ffffffff800bb84f   <kernel_x86_64> invoke_debugger_command + 0xef
 4 ffffffff864811a0 (+  64) ffffffff800bb9e6   <kernel_x86_64> invoke_pipe_segment(debugger_command_pipe*, int, char*) + 0x116
 5 ffffffff864811f0 (+  80) ffffffff800bbb0c   <kernel_x86_64> invoke_debugger_command_pipe + 0x9c
 6 ffffffff86481230 (+  64) ffffffff800bdd66   <kernel_x86_64> ExpressionParser::_ParseCommandPipe(int&) + 0x1e6
 7 ffffffff86481270 (+  64) ffffffff800bf2cd   <kernel_x86_64> ExpressionParser::EvaluateCommand(char const*, int&) + 0x43d
 8 ffffffff86481360 (+ 240) ffffffff800bf8ff   <kernel_x86_64> evaluate_debug_command + 0x12f
 9 ffffffff864813c0 (+  96) ffffffff800b9fd8   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0x198
10 ffffffff86481410 (+  80) ffffffff800ba2ce   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
11 ffffffff86481500 (+ 240) ffffffff800ba667   <kernel_x86_64> panic + 0xb7
12 ffffffff864815f0 (+ 240) ffffffff8013b248   <kernel_x86_64> vm_page_fault + 0x298
13 ffffffff86481630 (+  64) ffffffff80160c6e   <kernel_x86_64> x86_page_fault_exception + 0x15e
14 ffffffff86481988 (+ 856) ffffffff8015640c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff86481988 (end = 0xffffffff86481a50)
 rax 0x1                   rbx 0xffffffffa3e96d20    rcx 0x0
 rdx 0xffffffffa3e96db0    rsi 0xffffffffa3758c68    rdi 0xffffffff8207c4c0
 rbp 0xffffffff86481a90     r8 0xffffffff86481930     r9 0xffffffff864819e0
 r10 0xffffffff8207c4c0    r11 0x1                   r12 0xffffffff8207c4c0
 r13 0x0                   r14 0xffffffffa3758c60    r15 0xffffffffa3e96d30
 rip 0xffffffff80129da2    rsp 0xffffffff86481a50 rflags 0x10246
 vector: 0xe, error code: 0x2
15 ffffffff86481a90 (+ 264) ffffffff80129da2   <kernel_x86_64> AVLTreeBase::_Remove(AVLTreeNode*) + 0x92
16 ffffffff86481aa0 (+  16) ffffffff80129f79   <kernel_x86_64> AVLTreeBase::Remove(AVLTreeNode*) + 0x09
17 ffffffff86481b10 (+ 112) ffffffff8112d103   </boot/system/add-ons/kernel/file_systems/ramfs> SizeIndex::Changed[clone .localalias] (Node*, long) + 0xa3
18 ffffffff86481b40 (+  48) ffffffff81121f53   </boot/system/add-ons/kernel/file_systems/ramfs> File::SetSize[clone .localalias] (long) + 0x63
19 ffffffff86481b90 (+  80) ffffffff81124d16   </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_open(fs_volume*, fs_vnode*, int, void**) + 0x146
20 ffffffff86481bd0 (+  64) ffffffff8010141f   <kernel_x86_64> open_vnode(vnode*, int, bool) + 0x3f
21 ffffffff86481d70 (+ 416) ffffffff80107f68   <kernel_x86_64> create_vnode(vnode*, char const*, int, int, bool) + 0x4b8
22 ffffffff86481ec0 (+ 336) ffffffff8010ad42   <kernel_x86_64> file_create(int, char*, int, int, bool) + 0x62
23 ffffffff86481f20 (+  96) ffffffff8011123b   <kernel_x86_64> _user_open + 0xcb
24 ffffffff86481f30 (+  16) ffffffff8015670f   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff86481f30 (end = 0xffffffff86481ff8)
 rax 0x71                  rbx 0x7fe7d2ace5f3        rcx 0xb3b6652c7c
 rdx 0x641                 rsi 0x11103bcd2a80        rdi 0xffffff9c
 rbp 0x7fe7d2ace3a0         r8 0x7fe7d2ace5f3         r9 0xffffffff
 r10 0x1a4                 r11 0x202                 r12 0x601
 r13 0x11103bcd2a80        r14 0x1b6                 r15 0x11103b3fd900
 rip 0xb3b6652c7c          rsp 0x7fe7d2ace338     rflags 0x202
 vector: 0x63, error code: 0x0
25 00007fe7d2ace3a0 (+   0) 000000b3b6652c7c   </boot/system/lib/libroot.so> _kern_open + 0x0c
26 00007fe7d2ace3d0 (+  48) 00000143f429150b   </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0xdd9
27 00007fe7d2ace440 (+ 112) 00000143f42dc688   </boot/system/lib/perl5/5.40.0/x> PerlIOUnix_open + 0xfe
28 00007fe7d2ace4d0 (+ 144) 00000143f42db6e1   </boot/system/lib/perl5/5.40.0/x> PerlIOBuf_open + 0x150
29 00007fe7d2ace570 (+ 160) 00000143f42ddea2   </boot/system/lib/perl5/5.40.0/x> PerlIO_openn + 0x1a8
30 00007fe7d2ace640 (+ 208) 00000143f42920e6   </boot/system/lib/perl5/5.40.0/x> Perl_debstack (nearest) + 0x19b4
31 00007fe7d2ace6c0 (+ 128) 00000143f4332a66   </boot/system/lib/perl5/5.40.0/x> Perl_sortsv (nearest) + 0x263a
32 00007fe7d2ace6e0 (+  32) 00000143f4399d26   </boot/system/lib/perl5/5.40.0/x> Perl_runops_standard + 0x16
33 00007fe7d2ace840 (+ 352) 00000143f427b378   </boot/system/lib/perl5/5.40.0/x> Perl_call_sv + 0x432
34 00007fe7d2ace900 (+ 192) 00000189a884fde2   </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0xe5a
35 00007fe7d2ace970 (+ 112) 00000189a8851c6f   </boot/system/lib/perl5/5.40.0/x> _init (nearest) + 0x2ce7
36 00007fe7d2ace990 (+  32) 000000b3b66618d8   </boot/system/lib/libroot.so> pthread_exit (nearest) + 0x38
37 0000000000000000 (+   0) 00007fffffca9258   2773733:commpage@0x00007fffffca9000 + 0x258
kdebug> message
message
PANIC: vm_page_fault: unhandled page fault in kernel space at 0x0, ip 0xffffffff80129da2

kdebug> 

comment:2 by waddlesplash, 34 hours ago

Description: modified (diff)
Platform: x86-64All

comment:3 by dovsienko, 34 hours ago

Forgot to add: the first time I was doing a benchmark of the script using a different number of threads:

  • 1 worker thread: 192 seconds (ran this a few times without a problem)
  • 2 worker threads: 79 seconds
  • 3 worker threads: 68 seconds
  • 4 worker threads: KDL before the completion

The second time I ran the script using 4 worker threads and the problem reproduced almost immediately. So this looks like a race condition that is much more likely to manifest when there is a busy thread for every CPU core.

comment:4 by dovsienko, 33 hours ago

Here is another stack trace, this time from the compiled single-threaded C program (filtertest), which the multi-threaded Perl script runs for every test. Looks like it is not specific to Perl or to a race condition between different threads of the same process.

PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0

Welcome to Kernel Debugging Land...
Thread 17794 "filtertest" running on CPU 3
stack trace for thread 17794 "filtertest"
    kernel stack: 0xffffffff8022f000 to 0xffffffff80234000
      user stack: 0x00007f7623bfc000 to 0x00007f7624bfc000
frame                       caller             <image>:function + offset
 0 ffffffff80233540 (+  32) ffffffff80154b30   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a
 1 ffffffff80233590 (+  80) ffffffff800b8848   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff802335f0 (+  96) ffffffff800b9f34   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4
 3 ffffffff80233640 (+  80) ffffffff800ba2ce   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff80233730 (+ 240) ffffffff800ba667   <kernel_x86_64> panic + 0xb7
 5 ffffffff80233a88 (+ 856) ffffffff8015640c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff80233a88 (end = 0xffffffff80233b50)
 rax 0xffffffff8a43eae0    rbx 0xffffffff8a43eae0    rcx 0xffffffff81139e28
 rdx 0xffffffff8a43eae0    rsi 0xffffffff8abc3390    rdi 0xffffffff820746f8
 rbp 0xffffffff80233b80     r8 0x7                    r9 0x43
 r10 0x1001                r11 0xffffffff80219200    r12 0xffffffff8abc3390
 r13 0xffffffff8abc3390    r14 0xffffffff80233b90    r15 0xcccccccc00000000
 rip 0xffffffff8112e1eb    rsp 0xffffffff80233b50 rflags 0x10286
 vector: 0xd, error code: 0x0
 6 ffffffff80233b80 (+ 248) ffffffff8112e1eb   </boot/system/add-ons/kernel/file_systems/ramfs> _ZN10AVLTreeMapI16TwoKeyAVLTreeKeyI19SizeIndexPrimaryKeyP4NodeES3_25TwoKeyAVLTreeNodeStrategyIS1_S3_S3_26SizeIndexPrimaryKeyCompare28TwoKeyAVLTreeStandardCompareIS3_E22SizeIndexGetPrimaryKey27TwoKeyAVLTreeStandardGetKeyIS3_S3_EEE12CompareNodesEPK11AVLTreeNodeSG_ + 0x1b
 7 ffffffff80233dc0 (+ 576) ffffffff801299e5   <kernel_x86_64> AVLTreeBase::_Insert(AVLTreeNode*) + 0x75
 8 ffffffff80233dd0 (+  16) ffffffff80129ab9   <kernel_x86_64> AVLTreeBase::Insert(AVLTreeNode*) + 0x09
 9 ffffffff80233e40 (+ 112) ffffffff8112d170   </boot/system/add-ons/kernel/file_systems/ramfs> SizeIndex::Changed[clone .localalias] (Node*, long) + 0x110
10 ffffffff80233e70 (+  48) ffffffff8112217b   </boot/system/add-ons/kernel/file_systems/ramfs> File::WriteAt(long, void const*, unsigned long, unsigned long*) + 0x6b
11 ffffffff80233ec0 (+  80) ffffffff811269e2   </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_write(fs_volume*, fs_vnode*, void*, long, void const*, unsigned long*) + 0x102
12 ffffffff80233f20 (+  96) ffffffff800f350f   <kernel_x86_64> common_user_io(int, long, void*, unsigned long, bool) + 0x14f
13 ffffffff80233f30 (+  16) ffffffff8015670f   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff80233f30 (end = 0xffffffff80233ff8)
 rax 0x96                  rbx 0x7b                  rcx 0x63ab41cecc
 rdx 0x10c3c3ad9340        rsi 0xffffffffffffffff    rdi 0x1
 rbp 0x7f7624bfb4a0         r8 0x0                    r9 0x24bf7156c0
 r10 0x7b                  r11 0x206                 r12 0x10c3c3ad9340
 r13 0x7b                  r14 0x63ab4f7680          r15 0x40000
 rip 0x63ab41cecc          rsp 0x7f7624bfb488     rflags 0x206
 vector: 0x63, error code: 0x0
14 00007f7624bfb4a0 (+   0) 00000063ab41cecc   </boot/system/lib/libroot.so> _kern_write + 0x0c
15 00007f7624bfb4d0 (+  48) 00000063ab45141a   </boot/system/lib/libroot.so> _IO_new_file_write + 0x3a
16 00007f7624bfb500 (+  48) 00000063ab450de1   </boot/system/lib/libroot.so> _IO_file_setbuf (nearest) + 0x81
17 00007f7624bfb520 (+  32) 00000063ab451b91   </boot/system/lib/libroot.so> _IO_do_write + 0x21
18 00007f7624bfb540 (+  32) 00000063ab451ff0   </boot/system/lib/libroot.so> _IO_file_sync + 0x90
19 00007f7624bfb560 (+  32) 00000063ab454107   </boot/system/lib/libroot.so> fflush_unlocked + 0x27
20 00007f7624bfb590 (+  48) 000001597e86cffa   </boot/system/lib/libstdc++.so.6> std::basic_ostream<char, std::char_traits<char> >::flush() + 0x4a
21 00007f7624bfb5a0 (+  16) 000001597e7f9e9e   </boot/system/lib/libstdc++.so.6> std::ios_base::Init::~Init() + 0x3e
22 00007f7624bfb5e0 (+  64) 00000063ab494e1c   </boot/system/lib/libroot.so> __cxa_finalize + 0x15c
23 00007f7624bfb600 (+  32) 00000063ab494fd7   </boot/system/lib/libroot.so> exit + 0x17
24 00007f7624bfb6f0 (+ 240) 000000a702bc9531   </boot/home/gitrepos/libpcap/tes> _init (nearest) + 0xb69
25 00007f7624bfb720 (+  48) 000000a702bc982e   </boot/home/gitrepos/libpcap/tes> _start + 0x3e
26 00007f7624bfb750 (+  48) 000000c1a018de05   </boot/system/runtime_loader> runtime_loader + 0x115
27 0000000000000000 (+   0) 00007fffffc15258   1464226:commpage@0x00007fffffc15000 + 0x258

comment:5 by dovsienko, 32 hours ago

Here is another variety of the stack trace. This time the first run of the script was successful, but the second run triggered the problem as early as creating a temporary directory.

PANIC: vm_page_fault: unhandled page fault in kernel space at 0x0, ip 0xffffffff801297ed

Welcome to Kernel Debugging Land...
Thread 9473 "perl" running on CPU 3
stack trace for thread 9473 "perl"
    kernel stack: 0xffffffff8183d000 to 0xffffffff81842000
      user stack: 0x00007fd39d969000 to 0x00007fd39e969000
frame                       caller             <image>:function + offset
 0 ffffffff818411d0 (+  32) ffffffff80154b30   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a
 1 ffffffff81841220 (+  80) ffffffff800b8848   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff81841280 (+  96) ffffffff800b9f34   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4
 3 ffffffff818412d0 (+  80) ffffffff800ba2ce   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff818413c0 (+ 240) ffffffff800ba667   <kernel_x86_64> panic + 0xb7
 5 ffffffff818414b0 (+ 240) ffffffff8013b248   <kernel_x86_64> vm_page_fault + 0x298
 6 ffffffff818414f0 (+  64) ffffffff80160c6e   <kernel_x86_64> x86_page_fault_exception + 0x15e
 7 ffffffff81841848 (+ 856) ffffffff8015640c   <kernel_x86_64> int_bottom + 0x80
kernel iframe at 0xffffffff81841848 (end = 0xffffffff81841910)
 rax 0x0                   rbx 0xffffffff81841930    rcx 0xffffffff820796f8
 rdx 0xffffffff8a94a278    rsi 0xffffffff82079708    rdi 0xffffffff82079760
 rbp 0xffffffff81841920     r8 0xffffffff82079760     r9 0xffffffff82079760
 r10 0x0                   r11 0x0                   r12 0xffffffff82079760
 r13 0xffffffff8a94a278    r14 0xffffffff81841930    r15 0xffffffff82079760
 rip 0xffffffff801297ed    rsp 0xffffffff81841918 rflags 0x10282
 vector: 0xe, error code: 0x2
 8 ffffffff81841920 (+ 216) ffffffff801297ed   <kernel_x86_64> AVLTreeBase::_RotateRight(AVLTreeNode**) + 0x0d
 9 ffffffff81841b60 (+ 576) ffffffff80129a78   <kernel_x86_64> AVLTreeBase::_Insert(AVLTreeNode*) + 0x108
10 ffffffff81841b70 (+  16) ffffffff80129ab9   <kernel_x86_64> AVLTreeBase::Insert(AVLTreeNode*) + 0x09
11 ffffffff81841ba0 (+  48) ffffffff8112cf50   </boot/system/add-ons/kernel/file_systems/ramfs> non-virtual thunk to SizeIndex::NodeAdded(Node*)[clone .localalias] () + 0x70
12 ffffffff81841c10 (+ 112) ffffffff8112f664   </boot/system/add-ons/kernel/file_systems/ramfs> Volume::NodeAdded[clone .localalias] (Node*) + 0xc4
13 ffffffff81841c40 (+  48) ffffffff8112f888   </boot/system/add-ons/kernel/file_systems/ramfs> Volume::PublishVNode[clone .localalias] (Node*) + 0x18
14 ffffffff81841c70 (+  48) ffffffff8112966c   </boot/system/add-ons/kernel/file_systems/ramfs> Node::Link[clone .localalias] (Entry*) + 0xbc
15 ffffffff81841ca0 (+  48) ffffffff811217dd   </boot/system/add-ons/kernel/file_systems/ramfs> Entry::Link[clone .localalias] (Node*) + 0x2d
16 ffffffff81841ce0 (+  64) ffffffff81120f2b   </boot/system/add-ons/kernel/file_systems/ramfs> Directory::CreateEntry[clone .localalias] (Node*, char const*, Entry**) + 0x9b
17 ffffffff81841d20 (+  64) ffffffff811212f8   </boot/system/add-ons/kernel/file_systems/ramfs> Directory::CreateDirectory(char const*, Directory**) + 0xc8
18 ffffffff81841d90 (+ 112) ffffffff811260f7   </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_create_dir(fs_volume*, fs_vnode*, char const*, int) + 0x1a7
19 ffffffff81841ed0 (+ 320) ffffffff8010b398   <kernel_x86_64> dir_create(int, char*, int, bool) + 0x68
20 ffffffff81841f20 (+  80) ffffffff80111964   <kernel_x86_64> _user_create_dir + 0x84
21 ffffffff81841f30 (+  16) ffffffff8015670f   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff81841f30 (end = 0xffffffff81841ff8)
 rax 0x7a                  rbx 0x11ceb24f3340        rcx 0xc072cded0c
 rdx 0x1c0                 rsi 0x11ceb2bc45a0        rdi 0xffffff9c
 rbp 0x7fd39e9687d0         r8 0x0                    r9 0x0
 r10 0x6d                  r11 0x206                 r12 0x0
 r13 0x7fd39e969660        r14 0x11ceb2bec198        r15 0x11ceb2912b38
 rip 0xc072cded0c          rsp 0x7fd39e9687b8     rflags 0x206
 vector: 0x63, error code: 0x0
22 00007fd39e9687d0 (+   0) 000000c072cded0c   </boot/system/lib/libroot.so> _kern_create_dir + 0x0c
23 00007fd39e968830 (+  96) 0000018ab3a19447   </boot/system/lib/perl5/5.40.0/x> Perl_setdefout (nearest) + 0x7aca
24 00007fd39e968850 (+  32) 0000018ab3a76d26   </boot/system/lib/perl5/5.40.0/x> Perl_runops_standard + 0x16
25 00007fd39e9688e0 (+ 144) 0000018ab3961196   </boot/system/lib/perl5/5.40.0/x> perl_run + 0x282
26 00007fd39e968930 (+  80) 000001448a514db8   </boot/system/bin/perl> main + 0xd3
27 00007fd39e968960 (+  48) 000001448a514cbe   </boot/system/bin/perl> _start + 0x3e
28 00007fd39e968990 (+  48) 000001b09aed3e05   </boot/system/runtime_loader> runtime_loader + 0x115
29 0000000000000000 (+   0) 00007ffffeff9258   769484:commpage@0x00007ffffeff9000 + 0x258
kdebug> 

comment:6 by waddlesplash, 28 hours ago

Milestone: UnscheduledR1/beta6
Resolution: fixed
Status: newclosed

Fixed in hrev58540.

comment:7 by dovsienko, 21 hours ago

Thank you for looking into this. hrev58541 still experiences a KDL under the same conditions, although it looks different:

kdebug> message
message
PANIC: remove page 0xffffffff83654070 from cache 0xffffffff8af5c128: page still has mappings!

kdebug> 
kdebug> bt
bt
stack trace for thread 71237 "pthread func"
    kernel stack: 0xffffffff85a80000 to 0xffffffff85a85000
frame                       caller             <image>:function + offset
 0 ffffffff85a84510 (+  32) ffffffff800bb609   <kernel_x86_64> invoke_command_trampoline(void*) + 0x19
 1 ffffffff85a84530 (+  32) ffffffff80154b70   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a
 2 ffffffff85a84580 (+  80) ffffffff800b8858   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 3 ffffffff85a845e0 (+  96) ffffffff800bb85f   <kernel_x86_64> invoke_debugger_command + 0xef
 4 ffffffff85a84620 (+  64) ffffffff800bb9f6   <kernel_x86_64> invoke_pipe_segment(debugger_command_pipe*, int, char*) + 0x116
 5 ffffffff85a84670 (+  80) ffffffff800bbb1c   <kernel_x86_64> invoke_debugger_command_pipe + 0x9c
 6 ffffffff85a846b0 (+  64) ffffffff800bdd76   <kernel_x86_64> ExpressionParser::_ParseCommandPipe(int&) + 0x1e6
 7 ffffffff85a846f0 (+  64) ffffffff800bf2dd   <kernel_x86_64> ExpressionParser::EvaluateCommand(char const*, int&) + 0x43d
 8 ffffffff85a847e0 (+ 240) ffffffff800bf90f   <kernel_x86_64> evaluate_debug_command + 0x12f
 9 ffffffff85a84840 (+  96) ffffffff800b9fe8   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0x198
10 ffffffff85a84890 (+  80) ffffffff800ba2de   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
11 ffffffff85a84980 (+ 240) ffffffff800ba677   <kernel_x86_64> panic + 0xb7
12 ffffffff85a849c0 (+  64) ffffffff8014eb02   <kernel_x86_64> VMCache::Delete() + 0x82
13 ffffffff85a84a00 (+  64) ffffffff8014ec3f   <kernel_x86_64> VMCache::Delete() + 0x1bf
14 ffffffff85a84a50 (+  80) ffffffff8012e65d   <kernel_x86_64> delete_area(VMAddressSpace*, VMArea*, bool, bool) + 0x17d
15 ffffffff85a84a90 (+  64) ffffffff80130e1e   <kernel_x86_64> vm_delete_areas + 0xce
16 ffffffff85a84f20 (+1168) ffffffff800868ce   <kernel_x86_64> _user_exec + 0x31e
17 ffffffff85a84f30 (+  16) ffffffff8015674f   <kernel_x86_64> x86_64_syscall_entry + 0xfb
user iframe at 0xffffffff85a84f30 (end = 0xffffffff85a84ff8)
 rax 0x2e                  rbx 0x0                   rcx 0xc1dacab84c
 rdx 0x12af                rsi 0x10617f2e83d0        rdi 0xdb4e042a68
 rbp 0x7fb4547fa6e0         r8 0x29                   r9 0x12
 r10 0x3                   r11 0x246                 r12 0xdb4e042a68
 r13 0x0                   r14 0x7fb4547fa5b0        r15 0x0
 rip 0xc1dacab84c          rsp 0x7fb4547fa578     rflags 0x246
 vector: 0x63, error code: 0x0
18 00007fb4547fa6e0 (+   0) 000000c1dacab84c   
19 00007fb4547fa770 (+ 144) 000000c1dad2ef3e   
20 00007fb4547fa7e0 (+ 112) 000000db4decb5c2   
21 00007fb4547fa8a0 (+ 192) 000000db4df7387b   
22 00007fb4547fa8c0 (+  32) 000000db4dfcfd26   
23 00007fb4547faa20 (+ 352) 000000db4deb1378   
24 00007fb4547faae0 (+ 192) 00000003313bfde2   
25 00007fb4547fab50 (+ 112) 00000003313c1c6f   
26 00007fb4547fab70 (+  32) 000000c1dacba8d8   
27 0000000000000000 (+   0) 00007ffffc5b9258   6006539:commpage@0x00007ffffc5b9000 + 0x258
kdebug> 

comment:8 by dovsienko, 20 hours ago

The previous stack trace reproduces using the following:

while TESTRUN_JOBS=16 ./testprogs/TESTrun; do date; done

After a few normal rounds the script starts and spawns 16 threads, and the KDL pops up before it gets the first result from the first thread:

Skipped tests:
    invalid_ip6_host_disabled               : IPv6 syntax enabled
    invalid_ip6_net_disabled                : IPv6 syntax enabled

------------------------------------------------
   2 tests skipped
   0 tests failed
 924 tests passed
Fri Jan 17 12:56:47 GMT 2025
INFO: Using a test timeout of 1.
(*** KDL ***)

comment:9 by waddlesplash, 14 hours ago

Thus far, I haven't managed to reproduce that KDL here, in a VM with 4 cores at least. The script has run for a while and no KDL to be seen. It may be better to open a new ticket for that, as it's likely unrelated to ramfs even.

comment:10 by waddlesplash, 14 hours ago

OK, it took a while but it finally reproduced here.

initial commands: page 0xffffffff840562b0; cache 0xffffffff9081d2b8
PAGE: 0xffffffff840562b0
queue_next,prev: 0xffffffff83f52710, 0xffffffff84145270
physical_number: 0x3456f
cache:           0xffffffff9081d2b8
cache_offset:    7727
cache_next:      0xffffffff840ee8d0
state:           active
wired_count:     0
usage_count:     0
busy:            0
busy_writing:    0
accessed:        0
modified:        0
accessor:        -1
area mappings:
  0xffffffff908382a8 (4359836)
CACHE 0xffffffff9081d2b8:
  ref_count:    0
  source:       0x0000000000000000
  type:         RAM
  virtual_base: 0x0
  virtual_end:  0x1930000
  temporary:    1
  lock:         0xffffffff9081d340
  lock.holder:  50404
  areas:
  consumers:
  pages:
	6 in cache
AREA: 0xffffffff908382a8
name:		'heap'
owner:		0xc4d4
id:		0x42869c
base:		0x11873179c000
size:		0x2070000
protection:	0x33
page_protection:0x0000000000000000
wiring:		0x0
memory_type:	0x0
cache:		0xffffffff8ff14828
cache_type:	RAM
cache_offset:	0x0
cache_next:	0x0000000000000000
cache_prev:	0x0000000000000000
page mappings:	8186
kdebug> 

comment:11 by waddlesplash, 13 hours ago

CACHE 0xffffffff8ff14828:
  ref_count:    2
  source:       0xffffffff9086adf0
  type:         RAM
  virtual_base: 0x0
  virtual_end:  0x2070000
  temporary:    1
  lock:         0xffffffff8ff148b0
  lock.holder:  50395
  areas:
    area 0x42869c, heap
	base_addr:  0x11873179c000, size: 0x2070000
	protection: 0x33
	owner:      0xc4d4
  consumers:
  pages:
	1718 in cache

comment:12 by waddlesplash, 13 hours ago

CACHE 0xffffffff9086adf0:
  ref_count:    2
  source:       0x0000000000000000
  type:         RAM
  virtual_base: 0x0
  virtual_end:  0x1d10000
  temporary:    1
  lock:         0xffffffff9086ae78
  lock.holder:  50408
  areas:
  consumers:
	0xffffffff9085bcc8
	0xffffffff8ff14828
  pages:
	7355 in cache

comment:13 by waddlesplash, 13 hours ago

Seems another thread is in the process of removing a consumer:

PANIC: remove page 0xffffffff840562b0 from cache 0xffffffff9081d2b8: page still has mappings!

Welcome to Kernel Debugging Land...
Thread 50404 "pthread func" running on CPU 0
stack trace for thread 50404 "pthread func"
    kernel stack: 0xffffffff827bd000 to 0xffffffff827c2000
frame                       caller             <image>:function + offset
 0 ffffffff827c1790 (+  32) ffffffff80154c50   <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a
 1 ffffffff827c17e0 (+  80) ffffffff800b8858   <kernel_x86_64> debug_call_with_fault_handler + 0x78
 2 ffffffff827c1840 (+  96) ffffffff800b9f44   <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4
 3 ffffffff827c1890 (+  80) ffffffff800ba2de   <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e
 4 ffffffff827c1980 (+ 240) ffffffff800ba677   <kernel_x86_64> panic + 0xb7
 5 ffffffff827c19c0 (+  64) ffffffff8014eb92   <kernel_x86_64> VMCache::Delete() + 0x82
 6 ffffffff827c1a00 (+  64) ffffffff8014eccf   <kernel_x86_64> VMCache::Delete() + 0x1bf
 7 ffffffff827c1a50 (+  80) ffffffff8012e6ed   <kernel_x86_64> delete_area(VMAddressSpace*, VMArea*, bool, bool) + 0x17d
 8 ffffffff827c1a90 (+  64) ffffffff80130eae   <kernel_x86_64> vm_delete_areas + 0xce
 9 ffffffff827c1f20 (+1168) ffffffff800868ce   <kernel_x86_64> _user_exec + 0x31e

vs.

stack trace for thread 50408 "pthread func"
    kernel stack: 0xffffffff827d1000 to 0xffffffff827d6000
frame                       caller             <image>:function + offset
 0 ffffffff827d5910 (+ 144) ffffffff800a3a7e   <kernel_x86_64> reschedule(int) + 0x69e
 1 ffffffff827d5940 (+  48) ffffffff8008de06   <kernel_x86_64> thread_block + 0xc6
 2 ffffffff827d5990 (+  80) ffffffff8009daa1   <kernel_x86_64> _mutex_lock(mutex*, void*) + 0xd1
 3 ffffffff827d59c0 (+  48) ffffffff8014ea24   <kernel_x86_64> VMCache::_RemoveConsumer(VMCache*) + 0x54
 4 ffffffff827d5a00 (+  64) ffffffff8014eccf   <kernel_x86_64> VMCache::Delete() + 0x1bf
 5 ffffffff827d5a50 (+  80) ffffffff8012e6ed   <kernel_x86_64> delete_area(VMAddressSpace*, VMArea*, bool, bool) + 0x17d
 6 ffffffff827d5a90 (+  64) ffffffff80130eae   <kernel_x86_64> vm_delete_areas + 0xce
 7 ffffffff827d5f20 (+1168) ffffffff800868ce   <kernel_x86_64> _user_exec + 0x31e
 8 ffffffff827d5f30 (+  16) ffffffff8015682f   <kernel_x86_64> x86_64_syscall_entry + 0xfb

comment:14 by waddlesplash, 12 hours ago

I guess the question here is how cache 0xffffffff9086adf0 wound up with no source. Presumably its source should be 0xffffffff9081d2b8, so how did it lose that?

Note: See TracTickets for help on using tickets.