Opened 2 months ago
Last modified 45 hours ago
#19252 reopened bug
[ramfs] PANIC: vm_page_fault: unhandled page fault in kernel space
Reported by: | bipolar | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta6 |
Component: | File Systems/RAMFS | Version: | R1/beta5 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
While using haikuporter to build the mozc
recipe (with OUTPUT_DIRECTORY
pointing to a RAMFS mount), on beta5+125, 64 bits, bare-metal, I got the attached KDL (syslog didn't retained the info upon reboot).
The build had almost ended, but haikuporter was unable to unmount the system
volume from the chroot, so I did it manually, removed a boot/system
dir that remained on the chroot, tried to run hp -F mozc
, and that is where I got the KDL.
Attachments (1)
Change History (9)
by , 2 months ago
Attachment: | kdl-ramfs-vm_page_fault.jpg added |
---|
comment:1 by , 2 months ago
comment:2 by , 2 months ago
FWIW, I have 8 GB of RAM on that machine, and while building the recipe, RAM usage topped at 3.2 GB. The largest I've seen for that particular work dir during build, was about 1 GB. Even doubling that... should still have a couple of GBs left. But I have no idea how accurate those numbers are (of if there are other "hidden" uses of RAM).
comment:3 by , 2 months ago
Hmm, then maybe not.
Did "hp -F mozc" on a RAMFS twice now, didn't get this KDL so far. I looked at the codepath here and it acquires a write lock, as it should, and does NULL checks in the relevant places. So I'm not sure how this could happen.
I have a few refactors I'll push, though.
comment:4 by , 2 days ago
Milestone: | Unscheduled → R1/beta6 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Should be fixed in hrev58540. If it's not, hopefully the new assertions added in that commit will catch the real problem.
comment:5 by , 2 days ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Hmm, or maybe not. I can actually reproduce this reliably now, if I try to delete the haikuports directory in ramfs after the build finishes, even if it finishes successfully; only it appears to be a use-after-free.
comment:6 by , 45 hours ago
It might be a race condition of some kind, I rebuilt ramfs in debug mode and now it doesn't seem to happen. Or at least, it didn't the one time I ran a full mozc build and then rm -rf'd again.
A smaller reproducer (if possible) would make this much easier to debug...
comment:7 by , 45 hours ago
For reference, the stack trace of the problem I was seeing is this:
PANIC: Unexpected exception "General Protection Exception" occurred in kernel mode! Error code: 0x0 Welcome to Kernel Debugging Land... Thread 40876 "rm" running on CPU 2 stack trace for thread 40876 "rm" kernel stack: 0xffffffff82013000 to 0xffffffff82018000 user stack: 0x00007f958ac84000 to 0x00007f958bc84000 frame caller <image>:function + offset 0 ffffffff82016f80 (+ 32) ffffffff80154c50 <kernel_x86_64> arch_debug_call_with_fault_handler + 0x1a 1 ffffffff82016fd0 (+ 80) ffffffff800b8858 <kernel_x86_64> debug_call_with_fault_handler + 0x78 2 ffffffff82017030 (+ 96) ffffffff800b9f44 <kernel_x86_64> kernel_debugger_loop(char const*, char const*, __va_list_tag*, int) + 0xf4 3 ffffffff82017080 (+ 80) ffffffff800ba2de <kernel_x86_64> kernel_debugger_internal(char const*, char const*, __va_list_tag*, int) + 0x6e 4 ffffffff82017170 (+ 240) ffffffff800ba677 <kernel_x86_64> panic + 0xb7 5 ffffffff820174f8 (+ 904) ffffffff8015652c <kernel_x86_64> int_bottom + 0x80 kernel iframe at 0xffffffff820174f8 (end = 0xffffffff820175c0) rax 0xdeadbeefdeadbeef rbx 0xffffffff82017970 rcx 0x3 rdx 0xffffffff820175d8 rsi 0x0 rdi 0xffffffffa017b780 rbp 0xffffffff82017820 r8 0xffffffff820176d8 r9 0xffffffff8a771760 r10 0xffffffff820175d8 r11 0x0 r12 0xffffffffa34ddba0 r13 0xffffffffa25e1f50 r14 0xffffffff820175d0 r15 0xffffffff8286ad40 rip 0xffffffff8a7547ed rsp 0xffffffff820175c8 rflags 0x10246 vector: 0xd, error code: 0x0 6 ffffffff82017820 (+ 808) ffffffff8a7547ed </boot/system/add-ons/kernel/file_systems/ramfs> Attribute::GetKey(unsigned char*, unsigned long*) + 0x1d 7 ffffffff82017850 (+ 48) ffffffff8012a001 <kernel_x86_64> AVLTreeBase::Remove(void const*) + 0x41 8 ffffffff82017ab0 (+ 608) ffffffff8a755164 </boot/system/add-ons/kernel/file_systems/ramfs> AttributeIndexImpl::Removed[clone .localalias] (Attribute*) + 0xe4 9 ffffffff82017bf0 (+ 320) ffffffff8a7679f3 </boot/system/add-ons/kernel/file_systems/ramfs> Volume::NodeAttributeRemoved(long, Attribute*) + 0x53 10 ffffffff82017c30 (+ 64) ffffffff8a760ca4 </boot/system/add-ons/kernel/file_systems/ramfs> Node::RemoveAttribute[clone .localalias] (Attribute*) + 0xd4 11 ffffffff82017c50 (+ 32) ffffffff8a760e1b </boot/system/add-ons/kernel/file_systems/ramfs> Node::~Node[clone .localalias] () + 0x3b 12 ffffffff82017c70 (+ 32) ffffffff8a758f79 </boot/system/add-ons/kernel/file_systems/ramfs> File::~File[clone .localalias] () + 0x39 13 ffffffff82017c90 (+ 32) ffffffff8a75a989 </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_remove_vnode(fs_volume*, fs_vnode*, bool) + 0x39 14 ffffffff82017cd0 (+ 64) ffffffff80102734 <kernel_x86_64> free_vnode(vnode*, bool) + 0xb4 15 ffffffff82017d20 (+ 80) ffffffff8010424b <kernel_x86_64> dec_vnode_ref_count[clone .isra.0] (vnode*, bool, bool) + 0x33b 16 ffffffff82017d40 (+ 32) ffffffff8010bf37 <kernel_x86_64> put_vnode + 0x97 17 ffffffff82017d90 (+ 80) ffffffff8a75c92b </boot/system/add-ons/kernel/file_systems/ramfs> ramfs_unlink(fs_volume*, fs_vnode*, char const*) + 0x1bb 18 ffffffff82017ed0 (+ 320) ffffffff8010b160 <kernel_x86_64> common_unlink(int, char*, bool) + 0x60 19 ffffffff82017f20 (+ 80) ffffffff8011211c <kernel_x86_64> _user_unlink + 0x7c 20 ffffffff82017f30 (+ 16) ffffffff8015682f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff82017f30 (end = 0xffffffff82017ff8) rax 0x7f rbx 0x10a4d4e30cd0 rcx 0x172a8a44d5c rdx 0x0 rsi 0x10a4d4e30dc0 rdi 0x7 rbp 0x7f958bc83650 r8 0x3 r9 0x7f958bc8372c r10 0x61ff7278 r11 0x246 r12 0x0 r13 0x10a4d4b14300 r14 0x7f958bc83800 r15 0x10a4d4e30cd0 rip 0x172a8a44d5c rsp 0x7f958bc83638 rflags 0x246 vector: 0x63, error code: 0x0 21 00007f958bc83650 (+ 0) 00000172a8a44d5c </boot/system/lib/libroot.so> _kern_unlink + 0x0c 22 00007f958bc83700 (+ 176) 0000012909c64b2d </boot/system/bin/rm> usage (nearest) + 0x44d 23 00007f958bc837e0 (+ 224) 0000012909c65276 </boot/system/bin/rm> rm + 0x106 24 00007f958bc838d0 (+ 240) 0000012909c64457 </boot/system/bin/rm> main + 0x3e7 25 00007f958bc83900 (+ 48) 0000012909c646cf </boot/system/bin/rm> _start + 0x3f 26 00007f958bc83930 (+ 48) 000000622d621e05 </boot/system/runtime_loader> runtime_loader + 0x115 27 0000000000000000 (+ 0) 00007ffd83c02258 2124931:commpage@0x00007ffd83c02000 + 0x258
comment:8 by , 45 hours ago
I did have an AssertWriteLocked() in Remove, and we check there if the attribute's index == this. I am not sure which attribute is the one GetKey is being invoked on: this one, or some other in the tree? I also added an ASSERT(fIndex == NULL); inside ~Attribute but this also did not fire.
Looks like a NULL dereference after OOM.