Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#9715 closed bug (fixed)

bfs crash while running checkfs

Reported by: bonefish Owned by: axeld
Priority: normal Milestone: R1
Component: File Systems/BFS Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

vm_soft_fault: va 0x0 not covered by area in address space
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x8, ip 0x8203ee91, write 1, user 0, thread 0x2d7
PANIC: vm_page_fault: unhandled page fault in kernel space at 0x8, ip 0x8203ee91

Welcome to Kernel Debugging Land...
Thread 727 "checkfs" running on CPU 4
stack trace for thread 727 "checkfs"
    kernel stack: 0xcce54000 to 0xcce58000
      user stack: 0x62737000 to 0x63737000
frame               caller     <image>:function + offset
 0 cce57738 (+  32) 801328d2   <kernel_x86> arch_debug_stack_trace + 0x12
 1 cce57758 (+  16) 80092c03   <kernel_x86> stack_trace_trampoline(NULL) + 0x0b
 2 cce57768 (+  12) 80125372   <kernel_x86> arch_debug_call_with_fault_handler + 0x1b
 3 cce57774 (+  48) 800946a6   <kernel_x86> debug_call_with_fault_handler + 0x5e
 4 cce577a4 (+  64) 80092e23   <kernel_x86> kernel_debugger_loop(0x801716f7 "PANIC: ", 0x80187560 "vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx
", 0xcce57850 ", int32: 4) + 0x21b
 5 cce577e4 (+  48) 80093187   <kernel_x86> kernel_debugger_internal(0x801716f7 "PANIC: ", 0x80187560 "vm_page_fault: unhandled page fault in kernel space at 0x%lx, ip 0x%lx
", 0xcce57850 ", int32: 4) + 0x53
 6 cce57814 (+  48) 80094a32   <kernel_x86> panic + 0x36
 7 cce57844 (+ 144) 80109e95   <kernel_x86> vm_page_fault + 0x145
 8 cce578d4 (+  80) 8013410f   <kernel_x86> x86_page_fault_exception + 0x18b
 9 cce57924 (+  12) 80127e20   <kernel_x86> int_bottom + 0x30
kernel iframe at 0xcce57930 (end = 0xcce57980)
 eax 0x8204fe80    ebx 0x8204fb34     ecx 0x0         edx 0x0
 esi 0xcc7ff048    edi 0x0            ebp 0xcce57984  esp 0xcce57964
 eip 0x8203ee91 eflags 0x10282   
 vector: 0xe, error code: 0x2
10 cce57930 (+  84) 8203ee91   <bfs> __19TransactionListener + 0x19
11 cce57984 (+  48) 8202b764   <bfs> __9BPlusTreeP5Inode + 0x24
12 cce579b4 (+  48) 82035059   <bfs> __5InodeP6Volumex + 0xf5
13 cce579e4 (+  96) 820443f8   <bfs> bfs_get_vnode(fs_volume*: 0xcd5bdd20, int64: 4246495, fs_vnode*: 0xcc7fb4e0, 0xcce57a8c, 0xcce57a90, true) + 0x208
14 cce57a44 (+  96) 800d88de   <kernel_x86> get_vnode(int32: 4, int64: 4246495, vnode*: 0xcce57af0, true, int32: 1) + 0x356
15 cce57aa4 (+  80) 800dd4e7   <kernel_x86> get_vnode + 0x3f
16 cce57af4 (+  48) 8202aad1   <bfs> Vnode<0xcce57cc4>::SetTo(Volume*: 0x8322e100, int64: 4246495) + 0x5d
17 cce57b24 (+ 432) 820285f6   <bfs> BlockAllocator<0x8322e318>::CheckNextNode(check_control*: 0xcce57d10) + 0x626
18 cce57cd4 (+ 448) 82045272   <bfs> bfs_ioctl(fs_volume*: 0xcd5bdd20, fs_vnode*: 0xd33c5ce8, 0xcfad8410, uint32: 0x377b (14203), 0x6373655c, uint32: 0x184 (388)) + 0x116
19 cce57e94 (+  64) 800e13b4   <kernel_x86> common_ioctl(file_descriptor*: 0xd3770c20, uint32: 0x377b (14203), 0x6373655c, uint32: 0x184 (388)) + 0x38
20 cce57ed4 (+  48) 800cbf97   <kernel_x86> fd_ioctl(false, int32: 5, uint32: 0x377b (14203), 0x6373655c, uint32: 0x184 (388)) + 0x5b
21 cce57f04 (+  64) 800ccd7c   <kernel_x86> _user_ioctl + 0x58
22 cce57f44 (+ 100) 80128010   <kernel_x86> handle_syscall + 0xcd
user iframe at 0xcce57fa8 (end = 0xcce58000)
 eax 0x8e          ebx 0x152f048      ecx 0x63736484  edx 0x60f7e114
 esi 0x5           edi 0x0            ebp 0x637364b0  esp 0xcce57fdc
 eip 0x60f7e114 eflags 0x3202    user esp 0x63736484
 vector: 0x63, error code: 0x0
23 cce57fa8 (+   0) 60f7e114   <commpage> commpage_syscall + 0x04
24 637364b0 (+ 720) 02b7be97   <bfs> BFSPartitionHandle<0x2d79030>::Repair(false) + 0x453
25 63736780 (+  48) 00999b09   <libbe.so> BPartition::Delegate<0x2d79510>::Repair(false) + 0x39
26 637367b0 (+  48) 00997fdc   <libbe.so> BPartition<0x2d9dca8>::Repair(BPartition: 0x63730000, true) + 0x30
27 637367e0 (+ 192) 0157c57e   <_APP_> main + 0x3ee
28 637368a0 (+  48) 0157c037   <_APP_> _start + 0x5b
29 637368d0 (+  48) 0124c506   </boot/system/runtime_loader@0x0123d000> <unknown> + 0xf506
30 63736900 (+   0) 60f7e250   <commpage> commpage_thread_exit + 0x00}}}

The culprit seems to be this new operator. Apparently the non-nothrow new, if hacked to not throw anymore, simply calls the constructor on a NULL pointer when running out of memory.

This is gcc 2 at hrev45561.

Change History (4)

comment:1 by anevilyak, 6 years ago

Is there any possibility this was a side effect of #9714 ?

comment:2 by bonefish, 6 years ago

I haven't checked the disassembly to verify, but as written in the description, I think the cause is simply the code gcc 2 generates for the non-nothrow new we use in the kernel. The kernel really ran out of heap (due to running out of address space) in my test.

comment:3 by axeld, 6 years ago

Resolution: fixed
Status: newclosed

I've removed the dependency to kernel_cpp.h, and all remaining such new operators in hrev45634. I've noticed that there are other users of this header, though, and some even copy it. Should we create a new ticket to track its removal, or use this one?

in reply to:  3 comment:4 by bonefish, 6 years ago

Replying to axeld:

I've noticed that there are other users of this header, though, and some even copy it. Should we create a new ticket to track its removal, or use this one?

Since you don't seem to have to come to a decision yet: a new ticket would be nice. :-)

Note: See TracTickets for help on using tickets.