Opened 6 years ago

Closed 2 years ago

#9839 closed bug (fixed)

VFS: race condition between get_vnode() and new_vnode()/publish_vnode()

Reported by: bonefish Owned by: axeld
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: vfs Cc:
Blocked By: Blocking: #13390
Has a Patch: no Platform: All

Description (last modified by bonefish)

There is a race condition between VFS's get_vnode() (e.g. in response to _user_open_dir_entry_ref()) and new_vnode() or publish_vnode(): Since get_vnode() first calls create_new_vnode_and_lock() and afterwards the FS's get_vnode() hook, there's a time window where the vnode exists, but the respective node may not exist in the FS. If the node doesn't actually exist and concurrently another thread creates a new file or directory with that node ID, the new_vnode() or publish_vnode() it calls will encounter the existing vnode and fail (the former will panic()).

Not sure what a good solution would be. new_vnode()/publish_vnode() could wait until the vnode becomes unbusy or disappears (a new marker flag might be needed to distinguish the case from others where the node is busy). That might be problematic, however, since the caller may hold a FS specific lock, so that we'd risk a deadlock with the concurrent get_vnode() hook which may also want to acquire that lock.

An IMO cleaner alternative would be to change the interface and semantics of the get_vnode() hook. It would be required to create the vnode (via publish_vnode() or a new interface). We would have to push the burden of dealing with concurrent invocations to the hook, though.

This issue is somewhat related to #5262 in that there the problem is also that create_new_vnode_and_lock() reports a pre-existing vnode, although in that case the vnode is about to be deleted and already no longer known to the FS (cf. ticket:5262#comment:24).

Change History (6)

comment:1 Changed 6 years ago by bonefish

Description: modified (diff)

comment:2 Changed 6 years ago by kallisti5

Backtrace of situation

KERN: PANIC: vnode 3:2110854 already exists (node = 0xff3dfe50, vnode->node = 0x00000000)!
KERN: Welcome to Kernel Debugging Land...
KERN: Thread 49860 "gcc" running on CPU 3
KERN: stack trace for thread 49860 "gcc"
KERN:     kernel stack: 0x823a5000 to 0x823a9000
KERN:       user stack: 0x71fc9000 to 0x72fc9000
KERN: frame               caller     <image>:function + offset
KERN:  0 823a8988 (+  32) 801368de   <kernel_x86> arch_debug_stack_trace + 0x12
KERN:  1 823a89a8 (+  16) 80094ea7   <kernel_x86> stack_trace_trampoline(NULL) + 0x0b
KERN:  2 823a89b8 (+  12) 80128e72   <kernel_x86> arch_debug_call_with_fault_handler + 0x1b
KERN:  3 823a89c4 (+  48) 8009694a   <kernel_x86> debug_call_with_fault_handler + 0x5e
KERN:  4 823a89f4 (+  64) 800950c7   <kernel_x86> kernel_debugger_loop(0x80176057 "PANIC: ", 0x801839e0 "vnode %ld:%lld already exists (node = %p, vnode->node = %p)!", 0x823a8aa0 "", int32: 3) + 0x21b
KERN:  5 823a8a34 (+  48) 8009542b   <kernel_x86> kernel_debugger_internal(0x80176057 "PANIC: ", 0x801839e0 "vnode %ld:%lld already exists (node = %p, vnode->node = %p)!", 0x823a8aa0 "", int32: 3) + 0x53
KERN:  6 823a8a64 (+  48) 80096cd6   <kernel_x86> panic + 0x36
KERN:  7 823a8a94 (+  80) 800e0166   <kernel_x86> new_vnode + 0x7e
KERN:  8 823a8ae4 (+  80) 81c12003   <bfs> InodeAllocator<0x823a8bcc>::New(block_run*: 0x823a8b98, uint32: 0x8180 (33152), block_run&: 0x823a8ba0, fs_vnode_ops*: NULL, Inode*: 0x823a8ba8) + 0x173
KERN:  9 823a8b34 (+ 176) 81c19325   <bfs> Inode<0x823a8c44>::Create(Transaction&: 0xe0df7218, Inode*: 0x823a8df4, 0x8180 "<???>", int32: 770, int32: 0, uint32: 0x823a8c3f, 0x823a8ca0, 0x823a8c40, Inode*: NULL, fs_vnode_ops*: NULL, uint32: 0x0 (0)) + 0x421
KERN: 10 823a8be4 (+ 112) 81c23326   <bfs> bfs_create(fs_volume*: 0x82b7e870, fs_vnode*: 0xe0dcabf0, 0x823a8df4 "ccWjQ9qR.ii", int32: 770, int32: 384, 0x823a8c9c, 0x823a8ca0) + 0x13a
KERN: 11 823a8c54 (+ 352) 800e32a8   <kernel_x86> create_vnode(vnode*: 0xe0dcabf0, 0x823a8df4 "ccWjQ9qR.ii", int32: 770, int32: 384, false) + 0x1b8
KERN: 12 823a8db4 (+ 320) 800e377e   <kernel_x86> file_create(int32: -1, 0xf43d4680 "/tmp", int32: 770, int32: 384, false) + 0x5a
KERN: 13 823a8ef4 (+  80) 800e98a0   <kernel_x86> _user_open + 0xc0
KERN: 14 823a8f44 (+ 100) 8012bb10   <kernel_x86> handle_syscall + 0xcd
KERN: user iframe at 0x823a8fa8 (end = 0x823a9000)
KERN:  eax 0x67          ebx 0xaa6530       ecx 0x72fc85e4  edx 0x60f9b114
KERN:  esi 0xa7a         edi 0x0            ebp 0x72fc8610  esp 0x823a8fdc
KERN:  eip 0x60f9b114 eflags 0x3202    user esp 0x72fc85e4
KERN:  vector: 0x63, error code: 0x0
KERN: 15 823a8fa8 (+   0) 60f9b114   <commpage> commpage_syscall + 0x04
KERN: 16 72fc8610 (+  80) 021be40f   <gcc> mkstemps + 0x207
KERN: 17 72fc8660 (+  64) 021bdf74   <gcc> make_temp_file + 0x1d4
KERN: 18 72fc86a0 (+ 128) 021b90dd   <gcc> do_spec (nearest) + 0xf8d
KERN: 19 72fc8720 (+  96) 021ba324   <gcc> do_spec (nearest) + 0x21d4
KERN: 20 72fc8780 (+ 128) 021b9a4c   <gcc> do_spec (nearest) + 0x18fc
KERN: 21 72fc8800 (+  96) 021ba324   <gcc> do_spec (nearest) + 0x21d4
KERN: 22 72fc8860 (+ 128) 021b9a4c   <gcc> do_spec (nearest) + 0x18fc
KERN: 23 72fc88e0 (+  96) 021ba324   <gcc> do_spec (nearest) + 0x21d4
KERN: 24 72fc8940 (+ 128) 021b9a4c   <gcc> do_spec (nearest) + 0x18fc
KERN: 25 72fc89c0 (+  96) 021ba324   <gcc> do_spec (nearest) + 0x21d4
KERN: 26 72fc8a20 (+ 128) 021b9a4c   <gcc> do_spec (nearest) + 0x18fc
KERN: 27 72fc8aa0 (+  48) 021b81ac   <gcc> do_spec + 0x5c
KERN: 28 72fc8ad0 (+ 352) 021bc1c4   <gcc> main + 0x17f4
KERN: 29 72fc8c30 (+  48) 021b31db   <gcc> _start + 0x5b
KERN: 30 72fc8c60 (+  48) 00f57d82   </boot/system/runtime_loader@0x00f48000> <unknown> + 0xfd82
KERN: 31 72fc8c90 (+   0) 60f9b250   <commpage> commpage_thread_exit + 0x00
KERN: kdebugcontinueKERN: bfs: inode at 2110854 could not be read: Bad data!
KERN: bfs: New:220: General system error
KERN: bfs: InitCheck:323: Bad data
KERN: bfs: inode at 2110854 is already deleted!
KERN: bfs: bfs_lookup:581: Operation not allowed
ad_exit + 0x00
KERN: kdebugcontinueKERN: bfs: inode at 2110854 could not be read: Bad data!
KERN: bfs: New:220: General system error
KERN: bfs: InitCheck:323: Bad data
KERN: bfs: inode at 2110854 is already deleted!
KERN: bfs: bfs_lookup:581: Operation not allowed
data sourcetext/html}}}

comment:3 Changed 2 years ago by axeld

Blocking: 13390 added

(In #13390) Duplicate of #9839. I hope to be able to look into that one in the next few days, though.

comment:4 Changed 2 years ago by axeld

Status: newin-progress

comment:5 Changed 2 years ago by axeld

Please try again with hrev51061.

comment:6 Changed 2 years ago by axeld

Resolution: fixed
Status: in-progressclosed
Note: See TracTickets for help on using tickets.