#2710 closed bug (fixed)
Deadlock between clone_area(), Kernel Area Operation, I/O, and Page Fault
Reported by: | bonefish | Owned by: | bonefish |
---|---|---|---|
Priority: | high | Milestone: | R1/alpha1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | address_space detailed | Cc: | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
When cloning a kernel area into userland (as done e.g. by the message deliverer in the registrar) the following deadlock can occur:
- thread 1: clone_area() read-locks the kernel address space.
- thread 2: Some thread wants to create/delete a kernel area. It blocks trying to write-lock the kernel address space.
- thread 3 (I/O scheduler notifier): Some sub-I/O-request (e.g. from the block cache) goes through the I/O scheduler and is finished. The notifier thread calls the iteration callback, which creates more subrequests and tries to schedule them. lock_memory() is invoked, which blocks on the kernel address space R/W lock.
- thread 4 (team mate of thread 1): Page faults on a mapped file. The page fault handler read-locks the team's address space and tries to read in the page in question. Since the I/O scheduler notifier thread is blocked, this thread blocks too, waiting for the I/O request to finish.
- thread 1: clone_area() tries to write lock the team's address space and blocks, since thread 4 has it read-locked.
To sum it up:
- thread 1: blocks trying to write-lock a team's address space (read-locked by thread 4)
- thread 2: blocks trying to write-lock the kernel's address space (read-locked by thread 1)
- thread 3: blocks trying to read-lock the kernel's address space (waiting writer thread 2)
- thread 4: waits for I/O (to be finished by thread 3)
I've seen this while booting two times already (out of maybe 20 boots). It seems to happen more likely with my soon-to-be-committed optimization to pre-map pages of mapped files.
A solution would be to drop a team's address space lock while handling a page fault. There's already a TODO to that effect in vm_soft_fault(), though it mentions performance reasons only.
Attachments (3)
Change History (9)
by , 16 years ago
by , 16 years ago
by , 16 years ago
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Happened a few times in relation to crash #3289.
Also, an easy way to reproduce is open Firefox, click Restart in Deskbar. You will either get a message that Firefox crashed or one asking you to either kill Firefox or cancel shutdown. We need the latter case. Press Cancel shutdown. If firefox's window hasn't closed yet, close it. You will either get a message that Firefox crashed, or you will see Firefox's window disappear with Firefox still being listed in the Deskbar. We need the latter case. You may now run threads|grep rwlock and sys|ta 30 to see that a page fault happened and that Firefox is waiting on an address space rwlock.
comment:3 by , 16 years ago
If your deadlock happened when something crashed, it is not related to the ticket. Those deadlocks could happen due to the stack trace the kernel printed on userland crashes. I've disabled them in hrev30877.
comment:6 by , 7 years ago
Keywords: | address_space detailed added |
---|
I hope the photo quality isn't too bad.
Team Tracker deadlocked. vm_delete_area, vm_page_fault waiting for a rwlock.