Opened 11 years ago

Closed 10 years ago

Last modified 21 months ago

#2710 closed bug (fixed)

Deadlock between clone_area(), Kernel Area Operation, I/O, and Page Fault

Reported by: bonefish Owned by: bonefish
Priority: high Milestone: R1/alpha1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: address_space detailed Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev27377

When cloning a kernel area into userland (as done e.g. by the message deliverer in the registrar) the following deadlock can occur:

  • thread 1: clone_area() read-locks the kernel address space.
  • thread 2: Some thread wants to create/delete a kernel area. It blocks trying to write-lock the kernel address space.
  • thread 3 (I/O scheduler notifier): Some sub-I/O-request (e.g. from the block cache) goes through the I/O scheduler and is finished. The notifier thread calls the iteration callback, which creates more subrequests and tries to schedule them. lock_memory() is invoked, which blocks on the kernel address space R/W lock.
  • thread 4 (team mate of thread 1): Page faults on a mapped file. The page fault handler read-locks the team's address space and tries to read in the page in question. Since the I/O scheduler notifier thread is blocked, this thread blocks too, waiting for the I/O request to finish.
  • thread 1: clone_area() tries to write lock the team's address space and blocks, since thread 4 has it read-locked.

To sum it up:

  • thread 1: blocks trying to write-lock a team's address space (read-locked by thread 4)
  • thread 2: blocks trying to write-lock the kernel's address space (read-locked by thread 1)
  • thread 3: blocks trying to read-lock the kernel's address space (waiting writer thread 2)
  • thread 4: waits for I/O (to be finished by thread 3)

I've seen this while booting two times already (out of maybe 20 boots). It seems to happen more likely with my soon-to-be-committed optimization to pre-map pages of mapped files.

A solution would be to drop a team's address space lock while handling a page fault. There's already a TODO to that effect in vm_soft_fault(), though it mentions performance reasons only.

Attachments (3)

a.tar.gz (384.3 KB ) - added by Adek336 11 years ago.
b.tar.gz (192.3 KB ) - added by Adek336 11 years ago.
c.tar.gz (407.3 KB ) - added by Adek336 11 years ago.

Download all attachments as: .zip

Change History (9)

by Adek336, 11 years ago

Attachment: a.tar.gz added

by Adek336, 11 years ago

Attachment: b.tar.gz added

by Adek336, 11 years ago

Attachment: c.tar.gz added

comment:1 by Adek336, 11 years ago

I hope the photo quality isn't too bad.

Team Tracker deadlocked. vm_delete_area, vm_page_fault waiting for a rwlock.

comment:2 by Adek336, 11 years ago

Happened a few times in relation to crash #3289.

Also, an easy way to reproduce is open Firefox, click Restart in Deskbar. You will either get a message that Firefox crashed or one asking you to either kill Firefox or cancel shutdown. We need the latter case. Press Cancel shutdown. If firefox's window hasn't closed yet, close it. You will either get a message that Firefox crashed, or you will see Firefox's window disappear with Firefox still being listed in the Deskbar. We need the latter case. You may now run threads|grep rwlock and sys|ta 30 to see that a page fault happened and that Firefox is waiting on an address space rwlock.

comment:3 by bonefish, 10 years ago

If your deadlock happened when something crashed, it is not related to the ticket. Those deadlocks could happen due to the stack trace the kernel printed on userland crashes. I've disabled them in hrev30877.

comment:4 by bonefish, 10 years ago

Owner: changed from axeld to bonefish
Status: newassigned

Working on it.

comment:5 by bonefish, 10 years ago

Resolution: fixed
Status: assignedclosed

Fixed in hrev30911.

comment:6 by tqh, 21 months ago

Keywords: address_space detailed added
Note: See TracTickets for help on using tickets.