Ticket #1929 (closed bug: fixed)

Opened 2 months ago

Last modified 2 months ago

steal_pages() Livelock

Reported by: bonefish Assigned to: axeld
Priority: critical Milestone: R1/alpha1
Component: System/Kernel Version: R1 development
Cc: Platform: All

Description

r24417, VMware

Cf. attached KDL session. While running a perl test that caused lots of CPU and some memory pressure, the system got into a (quasi) livelock situation: Two high-priority threads (PS/2 mouse watcher and input server event loop) are in steal_pages(), only one inactive page is left. The cache of the page is currently locked by a lower priority thread, hence steal_page() fails for the page.

The steal_pages() of the high priority threads gets the page via find_page_candidate(), but steal_page() fails. It does a sFreePageCondition.NotifyOne(), which wakes up the other high priority thread and waits on sFreePageCondition. Since both threads are high priority, the thread owning the lock is starved.

There are several factors contributing to this situation: ConditionVariable doesn't use a thread queue, but a stack, so NotifyOne() always wakes up the thread that last started waiting. Furthermore our scheduler is not fair enough. The lower priority thread shouldn't be starved.

Anyway, I don't quite see the reason for the NotifyOne() call. It is only done when steal_page() fails, in which case other threads won't be any luckier. A snooze() might be more appropriate.

Attachments

kdl-session-steal-pages-livelock (13.2 kB) - added by bonefish on 03/16/08 21:31:35.

Change History

03/16/08 21:31:35 changed by bonefish

  • attachment kdl-session-steal-pages-livelock added.

03/27/08 00:44:10 changed by bonefish

  • status changed from new to closed.
  • resolution set to fixed.

I replaced the NotifyOne() by a snooze(10000) in r24605. This should at least fix the problem, even if there might be better solutions.