Opened 17 years ago
Closed 17 years ago
#1929 closed bug (fixed)
steal_pages() Livelock
Reported by: | bonefish | Owned by: | axeld |
---|---|---|---|
Priority: | critical | Milestone: | R1/alpha1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
hrev24417, VMware
Cf. attached KDL session. While running a perl test that caused lots of CPU and some memory pressure, the system got into a (quasi) livelock situation: Two high-priority threads (PS/2 mouse watcher and input server event loop) are in steal_pages(), only one inactive page is left. The cache of the page is currently locked by a lower priority thread, hence steal_page() fails for the page.
The steal_pages() of the high priority threads gets the page via find_page_candidate(), but steal_page() fails. It does a sFreePageCondition.NotifyOne(), which wakes up the other high priority thread and waits on sFreePageCondition. Since both threads are high priority, the thread owning the lock is starved.
There are several factors contributing to this situation: ConditionVariable doesn't use a thread queue, but a stack, so NotifyOne() always wakes up the thread that last started waiting. Furthermore our scheduler is not fair enough. The lower priority thread shouldn't be starved.
Anyway, I don't quite see the reason for the NotifyOne() call. It is only done when steal_page() fails, in which case other threads won't be any luckier. A snooze() might be more appropriate.
Attachments (1)
Change History (2)
by , 17 years ago
Attachment: | kdl-session-steal-pages-livelock added |
---|
comment:1 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I replaced the NotifyOne() by a snooze(10000) in hrev24605. This should at least fix the problem, even if there might be better solutions.