Opened 11 years ago

Closed 11 years ago

#1929 closed bug (fixed)

steal_pages() Livelock

Reported by: bonefish Owned by: axeld
Priority: critical Milestone: R1/alpha1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev24417, VMware

Cf. attached KDL session. While running a perl test that caused lots of CPU and some memory pressure, the system got into a (quasi) livelock situation: Two high-priority threads (PS/2 mouse watcher and input server event loop) are in steal_pages(), only one inactive page is left. The cache of the page is currently locked by a lower priority thread, hence steal_page() fails for the page.

The steal_pages() of the high priority threads gets the page via find_page_candidate(), but steal_page() fails. It does a sFreePageCondition.NotifyOne(), which wakes up the other high priority thread and waits on sFreePageCondition. Since both threads are high priority, the thread owning the lock is starved.

There are several factors contributing to this situation: ConditionVariable doesn't use a thread queue, but a stack, so NotifyOne() always wakes up the thread that last started waiting. Furthermore our scheduler is not fair enough. The lower priority thread shouldn't be starved.

Anyway, I don't quite see the reason for the NotifyOne() call. It is only done when steal_page() fails, in which case other threads won't be any luckier. A snooze() might be more appropriate.

Attachments (1)

kdl-session-steal-pages-livelock (13.2 KB) - added by bonefish 11 years ago.

Download all attachments as: .zip

Change History (2)

Changed 11 years ago by bonefish

comment:1 Changed 11 years ago by bonefish

Resolution: fixed
Status: newclosed

I replaced the NotifyOne() by a snooze(10000) in hrev24605. This should at least fix the problem, even if there might be better solutions.

Note: See TracTickets for help on using tickets.