Attachments (1)
Change History (6)
comment:1 by , 5 years ago
by , 5 years ago
Attachment: | syslog_hrev53346.txt added |
---|
comment:2 by , 5 years ago
Component: | Servers/app_server → System |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Summary: | Random system freeze → userland rw_lock races into a deadlock |
comment:3 by , 5 years ago
So, the cause of the KDL in #15211 was indeed a race between the userland rw_lock's wait() calling _user_thread_block, and a separate thread calling _user_thread_unblock, and unblocking it on something else (in all recorded case in that ticket, the thread mutex) and then general mayhem ensuing.
In hrev53345~1, I fixed the KDLs by only having _user_unblock_thread actually unblock threads when they were blocked by an equivalent _user_block_thread, and otherwise just set the user_thread->wait_status
. Before touching that, it of course acquires the thread's lock; and similarly in _user_block_thread, it acquires the thread's mutex before checking wait_status, and it returns immediately if wait_status has already been set; and otherwise it blocks.
As we only unlock the thread inside _user_block_thread after calling thread_prepare_to_block
, this should (as I wrote in that commit) prevent the exact race seen here. But apparently it doesn't, and so there is somehow a way we are entering thread_block() (or, more particularly, thread_prepare_to_block()) with the wait_status already changed, and so we of course never get thread_unblocked.
I've gone over the code multiple times and as far as I can see, all bases should be covered by locking the thread.
comment:5 by , 5 years ago
Milestone: | Unscheduled → R1/beta2 |
---|
Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone
Seeing the same with hrev53346, 32bit. Attached is the syslog. The end shows me invoking KDL, but didn't know what to do, so just 'reboot'... :)