Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#15234 closed bug (fixed)

userland rw_lock races into a deadlock

Reported by: X512 Owned by: waddlesplash
Priority: normal Milestone: R1/beta2
Component: System Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

This is hrev53348 32 bit gcc2hybrid.

Tested on VirtualBox and real hardware.

When system freeze no KDL or debugger happens, display do not update but cursor still moves. Sound continue playing.

Maybe app_server locking problems caused by hrev53305.

Attachments (1)

syslog_hrev53346.txt (210.1 KB ) - added by humdinger 5 years ago.

Download all attachments as: .zip

Change History (6)

comment:1 by humdinger, 5 years ago

Seeing the same with hrev53346, 32bit. Attached is the syslog. The end shows me invoking KDL, but didn't know what to do, so just 'reboot'... :)

by humdinger, 5 years ago

Attachment: syslog_hrev53346.txt added

comment:2 by waddlesplash, 5 years ago

Component: Servers/app_serverSystem
Owner: changed from axeld to waddlesplash
Status: newassigned
Summary: Random system freezeuserland rw_lock races into a deadlock

comment:3 by waddlesplash, 5 years ago

So, the cause of the KDL in #15211 was indeed a race between the userland rw_lock's wait() calling _user_thread_block, and a separate thread calling _user_thread_unblock, and unblocking it on something else (in all recorded case in that ticket, the thread mutex) and then general mayhem ensuing.

In hrev53345~1, I fixed the KDLs by only having _user_unblock_thread actually unblock threads when they were blocked by an equivalent _user_block_thread, and otherwise just set the user_thread->wait_status. Before touching that, it of course acquires the thread's lock; and similarly in _user_block_thread, it acquires the thread's mutex before checking wait_status, and it returns immediately if wait_status has already been set; and otherwise it blocks.

As we only unlock the thread inside _user_block_thread after calling thread_prepare_to_block, this should (as I wrote in that commit) prevent the exact race seen here. But apparently it doesn't, and so there is somehow a way we are entering thread_block() (or, more particularly, thread_prepare_to_block()) with the wait_status already changed, and so we of course never get thread_unblocked.

I've gone over the code multiple times and as far as I can see, all bases should be covered by locking the thread.

comment:4 by waddlesplash, 5 years ago

Resolution: fixed
Status: assignedclosed

Fixed in hrev53353.

comment:5 by nielx, 4 years ago

Milestone: UnscheduledR1/beta2

Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone

Note: See TracTickets for help on using tickets.