Opened 10 years ago
Closed 10 years ago
#11482 closed bug (fixed)
pthreads: possible race condition leading to deadlock
Reported by: | jessicah | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System/libroot.so | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
I'm working on upstreaming Haiku support for Boost, and am running into a reproducible deadlock for the Boost.Interprocess module.
My current work can be found at https://github.com/jessicah/boost
I think git clone --recursive https://github.com/jessicah/boost.git
should do the right thing. Else you'll also need to grab the build, config, predef, thread, filesystem, and interprocess submodules from my GitHub as well.
Steps to reproduce:
./bootstrap.sh ./b2 --without-mpi --enable-parallel-mark inlining=on threading=multi variant=debug link=static,shared runtime-link=shared --without-python -j<N> cd libs/interprocess ../../b2 --without-mpi --enable-parallel-mark inlining=on threading=multi variant=debug link=static,shared runtime-link=shared --without-python -j<N> -a -q test
Eventually, several tests will end up deadlocked, these are condition_test
, condition_any_test
, named_condition_test
, and named_condition_any_test
.
If I attach Debugger to any of these tests, I can break the deadlock by debugging all currently running threads, then resuming the test thread (this has the pthread_join
call in the stack trace), then resuming the other threads. If I instead resume the other threads first, the deadlock remains.
The named tests sometimes require repeating the process, but will eventually resume.
Attachments (2)
Change History (8)
by , 10 years ago
Attachment: | syslog.txt added |
---|
comment:1 by , 10 years ago
Mm, might not even be in pthread_join
, but the condition variables themselves. Hitting debug/run in Debugger for each pthread_func
thread releases them.
Attached some KDL output in case this might help. I can run further tests if required.
Also, shouldn't LD_PRELOAD=/system/lib/x86/libroot_debug.so <command to run> give me debug symbols in Debugger for functions like pthread_join, etc? Or do I need to rebuild Haiku with extra debugging options enabled?
comment:2 by , 10 years ago
Mh, github with submodules makes it hard to track the changes :(
libroot_debug does not come with debugging information. It's a version of libroot with extra debug checks (guarded memory allocator, etc).
To compile libroot with debug information, you need to add this to build/jam/UserBuildConfig:
SetConfigVar DEBUG : HAIKU_TOP src system libroot : 1 : global ;
Then recompile it. The library is output in generated/objects/haiku/x86_gcc2/debug_1/
(and is also made part of the built haiku image).
comment:3 by , 10 years ago
The only thing I can see is that all three threads are blocked, calling thread_block
. Since none of them use thread_block_with_timeout
, I don't really see how any of the threads could possibly hope to progress, which would explain the deadlock. That's about all the sense I can make out of the situation.
The same tests under Linux don't exhibit this behaviour, FYI. The tests there all succeed without issue.
by , 10 years ago
Attachment: | 0001-user-mutex-dequeue-waiters-when-waking-them-up.patch added |
---|
comment:4 by , 10 years ago
patch: | 0 → 1 |
---|
comment:5 by , 10 years ago
This seems to be a missed wake-up problem. In the current user_mutex
implementation, the waking thread wakes the waiting one, but leaves it on the queue. The woken-up thread eventually runs and dequeues itself. For condition variables, this means multiple signals can just keep on waking up the same thread, until that thread eventually runs and dequeues itself.
Attached is a patch to address the issue (dequeue at the waker side instead; a waiter only dequeues itself if it was interrupted or timed out). Fixes the Boost condvar tests, and the POSIX test suite tests still pass. I would appreciate if someone else could take a look and tell me if I'm doing something wrong here though :)
comment:6 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Applied by hamishm in hrev49149.
kernel debug output during deadlock