#6400 closed bug (fixed)
BLocker won't unlock from a different thread
Reported by: | darkwyrm | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | Kits/Support Kit | Version: | R1/alpha2 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #11655, #11860, #11984 | |
Platform: | All |
Description
Once again my Paladin development efforts have uncovered another system-level bug.
It seems that if a BLocker is locked from one thread and unlocked from another, the Unlock fails. I've written a quick test app which locks a BLocker from within the main window's constructor and unlocks it in QuitRequested. The BLocker's status is then tested in the App class' destructor. When built and run under Zeta, the lock is unlocked. Under Haiku, it is still locked. I am attaching the test program momentarily.
Attachments (2)
Change History (27)
by , 14 years ago
Attachment: | HaikuThreadBugTest.zip added |
---|
comment:1 by , 14 years ago
That's actually very bad practice generally, usually the thread acquiring a lock should also be the one that releases it when done, mix and matching is almost invariably asking for trouble. BLocker guards against that situation intentionally for that reason, c.f. http://dev.haiku-os.org/browser/haiku/trunk/src/kits/support/Locker.cpp#L152
comment:2 by , 14 years ago
Component: | System/Kernel → Kits/Support Kit |
---|
comment:3 by , 14 years ago
That having been said, http://www.haiku-os.org/legacy-docs/bebook/BLocker.html#BLocker_Lock seems to indicate that Be considered that allowed behavior so we may have to change that for compat reasons.
comment:4 by , 10 years ago
Completely ignoring the unlock without even a warning is not really helpful either. I'm changing that to a debugger() call so we can identify apps trying to rely on this. Then we can make a decision on allowing it or not, depending on the issues we find (how many apps are affected, can they be fixed, etc). We could also turn this into a warning message on the console or syslogs (like the "resizing mode and flags swapped" in BWindow).
In hrev48542 I added a debugger() call in that case. Let's see if we can trigger it with some apps and make a decision from that.
comment:6 by , 10 years ago
Homeworld_SDL crashing at start, maybe hrev48542 made this:
Active Threads: thread 2620: SDL thread 2625: SDL thread 2626: _BMediaRoster_ thread 2627: SDL Audio control thread 2628: SDL thread 2629: w>Untitled thread 2630: team 2619 debug task thread 2619: homeworld_sdl (main) state: Call (Trying to unlock from the wrong thread (#6400))
Whole crashlog here: http://vault.extrowerk.com/homeworld-sdl-haiku/issue/1/crash-at-startup
comment:7 by , 10 years ago
Yes, that's the debugger call that Adrien added in that commit. Does Homeworld use more than one thread? If not, then this is a bug in our SDL port (it may also be a be a bug in our SDL port anyway that we don't handle the application calling SDL from different threads...)
follow-up: 10 comment:8 by , 10 years ago
@waddlesplash: if you had read the linked bugreport on homeworld side, you would have noticed that Philippe Houdoin already tracked where the problem is. The issue is in Mesa (the legacy version).
It uses a BLock in an unusual way, which worked on BeOS but resulted on the lock being always held in Haiku (since before I adde dthis check, the Unlock call would simply have been ignored). In the Mesa code there is a mention of adding this specifically to get Quake 2 working. So now we have to find/build a Quake 2 port and make sure it doesn't need this anymore. Or, we can comment the debugger() call or convert it to an stderr warning.
comment:9 by , 10 years ago
Ah, OK.
I recall @kallisti5 saying he'd gotten a Quake (2?) port working just this past weekend I think, so we might want to contact him to see if that was GCC2 or GCC4. And what kind of patches it needs.
comment:10 by , 10 years ago
Replying to pulkomandy:
@waddlesplash: if you had read the linked bugreport on homeworld side, you would have noticed that Philippe Houdoin already tracked where the problem is. The issue is in Mesa (the legacy version).
Homeworld depend on mesa_x86_swrast. Is it the legacy version? (I thought, the gcc2 version the legacy, or am I wrong?)
comment:11 by , 10 years ago
Sorry, I'm wrong here. It is indeed in the gcc4 version, which means we don't need to care about BeOS compatibility. So the hack in Mesa needs to be removed (and Quake 2 fixed or ported again using an SDL version).
comment:12 by , 10 years ago
Blocking: | 11860 added |
---|
(In #11860) It's actually a problem with the use of BLocker in our mesa port, which uses a quirk that worked in BeOS and never worked in Haiku (previously it was ignored, now it's entering debugger, see #6400).
The crash is continuable: click the "debug" button, then quit debugger and click "resume". The app should then run (unless there are other bugs). Anything using SDL and OpenGL will hit this problem (some other games like f1spirit, for example).
comment:13 by , 10 years ago
Can we delete the problematic section from our Mesa port? If no, why not? It seems, that it makes more trouble as good.
http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/haiku/swrast/SoftwareRast.cpp#n180
// some stupid applications (Quake2) don't even think about calling LockGL() // before using glGetString and its glGet*() friends... // so make sure there is at least a valid context. if (!_mesa_get_current_context()) { LockGL(); // not needed, we don't have a looper yet: UnlockLooper(); }
comment:14 by , 10 years ago
I think removing this should be ok (it's a no-op anyway in Haiku) but I'd prefer if someone more knowledgeable about Mesa would have a look; and if we could make sure there is still a way to get the Quake 2 engine to run (not necessarily the original one, but a "modern" open source version).
comment:16 by , 10 years ago
Yes, all x86 OpenGL apps will have this problem in the current situation.
comment:17 by , 10 years ago
Blocking: | 11984 added |
---|
comment:18 by , 10 years ago
comment:19 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
We are now behaving the same as BeOS (with an extra warning message), which fixes the initial problem, but that makes some other apps (such as OpenGL) unhappy.
So yes, let's say this is fixed, OpenGL problems are tracked in #11860.
comment:20 by , 3 years ago
I just got this error running qemu on the latest hrev55760:
Unlocking BLocker with sem 1526838 from wrong thread 233694, current holder 233691 (see issue #6400).
I feel like the message might need updated :-)
comment:21 by , 2 months ago
Is this bug meant to be solved? I can produce it reliably on my Beta 5 x86_64 install, today I built a Haiku version of ioQuake3 and it spams the terminal with the message described here, then it closes its window and hangs.
In order to run the application the base Quake 3 (or OpenArena) files have to be put in "/boot/home/config/settings/ioquake3/baseq3"
Link to hpkg and recipe: https://www.dropbox.com/scl/fi/fcx46zauur8nt2w97wdhb/ioquake3-1.36-git-x86_64.zip?rlkey=7h6yhrdn5lxhzujxhsllgmo3n&st=uyw7rpt8&dl=0
by , 8 weeks ago
Attachment: | ioquake3_debuginfo-1.36~git-1-x86_64.hpkg added |
---|
Debug package for ioquake3, as I forgot to add it before
comment:22 by , 8 weeks ago
This bug is solved.
BeOS allows a BLocker to be locked in one thread and then unlocked in another. This makes the API too easy to use incorrectly, because anyone can call Unlock() and force the locker to be unlocked.
As much as we would like to fix this, it would break binary compatibility, and so we have settled on a warning message telling developers that they are likely misusing the API.
If you need something that can be passed from one thread to another, use a different locking primitive where this can be done explicitly (maybe a semaphore, for example). Otherwise, use a pthread mutex.
If you get this error, it does not mean that Haiku has a bug, but that your code has a bug and is not using BLocker in the expected way for Haiku (even if it may have worked on BeOS). In this case, "your" code may be Mesa/OpenGL, but that is still not in Haiku codebase.
comment:23 by , 8 weeks ago
That's fair and all but then it'd be best to remove any reference to this ticket in the error message, because people will inevitably post here if/when it happens.
comment:24 by , 8 weeks ago
They can read the history of the ticket to see the explanation. There is currently no better place to link to to explain this.
So if we want to change the link, we should first update https://www.haiku-os.org/docs/api/classBLocker.html to include a note about this (I would say in the documentation of the Unlock method, mentionning that unlike BeOS, unlocking should be done only by the thread holding the lock).
Test app