Opened 14 years ago

Closed 10 years ago

Last modified 3 months ago

#6400 closed bug (fixed)

BLocker won't unlock from a different thread

Reported by: darkwyrm Owned by: axeld
Priority: normal Milestone: R1
Component: Kits/Support Kit Version: R1/alpha2
Keywords: Cc:
Blocked By: Blocking: #11655, #11860, #11984
Platform: All

Description

Once again my Paladin development efforts have uncovered another system-level bug.

It seems that if a BLocker is locked from one thread and unlocked from another, the Unlock fails. I've written a quick test app which locks a BLocker from within the main window's constructor and unlocks it in QuitRequested. The BLocker's status is then tested in the App class' destructor. When built and run under Zeta, the lock is unlocked. Under Haiku, it is still locked. I am attaching the test program momentarily.

Attachments (2)

HaikuThreadBugTest.zip (4.0 KB ) - added by darkwyrm 14 years ago.
Test app
ioquake3_debuginfo-1.36~git-1-x86_64.hpkg (1.6 KB ) - added by MichaelPeppers 3 months ago.
Debug package for ioquake3, as I forgot to add it before

Download all attachments as: .zip

Change History (27)

by darkwyrm, 14 years ago

Attachment: HaikuThreadBugTest.zip added

Test app

comment:1 by anevilyak, 14 years ago

That's actually very bad practice generally, usually the thread acquiring a lock should also be the one that releases it when done, mix and matching is almost invariably asking for trouble. BLocker guards against that situation intentionally for that reason, c.f. http://dev.haiku-os.org/browser/haiku/trunk/src/kits/support/Locker.cpp#L152

comment:2 by anevilyak, 14 years ago

Component: System/KernelKits/Support Kit

comment:3 by anevilyak, 14 years ago

That having been said, http://www.haiku-os.org/legacy-docs/bebook/BLocker.html#BLocker_Lock seems to indicate that Be considered that allowed behavior so we may have to change that for compat reasons.

comment:4 by pulkomandy, 10 years ago

Completely ignoring the unlock without even a warning is not really helpful either. I'm changing that to a debugger() call so we can identify apps trying to rely on this. Then we can make a decision on allowing it or not, depending on the issues we find (how many apps are affected, can they be fixed, etc). We could also turn this into a warning message on the console or syslogs (like the "resizing mode and flags swapped" in BWindow).

In hrev48542 I added a debugger() call in that case. Let's see if we can trigger it with some apps and make a decision from that.

comment:5 by anevilyak, 10 years ago

Blocking: 11655 added

(In #11655) Duplicate of #6400.

comment:6 by miqlas, 10 years ago

Homeworld_SDL crashing at start, maybe hrev48542 made this:

Active Threads:
	thread 2620: SDL 
	thread 2625: SDL 
	thread 2626: _BMediaRoster_ 
	thread 2627: SDL Audio control 
	thread 2628: SDL 
	thread 2629: w>Untitled 
	thread 2630: team 2619 debug task 
	thread 2619: homeworld_sdl (main)
		state: Call (Trying to unlock from the wrong thread (#6400))

Whole crashlog here: http://vault.extrowerk.com/homeworld-sdl-haiku/issue/1/crash-at-startup

comment:7 by waddlesplash, 10 years ago

Yes, that's the debugger call that Adrien added in that commit. Does Homeworld use more than one thread? If not, then this is a bug in our SDL port (it may also be a be a bug in our SDL port anyway that we don't handle the application calling SDL from different threads...)

comment:8 by pulkomandy, 10 years ago

@waddlesplash: if you had read the linked bugreport on homeworld side, you would have noticed that Philippe Houdoin already tracked where the problem is. The issue is in Mesa (the legacy version).

It uses a BLock in an unusual way, which worked on BeOS but resulted on the lock being always held in Haiku (since before I adde dthis check, the Unlock call would simply have been ignored). In the Mesa code there is a mention of adding this specifically to get Quake 2 working. So now we have to find/build a Quake 2 port and make sure it doesn't need this anymore. Or, we can comment the debugger() call or convert it to an stderr warning.

comment:9 by waddlesplash, 10 years ago

Ah, OK.

I recall @kallisti5 saying he'd gotten a Quake (2?) port working just this past weekend I think, so we might want to contact him to see if that was GCC2 or GCC4. And what kind of patches it needs.

in reply to:  8 comment:10 by miqlas, 10 years ago

Replying to pulkomandy:

@waddlesplash: if you had read the linked bugreport on homeworld side, you would have noticed that Philippe Houdoin already tracked where the problem is. The issue is in Mesa (the legacy version).

Homeworld depend on mesa_x86_swrast. Is it the legacy version? (I thought, the gcc2 version the legacy, or am I wrong?)

comment:11 by pulkomandy, 10 years ago

Sorry, I'm wrong here. It is indeed in the gcc4 version, which means we don't need to care about BeOS compatibility. So the hack in Mesa needs to be removed (and Quake 2 fixed or ported again using an SDL version).

comment:12 by pulkomandy, 10 years ago

Blocking: 11860 added

(In #11860) It's actually a problem with the use of BLocker in our mesa port, which uses a quirk that worked in BeOS and never worked in Haiku (previously it was ignored, now it's entering debugger, see #6400).

The crash is continuable: click the "debug" button, then quit debugger and click "resume". The app should then run (unless there are other bugs). Anything using SDL and OpenGL will hit this problem (some other games like f1spirit, for example).

comment:13 by miqlas, 10 years ago

Can we delete the problematic section from our Mesa port? If no, why not? It seems, that it makes more trouble as good.

http://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/haiku/swrast/SoftwareRast.cpp#n180

	// some stupid applications (Quake2) don't even think about calling LockGL()
	// before using glGetString and its glGet*() friends...
	// so make sure there is at least a valid context.

	if (!_mesa_get_current_context()) {
		LockGL();
		// not needed, we don't have a looper yet: UnlockLooper();
	}

comment:14 by pulkomandy, 10 years ago

I think removing this should be ok (it's a no-op anyway in Haiku) but I'd prefer if someone more knowledgeable about Mesa would have a look; and if we could make sure there is still a way to get the Quake 2 engine to run (not necessarily the original one, but a "modern" open source version).

comment:15 by diver, 10 years ago

GLTeapot seems to be affected by it too.

comment:16 by pulkomandy, 10 years ago

Yes, all x86 OpenGL apps will have this problem in the current situation.

comment:17 by luroh, 10 years ago

Blocking: 11984 added

comment:19 by pulkomandy, 10 years ago

Resolution: fixed
Status: newclosed

We are now behaving the same as BeOS (with an extra warning message), which fixes the initial problem, but that makes some other apps (such as OpenGL) unhappy.

So yes, let's say this is fixed, OpenGL problems are tracked in #11860.

comment:20 by kallisti5, 3 years ago

I just got this error running qemu on the latest hrev55760:

Unlocking BLocker with sem 1526838 from wrong thread 233694, current holder 233691 (see issue #6400).

I feel like the message might need updated :-)

Last edited 3 years ago by kallisti5 (previous) (diff)

comment:21 by MichaelPeppers, 3 months ago

Is this bug meant to be solved? I can produce it reliably on my Beta 5 x86_64 install, today I built a Haiku version of ioQuake3 and it spams the terminal with the message described here, referencing this ticket number, then it closes its window and hangs.

In order to run the application (and see the error) the base Quake 3 (or OpenArena I think) files have to be put in "/boot/home/config/settings/ioquake3/baseq3"

Link to hpkg and recipe in case somebody wants to test it: https://www.dropbox.com/scl/fi/fcx46zauur8nt2w97wdhb/ioquake3-1.36-git-x86_64.zip?rlkey=7h6yhrdn5lxhzujxhsllgmo3n&st=uyw7rpt8&dl=0

Repro steps:

  1. Install the package
  2. Run "/boot/system/bin/ioquake3/ioquake3" in a Terminal
  3. Profit
Last edited 3 months ago by MichaelPeppers (previous) (diff)

by MichaelPeppers, 3 months ago

Debug package for ioquake3, as I forgot to add it before

comment:22 by pulkomandy, 3 months ago

This bug is solved.

BeOS allows a BLocker to be locked in one thread and then unlocked in another. This makes the API too easy to use incorrectly, because anyone can call Unlock() and force the locker to be unlocked.

As much as we would like to fix this, it would break binary compatibility, and so we have settled on a warning message telling developers that they are likely misusing the API.

If you need something that can be passed from one thread to another, use a different locking primitive where this can be done explicitly (maybe a semaphore, for example). Otherwise, use a pthread mutex.

If you get this error, it does not mean that Haiku has a bug, but that your code has a bug and is not using BLocker in the expected way for Haiku (even if it may have worked on BeOS). In this case, "your" code may be Mesa/OpenGL, but that is still not in Haiku codebase.

comment:23 by MichaelPeppers, 3 months ago

That's fair and all but then it'd be best to remove any reference to this ticket in the error message, because people will inevitably post here if/when it happens.

comment:24 by pulkomandy, 3 months ago

They can read the history of the ticket to see the explanation. There is currently no better place to link to to explain this.

So if we want to change the link, we should first update https://www.haiku-os.org/docs/api/classBLocker.html to include a note about this (I would say in the documentation of the Unlock method, mentionning that unlike BeOS, unlocking should be done only by the thread holding the lock).

comment:25 by X512, 3 months ago

This bug will be fixed after SDL will move to EGL.

Note: See TracTickets for help on using tickets.