Opened 4 days ago

Last modified 45 hours ago

#18451 new bug

Assertion failure (mutex was not actually locked) in libroot hoard malloc

Reported by: waddlesplash Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: System/libroot.so Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

This triggers unreliably (1 in 5-10 or so) with certain applications, following the addition of those assertions (the new mutex code is not needed to trigger it.)

The stack trace after the assert is always just this:

		0x7f6d30f69b60	0xce64b42d98	BPrivate::processHeap::free(void*) + 0x138 
		0x7f6d30f69b90	0xce64b43f84	free + 0x44 
...

Change History (6)

comment:1 by waddlesplash, 3 days ago

Things I have tried so far:

  1. using pthread_mutex instead of our own built-in mutex, and changing its error checks to assert()s also. This just makes the assertion failure be always:
    		state: Call (mutex->owner == find_thread(NULL))
    
  1. Using the "unused" area as a flags field, and trying to catch any unlock()s when a flag is set. No results.
  1. Adding assertions to ~superblock() (but is this ever called?) No results.

comment:2 by waddlesplash, 3 days ago

The most reliable ways to trigger this problem are with QtWebEngine and GTKWebKit. With QtWebEngine, CanvasMark is a reliable reproducer; with GTKWebKit, Haiku's own forums seem to trigger it.

comment:3 by waddlesplash, 3 days ago

Oho, it's possible to trigger it using WebPositive and trying to access the Haiku forums also, not just GTKWebKit.

comment:4 by pulkomandy, 3 days ago

The full stacktrace would be useful. This may just be some kind of heap corruption.

in reply to:  4 comment:5 by korli, 3 days ago

Replying to pulkomandy:

The full stacktrace would be useful. This may just be some kind of heap corruption.

Example here.

	thread 3597: pthread func 
		state: Call (mutex was not actually locked!)

		Frame		IP			Function Name
		-----------------------------------------------
		00000000	0xce64ab8da7	_kern_debugger + 0x7 
			Disassembly:
				_kern_debugger:
				0x000000ce64ab8da0:   48c7c0e5000000  mov $0xe5, %rax
				0x000000ce64ab8da7:             0f05  syscall  <--

		0x7f6d30f69b60	0xce64b42d98	BPrivate::processHeap::free(void*) + 0x138 
		0x7f6d30f69b90	0xce64b43f84	free + 0x44 
		0x7f6d30f69bf0	0x14afeb6e206	_ZNSt6vectorISt4pairIt13scoped_refptrIN2cc4TaskEEESaIS5_EE17_M_realloc_insertIJS5_EEEvN9__gnu_cxx17(, ) + 0x116 
		0x7f6d30f69c60	0x14afeb703c9	cc::TaskGraphWorkQueue::GetNextTaskToRun(unsigned short) + 0x269 
		0x7f6d30f69d30	0x14b0059dd0b	content::CategorizedWorkerPool::RunTaskInCategoryWithLockAcquired(cc::TaskCategory) + 0x2b 
		0x7f6d30f69d80	0x14b0059dfea	content::CategorizedWorkerPool::Run(std::vector<cc::TaskCategory, std::allocator<cc::TaskCategory> > const&, base::ConditionVariable*) + 0x6a 
		0x7f6d30f69db0	0x14afe2e5b48	base::_GLOBAL__N_1::ThreadFunc(void*) + 0x48 
		0x7f6d30f69dd0	0xce64ac7105	pthread_thread_entry(void*, void*) + 0x15 
		00000000	0x7faeead5b258	commpage_thread_exit + 0 

comment:6 by pulkomandy, 45 hours ago

So we're looking at this lock, I guess:

https://cgit.haiku-os.org/haiku/tree/src/system/libroot/posix/malloc_hoard2/processheap.cpp#n203

Notice how the superblock is retrieved from the memory block being freed:

superblock *sb = b->getSuperblock();

This means a corrupt block (either because some data was overwritten, or because the software is trying to free memory that was not allocated by malloc) will result in a pointer not pointing to a superblock at all.

There is an assert to check if the superblock is valid, but the validation done is quite weak:

  • Two values (numBlocks and sizeClass) must be greater than zero (effectively checking only the sign bit)
  • numAvailable must be less than numBlocks

(block::isValid which is called earlier in the function is even worse, it is just a "return 1" and there isn't much more that can be checked)

HEAP_DEBUG is not set so we don't have the _magic field which could be used for a more reliable check. And the isValid function in the superblock does not even try to uses that anyways.

So, is it possible that we are simply looking at an invalid superblock, and the value being used is not at all a mutex, because we didn't actually get a pointer to a superblock?

Note: See TracTickets for help on using tickets.