Context Navigation

#9806 closed bug (fixed)

Opening FAT32 partition yields PANIC: double mutex lock

Reported by:	ttcoder	Owned by:	nobody
Priority:	normal	Milestone:	R1
Component:	File Systems/FAT	Version:	R1/Development
Keywords:		Cc:
Blocked By:		Blocking:
Platform:	All

Description

This is hrev45681

Steps:

boot to desktop
mount 'corecharlie' from mount menu (this is a FAT32 partition on the same HDD)
double-click its icon on desktop.

Result: I get dropped to KDL with said message. Reproduced 3 times in a row. Right-clicking the volume icon to drill down (instead of double-clicking it) does not crash. Gotta experiment more.

Attachments (1)

IMG_0344.JPG (1.6 MB ) - added by ttcoder 12 years ago.: dprintf() locking a mutex that's already (?) locked

Download all attachments as: .zip

Change History (10)

comment:1 by ttcoder, 12 years ago

I'm lending my camera to somebody so can't take a screenshot.. But from memory, the message and stack crawl look something like this:

PANIC: mutex_lock(): double lock at ... by thread ...

..
dprintf_args()
vm_page_fault()
..
strnlen()
vprintf..
dprintf_args( "lfn_entry with intervening.. erased entries" )
filesystem/fat
..
poseview

The fat32 add-on is obviously involved, [strike]but I find it interesting that the last chunk of code called before panicing is the vm (virtual memory?) one.. Is that significant? Wondering.[strike]

Update -- "mutex" command at KDL prompt:

mutex 0x801a0f04
name: debug output
flags: 0x0
holder: 405
waiting threads:

Last edited 12 years ago by ttcoder (previous) (diff)

comment:2 by anevilyak, 12 years ago

Component:	- General → File Systems/FAT

More details than that are realistically going to be needed. vm_page_fault() is invoked whenever a page fault occurs, so that could simply be a result of e.g. the fat32 module dereferencing a NULL pointer. At the very least the full stack trace / message are needed though, and ideally also the output of the KDL "mutex" command for the particular mutex indicated in the panic message.

A further question would be if this only occurs with that fat32 volume in particular, or with any random one you try.

comment:3 by ttcoder, 12 years ago

Gotta borrow a camera, possibly this week-end. I'll come back to this ticket then.

In the meantime I've posted a bit more stack trace (updated in the original comment for clarity). It seems things start to go south as the FAT addon wants to write to the syslog with dprintf(), which triggers a page fault, and then vm_page_fault() itself goes south in a dprintf() call of its own (!).

Also...

hrev44xxx panics exactly the same as this more recent partition here, so something is up with that FAT volume indeed, since I used to access it without trouble from both Haiku revs.

mounting an NTFS partition (the only other partition that is not BFS) works 100%, as well as reading from it. No other FAT partition to try out here.

I'm able to "continue" out of KDL, but then after a few seconds the system comes to a hard freeze (I needed to hold power button for 5 secs).

Will hopefully return with more..

by ttcoder, 12 years ago

Attachment:	IMG_0344.JPG added

dprintf() locking a mutex that's already (?) locked

follow-up: 5 comment:4 by bonefish, 12 years ago

The main issue is not so much the double mutex lock. The reason for that is the invocation of dprintf() with an invalid string pointer, so a page fault happens, and the page fault code tries to print some debug output (via dprintf()), causing the double lock. The bug is in the fat module's _next_dirent_(). At a quick glance the filename variable that is passed points to a buffer that hasn't been initialized yet -- that happens later in the function when an entry has been found -- so it doesn't make any sense to print it. There's a dprintf() a few lines earlier with the same issue. What may make some sense to print are the information passed to the DPRINTF() in line 81.

in reply to: 4 comment:5 by umccullough, 12 years ago

Replying to bonefish:

The bug is in the fat module's _next_dirent_(). At a quick glance the filename variable that is passed points to a buffer that hasn't been initialized yet -- that happens later in the function when an entry has been found -- so it doesn't make any sense to print it. There's a dprintf() a few lines earlier with the same issue. What may make some sense to print are the information passed to the DPRINTF() in line 81.

And it looks like there might be another case at the end of get_next_dirent() per CID 603015 that someone should probably clean up as well.

comment:6 by waddlesplash, 11 years ago

This was fixed by hrev47244, right?

comment:7 by ttcoder, 11 years ago

The ticket can probably be closed; I believe I still have that extra HDD setup in a USB roaming 'mount' somewhere; I'll plug it into my laptop and confirm the KDL is gone in a few days. First gotta fix/diagnose #10864 (unbootable haiku / bfs panic on my main development laptop)

comment:8 by axeld, 11 years ago

Resolution:	→ fixed
Status:	new → closed

Indeed, this one should be fixed now, thanks for the hint.

comment:9 by ttcoder, 11 years ago

Just a note to confirm it's fixed; the dprintf() to syslog occurs without incident now that no uninitialized variables are used:

KERN: lfn entry (4) with intervening erased entries
KERN: Last message repeated 60 times.

Thanks.

Note: See TracTickets for help on using tickets.

Download in other formats: