Opened 9 years ago

Closed 9 years ago

#11718 closed bug (fixed)

userlandfs -> Failed to acquire spinlock for a long time (last caller: 0x0, value: deadbeef)

Reported by: ttcoder Owned by: bonefish
Priority: normal Milestone: Unscheduled
Component: File Systems/UserlandFS Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

This is hrev46819

When ejecting a cdda, it seems Tracker is still keeping a reference to a no-longer-valid resource and its UpdateVolumeSpaceBar() calls BVolume::Capacity(), which in turns KDLs.

This is...

  • not continuable (the system returns to KDL after a few seconds of freeze)
  • reproducible easily it seems (insert/mount CD, eject, notice that the volume is still on desktop, invoke the /bin/mount -t userlandfs... line again even though the CD drive is now empty)
  • backtrace seems similar to #11344 (which also has 0x0 last caller, IIUC)

Attachments (2)

spinlock_KDL_from_tracker.jpg (398.0 KB ) - added by ttcoder 9 years ago.
userlandfs mount after ejecting (sans proper un-mounting) the previous userlandfs CD: Tracker triggers a spinlock failure
IMG_3713.JPG (739.7 KB ) - added by dsuden 9 years ago.
Spinlock KDL

Download all attachments as: .zip

Change History (16)

by ttcoder, 9 years ago

userlandfs mount after ejecting (sans proper un-mounting) the previous userlandfs CD: Tracker triggers a spinlock failure

comment:1 by ttcoder, 9 years ago

BTW if I use userlandfs normally (i.e. wait until the second CD is inserted before invoking mount) everything works as it should, I've had 100% stability so far in normal use. Thought it would still be interesting to file this ticket as it might hint at something more serious.

BTW2 I might take a stab at implementing ticket:8376#comment:6 soon and report on whether it improves the "icon stuck on desktop" issue or not.

EDIT: this same KDL just occured now, after invoking eject quickly after a CD rip.. a much more 'interesting' situation! But probably can be worked-around too, by waiting a few seconds after the rip ends, before ejecting. As to the icon-stuck-on-desktop, as noted in 8376 my attempt was a dead-end so leaving that aside.

Last edited 9 years ago by ttcoder (previous) (diff)

comment:2 by diver, 9 years ago

Component: - GeneralFile Systems/UserlandFS
Owner: changed from nobody to bonefish

comment:3 by dsuden, 9 years ago

I don't know if my KDL is enough different to warrant posting it, but here's another example, as it looked when it happened to me today.

Dane

by dsuden, 9 years ago

Attachment: IMG_3713.JPG added

Spinlock KDL

comment:4 by kallisti5, 9 years ago

Priority: normalhigh

comment:5 by axeld, 9 years ago

What are you using userland FS with? It's not a file system itself, and that information is missing from this bug report. Are you using a CDDA-FS that sits on top of the userland FS or is it just the combination of the two?

comment:6 by ttcoder, 9 years ago

mkdir -p /audiocd ; mount -t userlandfs -p cdda /dev/disk/atapi/whatever  /audiocd

ticket:9858#comment:36

Is it a realistic prospect to try and compile a custom build of Tracker that does not call TrashWatcher ..etc on userlandfs devices ? We've been thinking of doing that as a workaround if all else fails.. So I thought I'd humbly ask for advice from the savvy ones here, before I embark on that kind of hacking, in case the more experienced devs tell me it's a low return on invested time tactic...

[removed uncalled-for pre emptive incantation as it seems I'm welcome again here 8-]

Last edited 9 years ago by ttcoder (previous) (diff)

in reply to:  6 comment:7 by anevilyak, 9 years ago

Replying to ttcoder:

Is it a realistic prospect to try and compile a custom build of Tracker that does not call TrashWatcher ..etc on userlandfs devices ? We've been thinking of doing that as a workaround if all else fails.. So I thought I'd humbly ask for advice from the savvy ones here, before I embark on that kind of hacking, in case the more experienced devs tell me it's a low return on invested time tactic...

The problem is, Tracker is simply a victim of a deeper underlying problem here ; at the end of the day, since you're making use of cdda via various tools even without Tracker being involved, somebody is going to eventually wind up triggering this issue. Given that it happens in both the case of cdda directly in the kernel, as well as when cdda is mounted via userlandfs, this somewhat narrows down the possible candidates, but only to an extent.

The 0xdeadbeef in the panics here implies that someone is freeing kernel memory that's still in use/referenced somewhere. The main difference when cdda is invoked via userlandfs rather than via the kernel directly is that most of the meat of cdda runs in userland. However, userlandfs must still forward all of the kernel/VFS interactions back and forth (i.e. when the ripper requests to open a file, read a block, etc.). As such, there are two likely possibilities here. Either 1) cdda isn't doing some bookkeeping correctly when it interacts with the VFS, such as calling put_vnode() in a case where it shouldn't, or 2) the way cdda is interacting with the VFS is triggering a corner case/bug in the VFS itself. An outside edge case is that it could also be an issue with the ATAPI code, but that would suggest a similar problem could be triggered with data CDs, which to my knowledge has not been reported to be the case, so that one seems less likely.

The first case would probably be the easier one to try to investigate/eliminate since that would mainly require review of the cdda code on its own, whereas the second would obviously require reviewing the VFS and related code, which there is considerably more of, and is also significantly more complex.

Last edited 9 years ago by anevilyak (previous) (diff)

comment:8 by ttcoder, 9 years ago

Thank you for your professionalism and explaining the situation on this one, appreciated! I know it's critical to have a reproducible case for debugging that kind of issue, so maybe let's make a mental note that we do have one here.. Though I understand it's only so for people with a CD drive (not so commonplace these days) and it takes hours of spare time bashing one's head against the wall when the stars are aligned, and days if they are not :-/

Edit: note to self: should post to mention this is low priority, at least to us, since we're now reverting back to kernel-land after hrev48946, yay! :-)

Last edited 9 years ago by ttcoder (previous) (diff)

comment:9 by bonefish, 9 years ago

A quick info on the involved userlandfs classes:

  • Volume: Each mounted instance is associated with an instance of this class. Among other things it does a bit of book keeping so that a buggy client FS can't cause vnode or file descriptor reference imbalances and also has fallback implementations for vital FS hooks, so that the FS is well-behaved and can still be unmounted, even if the connection to the userlandfs server is lost (e.g. due to crashing or being killed).
  • FileSystem: Each client file system (type) that has a mounted volume is associated with an instance of this class. It manages the connection to the userlandfs server instance for that file system. FileSystem objects are reference counted. When the last volume of a client file system has been unmounted, the object is destroyed.
  • RequestPortPool: Manages a pool of (regular) ports that the userlandfs kernel module uses for communication with the userlandfs server instance. FileSystem has an aggregate RequestPortPool.

According to the stack strace the 0xdeadbeef is encountered when locking the FileSystem's RequestPortPool. That suggests that the FileSystem has been deleted already. Given that the FileSystem could be accessed via the Volume in the first place, the Volume hadn't been deleted at that point yet. Since a mutex is involved, it is possible that the Volume already has been deleted at the time of the panic, though.

So, the situation suggests either a reference counting/race condition issue in userlandfs or the VFS is calling the unmount() hook while the read_fs_info() hook is still being executed.

ATM I can't think of anything that the client FS (cddafs) can do to cause either problem. So I guess it's a genuine userlandfs or VFS bug.

comment:10 by waddlesplash, 9 years ago

With the cddafs KDLs now resolved in hrev48946, does this still need to be in B1 with "high" priority?

comment:11 by ttcoder, 9 years ago

Happy with it being any priority, even "low", as we're about to revert to kernel-land indeed :-) More info in a few days on that.. Dane will test for a few days to be 100% certain; if he finds something we might have to continue using userlandfs but I have a feeling we won't.

comment:12 by waddlesplash, 9 years ago

Milestone: R1/beta1Unscheduled
Priority: highnormal

comment:13 by mmlr, 9 years ago

Fixed in hrev48955. It was a logic reversal due to an oversight in hrev39870. In this particular case the FileSystem for CDDA was deleted while there were still Volumes.

comment:14 by mmlr, 9 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.