Opened 10 years ago
Closed 10 years ago
#11718 closed bug (fixed)
userlandfs -> Failed to acquire spinlock for a long time (last caller: 0x0, value: deadbeef)
Reported by: | ttcoder | Owned by: | bonefish |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | File Systems/UserlandFS | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
This is hrev46819
When ejecting a cdda, it seems Tracker is still keeping a reference to a no-longer-valid resource and its UpdateVolumeSpaceBar() calls BVolume::Capacity(), which in turns KDLs.
This is...
- not continuable (the system returns to KDL after a few seconds of freeze)
- reproducible easily it seems (insert/mount CD, eject, notice that the volume is still on desktop, invoke the
/bin/mount -t userlandfs...
line again even though the CD drive is now empty) - backtrace seems similar to #11344 (which also has 0x0 last caller, IIUC)
Attachments (2)
Change History (16)
by , 10 years ago
Attachment: | spinlock_KDL_from_tracker.jpg added |
---|
comment:1 by , 10 years ago
BTW if I use userlandfs normally (i.e. wait until the second CD is inserted before invoking mount
) everything works as it should, I've had 100% stability so far in normal use. Thought it would still be interesting to file this ticket as it might hint at something more serious.
BTW2 I might take a stab at implementing ticket:8376#comment:6 soon and report on whether it improves the "icon stuck on desktop" issue or not.
comment:2 by , 10 years ago
Component: | - General → File Systems/UserlandFS |
---|---|
Owner: | changed from | to
comment:3 by , 10 years ago
I don't know if my KDL is enough different to warrant posting it, but here's another example, as it looked when it happened to me today.
Dane
comment:4 by , 10 years ago
Priority: | normal → high |
---|
comment:5 by , 10 years ago
What are you using userland FS with? It's not a file system itself, and that information is missing from this bug report. Are you using a CDDA-FS that sits on top of the userland FS or is it just the combination of the two?
follow-up: 7 comment:6 by , 10 years ago
mkdir -p /audiocd ; mount -t userlandfs -p cdda /dev/disk/atapi/whatever /audiocd
Is it a realistic prospect to try and compile a custom build of Tracker that does not call TrashWatcher ..etc on userlandfs devices ? We've been thinking of doing that as a workaround if all else fails.. So I thought I'd humbly ask for advice from the savvy ones here, before I embark on that kind of hacking, in case the more experienced devs tell me it's a low return on invested time tactic...
[removed uncalled-for pre emptive incantation as it seems I'm welcome again here 8-]
comment:7 by , 10 years ago
Replying to ttcoder:
Is it a realistic prospect to try and compile a custom build of Tracker that does not call TrashWatcher ..etc on userlandfs devices ? We've been thinking of doing that as a workaround if all else fails.. So I thought I'd humbly ask for advice from the savvy ones here, before I embark on that kind of hacking, in case the more experienced devs tell me it's a low return on invested time tactic...
The problem is, Tracker is simply a victim of a deeper underlying problem here ; at the end of the day, since you're making use of cdda via various tools even without Tracker being involved, somebody is going to eventually wind up triggering this issue. Given that it happens in both the case of cdda directly in the kernel, as well as when cdda is mounted via userlandfs, this somewhat narrows down the possible candidates, but only to an extent.
The 0xdeadbeef in the panics here implies that someone is freeing kernel memory that's still in use/referenced somewhere. The main difference when cdda is invoked via userlandfs rather than via the kernel directly is that most of the meat of cdda runs in userland. However, userlandfs must still forward all of the kernel/VFS interactions back and forth (i.e. when the ripper requests to open a file, read a block, etc.). As such, there are two likely possibilities here. Either 1) cdda isn't doing some bookkeeping correctly when it interacts with the VFS, such as calling put_vnode() in a case where it shouldn't, or 2) the way cdda is interacting with the VFS is triggering a corner case/bug in the VFS itself. An outside edge case is that it could also be an issue with the ATAPI code, but that would suggest a similar problem could be triggered with data CDs, which to my knowledge has not been reported to be the case, so that one seems less likely.
The first case would probably be the easier one to try to investigate/eliminate since that would mainly require review of the cdda code on its own, whereas the second would obviously require reviewing the VFS and related code, which there is considerably more of, and is also significantly more complex.
comment:8 by , 10 years ago
Thank you for your professionalism and explaining the situation on this one, appreciated! I know it's critical to have a reproducible case for debugging that kind of issue, so maybe let's make a mental note that we do have one here.. Though I understand it's only so for people with a CD drive (not so commonplace these days) and it takes hours of spare time bashing one's head against the wall when the stars are aligned, and days if they are not :-/
Edit: note to self: should post to mention this is low priority, at least to us, since we're now reverting back to kernel-land after hrev48946, yay! :-)
comment:9 by , 10 years ago
A quick info on the involved userlandfs classes:
Volume
: Each mounted instance is associated with an instance of this class. Among other things it does a bit of book keeping so that a buggy client FS can't cause vnode or file descriptor reference imbalances and also has fallback implementations for vital FS hooks, so that the FS is well-behaved and can still be unmounted, even if the connection to the userlandfs server is lost (e.g. due to crashing or being killed).FileSystem
: Each client file system (type) that has a mounted volume is associated with an instance of this class. It manages the connection to the userlandfs server instance for that file system.FileSystem
objects are reference counted. When the last volume of a client file system has been unmounted, the object is destroyed.RequestPortPool
: Manages a pool of (regular) ports that the userlandfs kernel module uses for communication with the userlandfs server instance.FileSystem
has an aggregateRequestPortPool
.
According to the stack strace the 0xdeadbeef is encountered when locking the FileSystem
's RequestPortPool
. That suggests that the FileSystem
has been deleted already. Given that the FileSystem
could be accessed via the Volume
in the first place, the Volume
hadn't been deleted at that point yet. Since a mutex is involved, it is possible that the Volume
already has been deleted at the time of the panic, though.
So, the situation suggests either a reference counting/race condition issue in userlandfs or the VFS is calling the unmount()
hook while the read_fs_info()
hook is still being executed.
ATM I can't think of anything that the client FS (cddafs) can do to cause either problem. So I guess it's a genuine userlandfs or VFS bug.
comment:10 by , 10 years ago
With the cddafs KDLs now resolved in hrev48946, does this still need to be in B1 with "high" priority?
comment:11 by , 10 years ago
Happy with it being any priority, even "low", as we're about to revert to kernel-land indeed :-) More info in a few days on that.. Dane will test for a few days to be 100% certain; if he finds something we might have to continue using userlandfs but I have a feeling we won't.
comment:12 by , 10 years ago
Milestone: | R1/beta1 → Unscheduled |
---|---|
Priority: | high → normal |
comment:13 by , 10 years ago
comment:14 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
userlandfs mount after ejecting (sans proper un-mounting) the previous userlandfs CD: Tracker triggers a spinlock failure