Opened 3 months ago
Closed 5 weeks ago
#19122 closed bug (fixed)
PANIC: rw_lock_destroy(): read-locked and caller doesn't hold the write lock
Reported by: | bbjimmy | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta6 |
Component: | File Systems/BFS | Version: | R1/beta5 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #8405, #19207 | |
Platform: | All |
Description
browsing https://arstechnica.com/ in WebPositive
Attachments (1)
Change History (14)
by , 3 months ago
Attachment: | IMG_20240923_113442545.jpg added |
---|
comment:1 by , 3 months ago
Blocking: | 8405 added |
---|---|
Component: | System/Kernel → File Systems/BFS |
Owner: | changed from | to
Priority: | high → normal |
This is probably the real cause of #8405.
comment:2 by , 3 months ago
Summary: | PANICK: rw_lock_destroy() → PANIC: rw_lock_destroy(): read-locked and caller doesn't hold the write lock |
---|
comment:3 by , 3 months ago
Actually, I think this is a VFS bug and not a BFS bug. The VFS should keep references to vnodes that are currently performing asynchronous I/O.
comment:4 by , 3 months ago
Milestone: | Unscheduled → R1/beta6 |
---|
comment:5 by , 2 months ago
It only seems to occur while using WebPositive. eIt is easy to re-produce, open Web+ then browse:
It takes abour 5 minutes.
comment:7 by , 8 weeks ago
Blocking: | 19207 added |
---|
comment:9 by , 5 weeks ago
I still can't manage to reproduce this. It's probably because my VMs run on top of an SSD and so are too fast to hit the bug.
When the bug occurs, some other thread will be reading or writing from the file. It would be good to check other threads that are waiting on I/O in KDL, one of them will be the thread that holds this lock, and that's the one that should also hold a reference but doesn't.
If someone manages to reproduce and can ping me on IRC, I can walk through a KDL session.
comment:10 by , 5 weeks ago
OK, with some hacks (a snooze in the I/O callback) I managed to reproduce this myself. The I/O callback in question is PageWriteTransfer, which is odd because it's supposed to have references to all the vnodes it's doing I/O from.
comment:11 by , 5 weeks ago
Ah, no actually it isn't, that was a different request. The real faulting request has no finished callback set.
comment:12 by , 5 weeks ago
Alright, I think I see the problem here.
The issue is that vfs_read/write_pages are synchronous, and just wait for the I/O requests to complete before returning. bfs_io, meanwhile, doesn't keep references to the inodes, and assumes its caller will do that. But as it happens, vfs_read is notified of completion before bfs_io is, because IORequest notifies the finished condition before invoking the callbacks, and so there's a small window in which the vnode can get un-referenced and deleted before BFS releases the read lock.
The solution here is probably just to move the finished condition notification after the callbacks are invoked.
screenshot of KDL