Opened 16 years ago
Closed 13 years ago
#3159 closed bug (fixed)
[kernel] PANIC: ASSERT FAILED: vnode->ref_count ==0 && vnode->busy
Reported by: | diver | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | File Systems/BFS | Version: | R1/Development |
Keywords: | Cc: | arfonzo | |
Blocked By: | Blocking: | #8402, #8419 | |
Platform: | All |
Description
Happend when I tried to duplicate 200 folders to find out why it take so long to open BeBook folder. hrev28683. VirtualBox.
Attachments (3)
Change History (16)
by , 16 years ago
Attachment: | assert.png added |
---|
follow-up: 2 comment:1 by , 15 years ago
comment:2 by , 15 years ago
Replying to diver:
Never saw it again. Ingo?
No idea. The stack trace looks like this happened when something went wrong while creating a directory in BFS. Maybe Axel has an idea. The ticket is assigned to him anyway.
comment:4 by , 13 years ago
Version: | R1/pre-alpha1 → R1/Development |
---|
comment:5 by , 13 years ago
I still haven't got a better idea what goes wrong here. Of the two assert conditions the busy condition should hold, since remove_vnode()
explicitly sets the vnode to busy before calling free_vnode()
. That leaves the ref count. remove_vnode()
only frees nodes that haven't been published before and only new_vnode()
creates a vnode that is not published yet. The created vnode has ref count 1 and is busy. Due to the latter get_vnode()
cannot be called (respectively would hang). So at least via this method the ref count cannot be increased, which is why remove_vnode()
expects it to be 0 after decrementing.
There are a few other code paths that directly increment a vnode ref count, but skimming through those I don't really see any that would be executed in this situation. A quick look at the BFS code doesn't reveal any interesting stuff besides the new_vnode()
and remove_vnode()
calls, but I haven't checked what is called indirectly. Maybe Axel has an idea.
For debugging purposes I think it would be worthwhile to force an error in the BFS code and see whether it triggers the KDL. If so, it shouldn't be hard to track down the issue.
comment:7 by , 13 years ago
Cc: | added |
---|
comment:8 by , 13 years ago
I have recreated this issue a couple times with a "fresh" image running in virtual box.
I haven't yet gotten the precise steps down, but it involves executing a few commands from Terminal after booting an image I created for testing #8408 (gcc2 image which includes Development and Subversion packages).
Basically, this is what I did last to cause it:
~> checkfs -c /boot ~> svn co http://ports.haiku-files.org/svn/haikuporter/trunk haikuporter ~> checkfs -c /boot ~> cp -a /boot/common . <KDL after a few seconds>
I'll attach a screenshot of the failure and keep testing to see how often I can repro it.
by , 13 years ago
Attachment: | Haiku vfs.cpp ASSERT FAILED.png added |
---|
ASSERT FAILED while cp -a /boot/common .
comment:9 by , 13 years ago
If you have a b+tree error, this is fiarly easy to reproduce, not sure where the problem is, but if you load some apps, build up some pressure on the cache, and then invoke some disk i/o with file copying. It seems to pop up easily. In fact, just downloading the git repo, will often trigger this fialure. It seems like the mkdir command and expander and unzip really agitate this problem. Especially if you have a b+tree error in bfs. I frequently see this error during the unzip make image portion of building, but its gotten so bad lately on the 438xx series of nightlys, you can't even reliably build on haiku without some sort of kdl or app server crash, typically with a vfs.cpp assert failure.
by , 13 years ago
Attachment: | IMG_1511.JPG added |
---|
comment:10 by , 13 years ago
Cc: | added; removed |
---|
comment:11 by , 13 years ago
mkdir seems to be the trigger on nightlys newer then 43769, I think the issue occurs around 43810
comment:12 by , 13 years ago
I don't have any idea what is causing the issue either. The only reason this could happen is that someone else opens the inode in-between - directly, via inode ID. This again cannot happen, as Ingo pointed out, since the node is marked busy, and get_vnode() will fail in that case.
In any case, Urias just confirmed that the ref_count is 1 for the test he is running. I didn't manage to reproduce the issue yet the way he described.
comment:13 by , 13 years ago
Component: | System/Kernel → File Systems/BFS |
---|---|
Resolution: | → fixed |
Status: | new → closed |
Fixed in hrev43930 -- the tree in the transaction held a reference to the inode as well, and it wasn't removed neither before calling remove_vnode() (causing the panic), nor deleting the inode (which would have caused a crash upon destruction of the transaction).
This would always happen if creating a directory failed late.
Never saw it again. Ingo?