Opened 12 years ago

Closed 9 years ago

#3159 closed bug (fixed)

[kernel] PANIC: ASSERT FAILED: vnode->ref_count ==0 && vnode->busy

Reported by: diver Owned by: axeld
Priority: normal Milestone: R1
Component: File Systems/BFS Version: R1/Development
Keywords: Cc: arfonzo
Blocked By: Blocking: #8402, #8419
Platform: All


Happend when I tried to duplicate 200 folders to find out why it take so long to open BeBook folder. hrev28683. VirtualBox.

Attachments (3)

assert.png (74.2 KB ) - added by diver 12 years ago.
Haiku vfs.cpp ASSERT FAILED.png (371.4 KB ) - added by umccullough 9 years ago.
ASSERT FAILED while cp -a /boot/common .
IMG_1511.JPG (4.9 MB ) - added by SeanCollins 9 years ago.

Change History (16)

by diver, 12 years ago

Attachment: assert.png added

comment:1 by diver, 11 years ago

Never saw it again. Ingo?

in reply to:  1 comment:2 by bonefish, 11 years ago

Replying to diver:

Never saw it again. Ingo?

No idea. The stack trace looks like this happened when something went wrong while creating a directory in BFS. Maybe Axel has an idea. The ticket is assigned to him anyway.

comment:3 by anevilyak, 9 years ago

Blocking: 8402 added

(In #8402) Duplicate of #3159.

comment:4 by anevilyak, 9 years ago

Version: R1/pre-alpha1R1/Development

comment:5 by bonefish, 9 years ago

I still haven't got a better idea what goes wrong here. Of the two assert conditions the busy condition should hold, since remove_vnode() explicitly sets the vnode to busy before calling free_vnode(). That leaves the ref count. remove_vnode() only frees nodes that haven't been published before and only new_vnode() creates a vnode that is not published yet. The created vnode has ref count 1 and is busy. Due to the latter get_vnode() cannot be called (respectively would hang). So at least via this method the ref count cannot be increased, which is why remove_vnode() expects it to be 0 after decrementing.

There are a few other code paths that directly increment a vnode ref count, but skimming through those I don't really see any that would be executed in this situation. A quick look at the BFS code doesn't reveal any interesting stuff besides the new_vnode() and remove_vnode() calls, but I haven't checked what is called indirectly. Maybe Axel has an idea.

For debugging purposes I think it would be worthwhile to force an error in the BFS code and see whether it triggers the KDL. If so, it shouldn't be hard to track down the issue.

comment:6 by anevilyak, 9 years ago

Blocking: 8419 added

(In #8419) Duplicate of #3159.

comment:7 by arfonzo, 9 years ago

Cc: art@… added

comment:8 by umccullough, 9 years ago

I have recreated this issue a couple times with a "fresh" image running in virtual box.

I haven't yet gotten the precise steps down, but it involves executing a few commands from Terminal after booting an image I created for testing #8408 (gcc2 image which includes Development and Subversion packages).

Basically, this is what I did last to cause it:

~> checkfs -c /boot
~> svn co haikuporter
~> checkfs -c /boot
~> cp -a /boot/common .
<KDL after a few seconds>

I'll attach a screenshot of the failure and keep testing to see how often I can repro it.

by umccullough, 9 years ago

ASSERT FAILED while cp -a /boot/common .

comment:9 by SeanCollins, 9 years ago

If you have a b+tree error, this is fiarly easy to reproduce, not sure where the problem is, but if you load some apps, build up some pressure on the cache, and then invoke some disk i/o with file copying. It seems to pop up easily. In fact, just downloading the git repo, will often trigger this fialure. It seems like the mkdir command and expander and unzip really agitate this problem. Especially if you have a b+tree error in bfs. I frequently see this error during the unzip make image portion of building, but its gotten so bad lately on the 438xx series of nightlys, you can't even reliably build on haiku without some sort of kdl or app server crash, typically with a vfs.cpp assert failure.

by SeanCollins, 9 years ago

Attachment: IMG_1511.JPG added

comment:10 by diver, 9 years ago

Cc: arfonzo added; art@… removed

comment:11 by SeanCollins, 9 years ago

mkdir seems to be the trigger on nightlys newer then 43769, I think the issue occurs around 43810

comment:12 by axeld, 9 years ago

I don't have any idea what is causing the issue either. The only reason this could happen is that someone else opens the inode in-between - directly, via inode ID. This again cannot happen, as Ingo pointed out, since the node is marked busy, and get_vnode() will fail in that case.

In any case, Urias just confirmed that the ref_count is 1 for the test he is running. I didn't manage to reproduce the issue yet the way he described.

comment:13 by axeld, 9 years ago

Component: System/KernelFile Systems/BFS
Resolution: fixed
Status: newclosed

Fixed in hrev43930 -- the tree in the transaction held a reference to the inode as well, and it wasn't removed neither before calling remove_vnode() (causing the panic), nor deleting the inode (which would have caused a crash upon destruction of the transaction).

This would always happen if creating a directory failed late.

Note: See TracTickets for help on using tickets.