#2719 closed bug (fixed)
[bfs]: deadlock - mutex bfs inode+24.1243 not released on exit
Reported by: | emitrax | Owned by: | axeld |
---|---|---|---|
Priority: | high | Milestone: | R1 |
Component: | System/Kernel | Version: | R1/pre-alpha1 |
Keywords: | Cc: | imker@… | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
It seems weird since AutoLocker seem to be used everywhere, but this happens while writing with bonnie++ to the same directory I was creating some directories, and mkdir failed with name too long. Could it be possible it didn't release the lock?
kdebug> thread bonnie++ THREAD: 0x93197000 id: 4335 (0x10ef) name: "bonnie++" all_next: 0x9316f800 team_next: 0x00000000 q_next: 0x913c5000 priority: 10 (next 10) state: waiting next_state: waiting cpu: 0x00000000 sig_pending: 0x0 (blocked: 0x0) in_kernel: 1 waiting for: rwlock 0x90dfb400 fault_handler: 0x00000000 args: 0x90d945a0 0x00000000 entry: 0x8004b4d0 team: 0x90c3e45c, "bonnie++" exit.sem: 43239 exit.status: 0x0 (No error) exit.reason: 0x0 exit.signal: 0x0 exit.waiters: kernel_stack_area: 92166 kernel_stack_base: 0x920a6000 user_stack_area: 92168 user_stack_base: 0x7efef000 user_local_storage: 0x7ffef000 kernel_errno: 0x0 (No error) kernel_time: 16259689 user_time: 31183218 flags: 0x200 architecture dependant section: esp: 0x920a9cb8 ss: 0x00000010 fpu_state at 0x93197380 kdebug> mutex 0x90dfb400 mutex 0x90dfb400: name: bfs inode+24.1243 flags: 0xd8 holder: -1 waiting threads: 4335 kdebug> bt 4335 stack trace for thread 4335 "bonnie++" kernel stack: 0x920a6000 to 0x920aa000 user stack: 0x7efef000 to 0x7ffef000 frame caller <image>:function + offset 0 920a9d14 (+ 32) 800439ce <kernel>:context_switch__FP6threadT0 + 0x0026 1 920a9d34 (+ 64) 80043c38 <kernel>:scheduler_reschedule + 0x0248 2 920a9d74 (+ 48) 8003a104 <kernel>:rw_lock_wait__FP7rw_lockb + 0x00c4 3 920a9da4 (+ 64) 8003a666 <kernel>:rw_lock_write_lock + 0x00b6 4 920a9de4 (+ 64) 80594a7e <bfs>:WriteAt__5InodeR11TransactionxPCUcPUl + 0x010a 5 920a9e24 (+ 96) 805a1857 <bfs>:bfs_write__FP9fs_volumeP8fs_vnodePvxPCvPUl + 0x00d3 6 920a9e84 (+ 64) 8009370f <kernel>:file_write__FP15file_descriptorxPCvPUl + 0x0067 7 920a9ec4 (+ 80) 800833ad <kernel>:common_user_io__FixPvUlb + 0x017d 8 920a9f14 (+ 48) 800838a0 <kernel>:_user_write + 0x0028 9 920a9f44 (+ 100) 800c8852 <kernel>:pre_syscall_debug_done + 0x0002 (nearest) user iframe at 0x920a9fa8 (end = 0x920aa000) eax 0x82 ebx 0x2bbcdc ecx 0x7ffeeb90 edx 0xffff0104 esi 0x7ffeec3c edi 0x14ed ebp 0x7ffeebcc esp 0x920a9fdc eip 0xffff0104 eflags 0x203 user esp 0x7ffeeb90 vector: 0x63, error code: 0x0 10 920a9fa8 (+ 0) ffff0104 11 7ffeebcc (+ 48) 00203196 </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x3196 12 7ffeebfc (+ 128) 00206f89 </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x6f89 13 7ffeec7c (+ 768) 00206ac9 </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x6ac9 14 7ffeef7c (+ 48) 002028ff </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x28ff 15 7ffeefac (+ 48) 001008ea 92169:runtime_loader_seg0ro@0x00100000 + 0x8ea 16 7ffeefdc (+ 0) 7ffeefec 92168:bonnie++_main_stack@0x7efef000 + 0xffffec
Change History (10)
follow-up: 2 comment:1 by , 16 years ago
comment:2 by , 16 years ago
Replying to bonefish:
The info is unfortunately not very helpful. I don't see how the description fits with the summary. Furthermore you should use the "rwlock" command when printing info for an R/W lock.
I thought the inode was the one of the directory, since I was also writing to the same directory with another team, and jump to the wrong conclusion that failing without releasing the lock would be the cause.
This might be a deadlock with the page writer. ATM it iteratively acquires read locks for the underlying files, which, if the same file occurs twice and another thread tries to write-lock it in between, would indeed lead to a deadlock.
This seems more reasonable. Feel free to change the summary if you think this might be the case.
comment:3 by , 16 years ago
Milestone: | R1/alpha1 → R1 |
---|---|
Priority: | normal → high |
Since I've never seen this bug, it shouldn't hold up the alpha, I guess.
comment:4 by , 16 years ago
Cc: | added |
---|
comment:5 by , 16 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Since this one hasn't been seen again, and I couldn't find anything suspicious by proof-reading the code, I'm now closing this as invalid.
comment:6 by , 16 years ago
Component: | File Systems/BFS → System/Kernel |
---|---|
Resolution: | invalid |
Status: | closed → reopened |
That's a nice coincidence: after 11 months of silence, and after having closed this bug just yesterday, Michael ran into it just today.
Investigating...
comment:7 by , 16 years ago
Hi Axel. Just some criticism I hope you take constructively: You have been closing bugs based on the fact that no one else reported any more occurrences of it and, as this just showed, this is not a good idea. A bug should only be closed if you can pinpoint what fixed it.
comment:8 by , 16 years ago
Problem potentially fixed in hrev31809, at least the dead lock should be gone.
If that bug decides to show up in another 11 months, we can just reopen it again, I guess :-)
To bga: yes and no: first, I have closed bugs that did not distinct themselves from other, similar bugs, and did not contain any useful information. Then, I am closing bugs like this one (and will continue to do so) that happened once, but no one can reproduce this; there is no reason to keep those bugs open, it doesn't help anyone. But if a problem happens again, one can just reopen that ticket, no harm done.
comment:9 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:10 by , 16 years ago
I am not talking about duplicate bugs or bugs with no information whatsoever. I am talking about bugs (like this one) that you closed based solely on "no one reported it anymore". Bugs serve as documentation of problems that happened and that may happen again. That's why I think closing bugs based on the assumption that it fixed itself is wrong. But whatever.
And, BTW, keeping the bug around also does not do any harm and, even better, serves a purpose.
The info is unfortunately not very helpful. I don't see how the description fits with the summary. Furthermore you should use the "rwlock" command when printing info for an R/W lock.
This might be a deadlock with the page writer. ATM it iteratively acquires read locks for the underlying files, which, if the same file occurs twice and another thread tries to write-lock it in between, would indeed lead to a deadlock.