Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#2719 closed bug (fixed)

[bfs]: deadlock - mutex bfs inode+24.1243 not released on exit

Reported by: emitrax Owned by: axeld
Priority: high Milestone: R1
Component: System/Kernel Version: R1/pre-alpha1
Keywords: Cc: imker@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

It seems weird since AutoLocker seem to be used everywhere, but this happens while writing with bonnie++ to the same directory I was creating some directories, and mkdir failed with name too long. Could it be possible it didn't release the lock?

kdebug> thread bonnie++
THREAD: 0x93197000
id:                 4335 (0x10ef)
name:               "bonnie++"
all_next:           0x9316f800
team_next:          0x00000000
q_next:             0x913c5000
priority:           10 (next 10)
state:              waiting
next_state:         waiting
cpu:                0x00000000 
sig_pending:        0x0 (blocked: 0x0)
in_kernel:          1
waiting for:        rwlock 0x90dfb400
fault_handler:      0x00000000
args:               0x90d945a0 0x00000000
entry:              0x8004b4d0
team:               0x90c3e45c, "bonnie++"
  exit.sem:         43239
  exit.status:      0x0 (No error)
  exit.reason:      0x0
  exit.signal:      0x0
  exit.waiters:
kernel_stack_area:  92166
kernel_stack_base:  0x920a6000
user_stack_area:    92168
user_stack_base:    0x7efef000
user_local_storage: 0x7ffef000
kernel_errno:       0x0 (No error)
kernel_time:        16259689
user_time:          31183218
flags:              0x200
architecture dependant section:
        esp: 0x920a9cb8
        ss: 0x00000010
        fpu_state at 0x93197380


kdebug> mutex 0x90dfb400
mutex 0x90dfb400:
  name:            bfs inode+24.1243
  flags:           0xd8
  holder:          -1
  waiting threads: 4335


kdebug> bt 4335
stack trace for thread 4335 "bonnie++"
    kernel stack: 0x920a6000 to 0x920aa000
      user stack: 0x7efef000 to 0x7ffef000
frame            caller     <image>:function + offset
 0 920a9d14 (+  32) 800439ce   <kernel>:context_switch__FP6threadT0 + 0x0026
 1 920a9d34 (+  64) 80043c38   <kernel>:scheduler_reschedule + 0x0248
 2 920a9d74 (+  48) 8003a104   <kernel>:rw_lock_wait__FP7rw_lockb + 0x00c4
 3 920a9da4 (+  64) 8003a666   <kernel>:rw_lock_write_lock + 0x00b6
 4 920a9de4 (+  64) 80594a7e   <bfs>:WriteAt__5InodeR11TransactionxPCUcPUl + 0x010a
 5 920a9e24 (+  96) 805a1857   <bfs>:bfs_write__FP9fs_volumeP8fs_vnodePvxPCvPUl + 0x00d3
 6 920a9e84 (+  64) 8009370f   <kernel>:file_write__FP15file_descriptorxPCvPUl + 0x0067
 7 920a9ec4 (+  80) 800833ad   <kernel>:common_user_io__FixPvUlb + 0x017d
 8 920a9f14 (+  48) 800838a0   <kernel>:_user_write + 0x0028
 9 920a9f44 (+ 100) 800c8852   <kernel>:pre_syscall_debug_done + 0x0002 (nearest)
user iframe at 0x920a9fa8 (end = 0x920aa000)
 eax 0x82           ebx 0x2bbcdc        ecx 0x7ffeeb90   edx 0xffff0104
 esi 0x7ffeec3c     edi 0x14ed          ebp 0x7ffeebcc   esp 0x920a9fdc
 eip 0xffff0104  eflags 0x203      user esp 0x7ffeeb90
 vector: 0x63, error code: 0x0
10 920a9fa8 (+   0) ffff0104
11 7ffeebcc (+  48) 00203196   </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x3196
12 7ffeebfc (+ 128) 00206f89   </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x6f89
13 7ffeec7c (+ 768) 00206ac9   </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x6ac9
14 7ffeef7c (+  48) 002028ff   </boot/beos/bin/bonnie++@0x00200000>:unknown + 0x28ff
15 7ffeefac (+  48) 001008ea   92169:runtime_loader_seg0ro@0x00100000 + 0x8ea
16 7ffeefdc (+   0) 7ffeefec   92168:bonnie++_main_stack@0x7efef000 + 0xffffec

Change History (10)

comment:1 Changed 11 years ago by bonefish

The info is unfortunately not very helpful. I don't see how the description fits with the summary. Furthermore you should use the "rwlock" command when printing info for an R/W lock.

This might be a deadlock with the page writer. ATM it iteratively acquires read locks for the underlying files, which, if the same file occurs twice and another thread tries to write-lock it in between, would indeed lead to a deadlock.

comment:2 in reply to:  1 Changed 11 years ago by emitrax

Replying to bonefish:

The info is unfortunately not very helpful. I don't see how the description fits with the summary. Furthermore you should use the "rwlock" command when printing info for an R/W lock.

I thought the inode was the one of the directory, since I was also writing to the same directory with another team, and jump to the wrong conclusion that failing without releasing the lock would be the cause.

This might be a deadlock with the page writer. ATM it iteratively acquires read locks for the underlying files, which, if the same file occurs twice and another thread tries to write-lock it in between, would indeed lead to a deadlock.

This seems more reasonable. Feel free to change the summary if you think this might be the case.

comment:3 Changed 10 years ago by axeld

Milestone: R1/alpha1R1
Priority: normalhigh

Since I've never seen this bug, it shouldn't hold up the alpha, I guess.

comment:4 Changed 10 years ago by siarzhuk

Cc: imker@… added

comment:5 Changed 10 years ago by axeld

Resolution: invalid
Status: newclosed

Since this one hasn't been seen again, and I couldn't find anything suspicious by proof-reading the code, I'm now closing this as invalid.

comment:6 Changed 10 years ago by axeld

Component: File Systems/BFSSystem/Kernel
Resolution: invalid
Status: closedreopened

That's a nice coincidence: after 11 months of silence, and after having closed this bug just yesterday, Michael ran into it just today.

Investigating...

comment:7 Changed 10 years ago by bga

Hi Axel. Just some criticism I hope you take constructively: You have been closing bugs based on the fact that no one else reported any more occurrences of it and, as this just showed, this is not a good idea. A bug should only be closed if you can pinpoint what fixed it.

comment:8 Changed 10 years ago by axeld

Problem potentially fixed in hrev31809, at least the dead lock should be gone.

If that bug decides to show up in another 11 months, we can just reopen it again, I guess :-)

To bga: yes and no: first, I have closed bugs that did not distinct themselves from other, similar bugs, and did not contain any useful information. Then, I am closing bugs like this one (and will continue to do so) that happened once, but no one can reproduce this; there is no reason to keep those bugs open, it doesn't help anyone. But if a problem happens again, one can just reopen that ticket, no harm done.

comment:9 Changed 10 years ago by axeld

Resolution: fixed
Status: reopenedclosed

comment:10 Changed 10 years ago by bga

I am not talking about duplicate bugs or bugs with no information whatsoever. I am talking about bugs (like this one) that you closed based solely on "no one reported it anymore". Bugs serve as documentation of problems that happened and that may happen again. That's why I think closing bugs based on the assumption that it fixed itself is wrong. But whatever.

And, BTW, keeping the bug around also does not do any harm and, even better, serves a purpose.

Note: See TracTickets for help on using tickets.