Opened 16 years ago

Last modified 4 years ago

#2923 new bug

install-haiku target hangs when trying to install to /boot

Reported by: anevilyak Owned by: bonefish
Priority: normal Milestone: R1.1
Component: System/Kernel Version: R1/Development
Keywords: Cc: imker@…
Blocked By: Blocking:
Platform: All

Description

Just to see if it would work, I tried jam -q install-haiku from within Haiku. It got stuck while doing an ln however, in the following way:

 bt 34760
stack trace for thread 34760 "ln"
    kernel stack: 0x98735000 to 0x98739000
      user stack: 0x7efef000 to 0x7ffef000
frame               caller     <image>:function + offset
 0 98738664 (+  48) 800574d8   <kernel_x86> context_switch(thread*: 0x90136800, thread*: 0x819de000) + 0x003c
 1 98738694 (+  64) 800577bf   <kernel_x86> simple_reschedule() + 0x029f
 2 987386d4 (+  48) 8003cf61   <kernel_x86>:_mutex_lock + 0x0149
 3 98738704 (+  48) 8003c0d0   <kernel_x86>:recursive_lock_lock + 0x0068
 4 98738734 (+  48) 8006fef3   <kernel_x86> driver_removed(0x819bed60 "/boot/beos/system/add-ons/kernel/drivers/dev/bus/usb_raw") + 0x0037
 5 98738764 (+ 224) 80070cd1   <kernel_x86> DirectoryWatcher<0x8011f7ac>::EventOccured(NotificationService&: 0x8011f8a0, BPrivate8KMessage*: 0x987388d0, int16: 12899, int32: -1737258800, int8: 3, 0x98738e04, int32: -2146913081, 0x987388d0, 0xa4, 0x98738d34, 0x811ce040, 0x81171bec, 0x0, 0x98738d34) + 0x0439
 6 98738844 (+  64) 8008b344   <kernel_x86> NodeMonitorService<0x8011f8a0>::_SendNotificationMessage(BPrivate8KMessageP32interest&: 0x987388d0, 0x98738cfc, 0x807e561c00000001, 0x3, 0x9a901550, 0x246, 0x81171b0c, 0x98738980, 0x400044, 0x0, 0x400063, 0x0, 0x3, 0x2, 0x1, 0x6b4d7347, 0x1c, 0x0, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x987388fc, 0x400, 0x0, 0xa4, 0x6b4d7347, 0xc8, 0x4e444d4e, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff) + 0x0054
 7 98738884 (+1200) 8008b5db   <kernel_x86> NodeMonitorService<0x8011f8a0>::NotifyEntryCreatedOrRemoved(int32: 2, int32: 3, int64: 4194403, 0x98738e04 "usb_raw", int64: 4194372) + 0x0217
 8 98738d34 (+  64) 8008c49c   <kernel_x86>:notify_entry_removed + 0x002c
 9 98738d74 (+  96) 807de9b7   <bfs> bfs_unlink(fs_volume*: 0x81171ad4, fs_vnode*: 0x9a9036e8, 0x98738e04 "usb_raw") + 0x0153
10 98738dd4 (+ 304) 80099ba9   <kernel_x86> common_unlink(int32: -1, 0x819be100 "/Haiku", false) + 0x0065
11 98738f04 (+  64) 8009eb73   <kernel_x86>:_user_unlink + 0x0093
12 98738f44 (+ 100) 800cf621   <kernel_x86>:handle_syscall + 0x00be
user iframe at 0x98738fa8 (end = 0x98739000)
 eax 0x6e           ebx 0x2b971c        ecx 0x7ffeedf0   edx 0xffff0104
 esi 0x7ffef538     edi 0x7ffef5da      ebp 0x7ffeee1c   esp 0x98738fdc
 eip 0xffff0104  eflags 0x207      user esp 0x7ffeedf0
 vector: 0x63, error code: 0x0
13 98738fa8 (+   0) ffff0104   <commpage>:commpage_syscall + 0x0004
14 7ffeee1c (+ 208) 0020201e   <_APP_>:_term_after (nearest) + 0x04a2
15 7ffeeeec (+ 144) 0020286a   <_APP_>:main + 0x04e2
16 7ffeef7c (+  48) 00201a9b   <_APP_>:_start + 0x005b
17 7ffeefac (+  48) 001008ea   </boot/beos/system/runtime_loader@0x00100000>:unknown + 0x08ea
18 7ffeefdc (+   0) 7ffeefec   1127921:ln_main_stack@0x7efef000 + 0xffffec

id:          34760 (0x87c8)
name:        'ln'
args:        '/bin/ln -sfn ../../bin/usb_raw /Haiku/beos/system/add-ons/kerne'
next:        0x00000000
parent:      0x812f9990 (id = 34724)
children:    0x00000000
num_threads: 1
state:       0
flags:       0x1
io_context:  0x81320fc0
address_space: 0x9a8f5c30
main_thread: 0x90136800
thread_list: 0x90136800
group_id:    34494
session_id:  256

The mutex in question was:

kdebug> call 34760 2 -1
thread 34760, ln
987386d4 8003cf61   <kernel_x86>:_mutex_lock(0x8011f7c8)

kdebug> mutex 0x8011f7c8
mutex 0x8011f7c8:
  name:            legacy driver
  flags:           0x0
  count:           -3
  waiting threads: 34760 34772

Thread 34772:

THREAD: 0x90137000
id:                 34772 (0x87d4)
name:               "Terminal"
all_next:           0x90116800
team_next:          0x00000000
q_next:             0x80115b20
priority:           10 (next 10, I/O: -1)
state:              waiting
next_state:         waiting
cpu:                0x00000000
sig_pending:        0x0 (blocked: 0x0)
in_kernel:          1
waiting for:        mutex 0x8011f7c8
fault_handler:      0x00000000
args:               0x812fc348 0x00000000
entry:              0x8004cedc
team:               0x9b85c330, "Terminal"
  exit.sem:         1009244
  exit.status:      0x0 (No error)
  exit.reason:      0x0
  exit.signal:      0x0
  exit.waiters:
kernel_stack_area:  1127968
kernel_stack_base:  0x98749000
user_stack_area:    1127969
user_stack_base:    0x7efef000
user_local_storage: 0x7ffef000
kernel_errno:       0x0 (No error)
kernel_time:        6112
user_time:          12384
flags:              0x0
architecture dependant section:
        esp: 0x9874cbf8
        ss: 0x00000010
        fpu_state at 0x901373d0

stack trace for thread 34772 "Terminal"
    kernel stack: 0x98749000 to 0x9874d000
      user stack: 0x7efef000 to 0x7ffef000
frame               caller     <image>:function + offset
 0 9874cc54 (+  48) 800574d8   <kernel_x86> context_switch(thread*: 0x90137000, thread*: 0x80115b20) + 0x003c
 1 9874cc84 (+  64) 800577bf   <kernel_x86> simple_reschedule() + 0x029f
 2 9874ccc4 (+  48) 8003cf61   <kernel_x86>:_mutex_lock + 0x0149
 3 9874ccf4 (+  48) 8003c0d0   <kernel_x86>:recursive_lock_lock + 0x0068
 4 9874cd24 (+  48) 800715ed   <kernel_x86> LegacyDevice<0x811d4540>::InitDevice(0x0) + 0x0029
 5 9874cd54 (+ 320) 8006ae06   <kernel_x86> devfs_open(fs_volume*: 0x81171284, fs_vnode*: 0x812ab3b8, int32: 4098, 0x9874cec0) + 0x0092
 6 9874ce94 (+  48) 8009808c   <kernel_x86> open_vnode(vnode*: 0x812ab3b8, int32: 4098, false) + 0x0024
 7 9874cec4 (+  64) 800987ca   <kernel_x86> file_open(int32: -1, 0x819bd080 "/dev", int32: 4098, false) + 0x0062
 8 9874cf04 (+  64) 8009dff1   <kernel_x86>:_user_open + 0x00a1
 9 9874cf44 (+ 100) 800cf621   <kernel_x86>:handle_syscall + 0x00be
user iframe at 0x9874cfa8 (end = 0x9874d000)
 eax 0x60           ebx 0x84671c        ecx 0x7ffee600   edx 0xffff0104
 esi 0x18021fb0     edi 0x1             ebp 0x7ffee62c   esp 0x9874cfdc
 eip 0xffff0104  eflags 0x246      user esp 0x7ffee600
 vector: 0x63, error code: 0x0
10 9874cfa8 (+   0) ffff0104   <commpage>:commpage_syscall + 0x0004
11 7ffee62c (+  48) 00817e82   <libroot.so>:posix_openpt + 0x0026
12 7ffee65c (+ 528) 0021e854   <_APP_> Shell<0x1801fc80>::_Spawn(int32: 25, int32: 80, 0x231182 "UTF8", int32: 2, 0x1801a298 "@ã") + 0x005c
13 7ffee86c (+  64) 0021e40c   <_APP_> Shell<0x1801fc80>::Open(int32: 25, int32: 80, 0x231182 "UTF8", int32: 2, 0x1801a298 "@ã") + 0x0048
14 7ffee8ac (+ 144) 0022470e   <_APP_> TermView<0x18021fb0>::_InitObject(int32: 2, 0x1801a298 "@ã") + 0x056e
15 7ffee93c (+  64) 00223ddb   <_APP_>:__8TermViewiilPPCcl + 0x0167
16 7ffee97c (+  64) 002309e1   <_APP_>:__14CustomTermViewlllPPCcl + 0x0035
17 7ffee9bc (+ 240) 0022fb51   <_APP_> TermWindow<0x180221a8>::_AddTab(Arguments*: 0x1801c2b0) + 0x00ad
18 7ffeeaac (+  80) 0022c339   <_APP_>:__10TermWindowG5BRectPCcP9Arguments + 0x0165
19 7ffeeafc (+  64) 0022056d   <_APP_> TermApp<0x7ffeee50>::_MakeTermWindow(BRect&: 0x7ffeef68) + 0x0075
20 7ffeeb3c (+  96) 0021fc53   <_APP_> TermApp<0x7ffeee50>::ReadyToRun(0x7ffeecd4) + 0x00ef
21 7ffeeb9c (+ 496) 002eb311   <libbe.so> BApplication<0x7ffeee50>::DispatchMessage(BMessage*: 0x1800fd70, BHandler*: 0x7ffeee50) + 0x02f9
22 7ffeed8c (+  64) 002f648d   <libbe.so> BLooper<0x7ffeee50>::task_looper(0x7ffeef5c) + 0x0211
23 7ffeedcc (+  64) 002e9b89   <libbe.so> BApplication<0x7ffeee50>::Run(0x84) + 0x0075
24 7ffeee0c (+ 368) 0021bdd3   <_APP_>:main + 0x002f
25 7ffeef7c (+  48) 00215713   <_APP_>:_start + 0x005b
26 7ffeefac (+  48) 001008ea   </boot/beos/system/runtime_loader@0x00100000>:unknown + 0x08ea
27 7ffeefdc (+   0) 7ffeefec   1127969:Terminal_main_stack@0x7efef000 + 0xffffec

There doesn't seem to be an indicator as to who currently holds the mutex in the info though, so I'm not sure as to how to go about looking for the culprit of the deadlock.

Attachments (1)

syslog_ticket_2923.txt (113.3 KB ) - added by anevilyak 16 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 by anevilyak, 16 years ago

Just noticed that the mutex holder is only stored with KDEBUG enabled...will see if I can duplicate this with debugging on.

comment:2 by anevilyak, 16 years ago

Hm...upon trying to replicate this with debugging enabled I wound up with the following KDL during the "Populating image" phase...

devfs: reload driver "pc_floppy"
PANIC: _mutex_lock(): double lock of 0x80120b00 by thread 3
Welcome to Kernel Debugging Land...
Thread 3 "kernel daemon" running on CPU 0
kdebug> bt
stack trace for thread 3 "kernel daemon"
    kernel stack: 0x8016e000 to 0x80172000
frame               caller     <image>:function + offset
 0 80171a58 (+  48) 8005d96d   <kernel_x86>:invoke_debugger_command + 0x00f5
 1 80171a88 (+  64) 8005d75d   <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: 0x80126f40, int32: 0, 0x0 "<NULL>") + 0x0079
 2 80171ac8 (+  64) 8005dae4   <kernel_x86>:invoke_debugger_command_pipe + 0x009c
 3 80171b08 (+  48) 8005f06c   <kernel_x86> ExpressionParser<0x80171bbc>::_ParseCommandPipe(0x80171bb8) + 0x0234
 4 80171b38 (+  64) 8005e4a6   <kernel_x86> ExpressionParser<0x80171bbc>::EvaluateCommand(0x80116aa0 "bt", 0x80171bb8) + 0x02ba
 5 80171b78 (+ 224) 80060494   <kernel_x86>:evaluate_debug_command + 0x0088
 6 80171c58 (+  64) 8005b92e   <kernel_x86> kernel_debugger_loop() + 0x01ae
 7 80171c98 (+  32) 8005c799   <kernel_x86>:kernel_debugger + 0x004d
 8 80171cb8 (+ 192) 8005c741   <kernel_x86>:panic + 0x0029
 9 80171d78 (+  64) 8003ddc8   <kernel_x86>:_mutex_lock + 0x00f8
10 80171db8 (+  64) 8003c8e1   <kernel_x86> KernelDaemon<0x80120b00>::Register(0x9c5c8e44, int32: -1671643840, 0x5, 0x80120eec, 0x9c5cbd40, 0x80171ed8, 0x9c5c826c, 0x9c5c8e44, 0x9c5cbd40, 0x5, 0x9c5c816d, 0xd2c7, 0x81176b40, 0x0, 0x46, 0x8122fbb8, 0x0, 0x80171e78, 0x800b6f42, 0x8122fbb8, 0xd2c7, 0x80171e98, 0x9c5cbc80, 0x80116694, 0x9c5cbd00, 0x9c5c8e44, 0x80171eb0, 0x80171ea8, 0x9c5ca060, 0x9c5ca02b, 0x80036b18, 0x8122fbb8, 0x0) + 0x007d
11 80171df8 (+  32) 8003cc0c   <kernel_x86>:register_kernel_daemon + 0x001c
12 80171e18 (+ 192) 9c5c826c   </boot/beos/system/add-ons/kernel/drivers/dev/disk/floppy/pc_floppy>:init_driver + 0x0110
13 80171ed8 (+  48) 80072441   <kernel_x86> load_driver(legacy_driver*: 0x81176b40) + 0x0139
14 80171f08 (+  48) 80072a1d   <kernel_x86> reload_driver(legacy_driver*: 0x81176b40) + 0x0031
15 80171f38 (+  64) 80072b61   <kernel_x86> handle_driver_events(NULL, int32: 5629) + 0x0125
16 80171f78 (+  64) 8003cbbc   <kernel_x86> KernelDaemon<0x80120b00>::_DaemonThread(0x819b5000) + 0x006c
17 80171fb8 (+  32) 8003cb4a   <kernel_x86> KernelDaemon<0x80120b00>::_DaemonThreadEntry(NULL) + 0x0012
18 80171fd8 (+  32) 800529b3   <kernel_x86> _create_kernel_thread_kentry() + 0x001b
19 80171ff8 (+2145968136) 80052950   <kernel_x86> thread_kthread_exit() + 0x0000

The mutex in question is the kernel_daemon mutex itself. Attaching a serial log of that run up to the point where it died.

by anevilyak, 16 years ago

Attachment: syslog_ticket_2923.txt added

comment:3 by axeld, 16 years ago

Those all look like bugs in the driver reloading handling. This obviously needs some further work :-)

comment:4 by bonefish, 16 years ago

While driver reloading should be fixed, installing to /boot will most likely crash running apps, since the executed code is replaced by overwriting libraries and applications.

comment:5 by anevilyak, 16 years ago

Couldn't this be dealt with by using unlink() before copying the new data in place, like /bin/install does? Otherwise this seems to imply that it's impossible to use Haiku as primary OS for Haiku development without having at least two partitions, and this isn't the case for any other OS I know of.

comment:6 by axeld, 16 years ago

Of course you can solve this via unlink (everything else would just be a bug). BTW this problem is common between most operating systems I know of, for example Linux, and FreeBSD.

comment:7 by bonefish, 16 years ago

The original problem might be a dup of #2535. At least the legacy driver lock was involved. More is hard to guess with the info available.

comment:8 by anevilyak, 16 years ago

I do actually also run into another one sometimes that involves a double lock panic that I don't recall the mutex of right now, want me to try and duplicate later and post backtrace in case it's related?

in reply to:  8 comment:9 by bonefish, 16 years ago

Replying to anevilyak:

I do actually also run into another one sometimes that involves a double lock panic that I don't recall the mutex of right now, want me to try and duplicate later and post backtrace in case it's related?

Feel free. You could also open another ticket for the double lock reported here. This one is at least well understood -- a new kernel daemon is registered while executing a kernel daemon -- it just needs to be fixed.

comment:10 by anevilyak, 16 years ago

I'll check to make sure it's not that mutex as well, it may have been a different one. Will see.

comment:11 by anevilyak, 16 years ago

Never mind, same mutex as in previously reported trace. Should I go ahead and open a new ticket for the kernel_daemon double lock? And should it be considered Alpha milestone or no?

in reply to:  11 comment:12 by bonefish, 16 years ago

Replying to anevilyak:

Never mind, same mutex as in previously reported trace. Should I go ahead and open a new ticket for the kernel_daemon double lock?

Please do.

And should it be considered Alpha milestone or no?

Yeah, make it Alpha, but normal priority. We can still choose to ignore it, but I guess it's not hard to fix and thus a good victim for low-hanging-bug-hunting breaks. :-)

comment:13 by siarzhuk, 16 years ago

Cc: imker@… added

comment:14 by alexbl, 15 years ago

Could someone give a list of what the specific issues related to installing haiku in haiku are?

I think having a list, possibly with solutions (or at least ideas for solutions) might make this more tractable for someone wanting to fix it.

IMHO, this is absolutely necessary for R1.

comment:15 by axeld, 15 years ago

Version: R1/pre-alpha1R1/Development

There is an easy way around the problem for now: just rename /boot/system to something else, and then copy the new /boot/system folder to the boot volume. Of course, you won't be able to start new applications while there is no /system - copying with Tracker works just fine, though.

That's how I update my Haiku installation currently :-)

comment:16 by scottmc, 14 years ago

Milestone: R1R1/beta1

Is this still an issue? Assuming it is, moving it to hrev1/beta, if it's been fixed, then please verify and close.

comment:17 by anevilyak, 14 years ago

The issues affecting this have not been addressed, so it's indeed still valid.

comment:18 by scottmc, 14 years ago

Blocking: 7665 added

comment:19 by waddlesplash, 10 years ago

Still an issue after the PM merge?

comment:20 by pulkomandy, 10 years ago

Blocking: 7665 removed

Not a boot issue, removing from #7665.

comment:21 by pulkomandy, 10 years ago

Milestone: R1/beta1R1

This is not a common enough scenario to be in beta1. And it seems no one is interested in fixing it anyway.

comment:22 by waddlesplash, 10 years ago

Milestone: R1Unscheduled

You can now use pkgman install on the generated HPKGs, which effectively accomplishes the same thing, so moving this out of R1.

comment:23 by pulkomandy, 8 years ago

Milestone: UnscheduledR1

Moving back to R1. While the feature itself is not really needed now, the bugs that happens when doing such things should be fixed for R1.

comment:24 by pulkomandy, 4 years ago

Milestone: R1R1.1
Note: See TracTickets for help on using tickets.