Opened 16 years ago
Last modified 4 years ago
#2923 new bug
install-haiku target hangs when trying to install to /boot
Reported by: | anevilyak | Owned by: | bonefish |
---|---|---|---|
Priority: | normal | Milestone: | R1.1 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | imker@… | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
Just to see if it would work, I tried jam -q install-haiku from within Haiku. It got stuck while doing an ln however, in the following way:
bt 34760 stack trace for thread 34760 "ln" kernel stack: 0x98735000 to 0x98739000 user stack: 0x7efef000 to 0x7ffef000 frame caller <image>:function + offset 0 98738664 (+ 48) 800574d8 <kernel_x86> context_switch(thread*: 0x90136800, thread*: 0x819de000) + 0x003c 1 98738694 (+ 64) 800577bf <kernel_x86> simple_reschedule() + 0x029f 2 987386d4 (+ 48) 8003cf61 <kernel_x86>:_mutex_lock + 0x0149 3 98738704 (+ 48) 8003c0d0 <kernel_x86>:recursive_lock_lock + 0x0068 4 98738734 (+ 48) 8006fef3 <kernel_x86> driver_removed(0x819bed60 "/boot/beos/system/add-ons/kernel/drivers/dev/bus/usb_raw") + 0x0037 5 98738764 (+ 224) 80070cd1 <kernel_x86> DirectoryWatcher<0x8011f7ac>::EventOccured(NotificationService&: 0x8011f8a0, BPrivate8KMessage*: 0x987388d0, int16: 12899, int32: -1737258800, int8: 3, 0x98738e04, int32: -2146913081, 0x987388d0, 0xa4, 0x98738d34, 0x811ce040, 0x81171bec, 0x0, 0x98738d34) + 0x0439 6 98738844 (+ 64) 8008b344 <kernel_x86> NodeMonitorService<0x8011f8a0>::_SendNotificationMessage(BPrivate8KMessageP32interest&: 0x987388d0, 0x98738cfc, 0x807e561c00000001, 0x3, 0x9a901550, 0x246, 0x81171b0c, 0x98738980, 0x400044, 0x0, 0x400063, 0x0, 0x3, 0x2, 0x1, 0x6b4d7347, 0x1c, 0x0, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x987388fc, 0x400, 0x0, 0xa4, 0x6b4d7347, 0xc8, 0x4e444d4e, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff) + 0x0054 7 98738884 (+1200) 8008b5db <kernel_x86> NodeMonitorService<0x8011f8a0>::NotifyEntryCreatedOrRemoved(int32: 2, int32: 3, int64: 4194403, 0x98738e04 "usb_raw", int64: 4194372) + 0x0217 8 98738d34 (+ 64) 8008c49c <kernel_x86>:notify_entry_removed + 0x002c 9 98738d74 (+ 96) 807de9b7 <bfs> bfs_unlink(fs_volume*: 0x81171ad4, fs_vnode*: 0x9a9036e8, 0x98738e04 "usb_raw") + 0x0153 10 98738dd4 (+ 304) 80099ba9 <kernel_x86> common_unlink(int32: -1, 0x819be100 "/Haiku", false) + 0x0065 11 98738f04 (+ 64) 8009eb73 <kernel_x86>:_user_unlink + 0x0093 12 98738f44 (+ 100) 800cf621 <kernel_x86>:handle_syscall + 0x00be user iframe at 0x98738fa8 (end = 0x98739000) eax 0x6e ebx 0x2b971c ecx 0x7ffeedf0 edx 0xffff0104 esi 0x7ffef538 edi 0x7ffef5da ebp 0x7ffeee1c esp 0x98738fdc eip 0xffff0104 eflags 0x207 user esp 0x7ffeedf0 vector: 0x63, error code: 0x0 13 98738fa8 (+ 0) ffff0104 <commpage>:commpage_syscall + 0x0004 14 7ffeee1c (+ 208) 0020201e <_APP_>:_term_after (nearest) + 0x04a2 15 7ffeeeec (+ 144) 0020286a <_APP_>:main + 0x04e2 16 7ffeef7c (+ 48) 00201a9b <_APP_>:_start + 0x005b 17 7ffeefac (+ 48) 001008ea </boot/beos/system/runtime_loader@0x00100000>:unknown + 0x08ea 18 7ffeefdc (+ 0) 7ffeefec 1127921:ln_main_stack@0x7efef000 + 0xffffec id: 34760 (0x87c8) name: 'ln' args: '/bin/ln -sfn ../../bin/usb_raw /Haiku/beos/system/add-ons/kerne' next: 0x00000000 parent: 0x812f9990 (id = 34724) children: 0x00000000 num_threads: 1 state: 0 flags: 0x1 io_context: 0x81320fc0 address_space: 0x9a8f5c30 main_thread: 0x90136800 thread_list: 0x90136800 group_id: 34494 session_id: 256
The mutex in question was:
kdebug> call 34760 2 -1 thread 34760, ln 987386d4 8003cf61 <kernel_x86>:_mutex_lock(0x8011f7c8) kdebug> mutex 0x8011f7c8 mutex 0x8011f7c8: name: legacy driver flags: 0x0 count: -3 waiting threads: 34760 34772
Thread 34772:
THREAD: 0x90137000 id: 34772 (0x87d4) name: "Terminal" all_next: 0x90116800 team_next: 0x00000000 q_next: 0x80115b20 priority: 10 (next 10, I/O: -1) state: waiting next_state: waiting cpu: 0x00000000 sig_pending: 0x0 (blocked: 0x0) in_kernel: 1 waiting for: mutex 0x8011f7c8 fault_handler: 0x00000000 args: 0x812fc348 0x00000000 entry: 0x8004cedc team: 0x9b85c330, "Terminal" exit.sem: 1009244 exit.status: 0x0 (No error) exit.reason: 0x0 exit.signal: 0x0 exit.waiters: kernel_stack_area: 1127968 kernel_stack_base: 0x98749000 user_stack_area: 1127969 user_stack_base: 0x7efef000 user_local_storage: 0x7ffef000 kernel_errno: 0x0 (No error) kernel_time: 6112 user_time: 12384 flags: 0x0 architecture dependant section: esp: 0x9874cbf8 ss: 0x00000010 fpu_state at 0x901373d0 stack trace for thread 34772 "Terminal" kernel stack: 0x98749000 to 0x9874d000 user stack: 0x7efef000 to 0x7ffef000 frame caller <image>:function + offset 0 9874cc54 (+ 48) 800574d8 <kernel_x86> context_switch(thread*: 0x90137000, thread*: 0x80115b20) + 0x003c 1 9874cc84 (+ 64) 800577bf <kernel_x86> simple_reschedule() + 0x029f 2 9874ccc4 (+ 48) 8003cf61 <kernel_x86>:_mutex_lock + 0x0149 3 9874ccf4 (+ 48) 8003c0d0 <kernel_x86>:recursive_lock_lock + 0x0068 4 9874cd24 (+ 48) 800715ed <kernel_x86> LegacyDevice<0x811d4540>::InitDevice(0x0) + 0x0029 5 9874cd54 (+ 320) 8006ae06 <kernel_x86> devfs_open(fs_volume*: 0x81171284, fs_vnode*: 0x812ab3b8, int32: 4098, 0x9874cec0) + 0x0092 6 9874ce94 (+ 48) 8009808c <kernel_x86> open_vnode(vnode*: 0x812ab3b8, int32: 4098, false) + 0x0024 7 9874cec4 (+ 64) 800987ca <kernel_x86> file_open(int32: -1, 0x819bd080 "/dev", int32: 4098, false) + 0x0062 8 9874cf04 (+ 64) 8009dff1 <kernel_x86>:_user_open + 0x00a1 9 9874cf44 (+ 100) 800cf621 <kernel_x86>:handle_syscall + 0x00be user iframe at 0x9874cfa8 (end = 0x9874d000) eax 0x60 ebx 0x84671c ecx 0x7ffee600 edx 0xffff0104 esi 0x18021fb0 edi 0x1 ebp 0x7ffee62c esp 0x9874cfdc eip 0xffff0104 eflags 0x246 user esp 0x7ffee600 vector: 0x63, error code: 0x0 10 9874cfa8 (+ 0) ffff0104 <commpage>:commpage_syscall + 0x0004 11 7ffee62c (+ 48) 00817e82 <libroot.so>:posix_openpt + 0x0026 12 7ffee65c (+ 528) 0021e854 <_APP_> Shell<0x1801fc80>::_Spawn(int32: 25, int32: 80, 0x231182 "UTF8", int32: 2, 0x1801a298 "@ã") + 0x005c 13 7ffee86c (+ 64) 0021e40c <_APP_> Shell<0x1801fc80>::Open(int32: 25, int32: 80, 0x231182 "UTF8", int32: 2, 0x1801a298 "@ã") + 0x0048 14 7ffee8ac (+ 144) 0022470e <_APP_> TermView<0x18021fb0>::_InitObject(int32: 2, 0x1801a298 "@ã") + 0x056e 15 7ffee93c (+ 64) 00223ddb <_APP_>:__8TermViewiilPPCcl + 0x0167 16 7ffee97c (+ 64) 002309e1 <_APP_>:__14CustomTermViewlllPPCcl + 0x0035 17 7ffee9bc (+ 240) 0022fb51 <_APP_> TermWindow<0x180221a8>::_AddTab(Arguments*: 0x1801c2b0) + 0x00ad 18 7ffeeaac (+ 80) 0022c339 <_APP_>:__10TermWindowG5BRectPCcP9Arguments + 0x0165 19 7ffeeafc (+ 64) 0022056d <_APP_> TermApp<0x7ffeee50>::_MakeTermWindow(BRect&: 0x7ffeef68) + 0x0075 20 7ffeeb3c (+ 96) 0021fc53 <_APP_> TermApp<0x7ffeee50>::ReadyToRun(0x7ffeecd4) + 0x00ef 21 7ffeeb9c (+ 496) 002eb311 <libbe.so> BApplication<0x7ffeee50>::DispatchMessage(BMessage*: 0x1800fd70, BHandler*: 0x7ffeee50) + 0x02f9 22 7ffeed8c (+ 64) 002f648d <libbe.so> BLooper<0x7ffeee50>::task_looper(0x7ffeef5c) + 0x0211 23 7ffeedcc (+ 64) 002e9b89 <libbe.so> BApplication<0x7ffeee50>::Run(0x84) + 0x0075 24 7ffeee0c (+ 368) 0021bdd3 <_APP_>:main + 0x002f 25 7ffeef7c (+ 48) 00215713 <_APP_>:_start + 0x005b 26 7ffeefac (+ 48) 001008ea </boot/beos/system/runtime_loader@0x00100000>:unknown + 0x08ea 27 7ffeefdc (+ 0) 7ffeefec 1127969:Terminal_main_stack@0x7efef000 + 0xffffec
There doesn't seem to be an indicator as to who currently holds the mutex in the info though, so I'm not sure as to how to go about looking for the culprit of the deadlock.
Attachments (1)
Change History (25)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Hm...upon trying to replicate this with debugging enabled I wound up with the following KDL during the "Populating image" phase...
devfs: reload driver "pc_floppy" PANIC: _mutex_lock(): double lock of 0x80120b00 by thread 3 Welcome to Kernel Debugging Land... Thread 3 "kernel daemon" running on CPU 0 kdebug> bt stack trace for thread 3 "kernel daemon" kernel stack: 0x8016e000 to 0x80172000 frame caller <image>:function + offset 0 80171a58 (+ 48) 8005d96d <kernel_x86>:invoke_debugger_command + 0x00f5 1 80171a88 (+ 64) 8005d75d <kernel_x86> invoke_pipe_segment(debugger_command_pipe*: 0x80126f40, int32: 0, 0x0 "<NULL>") + 0x0079 2 80171ac8 (+ 64) 8005dae4 <kernel_x86>:invoke_debugger_command_pipe + 0x009c 3 80171b08 (+ 48) 8005f06c <kernel_x86> ExpressionParser<0x80171bbc>::_ParseCommandPipe(0x80171bb8) + 0x0234 4 80171b38 (+ 64) 8005e4a6 <kernel_x86> ExpressionParser<0x80171bbc>::EvaluateCommand(0x80116aa0 "bt", 0x80171bb8) + 0x02ba 5 80171b78 (+ 224) 80060494 <kernel_x86>:evaluate_debug_command + 0x0088 6 80171c58 (+ 64) 8005b92e <kernel_x86> kernel_debugger_loop() + 0x01ae 7 80171c98 (+ 32) 8005c799 <kernel_x86>:kernel_debugger + 0x004d 8 80171cb8 (+ 192) 8005c741 <kernel_x86>:panic + 0x0029 9 80171d78 (+ 64) 8003ddc8 <kernel_x86>:_mutex_lock + 0x00f8 10 80171db8 (+ 64) 8003c8e1 <kernel_x86> KernelDaemon<0x80120b00>::Register(0x9c5c8e44, int32: -1671643840, 0x5, 0x80120eec, 0x9c5cbd40, 0x80171ed8, 0x9c5c826c, 0x9c5c8e44, 0x9c5cbd40, 0x5, 0x9c5c816d, 0xd2c7, 0x81176b40, 0x0, 0x46, 0x8122fbb8, 0x0, 0x80171e78, 0x800b6f42, 0x8122fbb8, 0xd2c7, 0x80171e98, 0x9c5cbc80, 0x80116694, 0x9c5cbd00, 0x9c5c8e44, 0x80171eb0, 0x80171ea8, 0x9c5ca060, 0x9c5ca02b, 0x80036b18, 0x8122fbb8, 0x0) + 0x007d 11 80171df8 (+ 32) 8003cc0c <kernel_x86>:register_kernel_daemon + 0x001c 12 80171e18 (+ 192) 9c5c826c </boot/beos/system/add-ons/kernel/drivers/dev/disk/floppy/pc_floppy>:init_driver + 0x0110 13 80171ed8 (+ 48) 80072441 <kernel_x86> load_driver(legacy_driver*: 0x81176b40) + 0x0139 14 80171f08 (+ 48) 80072a1d <kernel_x86> reload_driver(legacy_driver*: 0x81176b40) + 0x0031 15 80171f38 (+ 64) 80072b61 <kernel_x86> handle_driver_events(NULL, int32: 5629) + 0x0125 16 80171f78 (+ 64) 8003cbbc <kernel_x86> KernelDaemon<0x80120b00>::_DaemonThread(0x819b5000) + 0x006c 17 80171fb8 (+ 32) 8003cb4a <kernel_x86> KernelDaemon<0x80120b00>::_DaemonThreadEntry(NULL) + 0x0012 18 80171fd8 (+ 32) 800529b3 <kernel_x86> _create_kernel_thread_kentry() + 0x001b 19 80171ff8 (+2145968136) 80052950 <kernel_x86> thread_kthread_exit() + 0x0000
The mutex in question is the kernel_daemon mutex itself. Attaching a serial log of that run up to the point where it died.
by , 16 years ago
Attachment: | syslog_ticket_2923.txt added |
---|
comment:3 by , 16 years ago
Those all look like bugs in the driver reloading handling. This obviously needs some further work :-)
comment:4 by , 16 years ago
While driver reloading should be fixed, installing to /boot will most likely crash running apps, since the executed code is replaced by overwriting libraries and applications.
comment:5 by , 16 years ago
Couldn't this be dealt with by using unlink() before copying the new data in place, like /bin/install does? Otherwise this seems to imply that it's impossible to use Haiku as primary OS for Haiku development without having at least two partitions, and this isn't the case for any other OS I know of.
comment:6 by , 16 years ago
Of course you can solve this via unlink (everything else would just be a bug). BTW this problem is common between most operating systems I know of, for example Linux, and FreeBSD.
comment:7 by , 16 years ago
The original problem might be a dup of #2535. At least the legacy driver lock was involved. More is hard to guess with the info available.
follow-up: 9 comment:8 by , 16 years ago
I do actually also run into another one sometimes that involves a double lock panic that I don't recall the mutex of right now, want me to try and duplicate later and post backtrace in case it's related?
comment:9 by , 16 years ago
Replying to anevilyak:
I do actually also run into another one sometimes that involves a double lock panic that I don't recall the mutex of right now, want me to try and duplicate later and post backtrace in case it's related?
Feel free. You could also open another ticket for the double lock reported here. This one is at least well understood -- a new kernel daemon is registered while executing a kernel daemon -- it just needs to be fixed.
comment:10 by , 16 years ago
I'll check to make sure it's not that mutex as well, it may have been a different one. Will see.
follow-up: 12 comment:11 by , 16 years ago
Never mind, same mutex as in previously reported trace. Should I go ahead and open a new ticket for the kernel_daemon double lock? And should it be considered Alpha milestone or no?
comment:12 by , 16 years ago
Replying to anevilyak:
Never mind, same mutex as in previously reported trace. Should I go ahead and open a new ticket for the kernel_daemon double lock?
Please do.
And should it be considered Alpha milestone or no?
Yeah, make it Alpha, but normal priority. We can still choose to ignore it, but I guess it's not hard to fix and thus a good victim for low-hanging-bug-hunting breaks. :-)
comment:13 by , 16 years ago
Cc: | added |
---|
comment:14 by , 15 years ago
Could someone give a list of what the specific issues related to installing haiku in haiku are?
I think having a list, possibly with solutions (or at least ideas for solutions) might make this more tractable for someone wanting to fix it.
IMHO, this is absolutely necessary for R1.
comment:15 by , 15 years ago
Version: | R1/pre-alpha1 → R1/Development |
---|
There is an easy way around the problem for now: just rename /boot/system to something else, and then copy the new /boot/system folder to the boot volume. Of course, you won't be able to start new applications while there is no /system - copying with Tracker works just fine, though.
That's how I update my Haiku installation currently :-)
comment:16 by , 14 years ago
Milestone: | R1 → R1/beta1 |
---|
Is this still an issue? Assuming it is, moving it to hrev1/beta, if it's been fixed, then please verify and close.
comment:17 by , 14 years ago
The issues affecting this have not been addressed, so it's indeed still valid.
comment:18 by , 14 years ago
Blocking: | 7665 added |
---|
comment:21 by , 10 years ago
Milestone: | R1/beta1 → R1 |
---|
This is not a common enough scenario to be in beta1. And it seems no one is interested in fixing it anyway.
comment:22 by , 10 years ago
Milestone: | R1 → Unscheduled |
---|
You can now use pkgman install
on the generated HPKGs, which effectively accomplishes the same thing, so moving this out of R1.
comment:23 by , 8 years ago
Milestone: | Unscheduled → R1 |
---|
Moving back to R1. While the feature itself is not really needed now, the bugs that happens when doing such things should be fixed for R1.
comment:24 by , 4 years ago
Milestone: | R1 → R1.1 |
---|
Just noticed that the mutex holder is only stored with KDEBUG enabled...will see if I can duplicate this with debugging on.