Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#12237 closed bug (fixed)

[launch_daemon] boot hangs at rocket icon

Reported by: diver Owned by: axeld
Priority: blocker Milestone: R1/beta1
Component: Servers/launch_daemon Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #12148
Platform: All

Description (last modified by diver)

Ever since I've updated to launch_daemon in hrev49432 boot process sometimes hangs at rocket icon. It happens ~ 1/5 boots.

Attachments (3)

boot_hang.png (72.8 KB ) - added by diver 10 years ago.
syslog_hrev49453.txt (45.0 KB ) - added by luroh 10 years ago.
kdl_session.png (46.0 KB ) - added by diver 9 years ago.

Download all attachments as: .zip

Change History (25)

by diver, 10 years ago

Attachment: boot_hang.png added

comment:1 by diver, 10 years ago

Description: modified (diff)

comment:2 by diver, 10 years ago

Note that you can ignore Launching x-vnd.haiku-midi_server failed: No such file or directory message as midi_server was blacklisted here. Enabling it didn't change anything.

comment:3 by luroh, 10 years ago

I've noticed the same in VirtualBox 5.0.0, attaching syslog from hung boot attempt, gcc2h hrev49453.

by luroh, 10 years ago

Attachment: syslog_hrev49453.txt added

comment:4 by waddlesplash, 10 years ago

Priority: highblocker

Happened to another user who was trying Haiku out (gcc2hybrid), and happens very often on my fresh x86_64 install. Elevating to Blocker.

comment:5 by diver, 9 years ago

Also reproducible in VMware Fusion 7.1.2.

comment:6 by axeld, 9 years ago

Please do a "teams", and a "threads <id-of-launch-daemon>" and attach the output here when it happens next time, thanks!

comment:7 by axeld, 9 years ago

In KDL, I mean.

comment:8 by diver, 9 years ago

Entered KDL on OS X with VBoxManage controlvm Haiku keyboardputscancode 38 54 20

by diver, 9 years ago

Attachment: kdl_session.png added

comment:9 by michaelvoliveira, 9 years ago

The same here!

comment:10 by fishpond, 9 years ago

On my core i7 quad, boot went through on two occasions out of 20. Disabling SMP works around the problem reliably. On those sessions where things went through with SMP enabled, this behaviour was no longer observed.

So, could the blocking be related to the scheduler, i.e. #10454? Would be interesting to know whether users impacted by this one also suffer from #10454 and whether the problem is more present with quad+ core systems.

Last edited 9 years ago by fishpond (previous) (diff)

comment:11 by fishpond, 9 years ago

Another observation: This bug hits me 99% of the time when booting from a USB stick and then switching to a hard disk boot volume in the boot options. I'm getting down to 50% failure rate when disconnecting the USB stick before continuing the boot process. Don't know if that helps pinpointing the root cause...

comment:12 by waddlesplash, 9 years ago

Blocking: 12148 added

(In #12148) There's a newer issue that has more info than this one, closing as duplicate.

comment:13 by mmlr, 9 years ago

Please recheck with hrev49561 as it might affect this. If it does make a difference there could be a timing dependent deadlock that is either related to or just made more obvious by the regression fixed in that change.

comment:14 by diver, 9 years ago

hrev49564. Did 20 reboots (which was more then needed to reproduce it) and was about to close this one when in hung :/

comment:15 by vidrep, 9 years ago

I was having the same issue pre-launch daemon (#12148). Is there a possibility that the launch daemon is not responsible at all for the symptom?

comment:16 by diver, 9 years ago

Never happened for me pre-launch daemon.

comment:17 by fishpond, 9 years ago

Blacklisting cpuidle as indicated in #12148 removes problem for me on i7 quad.

Last edited 9 years ago by fishpond (previous) (diff)

comment:18 by mmlr, 9 years ago

Please recheck with hrev49570 as it fixes a use-after-free due to a race condition in registrar that was made more likely by the parallel application launch process introduced with the launch_daemon.

Generally this ticket might track two unrelated problems with the same generic symptoms.

comment:19 by diver, 9 years ago

hrev49570. Hang after 10 reboots. Anything I can try?

comment:20 by diver, 9 years ago

kdebug> teams
team           id  parent      name
0x828bf600      1  0x00000000  kernel_team
0x828b6c00    171  0x828bf600  launch_daemon
0x828b6600    173  0x828b6c00  app_server
0x828b6000    174  0x828b6c00  cddb_daemon
0x828b5a00    175  0x828b6c00  debug_server
0x828b4e00    177  0x828b6c00  midi_server
0x828b5400    178  0x828b6c00  mount_server
0x828b4800    179  0x828b6c00  net_server
0x828b4200    180  0x828b6c00  notification_server
0x828b3c00    181  0x828b6c00  package_daemon
0x828b3600    182  0x828b6c00  power_daemon
0x828b3000    183  0x828b6c00  print_server
0x828b2a00    184  0x828b6c00  registrar
0x828b2400    185  0x828b6c00  syslog_daemon

kdebug> threads
thread         id  state     wait for  object      cpu pri  stack      team  name
0x801d1f00      1  running   -                       0   0  0x81001000    1  idle thread 1
0x82a0fa20      2  waiting   cvar      0x801e3328    -  15  0x81105000    1  undertaker
0x82a0f5d0      3  zzz                               -   5  0x81a19000    1  kernel daemon
0x82a0f180      4  zzz                               -   5  0x81a1d000    1  resource resizer
0x82a0ed30      5  waiting   sem       12            -  21  0x81a22000    1  acpi_task
0x82a0e8e0      6  zzz                               -   1  0x81a2a000    1  page scrubber
0x82a0e490      7  waiting   cvar      0x801ec934    -  11  0x81a2e000    1  page writer
0x82a0e040      8  waiting   cvar      0x801ec974    -  10  0x81a32000    1  page daemon
0x82a0dbf0      9  waiting   cvar      0x801ec7d0    - 110  0x81a36000    1  object cache resizer
0x82a0d7a0     10  waiting   sem       29            -   5  0x81a3a000    1  low resource manager
0x82a0d350     11  waiting   cvar      0x801cb2b8    -  10  0x81a42000    1  dpc: normal priority
0x82a0cf00     12  waiting   cvar      0x801cb2f8    -  20  0x81a46000    1  dpc: high priority
0x82a0cab0     13  waiting   cvar      0x801cb338    - 100  0x81a4a000    1  dpc: real-time priority
0x82a0c660     14  waiting   sem       42            -   5  0x81a4e000    1  block notifier/writer
0x82a0bdc0    141  waiting   sem       551           -  20  0x81b56000    1  ohci finish thread
0x82a0b970    142  zzz                               -   5  0x81b5a000    1  usb explore
0x82a0b520    143  zzz                               -  10  0x81864000    1  media checker
0x82a0b0d0    157  waiting   sem       618           -  10  0x81868000    1  locked_pool_enlarger
0x82a0ac80    158  waiting   sem       634           -  20  0x81b5e000    1  scsi_bus_service
0x82a0a830    159  waiting   sem       701           -  20  0x81b66000    1  scsi_bus_service
0x82a0a3e0    160  waiting   sem       731           -  20  0x81b6e000    1  scsi_bus_service
0x82a09f90    161  waiting   cvar      0x8296dc54    -  12  0x81bf6000    1  scsi scheduler 1
0x82a09b40    162  waiting   cvar      0x8296dc7c    -  12  0x81bfa000    1  scsi notifier 1
0x82a096f0    163  waiting   cvar      0x8296db54    -  12  0x81c91000    1  scsi scheduler 2
0x82a092a0    164  waiting   cvar      0x8296db7c    -  12  0x81c95000    1  scsi notifier 2
0x82a08e50    169  waiting   sem       806           -  10  0x81914000    1  run_on_exit_loop
0x82a08a00    170  waiting   sem       810           -  10  0x81a0b000    1  invalidate_loop
0x82a085b0    171  waiting   cvar      0x828670f8    -  10  0x81703000  171  launch_daemon
0x82a0c210    172  waiting   sem       829           -  10  0x81a52000  171  worker
0x82a08160    173  waiting   cvar      0x82825cd0    -  10  0x81707000  173  app_server
0x82a07d10    174  waiting   cvar      0x82825910    -  10  0x8170b000  174  cddb_daemon
0x82a078c0    175  waiting   cvar      0xd394adb0    -  10  0x8170f000  175  debug_server
0x82a07020    177  waiting   cvar      0xd394abd0    -  10  0x81968000  177  midi_server
0x82a07470    178  waiting   cvar      0xd394a630    -  10  0x81964000  178  mount_server
0x82a06bd0    179  waiting   cvar      0xd394a3b0    -  10  0x8196c000  179  net_server
0x82a06780    180  waiting   cvar      0xd394a090    -  10  0x81970000  180  notification_server
0x82a06330    181  waiting   cvar      0xd394fd70    -  10  0x81985000  181  package_daemon
0x82a05ee0    182  waiting   cvar      0xd394faf0    -  10  0x81989000  182  power_daemon
0x82a05a90    183  waiting   cvar      0xd394f730    -  10  0x8198d000  183  print_server
0x82a05640    184  waiting   cvar      0xd394f4b0    -  10  0x81991000  184  registrar
0x82a051f0    185  waiting   cvar      0xd394f230    -  10  0x819db000  185  syslog_daemon

kdebug> threads 171
thread         id  state     wait for  object      cpu pri  stack      team  name
0x82a085b0    171  waiting   cvar      0x828670f8    -  10  0x81703000  171  launch_daemon
0x82a0c210    172  waiting   sem       829           -  10  0x81a52000  171  worker

kdebug> bt 171
stack trace for thread 171 "launch_daemon"
    kernel stack: 0x81703000 to 0x81707000
      user stack: 0x7088e000 to 0x7188e000
frame               caller     <image>:function + offset
 0 81706cc4 (+ 224) 80097597   <kernel_x86> reschedule(int32: 6) + 0x1007
 1 81706da4 (+  48) 80097641   <kernel_x86> scheduler_reschedule + 0x61
 2 81706dd4 (+  96) 80089136   <kernel_x86> thread_block_with_timeout + 0x1ae
 3 81706e34 (+  64) 800568d7   <kernel_x86> ConditionVariableEntry<0x81706ea4>::Wait(uint32: 0x11 (17), int64: 9223372036854775807) + 0x11f
 4 81706e74 (+  80) 8006c5c2   <kernel_x86> _get_port_message_info_etc + 0x14a
 5 81706ec4 (+  80) 8006c469   <kernel_x86> port_buffer_size_etc + 0x25
 6 81706f14 (+  48) 8006d8e9   <kernel_x86> _user_port_buffer_size_etc + 0x91
 7 81706f44 (+ 100) 80139c1f   <kernel_x86> handle_syscall + 0xdc
user iframe at 0x81706fa8 (end = 0x81707000)
 eax 0xd8          ebx 0x1babad8      ecx 0x7188dc7c  edx 0x614eb114
 esi 0xffffffff    edi 0x7fffffff     ebp 0x7188dca8  esp 0x81706fdc
 eip 0x614eb114 eflags 0x3202    user esp 0x7188dc7c
 vector: 0x63, error code: 0x0
 8 81706fa8 (+   0) 614eb114   <commpage> commpage_syscall + 0x04
 9 7188dca8 (+  48) 00ba73be   <libbe.so> BLooper<0x18205b48>::ReadRawFromPort(0x7188dd04, int64: 9223372036854775807) + 0x2e
10 7188dcd8 (+  48) 00ba744a   <libbe.so> BLooper<0x18205b48>::ReadMessageFromPort(int64: 9223372036854775807) + 0x2a
11 7188dd08 (+  48) 00ba6bff   <libbe.so> BLooper<0x18205b48>::MessageFromPort(int64: 9223372036854775807) + 0x27
12 7188dd38 (+  64) 00ba75d7   <libbe.so> BLooper<0x18205b48>::task_looper(0x7188dd95) + 0x6f
13 7188dd78 (+  64) 00b9a89d   <libbe.so> BApplication<0x18205b48>::Run(0x0) + 0x75
14 7188ddb8 (+  80) 01448f68   <_APP_> main + 0xf4
15 7188de08 (+  48) 01443893   <_APP_> _start + 0x5b
16 7188de38 (+  48) 01209b72   </boot/system/runtime_loader@0x011fa000> <unknown> + 0xfb72
17 7188de68 (+   0) 614eb250   <commpage> commpage_thread_exit + 0x00

kdebug> bt 172
stack trace for thread 172 "worker"
    kernel stack: 0x81a52000 to 0x81a56000
      user stack: 0x70668000 to 0x706a8000
frame               caller     <image>:function + offset
 0 81a55d34 (+ 224) 80097597   <kernel_x86> reschedule(int32: 6) + 0x1007
 1 81a55e14 (+  48) 80097641   <kernel_x86> scheduler_reschedule + 0x61
 2 81a55e44 (+  96) 80088f4e   <kernel_x86> thread_block + 0x10a
 3 81a55ea4 (+  96) 8006fd68   <kernel_x86> switch_sem_etc + 0x400
 4 81a55f04 (+  64) 80070b8e   <kernel_x86> _user_acquire_sem_etc + 0x9a
 5 81a55f44 (+ 100) 80139c1f   <kernel_x86> handle_syscall + 0xdc
user iframe at 0x81a55fa8 (end = 0x81a56000)
 eax 0x11          ebx 0x1babad8      ecx 0x706a7aec  edx 0x614eb114
 esi 0xffffffff    edi 0x7fffffff     ebp 0x706a7b28  esp 0x81a55fdc
 eip 0x614eb114 eflags 0x3202    user esp 0x706a7aec
 vector: 0x63, error code: 0x0
 6 81a55fa8 (+   0) 614eb114   <commpage> commpage_syscall + 0x04
 7 706a7b28 (+  96) 00d14c1f   <libbe.so> BSupportKit::BPrivate::JobQueue<0x18205ca0>::Pop(int64: 9223372036854775807, false, BSupportKit::BJob*: 0x706a7bc4) + 0x4ff
 8 706a7b88 (+  64) 014542b1   <_APP_> Worker<0x181eeb10>::Process(0x0) + 0x45
 9 706a7bc8 (+  48) 01454392   <_APP_> Worker<0x181eeb10>::_Process(NULL) + 0x2a
10 706a7bf8 (+  48) 01b01d3f   <libroot.so> _get_next_team_info (nearest) + 0x5f
11 706a7c28 (+   0) 614eb250   <commpage> commpage_thread_exit + 0x00

kdebug> threads 184
thread         id  state     wait for  object      cpu pri  stack      team  name
0x82a05640    184  waiting   cvar      0xd394f4b0    -  10  0x81991000  184  registrar

kdebug> bt 184
stack trace for thread 184 "registrar"
    kernel stack: 0x81991000 to 0x81995000
      user stack: 0x71c2a000 to 0x72c2a000
frame               caller     <image>:function + offset
 0 81994cc4 (+ 224) 80097597   <kernel_x86> reschedule(int32: 6) + 0x1007
 1 81994da4 (+  48) 80097641   <kernel_x86> scheduler_reschedule + 0x61
 2 81994dd4 (+  96) 80089136   <kernel_x86> thread_block_with_timeout + 0x1ae
 3 81994e34 (+  64) 800568d7   <kernel_x86> ConditionVariableEntry<0x81994ea4>::Wait(uint32: 0x11 (17), int64: 9223372036854775807) + 0x11f
 4 81994e74 (+  80) 8006c5c2   <kernel_x86> _get_port_message_info_etc + 0x14a
 5 81994ec4 (+  80) 8006c469   <kernel_x86> port_buffer_size_etc + 0x25
 6 81994f14 (+  48) 8006d8e9   <kernel_x86> _user_port_buffer_size_etc + 0x91
 7 81994f44 (+ 100) 80139c1f   <kernel_x86> handle_syscall + 0xdc
user iframe at 0x81994fa8 (end = 0x81995000)
 eax 0xd8          ebx 0x1e4cad8      ecx 0x72c28d0c  edx 0x620db114
 esi 0xffffffff    edi 0x7fffffff     ebp 0x72c28d38  esp 0x81994fdc
 eip 0x620db114 eflags 0x3202    user esp 0x72c28d0c
 vector: 0x63, error code: 0x0
 8 81994fa8 (+   0) 620db114   <commpage> commpage_syscall + 0x04
 9 72c28d38 (+  48) 00ff1a0b   <libbe.so> __cl__Q38BPrivate11BLooperList12FindPortPredRQ38BPrivate11BLooperList10LooperData (nearest) + 0x13b
10 72c28d68 (+ 128) 00ff5c02   <libbe.so> BMessage<0x72c28fb0>::_SendMessage(BMessage: 0xb, int32: 171, int32: -2, int32: 1925353320, BMessage*: 0xffffffff, int64: -2147483649, int64: 8243666964875051007) + 0x17a
11 72c28de8 (+  96) 00ffdc29   <libbe.so> BMessenger<0x180f6a60>::SendMessage(BMessenger: 0x72c28fb0, BMessage*: 0x72c28f68, BMessage*: 0xffffffff, int64: -2147483649, int64: 72212189238263807) + 0x61
12 72c28e48 (+  64) 01008d2e   <libbe.so> BRoster::Private<0x72c28ed4>::SendTo(BMessage*: 0x72c28fb0) + 0xa2
13 72c28e88 (+ 368) 00fe792a   <libbe.so> __10BClipboardPCcb + 0x122
14 72c28ff8 (+1104) 01296f12   <_APP_> main + 0x42
15 72c29448 (+  48) 01277cd3   <_APP_> _start + 0x5b
16 72c29478 (+  48) 0174eb72   </boot/system/runtime_loader@0x0173f000> <unknown> + 0xfb72
17 72c294a8 (+   0) 620db250   <commpage> commpage_thread_exit + 0x00
kdebug>

comment:21 by mmlr, 9 years ago

Resolution: fixed
Status: newclosed

Fixed in hrev49583.

comment:22 by michaelvoliveira, 9 years ago

Thanks mmlr!!!

Finally Haiku boots successful in my machine after 2 years, and with Wi-Fi enabled!!!

Now I can get back to work in Haikuports recipes.

Note: See TracTickets for help on using tickets.