Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#6760 closed bug (fixed)

Unable to boot from usb key

Reported by: Anarchos Owned by: stippi
Priority: normal Milestone: R1
Component: Kits/Storage Kit Version:
Keywords: Cc: scorps@…
Blocked By: Blocking: #6819
Has a Patch: no Platform: x86

Description

Revision 39010, gcc2hybrid I get the blue desktop with only the boot volume icon and the mouse, and then the all system freezes

Attachments (4)

syslog (94.2 KB) - added by Anarchos 9 years ago.
listdev (12.0 KB) - added by Anarchos 9 years ago.
listdev, as seen on BeOS R5.0.3
input_server_st.txt (10.8 KB) - added by idefix 9 years ago.
The stack traces of input_server.
media_addon_server_st.txt (16.2 KB) - added by idefix 9 years ago.
The stack traces of media_addon_server.

Download all attachments as: .zip

Change History (25)

Changed 9 years ago by Anarchos

Attachment: syslog added

comment:1 Changed 9 years ago by Anarchos

Notice some lines in the syslog complaining about ATA and more than one DMA mode.

Changed 9 years ago by Anarchos

Attachment: listdev added

listdev, as seen on BeOS R5.0.3

comment:2 Changed 9 years ago by anevilyak

Did previous builds boot, and if so can you narrow down the revision where it broke?

comment:3 in reply to:  2 Changed 9 years ago by Anarchos

R1alpha2 boots. It broke 3 months ago. I will try to narrow the latest working one.

comment:4 Changed 9 years ago by idefix

I experience the same problem when booting Haiku from an 11 year old laptop harddisk (IBM DKLA-24320).
Once I tried to narrow it down, but got frustrated because sometimes (~1 in 10) the same release would boot without any problem. I therefore suspect a race-condition somewhere...

comment:5 in reply to:  2 Changed 9 years ago by idefix

Replying to anevilyak:

Did previous builds boot, and if so can you narrow down the revision where it broke?

I've narrowed it down to somewhere between hrev38099 and hrev38307. Unfortunately there weren't any nightlies created between those changesets.

comment:6 Changed 9 years ago by idefix

Component: - GeneralKits/Storage Kit
Owner: changed from nobody to axeld

It looks like changeset:38235 is the culprit.

hrev38234 (+hrev38237 to fix the build) boots fine, while hrev38235 (+hrev38237&hrev38238 to fix the build) freezes.

comment:7 Changed 9 years ago by stippi

If this changeset is indeed the problem, then you should still be able to enter KDL and get a stack trace of all the media_addon_server and input_server threads (both use the code in question). Use teams to get a list of all teams, then threads <id of media_addon_server> then bt <each thread id in turn>. The stack traces should reveal a deadlock.

comment:8 Changed 9 years ago by Michael S.

Cc: scorps@… added

comment:9 in reply to:  7 Changed 9 years ago by Michael S.

Replying to stippi:

KDL (Alt+Print+D) doesn't work. KDL appears if to pull out a flash card only.

So are all the latest build (gcc 2, 4, 2/4, 4/2)

comment:10 Changed 9 years ago by idefix

You can't enter KDL with the keyboard, but I never tried to enter it by pulling out a flashcard. Will try that trick next.

I think the problem happens when the input-server starts and wants to take control of the keyboard, because Haiku freezes approximately at the time that the numlock-led on the keyboard should turn off.

comment:11 Changed 9 years ago by stippi

Ok, so maybe it's the input_server which locks up. You could pull out that flash card and still get the back traces of the input_server threads once in KDL, like I described above. You probably have to take photos and attach them here. Thanks in advance and sorry for the hassle.

comment:12 Changed 9 years ago by idefix

It doesn't KDL when I pull out a flashcard; probably because I don't boot off a flashcard, but off a slow HD. I therefore tried the nest best thing: cut off the power to the HD. However, this wouldn't lead to KDL when the system was fully freezed, as there wasn't any HD-activity. So I had to cut the power just before that moment, while hmulti_audio.media_addon was scanning /dev/audio/hmulti/. Is there a way to enter KDL via the serial port?

The stack traces of input_server:

kdebug> teams
team           id  parent      name
0x84623c48     94  0x82026e00  Tracker
0x82026e00      1  0x00000000  kernel_team
0x84623a88     95  0x82026e00  Deskbar
0x84908a90    127  0x846238c8  media_addon_server
0x846238c8     96  0x82026e00  media_server
0x84623708     97  0x82026e00  midi_server
0x82026380     66  0x82026e00  syslog_daemon
0x84623548     98  0x82026e00  print_server
0x84623388     99  0x82026e00  mail_daemon
0x846231c8    100  0x82026e00  cddb_daemon
0x84623008    101  0x82026e00  notification_server
0x82026a80     44  0x82026e00  registrar
0x820261c0     76  0x82026540  input_server
0x820268c0     49  0x82026e00  debug_server
0x82026700     50  0x82026e00  net_server
0x82026540     51  0x82026e00  app_server
0x82026000     85  0x82026e00  mount_server
kdebug> threads 76
thread         id  state     wait for   object  cpu pri  stack      team  name
0x844ec7d0     76  waiting   sem           513    -  20  0x805dc000   76  input_server
0x844ea490     82  waiting   cvar   0x81c0d030    - 103  0x805f4000   76  _input_server_event_loop_
0x844e9eb0     83  waiting   sem           508    -  10  0x805f8000   76  add-on manager
0x844e98d0     84  waiting   sem           508    -  10  0x805fc000   76  AddOnMonitor
0x844e8d10     87  waiting   cvar   0x81c0d15c    -  10  0x83727000   76  PathMonitor looper
kdebug> bt 76
stack trace for thread 76 "input_server"
    kernel stack: 0x805dc000 to 0x805e0000
      user stack: 0x7efef000 to 0x7ffef000
frame               caller     <image>:function + offset
 0 805dfe24 (+  48) 8006fd97   <kernel_x86> context_switch(thread*: 0x844ec7d0, thread*: 0x82167b60) + 0x003f
 1 805dfe54 (+  96) 800700b1   <kernel_x86> simple_reschedule() + 0x02d5
 2 805dfeb4 (+  80) 8005a679   <kernel_x86>:switch_sem_etc + 0x0359
 3 805dff04 (+  64) 8005b315   <kernel_x86>:_user_acquire_sem_etc + 0x00a5
 4 805dff44 (+ 100) 80108fa2   <kernel_x86>:handle_syscall + 0x00af
user iframe at 0x805dffa8 (end = 0x805e0000)
 eax 0xf            ebx 0x5e930c        ecx 0x7ffeea20   edx 0xffff0114
 esi 0xffffffff     edi 0x7fffffff      ebp 0x7ffeea5c   esp 0x805dffdc
 eip 0xffff0114  eflags 0x203      user esp 0x7ffeea20
 vector: 0x63, error code: 0x0
 5 805dffa8 (+   0) ffff0114   <commpage>:commpage_syscall + 0x0004
 6 7ffeea5c (+  64) 003007d6   <libbe.so> BLooper<0x18035120>::_LockComplete(BLooper*: 0x1, int32: 76, int32: 513, int32: -1, int64: 9223067388382019583) + 0x003e
 7 7ffeea9c (+  80) 00300737   <libbe.so> BLooper<0x18035120>::_Lock(BLooper*: 0xffffffff, int32: -1, int64: 9802732424658943) + 0x0177
 8 7ffeeaec (+  48) 002ffa74   <libbe.so> BLooper<0x18035120>::Lock(0x2) + 0x002c
 9 7ffeeb1c (+ 544) 002215da   <_APP_> BPrivate::Storage::AddOnMonitorHandler<0x1801aa50>::AddDirectory(node_ref*: 0x7ffeedec) + 0x003a
10 7ffeed3c (+ 224) 00219946   <_APP_> AddOnManager<0x18022f20>::_RegisterAddOns(0x2ff8b6) + 0x028a
11 7ffeee1c (+  48) 0021968b   <_APP_> AddOnManager<0x18022f20>::LoadState(0x212d9c) + 0x001f
12 7ffeee4c (+ 224) 00212e2d   <_APP_>:__11InputServer + 0x0411
13 7ffeef2c (+  64) 00217a09   <_APP_>:main + 0x0031
14 7ffeef6c (+  48) 002126ef   <_APP_>:_start + 0x005b
15 7ffeef9c (+  64) 00105d32   </boot/system/runtime_loader@0x00100000>:unknown + 0x5d32
16 7ffeefdc (+   0) 7ffeefec   1737:input_server_main_stack@0x7efef000 + 0xffffec
kdebug> bt 82
stack trace for thread 82 "_input_server_event_loop_"
    kernel stack: 0x805f4000 to 0x805f8000
      user stack: 0x70000000 to 0x70040000
frame               caller     <image>:function + offset
 0 805f7d94 (+  48) 8006fd97   <kernel_x86> context_switch(thread*: 0x844ea490, thread*: 0x844ec7d0) + 0x003f
 1 805f7dc4 (+  96) 800700b1   <kernel_x86> simple_reschedule() + 0x02d5
 2 805f7e24 (+  64) 80046575   <kernel_x86> ConditionVariableEntry<0x805f7ea8>::Wait(uint32: 0x1 (1), int64: 0) + 0x0199
 3 805f7e64 (+  96) 800572aa   <kernel_x86>:_get_port_message_info_etc + 0x0182
 4 805f7ec4 (+  80) 80057119   <kernel_x86>:port_buffer_size_etc + 0x0025
 5 805f7f14 (+  48) 80058370   <kernel_x86>:_user_port_buffer_size_etc + 0x009c
 6 805f7f44 (+ 100) 80108fa2   <kernel_x86>:handle_syscall + 0x00af
user iframe at 0x805f7fa8 (end = 0x805f8000)
 eax 0xca           ebx 0x5e930c        ecx 0x7003fef0   edx 0xffff0114
 esi 0x216740       edi 0x1802da90      ebp 0x7003ff1c   esp 0x805f7fdc
 eip 0xffff0114  eflags 0x206      user esp 0x7003fef0
 vector: 0x63, error code: 0x0
 7 805f7fa8 (+   0) ffff0114   <commpage>:commpage_syscall + 0x0004
 8 7003ff1c (+  96) 00216795   <_APP_> InputServer<0x18037240>::_EventLoop(0x0) + 0x002d
 9 7003ff7c (+  48) 0021675f   <_APP_> InputServer<0x18037240>::_EventLooper(NULL) + 0x001f
10 7003ffac (+  48) 00542ab2   <libroot.so>:_get_next_team_info (nearest) + 0x0072
11 7003ffdc (+   0) 7003ffec   1772:_input_server_event_loop__82_st@0x70000000 + 0x3ffec
kdebug> bt 83
stack trace for thread 83 "add-on manager"
    kernel stack: 0x805f8000 to 0x805fc000
      user stack: 0x70041000 to 0x70081000
frame               caller     <image>:function + offset
 0 805fbe24 (+  48) 8006fd97   <kernel_x86> context_switch(thread*: 0x844e9eb0, thread*: 0x8216c1e0) + 0x003f
 1 805fbe54 (+  96) 800700b1   <kernel_x86> simple_reschedule() + 0x02d5
 2 805fbeb4 (+  80) 8005a679   <kernel_x86>:switch_sem_etc + 0x0359
 3 805fbf04 (+  64) 8005b315   <kernel_x86>:_user_acquire_sem_etc + 0x00a5
 4 805fbf44 (+ 100) 80108fa2   <kernel_x86>:handle_syscall + 0x00af
user iframe at 0x805fbfa8 (end = 0x805fc000)
 eax 0xf            ebx 0x5e930c        ecx 0x70080e80   edx 0xffff0114
 esi 0xffffffff     edi 0x7fffffff      ebp 0x70080ebc   esp 0x805fbfdc
 eip 0xffff0114  eflags 0x203      user esp 0x70080e80
 vector: 0x63, error code: 0x0
 5 805fbfa8 (+   0) ffff0114   <commpage>:commpage_syscall + 0x0004
 6 70080ebc (+  64) 003007d6   <libbe.so> BLooper<0x18022f20>::_LockComplete(BLooper*: 0x1, int32: 83, int32: 508, int32: -1, int64: 2147483647) + 0x003e
 7 70080efc (+  80) 00300737   <libbe.so> BLooper<0x18022f20>::_Lock(BLooper*: 0xffffffff, int32: -1, int64: 21462022445072383) + 0x0177
 8 70080f4c (+  48) 002ffa74   <libbe.so> BLooper<0x18022f20>::Lock(0x0) + 0x002c
 9 70080f7c (+  48) 00300a7c   <libbe.so> BLooper<0x18022f20>::_task0_(NULL) + 0x0020
10 70080fac (+  48) 00542ab2   <libroot.so>:_get_next_team_info (nearest) + 0x0072
11 70080fdc (+   0) 70080fec   1774:add-on manager_83_stack@0x70041000 + 0x3ffec
kdebug> bt 84
stack trace for thread 84 "AddOnMonitor"
    kernel stack: 0x805fc000 to 0x80600000
      user stack: 0x70082000 to 0x700c2000
frame               caller     <image>:function + offset
 0 805ffe24 (+  48) 8006fd97   <kernel_x86> context_switch(thread*: 0x844e98d0, thread*: 0x82167b60) + 0x003f
 1 805ffe54 (+  96) 800700b1   <kernel_x86> simple_reschedule() + 0x02d5
 2 805ffeb4 (+  80) 8005a679   <kernel_x86>:switch_sem_etc + 0x0359
 3 805fff04 (+  64) 8005b315   <kernel_x86>:_user_acquire_sem_etc + 0x00a5
 4 805fff44 (+ 100) 80108fa2   <kernel_x86>:handle_syscall + 0x00af
user iframe at 0x805fffa8 (end = 0x80600000)
 eax 0xf            ebx 0x5e930c        ecx 0x700c1a80   edx 0xffff0114
 esi 0xffffffff     edi 0x7fffffff      ebp 0x700c1abc   esp 0x805fffdc
 eip 0xffff0114  eflags 0x203      user esp 0x700c1a80
 vector: 0x63, error code: 0x0
 5 805fffa8 (+   0) ffff0114   <commpage>:commpage_syscall + 0x0004
 6 700c1abc (+  64) 003007d6   <libbe.so> BLooper<0x18022f20>::_LockComplete(BLooper*: 0x2, int32: 84, int32: 508, int32: -1, int64: 8073858316066881535) + 0x003e
 7 700c1afc (+  80) 00300737   <libbe.so> BLooper<0x18022f20>::_Lock(BLooper*: 0xffffffff, int32: -1, int64: 9802732424658943) + 0x0177
 8 700c1b4c (+  48) 002ffa74   <libbe.so> BLooper<0x18022f20>::Lock(0x22699d) + 0x002c
 9 700c1b7c (+  80) 0021a813   <_APP_> AddOnManager<0x18022f20>::_RegisterDevice(BInputServerDevice*: 0x18037188, entry_ref&: 0x700c1c00, int32: 627) + 0x002b
10 700c1bcc (+  96) 00219f2b   <_APP_> AddOnManager<0x18022f20>::_RegisterAddOn(BEntry&: 0x700c1c6c) + 0x012f
11 700c1c2c (+ 128) 0021dc8f   <_APP_> AddOnManager::MonitorHandler<0x1801aa50>::AddOnEnabled(BPrivate::Storage::add_on_entry_info*: 0x700c1d68) + 0x006b
12 700c1cac (+ 560) 002238c9   <_APP_> BPrivate::Storage::AddOnMonitorHandler<0x1801aa50>::_HandlePulse(0x0) + 0x0591
13 700c1edc (+  48) 00221588   <_APP_> BPrivate::Storage::AddOnMonitorHandler<0x1801aa50>::MessageReceived(BMessage*: 0x1801ab90) + 0x002c
14 700c1f0c (+  48) 002ff57f   <libbe.so> BLooper<0x18035120>::DispatchMessage(BMessage*: 0x1801ab90, BHandler*: 0x1801aa50) + 0x005b
15 700c1f3c (+  64) 00300ea9   <libbe.so> BLooper<0x18035120>::task_looper(0x0) + 0x0205
16 700c1f7c (+  48) 00300a9b   <libbe.so> BLooper<0x18035120>::_task0_(NULL) + 0x003f
17 700c1fac (+  48) 00542ab2   <libroot.so>:_get_next_team_info (nearest) + 0x0072
18 700c1fdc (+   0) 700c1fec   1776:AddOnMonitor_84_stack@0x70082000 + 0x3ffec
kdebug> bt 87
stack trace for thread 87 "PathMonitor looper"
    kernel stack: 0x83727000 to 0x8372b000
      user stack: 0x700c3000 to 0x70103000
frame               caller     <image>:function + offset
 0 8372ad54 (+  48) 8006fd97   <kernel_x86> context_switch(thread*: 0x844e8d10, thread*: 0x844e34f0) + 0x003f
 1 8372ad84 (+  96) 800700b1   <kernel_x86> simple_reschedule() + 0x02d5
 2 8372ade4 (+  48) 80068de9   <kernel_x86>:thread_block_with_timeout_locked + 0x00e1
 3 8372ae14 (+  80) 80046516   <kernel_x86> ConditionVariableEntry<0x8372aea8>::Wait(uint32: 0x11 (17), int64: 9223372036854775807) + 0x013a
 4 8372ae64 (+  96) 800572aa   <kernel_x86>:_get_port_message_info_etc + 0x0182
 5 8372aec4 (+  80) 80057119   <kernel_x86>:port_buffer_size_etc + 0x0025
 6 8372af14 (+  48) 80058370   <kernel_x86>:_user_port_buffer_size_etc + 0x009c
 7 8372af44 (+ 100) 80108fa2   <kernel_x86>:handle_syscall + 0x00af
user iframe at 0x8372afa8 (end = 0x8372b000)
 eax 0xca           ebx 0x5e930c        ecx 0x70102e80   edx 0xffff0114
 esi 0xffffffff     edi 0x7fffffff      ebp 0x70102eac   esp 0x8372afdc
 eip 0xffff0114  eflags 0x216      user esp 0x70102e80
 vector: 0x63, error code: 0x0
 8 8372afa8 (+   0) ffff0114   <commpage>:commpage_syscall + 0x0004
 9 70102eac (+  48) 00300af2   <libbe.so> BLooper<0x18035050>::ReadRawFromPort(0x70102f08, int64: 9223372036854775807) + 0x002e
10 70102edc (+  48) 00300b7e   <libbe.so> BLooper<0x18035050>::ReadMessageFromPort(int64: 9223372036854775807) + 0x002a
11 70102f0c (+  48) 0030024f   <libbe.so> BLooper<0x18035050>::MessageFromPort(int64: 9223372036854775807) + 0x0027
12 70102f3c (+  64) 00300d17   <libbe.so> BLooper<0x18035050>::task_looper(0x0) + 0x0073
13 70102f7c (+  48) 00300a9b   <libbe.so> BLooper<0x18035050>::_task0_(NULL) + 0x003f
14 70102fac (+  48) 00542ab2   <libroot.so>:_get_next_team_info (nearest) + 0x0072
15 70102fdc (+   0) 70102fec   1851:PathMonitor looper_87_stack@0x700c3000 + 0x3ffec

comment:13 Changed 9 years ago by stippi

Thanks a lot, so it's indeed the input_server dead-locking, and you provided just the info needed! I don't know when I'll have time to fix this, but I'll try to do so ASAP.

comment:14 in reply to:  12 Changed 9 years ago by bonefish

Replying to idefix:

Is there a way to enter KDL via the serial port?

No. In such a case one can add something like

(sleep 10; kernel_debugger) &`

at a strategic place in the boot script, though.

comment:15 Changed 9 years ago by idefix

That's a much more civilized way of getting into KDL.

Using this technique, I could get the stack-traces when the system was completely frozen. I've attached them for completeness' sake: input_server_st.txt, media_addon_server_st.txt

Changed 9 years ago by idefix

Attachment: input_server_st.txt added

The stack traces of input_server.

Changed 9 years ago by idefix

Attachment: media_addon_server_st.txt added

The stack traces of media_addon_server.

comment:16 Changed 9 years ago by stippi

Owner: changed from axeld to stippi
Status: newin-progress

comment:17 Changed 9 years ago by Anarchos

Maybe it is related to my irq5 assigned to a lot of devices : http://imagebin.org/126227

comment:18 Changed 9 years ago by stippi

No, no. I am pretty sure I've understood the problem and am going to commit a patch which hopefully fixes the problem after I've done some testing. Incidentally it also fixes some other potential concurrency problems in add-on handling in the input_server.

comment:19 Changed 9 years ago by stippi

Resolution: fixed
Status: in-progressclosed

Problem should be fixed in hrev39741 and hrev39742. Please reopen if not.

comment:20 Changed 9 years ago by stippi

Blocking: 6819 added

(In #6819) I am marking this as duplicate. If hrev39741 and hrev39742 don't fix the problem, please re-open the ticket. The "dodgyness" you were observing is actually expected by the nature of the bug that was fixed. The lock-up happened depending on the timing with which input devices appeared in the system during the input_server initialization phase.

comment:21 Changed 9 years ago by Anarchos

It worked for me.
Thanks a lot to idefix for spotting the faulting changeset, and stippi for this so fast and efficient solution !! I could not imagine to get back haiku booting on hard drive and usb key within this week end :)

Really, we have experienced and first class developpers in Haiku.

Note: See TracTickets for help on using tickets.