Opened 10 years ago

Closed 7 years ago

Last modified 17 months ago

#4500 closed bug (fixed)

Haiku can't boot on XenServer 5.5

Reported by: jackburton Owned by: axeld
Priority: normal Milestone: R1
Component: System Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #4574
Has a Patch: no Platform: All

Description (last modified by jackburton)

Subject says all. Boot hangs very early in the boot sequency. With APIC Timers disabled, it goes further but still hangs before lighting up icons.

Specification of the emulated guest: 256MB Ram 1 Virtual Processor 5GB hard drive

Attachments (8)

bootlog_extra.png (15.3 KB ) - added by jackburton 10 years ago.
with extra debug
boot_hang.PNG (21.6 KB ) - added by jackburton 10 years ago.
Hang while adding preloaded drivers
no_usb_disk.PNG (23.1 KB ) - added by jackburton 10 years ago.
Without usb_disk
cpuinfo.txt (472 bytes ) - added by jackburton 10 years ago.
cat /proc/cpuinfo of the XenServer host
hang.png (4.3 KB ) - added by jackburton 10 years ago.
main2_backtrace.PNG (49.4 KB ) - added by jackburton 10 years ago.
Backtrace of main2
info.png (77.8 KB ) - added by jackburton 8 years ago.
more info
ioapic.png (51.5 KB ) - added by jackburton 8 years ago.

Download all attachments as: .zip

Change History (35)

comment:1 by jackburton, 10 years ago

Description: modified (diff)

by jackburton, 10 years ago

Attachment: bootlog_extra.png added

with extra debug

comment:2 by jackburton, 10 years ago

Disabling the APIC timers (possible since hrev33132) makes the boot sequence proceed further. I'll attach more debug info.

comment:3 by jackburton, 10 years ago

Now hangs at "INIT: main: done... begin idle loop on cpu 0"

comment:4 by jackburton, 10 years ago

Also interesting: if I configure the guest machine to have 2 virtual processors, the boot hangs just before showing "INIT: main: done... begin idle loop on cpu 0", instead of after.

comment:5 by jackburton, 10 years ago

Description: modified (diff)

by jackburton, 10 years ago

Attachment: boot_hang.PNG added

Hang while adding preloaded drivers

comment:6 by jackburton, 10 years ago

I also tried switching to the IDE stack and removing most unneeded drivers/bus_managers like usb, network, sound, etc. but nothing changed.

comment:7 by jackburton, 10 years ago

Summary: Haiku Alpha 1 can't boot on XenServer 5.5Haiku can't boot on XenServer 5.5

More info: Removing the usb_disk from the FloppyImage (used for booting) makes the boot process go even further. Now it hangs later.

by jackburton, 10 years ago

Attachment: no_usb_disk.PNG added

Without usb_disk

comment:8 by mmlr, 10 years ago

It's pretty obvious: you don't get any interrupts, or you don't get interrupts from certain device. That causes APIC timers to fail and, if timer interrupts are generally missing, the scheduler not to work. It also explains that USB hangs. Please check the ints command and try to look into the PIC setup during boot for the problem.

by jackburton, 10 years ago

Attachment: cpuinfo.txt added

cat /proc/cpuinfo of the XenServer host

in reply to:  8 comment:9 by jackburton, 10 years ago

Replying to mmlr:

It's pretty obvious: you don't get any interrupts, or you don't get interrupts from certain device. That causes APIC timers to fail and, if timer interrupts are generally missing, the scheduler not to work. It also explains that USB hangs. Please check the ints command and try to look into the PIC setup during boot for the problem.

I had a look into the PIC setup but nothing seems suspicious there. I also double checked with other OSes code. Moreover, as shown by the ints.png screenshot, looks like interrupts are not (completely) missing. But the fact that the disk controller has "0" handled interrupts seems definitely related to the problem.

by jackburton, 10 years ago

Attachment: hang.png added

comment:10 by jackburton, 10 years ago

As it's shown in the "hang.png" attachment, it hangs on a snooze() call (ATAChannel.cpp, line 407, "_FlushAndWait(150 *1000)).

Any idea ? Marcus, could this be an ATA problem ?

by jackburton, 10 years ago

Attachment: main2_backtrace.PNG added

Backtrace of main2

comment:11 by mmlr, 10 years ago

It's most likely that you don't get (all) timer interrupts, therefore the reschedule events won't happen. So the idle thread, once it is running, won't ever get interrupted. You can verify that by checking the ints command and note down the numbers, continue from there and recheck after a bit of time has passed. If the timer interrupts aren't counted up then that's your problem. From the rest of what you report it really looks that way in any case.

comment:12 by jackburton, 10 years ago

I found the problem: the PIT isn't set to one shot mode, in pit_set_hardware_timer(), line 58. The correct line would read:

out8(0x38, 0x43);

Though I can't understand how it could work anywhere else until now, and how this would affect booting while not using the PIT timer (and using the LAPIC instead).

With this change, Haiku boots to the initial alert where you choose to launch the installer or the desktop. Mouse doesn't seem to work correctly, though. But this should be a different bug.

BTW, I also tested on virtualBox and Haiku continues to work correctly. I'll also test on real hardware (this evening) before committing the change, though. Unless someone else do that before.

comment:13 by jackburton, 10 years ago

Blocking: 4574 added

(In #4574) Indeed, it's a dup of #4500

in reply to:  12 comment:14 by jackburton, 10 years ago

Replying to jackburton:

I found the problem: the PIT isn't set to one shot mode, in pit_set_hardware_timer(), line 58. The correct line would read:

out8(0x38, 0x43);

I obviously misread the PIT documentation. It's already set to one-shot mode (0x30). Back to the drawing board...

comment:15 by jackburton, 10 years ago

At this point I'd not exclude a bug in Xen/XenServer. Since most OSs out there use the PIT in periodic mode, this could've passed unobserved. Setting the PIT in mode1, which is (almost) functionally equivalent to mode0 which we are using, result in a working system.

comment:16 by jackburton, 10 years ago

I filed a bug in Xen about this. ticket n. 1513.

comment:17 by jackburton, 10 years ago

Some updates: I fixed a couple of problems in the HPET timer code. Using that timer (which hangs on real hardware, so it's disabled by default: to enable it, you have to recompile haiku after modifiying line 43 of the file src/system/kernel/arch/x86/x86_hpet.c increasing the priority from 0 to 3), XenServer boots correctly, if I select fail-safe video mode, and a depth of 16bbp.

Mouse doesn't work (ticket #4569) and network doesn't work either (but it should, since it emulates a realtek 8139).

comment:18 by scottmc, 8 years ago

Can you recheck this with a recent Haiku build? It may have been fixed recently.

comment:19 by scottmc, 8 years ago

Blocking: 7665 added

comment:20 by jackburton, 8 years ago

Nope. Testing (and hacking) periodically, but as today the problem persists.

by jackburton, 8 years ago

Attachment: info.png added

more info

comment:21 by jackburton, 8 years ago

The latest attachment shows some more info with latest builds. The panic is caused programmatically by me after 1000 reschedules of the scheduler (otherwise I can't enter KDL via the keyboard). I noticed that the only thread being scheduled continuously is the object cache resizer thread.

The screenshot shows also the ints command, and interrupts are actually being delivered.

As I said before, usig HPET as timer results in a working system. Anyone has an idea of what is happening ?

Last edited 8 years ago by jackburton (previous) (diff)

in reply to:  21 ; comment:22 by mmlr, 8 years ago

Replying to jackburton:

The screenshot shows also the ints command, and interrupts are actually being delivered.

As I said before, usig HPET as timer results in a working system. Anyone has an idea of what is happening ?

As timer interrupts are actually received, I don't really see how switching to HPET would change anything. So it's probably a side effect of enabling HPET that gets things going.

I could imagine a PIT interrupt storm (or actually a HPET one) or similar taking place. You could try enabling all interrupts so you would see what unhandled interrupts exist. You could do this by removing line 766 in browser:haiku/trunk/src/system/kernel/arch/x86/ioapic.cpp so that all of them are enabled on the IO-APIC level (you could also extend the loop to go from 0 to 255). Unhandled interrupts should then be visible in the ints output as lines without interrupt handler.

in reply to:  22 ; comment:23 by jackburton, 8 years ago

Replying to mmlr:

I could imagine a PIT interrupt storm (or actually a HPET one) or similar taking place. You could try enabling all interrupts so you would see what unhandled interrupts exist. You could do this by removing line 766 in browser:haiku/trunk/src/system/kernel/arch/x86/ioapic.cpp so that all of them are enabled on the IO-APIC level (you could also extend the loop to go from 0 to 255). Unhandled interrupts should then be visible in the ints output as lines without interrupt handler.

I commented the line in question, so that the loop looks like this:

        // enable previsouly enabled legacy interrupts
	for (uint8 i = 0; i < 255; i++) {
		//if ((legacyInterrupts & (1 << i)) != 0)
			ioapic_enable_io_interrupt(i);
	}

But nothing changed. The "ints" command still shows only three interrupts installed. I also noticed this: the interrupt override list also shows interrupts 5, 10 and 11, but they aren't present in the "ints" command output.

by jackburton, 8 years ago

Attachment: ioapic.png added

in reply to:  23 comment:24 by mmlr, 8 years ago

Replying to jackburton:

I commented the line in question, so that the loop looks like this: ... But nothing changed. The "ints" command still shows only three interrupts installed.

Then it looks like it isn't an interrupt storm, or at least not one of the routed interrupts. It could still be one in a local interrupt source like the APIC timer, thermal monitor or similar that we don't handle. That'd be rather obscure though, so more on the unlikely side.

I also noticed this: the interrupt override list also shows interrupts 5, 10 and 11, but they aren't present in the "ints" command output.

That's because they don't need a redirection entry as their IRQ is the same as the GSI. Only the trigger mode and polarity is programmed for the pins where necessary.

comment:25 by jackburton, 7 years ago

It's working now (in XenServer 6.0). I think it's either been fixed in XenServer or (more probably) fixed with the AMD processors power management bugfix ?

comment:26 by jackburton, 7 years ago

Resolution: fixed
Status: newclosed

comment:27 by waddlesplash, 17 months ago

Blocking: 7665 removed
Note: See TracTickets for help on using tickets.