#4500 closed bug (fixed)
Haiku can't boot on XenServer 5.5
Reported by: | jackburton | Owned by: | axeld |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | System | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #4574 | |
Platform: | All |
Description (last modified by )
Subject says all. Boot hangs very early in the boot sequency. With APIC Timers disabled, it goes further but still hangs before lighting up icons.
Specification of the emulated guest: 256MB Ram 1 Virtual Processor 5GB hard drive
Attachments (8)
Change History (35)
comment:1 by , 15 years ago
Description: | modified (diff) |
---|
by , 15 years ago
Attachment: | bootlog_extra.png added |
---|
comment:2 by , 15 years ago
Disabling the APIC timers (possible since hrev33132) makes the boot sequence proceed further. I'll attach more debug info.
comment:4 by , 15 years ago
Also interesting: if I configure the guest machine to have 2 virtual processors, the boot hangs just before showing "INIT: main: done... begin idle loop on cpu 0", instead of after.
comment:5 by , 15 years ago
Description: | modified (diff) |
---|
comment:6 by , 15 years ago
I also tried switching to the IDE stack and removing most unneeded drivers/bus_managers like usb, network, sound, etc. but nothing changed.
comment:7 by , 15 years ago
Summary: | Haiku Alpha 1 can't boot on XenServer 5.5 → Haiku can't boot on XenServer 5.5 |
---|
More info: Removing the usb_disk from the FloppyImage (used for booting) makes the boot process go even further. Now it hangs later.
follow-up: 9 comment:8 by , 15 years ago
It's pretty obvious: you don't get any interrupts, or you don't get interrupts from certain device. That causes APIC timers to fail and, if timer interrupts are generally missing, the scheduler not to work. It also explains that USB hangs. Please check the ints command and try to look into the PIC setup during boot for the problem.
comment:9 by , 15 years ago
Replying to mmlr:
It's pretty obvious: you don't get any interrupts, or you don't get interrupts from certain device. That causes APIC timers to fail and, if timer interrupts are generally missing, the scheduler not to work. It also explains that USB hangs. Please check the ints command and try to look into the PIC setup during boot for the problem.
I had a look into the PIC setup but nothing seems suspicious there. I also double checked with other OSes code. Moreover, as shown by the ints.png screenshot, looks like interrupts are not (completely) missing. But the fact that the disk controller has "0" handled interrupts seems definitely related to the problem.
by , 15 years ago
comment:10 by , 15 years ago
As it's shown in the "hang.png" attachment, it hangs on a snooze() call (ATAChannel.cpp, line 407, "_FlushAndWait(150 *1000)).
Any idea ? Marcus, could this be an ATA problem ?
comment:11 by , 15 years ago
It's most likely that you don't get (all) timer interrupts, therefore the reschedule events won't happen. So the idle thread, once it is running, won't ever get interrupted. You can verify that by checking the ints command and note down the numbers, continue from there and recheck after a bit of time has passed. If the timer interrupts aren't counted up then that's your problem. From the rest of what you report it really looks that way in any case.
follow-up: 14 comment:12 by , 15 years ago
I found the problem: the PIT isn't set to one shot mode, in pit_set_hardware_timer(), line 58. The correct line would read:
out8(0x38, 0x43);
Though I can't understand how it could work anywhere else until now, and how this would affect booting while not using the PIT timer (and using the LAPIC instead).
With this change, Haiku boots to the initial alert where you choose to launch the installer or the desktop. Mouse doesn't seem to work correctly, though. But this should be a different bug.
BTW, I also tested on virtualBox and Haiku continues to work correctly. I'll also test on real hardware (this evening) before committing the change, though. Unless someone else do that before.
comment:14 by , 15 years ago
Replying to jackburton:
I found the problem: the PIT isn't set to one shot mode, in pit_set_hardware_timer(), line 58. The correct line would read:
out8(0x38, 0x43);
I obviously misread the PIT documentation. It's already set to one-shot mode (0x30). Back to the drawing board...
comment:15 by , 15 years ago
At this point I'd not exclude a bug in Xen/XenServer. Since most OSs out there use the PIT in periodic mode, this could've passed unobserved. Setting the PIT in mode1, which is (almost) functionally equivalent to mode0 which we are using, result in a working system.
comment:17 by , 15 years ago
Some updates: I fixed a couple of problems in the HPET timer code. Using that timer (which hangs on real hardware, so it's disabled by default: to enable it, you have to recompile haiku after modifiying line 43 of the file src/system/kernel/arch/x86/x86_hpet.c increasing the priority from 0 to 3), XenServer boots correctly, if I select fail-safe video mode, and a depth of 16bbp.
Mouse doesn't work (ticket #4569) and network doesn't work either (but it should, since it emulates a realtek 8139).
comment:18 by , 13 years ago
Can you recheck this with a recent Haiku build? It may have been fixed recently.
comment:19 by , 13 years ago
Blocking: | 7665 added |
---|
comment:20 by , 13 years ago
Nope. Testing (and hacking) periodically, but as today the problem persists.
follow-up: 22 comment:21 by , 13 years ago
The latest attachment shows some more info with latest builds. The panic is caused programmatically by me after 1000 reschedules of the scheduler (otherwise I can't enter KDL via the keyboard). I noticed that the only thread being scheduled continuously is the object cache resizer thread.
The screenshot shows also the ints command, and interrupts are actually being delivered.
As I said before, usig HPET as timer results in a working system. Anyone has an idea of what is happening ?
follow-up: 23 comment:22 by , 13 years ago
Replying to jackburton:
The screenshot shows also the ints command, and interrupts are actually being delivered.
As I said before, usig HPET as timer results in a working system. Anyone has an idea of what is happening ?
As timer interrupts are actually received, I don't really see how switching to HPET would change anything. So it's probably a side effect of enabling HPET that gets things going.
I could imagine a PIT interrupt storm (or actually a HPET one) or similar taking place. You could try enabling all interrupts so you would see what unhandled interrupts exist. You could do this by removing line 766 in browser:haiku/trunk/src/system/kernel/arch/x86/ioapic.cpp so that all of them are enabled on the IO-APIC level (you could also extend the loop to go from 0 to 255). Unhandled interrupts should then be visible in the ints output as lines without interrupt handler.
follow-up: 24 comment:23 by , 13 years ago
Replying to mmlr:
I could imagine a PIT interrupt storm (or actually a HPET one) or similar taking place. You could try enabling all interrupts so you would see what unhandled interrupts exist. You could do this by removing line 766 in browser:haiku/trunk/src/system/kernel/arch/x86/ioapic.cpp so that all of them are enabled on the IO-APIC level (you could also extend the loop to go from 0 to 255). Unhandled interrupts should then be visible in the ints output as lines without interrupt handler.
I commented the line in question, so that the loop looks like this:
// enable previsouly enabled legacy interrupts for (uint8 i = 0; i < 255; i++) { //if ((legacyInterrupts & (1 << i)) != 0) ioapic_enable_io_interrupt(i); }
But nothing changed. The "ints" command still shows only three interrupts installed. I also noticed this: the interrupt override list also shows interrupts 5, 10 and 11, but they aren't present in the "ints" command output.
by , 13 years ago
Attachment: | ioapic.png added |
---|
comment:24 by , 13 years ago
Replying to jackburton:
I commented the line in question, so that the loop looks like this: ... But nothing changed. The "ints" command still shows only three interrupts installed.
Then it looks like it isn't an interrupt storm, or at least not one of the routed interrupts. It could still be one in a local interrupt source like the APIC timer, thermal monitor or similar that we don't handle. That'd be rather obscure though, so more on the unlikely side.
I also noticed this: the interrupt override list also shows interrupts 5, 10 and 11, but they aren't present in the "ints" command output.
That's because they don't need a redirection entry as their IRQ is the same as the GSI. Only the trigger mode and polarity is programmed for the pins where necessary.
comment:25 by , 12 years ago
It's working now (in XenServer 6.0). I think it's either been fixed in XenServer or (more probably) fixed with the AMD processors power management bugfix ?
comment:26 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:27 by , 6 years ago
Blocking: | 7665 removed |
---|
with extra debug