Opened 8 years ago

Closed 8 years ago

#7498 closed bug (fixed)

[IO-APIC] KDL when booting

Reported by: luroh Owned by: mmlr
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev41406, gcc4 (on an internal SATA hard disk) and gcc2 (on a USB stick).

Unfortunately too early in the boot process to generate any syslog and there's no serial port present on the box.

Attaching picture of KDL + ints and a syslog without IO-APIC enabled.

Attachments (15)

kdl_r41406.jpg (188.1 KB) - added by luroh 8 years ago.
syslog_r41406_without_IO-APIC.txt (84.5 KB) - added by luroh 8 years ago.
syslog_r41406_last_page.jpg (248.6 KB) - added by luroh 8 years ago.
syslog_r41406_last_page_with_TRACE.jpg (243.4 KB) - added by luroh 8 years ago.
syslog_r41417_without_TRACE.txt (84.7 KB) - added by luroh 8 years ago.
lshw.txt (20.6 KB) - added by luroh 8 years ago.
kdl_r41446.jpg (310.3 KB) - added by luroh 8 years ago.
syslog_r41446_(10_of_16).jpg (267.9 KB) - added by luroh 8 years ago.
syslog_r41446_(11_of_16).jpg (261.5 KB) - added by luroh 8 years ago.
syslog_r41446_(12_of_16).jpg (270.4 KB) - added by luroh 8 years ago.
syslog_r41446_(13_of_16).jpg (260.6 KB) - added by luroh 8 years ago.
syslog_r41446_(14_of_16).jpg (195.9 KB) - added by luroh 8 years ago.
syslog_r41446_(15_of_16).jpg (236.9 KB) - added by luroh 8 years ago.
syslog_r41446_(16_of_16).jpg (181.8 KB) - added by luroh 8 years ago.
syslog_r41512_with_TRACE.txt (118.2 KB) - added by luroh 8 years ago.

Change History (45)

Changed 8 years ago by luroh

Attachment: kdl_r41406.jpg added

Changed 8 years ago by luroh

comment:1 Changed 8 years ago by mmlr

Can you please run the "syslog" command and check how many entries the IO-APIC has? It should be in the last couple of lines. Ideally can you take a photo of those lines?

comment:2 Changed 8 years ago by luroh

Not quite sure what I'm looking for, attaching photo of the last page of output from the "syslog" command.

Changed 8 years ago by luroh

Attachment: syslog_r41406_last_page.jpg added

comment:3 Changed 8 years ago by mmlr

Yeah, sorry about that, it's a debug TRACE only... Can you enable TRACE_ARCH_INT in browser:haiku/trunk/src/system/kernel/arch/x86/arch_int.cpp and TRACE_PRT in browser:haiku/trunk/src/system/kernel/arch/x86/irq_routing_table.cpp and get that output instead?

comment:4 Changed 8 years ago by luroh

Indeed, that makes the syslog look more interesting, attaching the last page.

Changed 8 years ago by luroh

comment:5 Changed 8 years ago by mmlr

Please retest with hrev41416. If it was caused by an unaddressable IRQ it should now report that and fall back to not using the IO-APIC.

comment:6 Changed 8 years ago by luroh

Fixed, thanks. Tested with hrev41417, syslog attached (without the debug TRACEs, else it wouldn't build).

Changed 8 years ago by luroh

comment:7 Changed 8 years ago by mmlr

Interesting. What kind of machine is that? It obviously has multiple IO-APICs, which is usually only really common on servers.

comment:8 Changed 8 years ago by luroh

It's an old-ish desktop machine with an ASUS A8V-E Deluxe motherboard, single-core AMD Athlon 64 4000+, GForce 7950 PCIe card, Audigy2 PCI card. Perhaps it's a VIA chipset thing? Output of lshw attached.

Changed 8 years ago by luroh

Attachment: lshw.txt added

comment:9 Changed 8 years ago by mmlr

Can you please check the outcome with a current revision? I've implemented support for multiple IO-APICs, so your system should now be able to utilize them as well. Output of the resulting interrupt routing would be interesting.

comment:10 Changed 8 years ago by luroh

hrev41443 no longer boots with just the IO-APIC option enabled. It hangs at the fourth (disk) icon, accepts no keyboard input and needs a hard reset. To make matters slightly worse, just by enabling on-screen debug output, the hanging problem is no longer present and the machine boots fine with IO-APIC enabled. Quantum bug. *grmbl*.

comment:11 in reply to:  10 Changed 8 years ago by bonefish

Replying to luroh:

hrev41443 no longer boots with just the IO-APIC option enabled. It hangs at the fourth (disk) icon, accepts no keyboard input and needs a hard reset.

Unless that hard reset is done by turning the machine off and on again, you should still be able to get the syslog from within the boot loader menu.

comment:12 Changed 8 years ago by luroh

Awesome, but how would I go about accessing this syslog after having pressed the reset button? I see no obvious entry in the boot menu to display any in-memory syslog. Does it mean it got nuked when resetting the computer? I'm not using the shutdown/power button.

Last edited 8 years ago by luroh (previous) (diff)

comment:13 Changed 8 years ago by anevilyak

You would have to have booted with the "Enable debug syslog" option turned on. On reboot the loader will sense its presence and provide an additional menu item to save it to a usb stick.

comment:14 Changed 8 years ago by luroh

Yes, the "Enable debug syslog" option is enabled. I guess it either gets nuked when resetting or it's still too early in the boot process for it to exist. For future reference though, what file system should a USB stick contain in order to be able to save the in-memory syslog?

comment:15 in reply to:  14 Changed 8 years ago by anevilyak

Replying to luroh:

For future reference though, what file system should a USB stick contain in order to be able to save the in-memory syslog?

FAT32.

comment:16 Changed 8 years ago by luroh

Thanks anevilyak. One more observation in case it helps, enabling serial debug output also makes the hanging go away, letting the computer boot with IO-APIC enabled.

comment:17 Changed 8 years ago by bonefish

Replying to luroh:

Awesome, but how would I go about accessing this syslog after having pressed the reset button? I see no obvious entry in the boot menu to display any in-memory syslog.

There should be items in the "Select debug options" submenu to display and save the syslog.

Does it mean it got nuked when resetting the computer? I'm not using the shutdown/power button.

The in-memory syslog buffer is located in a somewhat higher memory range which normally shouldn't be overwritten by the BIOS or boot manager software (though I don't know how far e.g. grub can be trusted in this respect -- it did work with the grub version(s) I used as I introduced the feature). I did a quick test with a relatively recent revision and it seems the feature is simply broken ATM. :-/ Sorry.

comment:18 in reply to:  17 ; Changed 8 years ago by bonefish

Replying to bonefish:

I did a quick test with a relatively recent revision and it seems the feature is simply broken ATM. :-/ Sorry.

Should work again with hrev41446.

comment:19 in reply to:  18 ; Changed 8 years ago by luroh

Replying to bonefish:

Should work again with hrev41446.

Sorry, no change with hrev41446 here. Using grub version "1.98+20100804-5ubuntu3"

Last edited 8 years ago by luroh (previous) (diff)

comment:20 in reply to:  19 ; Changed 8 years ago by bonefish

Replying to luroh:

Replying to bonefish:

Should work again with hrev41446.

Sorry, no change with hrev41446 here. Using grub version "1.98+20100804-5ubuntu3"

I have only tested with qemu, but at least there it works again. If it's an address issue in your case and you feel adventurous, you could try different base addresses for the syslog memory buffer. The address is hard-coded in src/system/boot/platform/bios_ia32/debug.cpp. Only the boot loader (haiku_loader) needs to be rebuilt and updated.

comment:21 in reply to:  20 Changed 8 years ago by luroh

Replying to bonefish:

I have only tested with qemu, but at least there it works again.

Yes, I can see it working in VMware as well, thanks a lot. I'll try booting from a USB stick first to see if that can help circumventing the problem.

comment:22 Changed 8 years ago by luroh

Some progress. Booting from a USB stick doesn't let me save the syslog either, but instead of freezing at the fourth icon with IO-APIC enabled, it KDLs! Attaching photo of the panic + the last seven interesting looking screens of syslog output (pardon the spammage).
The machine boots fine from the same USB stick without IO-APIC enabled.

Changed 8 years ago by luroh

Attachment: kdl_r41446.jpg added

Changed 8 years ago by luroh

Changed 8 years ago by luroh

Changed 8 years ago by luroh

Changed 8 years ago by luroh

Changed 8 years ago by luroh

Changed 8 years ago by luroh

Changed 8 years ago by luroh

comment:23 Changed 8 years ago by anevilyak

Something looks very wrong while retrieving the routing table... the addresses should not all be 0xffff like that.

comment:24 in reply to:  23 Changed 8 years ago by mmlr

Replying to anevilyak:

Something looks very wrong while retrieving the routing table... the addresses should not all be 0xffff like that.

Not really. The address field in this case denotes a bridge relative PCI device:function address. The 0xffff is a wildcard meaning "all functions". This is correct (and actually mandatory) because the devices aren't resolved by their function number but by their interrupt pin (multiple functions can match a single _PRT entry because functions can share interrupt pins). A single 0xffff just means device 0. Since there are many bridges involved in that setup (PCIe ports included) there are so many of those device 0 entries. The output looks a bit blown out of proportion of course, but that's just because of the enabled debug output.

The resulting routing table that is actually put to use looks fine. So I'm starting to suspect some logic error on my side while implementing mutli IO-APIC support, as so far 2 of 2 such systems have failed like this where all the single IO-APIC configurations I was able to test worked just fine.

The KDL isn't really surprising BTW. Since you're booting off a USB device, the inability to receive interrupts for USB resulting in addressing errors, makes the boot drive unavailable. While you'd get the same result (i.e. USB addressing failing) it would look different when booting from internal disks (and that is mostly just due to our ATA drivers really not giving up, recovering the lost interrupts, even though it's of course pretty hopeless in such a situation).

comment:25 Changed 8 years ago by mmlr

Can you please retest with hrev41451 or newer? The wrong vector was assigned for entries in the second IO-APIC, so if the USB controllers are linked to that they wouldn't have worked.

comment:26 Changed 8 years ago by luroh

No noticeable change with hrev41463, I'm afraid. Booting from internal HDD still freezes at the fourth icon with IO-APIC enabled and booting from USB stick still KDLs.

comment:27 Changed 8 years ago by mmlr

I'm really hopeful that hrev41476 fixes this. Please retest.

comment:28 Changed 8 years ago by luroh

Will check and revert on Sunday evening (GMT).

comment:29 Changed 8 years ago by luroh

hrev41512 works, both HDD and USB stick tested. Syslog attached. Nice work, booting Haiku from a USB stick has never fully worked on this machine before.

Changed 8 years ago by luroh

comment:30 in reply to:  29 Changed 8 years ago by mmlr

Resolution: fixed
Status: newclosed

Replying to luroh:

hrev41512 works, both HDD and USB stick tested. Syslog attached. Nice work, booting Haiku from a USB stick has never fully worked on this machine before.

Cool, very nice. Thanks a lot for testing and report back! Closing as fixed.

Note: See TracTickets for help on using tickets.