Opened 19 years ago

Closed 11 years ago

#5 closed bug (fixed)

PCI bus_manager does not assign IRQs

Reported by: marcusoverhagen Owned by: mmlr
Priority: high Milestone: R1/beta1
Component: System/Kernel Version: R1/Development
Keywords: Cc: diver, siarzhuk, fredrik.holmqvist@…, planche2k@…, gsf747@…, admiral0@…, umccullough, johnpaul.harold@…
Blocked By: Blocking: #4510
Platform: All

Description

The PCI bus_manager does no IRQ line assignment

Attachments (1)

acpi_routing_tables.patch (4.2 KB ) - added by tqh 14 years ago.
Current WIP, arch_int 682 for-loop always sets irq -1…

Download all attachments as: .zip

Change History (51)

comment:1 by marcusoverhagen, 18 years ago

Status: newassigned

comment:2 by diver, 18 years ago

Cc: diver added

comment:3 by siarzhuk, 16 years ago

Cc: siarzhuk added
Platform: All

comment:4 by mmlr, 16 years ago

Blocking: 2401 added

(In #2401) In fact this is a duplicate of bug #5. From the syslog it tells that a UHCI and an EHCI controller were found, but that both could not be used because they have an invalid IRQ assignment (or none at all probably). As this should automatically resolve as soon as bug/enhancement #5 is done, it shouldn't really be necessary to keep this bug open.

comment:5 by mmlr, 16 years ago

Blocking: 2620 added

(In #2620) The cause can be seen in the first screenshot at the top. Both the UHCI and EHCI controllers present have invalid IRQ assignments, which means that the BIOS didn't initialize interrupt routing for them. That means, this is in fact a duplicate of bug #5. If it works when you have a IDE disk connected this probably causes the BIOS to do some additional setup that it doesn't do when that's not the case. Maybe there are some USB legacy settings in the BIOS that you can try changing, otherwise there's not much you can currently do.

comment:6 by anevilyak, 15 years ago

Blocking: 3313 added

comment:7 by tqh, 15 years ago

Cc: fredrik.holmqvist@… added
Version: R1 development

comment:8 by tqh, 15 years ago

Apparently trac decided to change version when I added myself to CC. I don't know what it is supposed to be so leaving as is.

comment:9 by axeld, 14 years ago

Blocking: 4568 added

(In #4568) No update, assuming diagnosis is correct.

comment:10 by mmadia, 14 years ago

Blocking: 5571 added

(In #5571) From my understanding, the FreeBSD wlan drivers are expecting IRQ Routing.

comment:11 by mmadia, 14 years ago

Blocking: 5511 added

(In #5511) Replying to colin:

... the right way to do it, would be to implement irq routing by using ACPI in Haiku. As far as I know someone is working on that already.

Setting #5 as a blocker.

comment:12 by mmadia, 14 years ago

Blocking: 5372 added

(In #5372) Replying to kallisti5:

<@mmlr_mc> in freebsd they take care to not share interrupts, so their drivers usually don't behave well when we doing it anyway

I'm guessing this can be blocked by #5 as well?

comment:13 by jackburton, 14 years ago

Blocking: 5511 removed

(In #5511) Fixed for me as well.

comment:14 by andreasf, 14 years ago

Cc: planche2k@… added

comment:15 by anevilyak, 14 years ago

Blocking: 5751 added

comment:16 by tangobravo, 14 years ago

Version: R1/pre-alpha1R1/Development

To me this seems one of the biggest blockers holding up further Haiku releases. Is anyone working on it, or planning to? It's assigned to Marcus, but I'm not sure he has the time to really look into it. Tqh has done lots of the work on ACPI recently, but I think I read somewhere he thought he lacked the necessary knowledge to tackle this one (correct me if I'm wrong there). Given many people's hardware failures are eventually blamed on this, I'm sure a decent bounty could be raised if someone with the necessary skills could name their price! :)

comment:17 by mmlr, 14 years ago

It's a question of whether it's still relevant to implement the legacy PCI configuration. On reasonably modern systems routing is done through the IO-APIC and configured through ACPI or MSIs are used which don't need the extra hardware at all anyway.

comment:18 by gsf, 14 years ago

Cc: gsf747@… added
patch: 0

Landed here via #2620. I would certainly contribute toward a bounty to see this fixed.

comment:19 by tqh, 14 years ago

Got some feedback on IRC with problem same as comment 4.

comment:20 by admiral0, 14 years ago

I'm having issue as in comment 4 as said by tqh. How does Linux handle this situation?

comment:21 by admiral0, 14 years ago

Cc: admiral0@… added

comment:22 by tqh, 14 years ago

If you want to hack on this I think ACPI does what it should do.
There is a faulty check in source:haiku/trunk/src/system/kernel/arch/x86/irq_routing_table.cpp#109 to 113 that can be removed. (If I'm not mistaken).
After that it did iterate through a lot of things.

comment:23 by tqh, 14 years ago

I'm not sure what source:haiku/trunk/src/system/kernel/arch/x86/irq_routing_table.cpp#167 - 179 should do though. To me it looks like it returns the status of the last iteration of the while-loop, can that really be ok?

comment:24 by tqh, 14 years ago

patch: 01

comment:26 by tqh, 14 years ago

Added a patch that reads all irq_routing_entries from ACPI. This makes qemu use ioapic it seems, but on real hardware I think things are missing, it seems int14 is missed and a root partition isn't found.

The problems which always caused it to fail before was several:

  • The check if buffer.pointer is a ACPI_PACKAGE always failed. It is not. We now trust the B_OK status completly.
  • We failed as soon as a ACPI device didn't have irq routing information. Not all of them have, now we read the routing entries from all devices and only fail if we find none.

Also since the pci_bus and pci_device was unused I removed that complexity. Might be needed, if not we can also remove the passing of pci_module I guess.

This is as far as I can go, when it comes to ioapic it's way over my head.

comment:27 by tqh, 14 years ago

Formatting edit:

The problems which always caused it to fail before was several:

  • The check if buffer.pointer is a ACPI_PACKAGE always failed. It is not. We now trust the B_OK status completly.
  • We failed as soon as a ACPI device didn't have irq routing information. Not all of them have, now we read the routing entries from all devices and only fail if we find none.

by tqh, 14 years ago

Attachment: acpi_routing_tables.patch added

Current WIP, arch_int 682 for-loop always sets irq -1...

comment:28 by tqh, 14 years ago

Owner: changed from marcusoverhagen to tqh

comment:29 by umccullough, 14 years ago

Cc: umccullough added

comment:30 by anevilyak, 13 years ago

Blocking: 6955 added

(In #6955) Those are IO-APIC IRQs. Adding ticket #5 as a blocker.

comment:31 by anevilyak, 13 years ago

Blocking: 4510 added

(In #4510) Your problem is most likely related to interrupt sharing. You have two ethernet controllers and a USB controller sharing IRQs, which is something most of the FreeBSD-based network drivers aren't really geared towards dealing with since they assume the OS has configured everything such that that situation doesn't happen. Another victim of #5.

comment:32 by korli, 13 years ago

Blocking: 6955 removed

(In #6955) Thanks for the feedback!

comment:33 by john-paul.harold, 13 years ago

Cc: johnpaul.harold@… added

comment:34 by anevilyak, 13 years ago

Blocking: 7127 added

(In #7127) Another victim of #5.

comment:35 by umccullough, 13 years ago

I pledge $25 via paypal to the individual that gets this task completed so that it may finally resolve my interrupt sharing issues on my Acer Aspire One netbook.

comment:36 by tqh, 13 years ago

I'm sorry that real life is so busy for me right now, so can't really promise anything.

comment:37 by anevilyak, 13 years ago

Blocking: 7452 added

(In #7452) Cool, that makes this a dupe of #5. Enjoy!

comment:38 by tqh, 13 years ago

Owner: changed from tqh to mmlr
Status: in-progressassigned

comment:39 by mmlr, 13 years ago

I've completed the work on implementing IO-APIC support (adding PCI interrupt routing, configuration and PCI config updates) between hrev41328 and hrev41402. With that, configurations that previously didn't assign legacy IRQs should now do so via ACPI. Please retest the affected configurations with an installation that includes those changes and enable the IO-APIC by checking the "Enable IO-APIC" entry in the bootloader safemode options. Note that using the IO-APIC currently defaults to off, so you really need to enter the bootloader menu and explicitly enable it.

What issues to look for when using the IO-APIC:

  • If routing isn't possible do to some unexpected event, the routing code will panic throwing you into KDL.
  • If just some interrupts aren't routed (because matching the interrupt routing table to PCI devices failed for example) you would notice that by the affected devices failing. For example USB not being able to address plugged in devices, audio cards not working or not being detected at all, network cards failing to receive data and so on.
  • If you happen to have a non-legacy disk controller where you boot from (AHCI mostly) it's also possible that the boot process would hang, possibly progressing very slowly with messages of "recovering lost interrupts" printed to debug output.
  • If some interrupts are routed to the wrong vectors, it is possible that the affected devices fail to work and that the syslog is filled with "more than 99% of interrupts on vector X are unhandled" messages. It is also possible that such systems appear to hang completely.
  • It is possible that, due to using ACPI for interrupt configuration but not setting the corresponding ACPI support bits, MSI (Message Signaled Interrupts) stop working. You would experience this as network cards failing that use a driver based on the FreeBSD compatibility layer that use the MSI capability.

Please report any of those issues (or ones I forgot about but are linked to enabling the IO-APIC) in new tickets. I will close duplicate tickets as duplicates if needed, so please open new tickets if you're not sure an already reported ticket matches your problem.

In most cases syslogs or serial debug output from a boot would be helpful to identify the problem. You can verify that you are running on IO-APIC by checking the syslog for the line "using ioapic for interrupt routing". Immediately before that line the used IRQ routing table is printed. Please include that, and ideally the whole part where the PCI configuration is printed, in reports.

You can check the resulting interrupt configuration by entering KDL and running the "ints" debugger command. It will print the list of installed interrupt handlers and the vectors they are assigned to. Interrupts 0-15 are generally called ISA interrupts, although PCI interrupts are mixed into those in legacy PIC mode, because there simply are only 16 interrupts available. Depending on the configuration of your system and the firmware it uses, the resulting interrupt vectors will change with enabling the IO-APIC. Usually vectors 16-19 are now used for the PCI interrupt lines, depending on system configuration this might as well be 20-23 or both. So if you see handlers installed on those numbers they are definitely using the IO-APIC. Vector starting from 24 and up are Message Signaled Interrupts (MSIs) that are used only by FreeBSD network drivers using the FreeBSD compatibility layer right now. The vectors above 200 are used for processor to processor communication, the APIC timer, spurious interrupts, SMP error interrupts and such and are triggered inside the Local APIC of the processors (so not using the IO-APIC).

Note that it is possible that even after enabling the IO-APIC parts or even all of the vectors in use by PCI devices still reside in the legacy ISA range of 0-15. This is firmware and configuration dependent and, as long as the devices actually work, not a bug.

Relevance to this ticket: Setting up the IO-APIC involves switching the interrupt model. This for one makes hardwired "PCI interrupt pin -> IO-APIC input" connections available. Additionally the setup includes the configuration of so called PCI interrupt link devices (where available), making also the "PCI interrupt pin -> interrupt link device -> IO-APIC input" connections available. This means that all devices that are physically routed in the system are assigned a proper IRQ and, since this information is made available via the PCI configuration space, drivers will be able to attach to these. Long story short: IO-APIC interrupt routing replaces the legacy configuration that this ticket is about. So unless additional configuration is missing (memory resources, ticket #3; IO port resources, ticket #4), your devices should now work.

in reply to:  35 comment:40 by umccullough, 13 years ago

Replying to umccullough:

I pledge $25 via paypal to the individual that gets this task completed so that it may finally resolve my interrupt sharing issues on my Acer Aspire One netbook.

My AA1 works pretty good now - audio plays without issues during the use of USB and ethernet. Wifi seems to be working pretty good now - when I disable the io-apic again, then it starts having problems.

Great work! Payment sent - I threw an extra $5 in for good measure ;)

comment:41 by humdinger, 13 years ago

I just chipped in a few euros myself. This was an important ticket to solve.

comment:42 by luroh, 13 years ago

Blocking: 5866 added

(In #5866) Sounds good, thanks for reporting back. Closing this one as fixed.

comment:43 by scottmc, 13 years ago

Milestone: R1R1/beta1

Is this one finished now?

comment:44 by scottmc, 13 years ago

Blocking: 5571 removed

comment:45 by scottmc, 13 years ago

Blocking: 5571 added

comment:46 by scottmc, 13 years ago

Blocking: 5571 removed

in reply to:  43 comment:47 by mmlr, 13 years ago

Replying to scottmc:

Is this one finished now?

Well. This ticket isn't really done (and probably won't ever get done) in the sense that the PCI bus_manager still doesn't assign interrupt lines (i.e. no PNPBIOS). However with ACPI PCI Interrupt Routing Table support and IO-APIC support being available now, the need for this feature isn't really there anymore and the symptoms it caused are gone as well for most moderately modern machines. As explained earlier, machines that don't provide support for ACPI PRTs and IO-APICs usually don't have this problem at all because either the BIOS initializes all devices in the first place or at least they provide the "PnP OS installed" BIOS option by which this can be enforced.

So I'm really a bit undecided. Leave this open but close all the blocked bugs (already done for the most part)? Close this one as "wontfix"? Close it as fixed as the symptoms it causes should be gone?

comment:48 by tqh, 13 years ago

I'd vote for 'Close this one as "wontfix"' with the comments you already given and so nobody assumes problems are because of this bug.

comment:49 by tqh, 13 years ago

patch: 10

comment:50 by umccullough, 12 years ago

Blocking: 5372 removed

comment:51 by scottmc, 11 years ago

Blocking: 2401, 2620, 3313, 4568, 5751, 5866, 7127, 7452 removed
Resolution: fixed
Status: assignedclosed

All but one of the blocking tickets have been closed out, and that one just needs confirmation as to whether it's been fixed or not. So I say let's close this one out as being fixed, seeing as the reported symptoms have gone away.

Note: See TracTickets for help on using tickets.