Opened 8 years ago
Closed 8 years ago
#13381 closed bug (fixed)
IOMMU IRQ routing panic on UEFI Ryzen 7 X370 system
Reported by: | kallisti5 | Owned by: | kallisti5 |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | IOMMU | Cc: | |
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
I've seen this crash a few times booting from an AMD Ryzen 7 system. Haiku hrev51008 x86_64. Booted via our haiku_loader.efi EFI loader
ACPI: Executed 1 blocks of module-level executable AML code ACPI: 7 ACPI AML tables successfully acquired and loaded ACPI Error: Needed [Integer/String/Buffer], found [Region] 0xffffffff82b0d460 (20160729/exresop-514) ACPI Exception: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20160729/nsinit-515) add_memory_type_range(177, 0xfed80000, 0x1000, 0) add_memory_type_range(178, 0xdb135000, 0x1000, 0) ACPI: Enabled 4 GPEs in block 00 to 1F add_memory_type_range(179, 0xce477000, 0x1000, 0) found io-apic with address 0xfec00000, global interrupt base 0, apic-id 17 mapping io-apic 0 at physical address 0xfec00000 add_memory_type_range(180, 0xfec00000, 0x1000, 0) io-apic 0 has range 0-23, 24 entries, version 0x00178021, apic-id 17 found io-apic with address 0xfec01000, global interrupt base 24, apic-id 18 mapping io-apic 1 at physical address 0xfec01000 add_memory_type_range(181, 0xfec01000, 0x1000, 0) io-apic 1 has range 24-55, 32 entries, version 0x001f8021, apic-id 18 setting ACPI interrupt model to APIC PANIC: unable to find irq routing for PCI 0:0:2 Welcome to Kernel Debugging Land... Thread 1 "idle thread 1" running on CPU 0 stack trace for thread 1 "idle thread 1" kernel stack: 0xffffffff8253d000 to 0xffffffff82542000 frame caller <image>:function + offset 0 ffffffff82541a98 (+ 24) ffffffff80147a2c <kernel_x86_64> arch_debug_call_with_fault_handler() + 0x16 1 ffffffff82541ab0 (+ 80) ffffffff800aa118 <kernel_x86_64> debug_call_with_fault_handler() + 0x68 1 ffffffff82541ab0 (+ 80) ffffffff800aa118 <kernel_x86_64> debug_call_with_fault_handler() + 0x68 2 ffffffff82541b00 (+ 80) ffffffff800abbdc <kernel_x86_64> _ZL20kernel_debugger_loopPKcS0_P13__va_list_tagi() + 0x30c 3 ffffffff82541b50 (+ 80) ffffffff800abcee <kernel_x86_64> _ZL24kernel_debugger_internalPKcS0_P13__va_list_tagi() + 0x6e 4 ffffffff82541ba0 (+ 240) ffffffff800ac03e <kernel_x86_64> panic() + 0xbe 5 ffffffff82541c90 (+ 192) ffffffff8015c439 <kernel_x86_64> _ZL28ensure_all_functions_matchedP15pci_module_infohR6VectorI17irq_routing_entryES4_RS1_I11pci_addressE() + 0x569 6 ffffffff82541d50 (+ 416) ffffffff8015c62e <kernel_x86_64> _ZL22read_irq_routing_tableP16acpi_module_infoR6VectorI17irq_routing_entryEPFbiE() + 0x1ae 7 ffffffff82541ef0 (+ 48) ffffffff8015d1fb <kernel_x86_64> _Z19prepare_irq_routingP16acpi_module_infoR6VectorI17irq_routing_entryEPFbiE() + 0x1b 8 ffffffff82541f20 (+ 144) ffffffff8015ae18 <kernel_x86_64> _Z11ioapic_initP11kernel_args() + 0x1d8 9 ffffffff82541fb0 (+ 32) ffffffff80155b09 <kernel_x86_64> arch_int_init_io() + 0x19 10 ffffffff82541fd0 (+ 32) ffffffff8006111c <kernel_x86_64> _start() + 0x22c 11 ffffffff82541ff0 (+2108416016) 00000000ce4196d8
Attachments (5)
Change History (18)
comment:1 by , 8 years ago
Description: | modified (diff) |
---|---|
Summary: | ACPI crash on Ryzen 7 X370 system → ACPI crash on UEFI Ryzen 7 X370 system |
comment:2 by , 8 years ago
Component: | Drivers/ACPI → Drivers |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Summary: | ACPI crash on UEFI Ryzen 7 X370 system → IRQ routing panic on UEFI Ryzen 7 X370 system |
comment:3 by , 8 years ago
Component: | Drivers → System/Kernel |
---|
comment:4 by , 8 years ago
patch: | 0 → 1 |
---|
by , 8 years ago
Attachment: | haiku-irq-debug.txt added |
---|
serial output with IRQ routing trace enabled
comment:5 by , 8 years ago
patch: | 1 → 0 |
---|
comment:6 by , 8 years ago
The ACPI error seems unrelated. The same error can be seen on Linux machines and seems to be a common cosmetic problem with this generation of boards. (likely something the parser isn't aware of)
So the issue seems to be related to this device:
$ sudo lspci -vvv -s 00:02 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
There is an identical host bridge at 00:03, however it has things attached:
$ sudo lspci -vvv -s 00:03 00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=28, subordinate=28, sec-latency=0 I/O behind bridge: 0000d000-0000dfff [size=4K] Memory behind bridge: fe800000-fe8fffff [size=1M] Prefetchable memory behind bridge: 00000000e0000000-00000000f01fffff [size=258M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0 ExtTag+ RBE+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s <512ns, L1 <64us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+ SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 0.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible+ RootCap: CRSVisible+ RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+ AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS- DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd+ AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Device 8747 Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+ Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270 v1] #19 Capabilities: [370 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+ L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- L1SubCtl2: Capabilities: [3c4 v1] #23 Kernel driver in use: pcieport Kernel modules: shpchp
So it seems related to the unused host bridge?
comment:7 by , 8 years ago
Attachments:
-[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Device 1450 +-00.2 Advanced Micro Devices, Inc. [AMD] Device 1451 +-01.0 Advanced Micro Devices, Inc. [AMD] Device 1452 +-01.1-[01]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 +-01.3-[03-27]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device 43b9 | +-00.1 Advanced Micro Devices, Inc. [AMD] Device 43b5 | \-00.2-[1d-27]--+-00.0-[21]-- | +-02.0-[23]-- | +-03.0-[24]-- | +-04.0-[25]----00.0 ASMedia Technology Inc. Device 1343 | +-06.0-[26]----00.0 Intel Corporation I211 Gigabit Network Connection | \-07.0-[27]-- +-02.0 Advanced Micro Devices, Inc. [AMD] Device 1452 +-03.0 Advanced Micro Devices, Inc. [AMD] Device 1452 +-03.1-[28]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] | \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0 +-04.0 Advanced Micro Devices, Inc. [AMD] Device 1452 +-07.0 Advanced Micro Devices, Inc. [AMD] Device 1452 +-07.1-[29]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device 145a | +-00.2 Advanced Micro Devices, Inc. [AMD] Device 1456 | \-00.3 Advanced Micro Devices, Inc. [AMD] Device 145c +-08.0 Advanced Micro Devices, Inc. [AMD] Device 1452 +-08.1-[2a]--+-00.0 Advanced Micro Devices, Inc. [AMD] Device 1455 | +-00.2 Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] | \-00.3 Advanced Micro Devices, Inc. [AMD] Device 1457 +-14.0 Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller +-14.3 Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge +-18.0 Advanced Micro Devices, Inc. [AMD] Device 1460 +-18.1 Advanced Micro Devices, Inc. [AMD] Device 1461 +-18.2 Advanced Micro Devices, Inc. [AMD] Device 1462 +-18.3 Advanced Micro Devices, Inc. [AMD] Device 1463 +-18.4 Advanced Micro Devices, Inc. [AMD] Device 1464 +-18.5 Advanced Micro Devices, Inc. [AMD] Device 1465 +-18.6 Advanced Micro Devices, Inc. [AMD] Device 1466 \-18.7 Advanced Micro Devices, Inc. [AMD] Device 1467
comment:8 by , 8 years ago
Keywords: | IOMMU added |
---|---|
Summary: | IRQ routing panic on UEFI Ryzen 7 X370 system → IOMMU IRQ routing panic on UEFI Ryzen 7 X370 system |
wwwait.. 0:0:2 is 00.2 in the tree above, not 02.0.
PCI: [dom 0, bus 0] bus 0, device 0, function 2: vendor 1022, device 1451, revision 00 182 PCI: class_base 08, class_function 06, class_api 00 183 PCI: vendor 1022: Advanced Micro Devices, Inc. [AMD] 184 PCI: device 1451: Unknown 185 PCI: info: Generic system peripheral (IOMMU) 186 PCI: line_size 00, latency 00, header_type 80, BIST 00 187 PCI: ROM base host 00000000, pci 00000000, size 00000000 188 PCI: cardbus_CIS 00000000, subsystem_id 1451, subsystem_vendor_id 1022 189 PCI: interrupt_line ff, interrupt_pin 01, min_grant 00, max_latency 00 190 PCI: base reg 0: host 00000000, pci 00000000, size 00000000, flags 00 191 PCI: base reg 1: host 00000000, pci 00000000, size 00000000, flags 00 192 PCI: base reg 2: host 00000000, pci 00000000, size 00000000, flags 00 193 PCI: base reg 3: host 00000000, pci 00000000, size 00000000, flags 00 194 PCI: base reg 4: host 00000000, pci 00000000, size 00000000, flags 00 195 PCI: base reg 5: host 00000000, pci 00000000, size 00000000, flags 00 196 PCI: Capabilities: Secure Device, MSI, HyperTransport
It looks like that's IOMMU. Which would explain why I didn't see this bug the first time, and why jua "Started seeing it" I bet having IOMMU enabled in the BIOS causes it :-)
# lspci -s 00:00.2 -vv 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1451 Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1451 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+ Interrupt: pin A routed to IRQ 27 Capabilities: [40] Secure device <?> Capabilities: [64] MSI: Enable+ Count=1/4 Maskable- 64bit+ Address: 00000000fee00000 Data: 4091 Capabilities: [74] HyperTransport: MSI Mapping Enable+ Fixed+
comment:9 by , 8 years ago
patch: | 0 → 1 |
---|
by , 8 years ago
Attachment: | iommu-pci-extradbg2.txt added |
---|
extra debugging output with more debug sauce
comment:10 by , 8 years ago
That extra bit of debugging shows that PCI bus 0, device 0 doesn't have any IRQ routing tables from ACPI. Is that valid?
by , 8 years ago
Attachment: | acpi-tables.tar.gz added |
---|
comment:11 by , 8 years ago
We're talking about checking for bus 0, device 0 and only raising a warning instead of a panic in this case. Another option is to detect the device type of IOMMU and simply raise a TODO around better IOMMU knowledge (in-case someone wants to implement an IOMMU driver someday)
the IVRS table in acpi confirms the device via the DeviceID below: (found in AMD's IOMMU spec, page 254 http://support.amd.com/TechDocs/48882_IOMMU.pdf)
[000h 0000 4] Signature : "IVRS" [I/O Virtualization Reporting Structure] [004h 0004 4] Table Length : 000000D0 [008h 0008 1] Revision : 02 [009h 0009 1] Checksum : 94 [00Ah 0010 6] Oem ID : "" [010h 0016 8] Oem Table ID : "" [018h 0024 4] Oem Revision : 00000001 [01Ch 0028 4] Asl Compiler ID : "AMD " [020h 0032 4] Asl Compiler Revision : 00000000 [024h 0036 4] Virtualization Info : 00203041 [028h 0040 8] Reserved : 0000000000000000 [030h 0048 1] Subtable Type : 10 [Hardware Definition Block] [031h 0049 1] Flags : B0 [032h 0050 2] Length : 0048 [034h 0052 2] DeviceId : 0002
comment:12 by , 8 years ago
Owner: | changed from | to
---|
comment:13 by , 8 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Worked around via hrev51032
It's not really an ACPI problem, it's a IRQ mapping problem: http://cgit.haiku-os.org/haiku/tree/src/system/kernel/arch/x86/irq_routing_table.cpp#n562