Opened 8 years ago

Closed 8 years ago

#13381 closed bug (fixed)

IOMMU IRQ routing panic on UEFI Ryzen 7 X370 system

Reported by: kallisti5 Owned by: kallisti5
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: IOMMU Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by kallisti5)

I've seen this crash a few times booting from an AMD Ryzen 7 system. Haiku hrev51008 x86_64. Booted via our haiku_loader.efi EFI loader

ACPI: Executed 1 blocks of module-level executable AML code
ACPI: 7 ACPI AML tables successfully acquired and loaded

ACPI Error: Needed [Integer/String/Buffer], found [Region] 0xffffffff82b0d460 (20160729/exresop-514)
ACPI Exception: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20160729/nsinit-515)
add_memory_type_range(177, 0xfed80000, 0x1000, 0)
add_memory_type_range(178, 0xdb135000, 0x1000, 0)
ACPI: Enabled 4 GPEs in block 00 to 1F
add_memory_type_range(179, 0xce477000, 0x1000, 0)
found io-apic with address 0xfec00000, global interrupt base 0, apic-id 17
mapping io-apic 0 at physical address 0xfec00000
add_memory_type_range(180, 0xfec00000, 0x1000, 0)
io-apic 0 has range 0-23, 24 entries, version 0x00178021, apic-id 17
found io-apic with address 0xfec01000, global interrupt base 24, apic-id 18
mapping io-apic 1 at physical address 0xfec01000
add_memory_type_range(181, 0xfec01000, 0x1000, 0)
io-apic 1 has range 24-55, 32 entries, version 0x001f8021, apic-id 18
setting ACPI interrupt model to APIC
PANIC: unable to find irq routing for PCI 0:0:2
Welcome to Kernel Debugging Land...
Thread 1 "idle thread 1" running on CPU 0
stack trace for thread 1 "idle thread 1"
    kernel stack: 0xffffffff8253d000 to 0xffffffff82542000
frame                       caller             <image>:function + offset
 0 ffffffff82541a98 (+  24) ffffffff80147a2c   <kernel_x86_64> arch_debug_call_with_fault_handler() + 0x16
 1 ffffffff82541ab0 (+  80) ffffffff800aa118   <kernel_x86_64> debug_call_with_fault_handler() + 0x68
 1 ffffffff82541ab0 (+  80) ffffffff800aa118   <kernel_x86_64> debug_call_with_fault_handler() + 0x68
 2 ffffffff82541b00 (+  80) ffffffff800abbdc   <kernel_x86_64> _ZL20kernel_debugger_loopPKcS0_P13__va_list_tagi() + 0x30c
 3 ffffffff82541b50 (+  80) ffffffff800abcee   <kernel_x86_64> _ZL24kernel_debugger_internalPKcS0_P13__va_list_tagi() + 0x6e
 4 ffffffff82541ba0 (+ 240) ffffffff800ac03e   <kernel_x86_64> panic() + 0xbe
 5 ffffffff82541c90 (+ 192) ffffffff8015c439   <kernel_x86_64> _ZL28ensure_all_functions_matchedP15pci_module_infohR6VectorI17irq_routing_entryES4_RS1_I11pci_addressE() + 0x569
 6 ffffffff82541d50 (+ 416) ffffffff8015c62e   <kernel_x86_64> _ZL22read_irq_routing_tableP16acpi_module_infoR6VectorI17irq_routing_entryEPFbiE() + 0x1ae
 7 ffffffff82541ef0 (+  48) ffffffff8015d1fb   <kernel_x86_64> _Z19prepare_irq_routingP16acpi_module_infoR6VectorI17irq_routing_entryEPFbiE() + 0x1b
 8 ffffffff82541f20 (+ 144) ffffffff8015ae18   <kernel_x86_64> _Z11ioapic_initP11kernel_args() + 0x1d8
 9 ffffffff82541fb0 (+  32) ffffffff80155b09   <kernel_x86_64> arch_int_init_io() + 0x19
10 ffffffff82541fd0 (+  32) ffffffff8006111c   <kernel_x86_64> _start() + 0x22c
11 ffffffff82541ff0 (+2108416016) 00000000ce4196d8

Attachments (5)

lspci.txt (19.5 KB ) - added by kallisti5 8 years ago.
lspci from linux
haiku-irq-debug.txt (78.8 KB ) - added by kallisti5 8 years ago.
serial output with IRQ routing trace enabled
iommu-pci-extradbg.txt (76.7 KB ) - added by kallisti5 8 years ago.
extra debugging output
iommu-pci-extradbg2.txt (86.9 KB ) - added by kallisti5 8 years ago.
extra debugging output with more debug sauce
acpi-tables.tar.gz (71.9 KB ) - added by kallisti5 8 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 by kallisti5, 8 years ago

Description: modified (diff)
Summary: ACPI crash on Ryzen 7 X370 systemACPI crash on UEFI Ryzen 7 X370 system

comment:2 by tqh, 8 years ago

Component: Drivers/ACPIDrivers
Owner: changed from tqh to nobody
Status: newassigned
Summary: ACPI crash on UEFI Ryzen 7 X370 systemIRQ routing panic on UEFI Ryzen 7 X370 system

It's not really an ACPI problem, it's a IRQ mapping problem: http://cgit.haiku-os.org/haiku/tree/src/system/kernel/arch/x86/irq_routing_table.cpp#n562

comment:3 by waddlesplash, 8 years ago

Component: DriversSystem/Kernel

by kallisti5, 8 years ago

Attachment: lspci.txt added

lspci from linux

comment:4 by kallisti5, 8 years ago

patch: 01

by kallisti5, 8 years ago

Attachment: haiku-irq-debug.txt added

serial output with IRQ routing trace enabled

comment:5 by kallisti5, 8 years ago

patch: 10

comment:6 by kallisti5, 8 years ago

The ACPI error seems unrelated. The same error can be seen on Linux machines and seems to be a common cosmetic problem with this generation of boards. (likely something the parser isn't aware of)

So the issue seems to be related to this device:

$ sudo lspci -vvv -s 00:02
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

There is an identical host bridge at 00:03, however it has things attached:

$ sudo lspci -vvv -s 00:03
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 1452
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=28, subordinate=28, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff [size=4K]
	Memory behind bridge: fe800000-fe8fffff [size=1M]
	Prefetchable memory behind bridge: 00000000e0000000-00000000f01fffff [size=258M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s <512ns, L1 <64us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible+
		RootCap: CRSVisible+
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+
		AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS-
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd+
		AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Device 8747
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [270 v1] #19
	Capabilities: [370 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-

		L1SubCtl2:
	Capabilities: [3c4 v1] #23
	Kernel driver in use: pcieport
	Kernel modules: shpchp

So it seems related to the unused host bridge?

https://rog.asus.com/media/14878984098.gif

Version 0, edited 8 years ago by kallisti5 (next)

comment:7 by kallisti5, 8 years ago

Attachments:

-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1450
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1451
           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-01.1-[01]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
           +-01.3-[03-27]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 43b9
           |               +-00.1  Advanced Micro Devices, Inc. [AMD] Device 43b5
           |               \-00.2-[1d-27]--+-00.0-[21]--
           |                               +-02.0-[23]--
           |                               +-03.0-[24]--
           |                               +-04.0-[25]----00.0  ASMedia Technology Inc. Device 1343
           |                               +-06.0-[26]----00.0  Intel Corporation I211 Gigabit Network Connection
           |                               \-07.0-[27]--
           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-03.1-[28]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480]
           |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-07.1-[29]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 145a
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1456
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device 145c
           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-08.1-[2a]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1455
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device 1457
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-18.0  Advanced Micro Devices, Inc. [AMD] Device 1460
           +-18.1  Advanced Micro Devices, Inc. [AMD] Device 1461
           +-18.2  Advanced Micro Devices, Inc. [AMD] Device 1462
           +-18.3  Advanced Micro Devices, Inc. [AMD] Device 1463
           +-18.4  Advanced Micro Devices, Inc. [AMD] Device 1464
           +-18.5  Advanced Micro Devices, Inc. [AMD] Device 1465
           +-18.6  Advanced Micro Devices, Inc. [AMD] Device 1466
           \-18.7  Advanced Micro Devices, Inc. [AMD] Device 1467

comment:8 by kallisti5, 8 years ago

Keywords: IOMMU added
Summary: IRQ routing panic on UEFI Ryzen 7 X370 systemIOMMU IRQ routing panic on UEFI Ryzen 7 X370 system

wwwait.. 0:0:2 is 00.2 in the tree above, not 02.0.

PCI: [dom 0, bus  0] bus   0, device  0, function  2: vendor 1022, device 1451, revision 00
182	PCI:   class_base 08, class_function 06, class_api 00
183	PCI:   vendor 1022: Advanced Micro Devices, Inc. [AMD]
184	PCI:   device 1451: Unknown
185	PCI:   info: Generic system peripheral (IOMMU)
186	PCI:   line_size 00, latency 00, header_type 80, BIST 00
187	PCI:   ROM base host 00000000, pci 00000000, size 00000000
188	PCI:   cardbus_CIS 00000000, subsystem_id 1451, subsystem_vendor_id 1022
189	PCI:   interrupt_line ff, interrupt_pin 01, min_grant 00, max_latency 00
190	PCI:   base reg 0: host 00000000, pci 00000000, size 00000000, flags 00
191	PCI:   base reg 1: host 00000000, pci 00000000, size 00000000, flags 00
192	PCI:   base reg 2: host 00000000, pci 00000000, size 00000000, flags 00
193	PCI:   base reg 3: host 00000000, pci 00000000, size 00000000, flags 00
194	PCI:   base reg 4: host 00000000, pci 00000000, size 00000000, flags 00
195	PCI:   base reg 5: host 00000000, pci 00000000, size 00000000, flags 00
196	PCI:   Capabilities: Secure Device, MSI, HyperTransport

It looks like that's IOMMU. Which would explain why I didn't see this bug the first time, and why jua "Started seeing it" I bet having IOMMU enabled in the BIOS causes it :-)

# lspci -s 00:00.2 -vv
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1451
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1451
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
	Interrupt: pin A routed to IRQ 27
	Capabilities: [40] Secure device <?>
	Capabilities: [64] MSI: Enable+ Count=1/4 Maskable- 64bit+
		Address: 00000000fee00000  Data: 4091
	Capabilities: [74] HyperTransport: MSI Mapping Enable+ Fixed+

by kallisti5, 8 years ago

Attachment: iommu-pci-extradbg.txt added

extra debugging output

comment:9 by kallisti5, 8 years ago

patch: 01

by kallisti5, 8 years ago

Attachment: iommu-pci-extradbg2.txt added

extra debugging output with more debug sauce

comment:10 by kallisti5, 8 years ago

That extra bit of debugging shows that PCI bus 0, device 0 doesn't have any IRQ routing tables from ACPI. Is that valid?

by kallisti5, 8 years ago

Attachment: acpi-tables.tar.gz added

comment:11 by kallisti5, 8 years ago

We're talking about checking for bus 0, device 0 and only raising a warning instead of a panic in this case. Another option is to detect the device type of IOMMU and simply raise a TODO around better IOMMU knowledge (in-case someone wants to implement an IOMMU driver someday)

the IVRS table in acpi confirms the device via the DeviceID below: (found in AMD's IOMMU spec, page 254 http://support.amd.com/TechDocs/48882_IOMMU.pdf)

[000h 0000   4]                    Signature : "IVRS"    [I/O Virtualization Reporting Structure]
[004h 0004   4]                 Table Length : 000000D0
[008h 0008   1]                     Revision : 02
[009h 0009   1]                     Checksum : 94
[00Ah 0010   6]                       Oem ID : ""
[010h 0016   8]                 Oem Table ID : ""
[018h 0024   4]                 Oem Revision : 00000001
[01Ch 0028   4]              Asl Compiler ID : "AMD "
[020h 0032   4]        Asl Compiler Revision : 00000000

[024h 0036   4]          Virtualization Info : 00203041
[028h 0040   8]                     Reserved : 0000000000000000

[030h 0048   1]                Subtable Type : 10 [Hardware Definition Block]
[031h 0049   1]                        Flags : B0
[032h 0050   2]                       Length : 0048
[034h 0052   2]                     DeviceId : 0002

comment:12 by kallisti5, 8 years ago

Owner: changed from nobody to kallisti5

comment:13 by kallisti5, 8 years ago

Resolution: fixed
Status: assignedclosed

Worked around via hrev51032

Note: See TracTickets for help on using tickets.