Opened 2 years ago

Closed 2 years ago

#13381 closed bug (fixed)

IOMMU IRQ routing panic on UEFI Ryzen 7 X370 system

Reported by: kallisti5 Owned by: kallisti5
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: IOMMU Cc:
Blocked By: Blocking:
Has a Patch: yes Platform: All

Description (last modified by kallisti5)

I've seen this crash a few times booting from an AMD Ryzen 7 system. Haiku hrev51008 x86_64. Booted via our haiku_loader.efi EFI loader

ACPI: Executed 1 blocks of module-level executable AML code
ACPI: 7 ACPI AML tables successfully acquired and loaded

ACPI Error: Needed [Integer/String/Buffer], found [Region] 0xffffffff82b0d460 (20160729/exresop-514)
ACPI Exception: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20160729/nsinit-515)
add_memory_type_range(177, 0xfed80000, 0x1000, 0)
add_memory_type_range(178, 0xdb135000, 0x1000, 0)
ACPI: Enabled 4 GPEs in block 00 to 1F
add_memory_type_range(179, 0xce477000, 0x1000, 0)
found io-apic with address 0xfec00000, global interrupt base 0, apic-id 17
mapping io-apic 0 at physical address 0xfec00000
add_memory_type_range(180, 0xfec00000, 0x1000, 0)
io-apic 0 has range 0-23, 24 entries, version 0x00178021, apic-id 17
found io-apic with address 0xfec01000, global interrupt base 24, apic-id 18
mapping io-apic 1 at physical address 0xfec01000
add_memory_type_range(181, 0xfec01000, 0x1000, 0)
io-apic 1 has range 24-55, 32 entries, version 0x001f8021, apic-id 18
setting ACPI interrupt model to APIC
PANIC: unable to find irq routing for PCI 0:0:2
Welcome to Kernel Debugging Land...
Thread 1 "idle thread 1" running on CPU 0
stack trace for thread 1 "idle thread 1"
    kernel stack: 0xffffffff8253d000 to 0xffffffff82542000
frame                       caller             <image>:function + offset
 0 ffffffff82541a98 (+  24) ffffffff80147a2c   <kernel_x86_64> arch_debug_call_with_fault_handler() + 0x16
 1 ffffffff82541ab0 (+  80) ffffffff800aa118   <kernel_x86_64> debug_call_with_fault_handler() + 0x68
 1 ffffffff82541ab0 (+  80) ffffffff800aa118   <kernel_x86_64> debug_call_with_fault_handler() + 0x68
 2 ffffffff82541b00 (+  80) ffffffff800abbdc   <kernel_x86_64> _ZL20kernel_debugger_loopPKcS0_P13__va_list_tagi() + 0x30c
 3 ffffffff82541b50 (+  80) ffffffff800abcee   <kernel_x86_64> _ZL24kernel_debugger_internalPKcS0_P13__va_list_tagi() + 0x6e
 4 ffffffff82541ba0 (+ 240) ffffffff800ac03e   <kernel_x86_64> panic() + 0xbe
 5 ffffffff82541c90 (+ 192) ffffffff8015c439   <kernel_x86_64> _ZL28ensure_all_functions_matchedP15pci_module_infohR6VectorI17irq_routing_entryES4_RS1_I11pci_addressE() + 0x569
 6 ffffffff82541d50 (+ 416) ffffffff8015c62e   <kernel_x86_64> _ZL22read_irq_routing_tableP16acpi_module_infoR6VectorI17irq_routing_entryEPFbiE() + 0x1ae
 7 ffffffff82541ef0 (+  48) ffffffff8015d1fb   <kernel_x86_64> _Z19prepare_irq_routingP16acpi_module_infoR6VectorI17irq_routing_entryEPFbiE() + 0x1b
 8 ffffffff82541f20 (+ 144) ffffffff8015ae18   <kernel_x86_64> _Z11ioapic_initP11kernel_args() + 0x1d8
 9 ffffffff82541fb0 (+  32) ffffffff80155b09   <kernel_x86_64> arch_int_init_io() + 0x19
10 ffffffff82541fd0 (+  32) ffffffff8006111c   <kernel_x86_64> _start() + 0x22c
11 ffffffff82541ff0 (+2108416016) 00000000ce4196d8

Attachments (5)

lspci.txt (19.5 KB ) - added by kallisti5 2 years ago.
lspci from linux
haiku-irq-debug.txt (78.8 KB ) - added by kallisti5 2 years ago.
serial output with IRQ routing trace enabled
iommu-pci-extradbg.txt (76.7 KB ) - added by kallisti5 2 years ago.
extra debugging output
iommu-pci-extradbg2.txt (86.9 KB ) - added by kallisti5 2 years ago.
extra debugging output with more debug sauce
acpi-tables.tar.gz (71.9 KB ) - added by kallisti5 2 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 by kallisti5, 2 years ago

Description: modified (diff)
Summary: ACPI crash on Ryzen 7 X370 systemACPI crash on UEFI Ryzen 7 X370 system

comment:2 by tqh, 2 years ago

Component: Drivers/ACPIDrivers
Owner: changed from tqh to nobody
Status: newassigned
Summary: ACPI crash on UEFI Ryzen 7 X370 systemIRQ routing panic on UEFI Ryzen 7 X370 system

It's not really an ACPI problem, it's a IRQ mapping problem: http://cgit.haiku-os.org/haiku/tree/src/system/kernel/arch/x86/irq_routing_table.cpp#n562

comment:3 by waddlesplash, 2 years ago

Component: DriversSystem/Kernel

by kallisti5, 2 years ago

Attachment: lspci.txt added

lspci from linux

comment:4 by kallisti5, 2 years ago

Has a Patch: set

by kallisti5, 2 years ago

Attachment: haiku-irq-debug.txt added

serial output with IRQ routing trace enabled

comment:5 by kallisti5, 2 years ago

Has a Patch: unset

comment:6 by kallisti5, 2 years ago

EDIT: wrong device, removed

Last edited 2 years ago by kallisti5 (previous) (diff)

comment:7 by kallisti5, 2 years ago

Attachments:

-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1450
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1451
           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-01.1-[01]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
           +-01.3-[03-27]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 43b9
           |               +-00.1  Advanced Micro Devices, Inc. [AMD] Device 43b5
           |               \-00.2-[1d-27]--+-00.0-[21]--
           |                               +-02.0-[23]--
           |                               +-03.0-[24]--
           |                               +-04.0-[25]----00.0  ASMedia Technology Inc. Device 1343
           |                               +-06.0-[26]----00.0  Intel Corporation I211 Gigabit Network Connection
           |                               \-07.0-[27]--
           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-03.1-[28]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480]
           |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-07.1-[29]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 145a
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1456
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device 145c
           +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1452
           +-08.1-[2a]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1455
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device 1457
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-18.0  Advanced Micro Devices, Inc. [AMD] Device 1460
           +-18.1  Advanced Micro Devices, Inc. [AMD] Device 1461
           +-18.2  Advanced Micro Devices, Inc. [AMD] Device 1462
           +-18.3  Advanced Micro Devices, Inc. [AMD] Device 1463
           +-18.4  Advanced Micro Devices, Inc. [AMD] Device 1464
           +-18.5  Advanced Micro Devices, Inc. [AMD] Device 1465
           +-18.6  Advanced Micro Devices, Inc. [AMD] Device 1466
           \-18.7  Advanced Micro Devices, Inc. [AMD] Device 1467

comment:8 by kallisti5, 2 years ago

Keywords: IOMMU added
Summary: IRQ routing panic on UEFI Ryzen 7 X370 systemIOMMU IRQ routing panic on UEFI Ryzen 7 X370 system

wwwait.. 0:0:2 is 00.2 in the tree above, not 02.0.

PCI: [dom 0, bus  0] bus   0, device  0, function  2: vendor 1022, device 1451, revision 00
182	PCI:   class_base 08, class_function 06, class_api 00
183	PCI:   vendor 1022: Advanced Micro Devices, Inc. [AMD]
184	PCI:   device 1451: Unknown
185	PCI:   info: Generic system peripheral (IOMMU)
186	PCI:   line_size 00, latency 00, header_type 80, BIST 00
187	PCI:   ROM base host 00000000, pci 00000000, size 00000000
188	PCI:   cardbus_CIS 00000000, subsystem_id 1451, subsystem_vendor_id 1022
189	PCI:   interrupt_line ff, interrupt_pin 01, min_grant 00, max_latency 00
190	PCI:   base reg 0: host 00000000, pci 00000000, size 00000000, flags 00
191	PCI:   base reg 1: host 00000000, pci 00000000, size 00000000, flags 00
192	PCI:   base reg 2: host 00000000, pci 00000000, size 00000000, flags 00
193	PCI:   base reg 3: host 00000000, pci 00000000, size 00000000, flags 00
194	PCI:   base reg 4: host 00000000, pci 00000000, size 00000000, flags 00
195	PCI:   base reg 5: host 00000000, pci 00000000, size 00000000, flags 00
196	PCI:   Capabilities: Secure Device, MSI, HyperTransport

It looks like that's IOMMU. Which would explain why I didn't see this bug the first time, and why jua "Started seeing it" I bet having IOMMU enabled in the BIOS causes it :-)

# lspci -s 00:00.2 -vv
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 1451
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1451
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
	Interrupt: pin A routed to IRQ 27
	Capabilities: [40] Secure device <?>
	Capabilities: [64] MSI: Enable+ Count=1/4 Maskable- 64bit+
		Address: 00000000fee00000  Data: 4091
	Capabilities: [74] HyperTransport: MSI Mapping Enable+ Fixed+

by kallisti5, 2 years ago

Attachment: iommu-pci-extradbg.txt added

extra debugging output

comment:9 by kallisti5, 2 years ago

Has a Patch: set

by kallisti5, 2 years ago

Attachment: iommu-pci-extradbg2.txt added

extra debugging output with more debug sauce

comment:10 by kallisti5, 2 years ago

That extra bit of debugging shows that PCI bus 0, device 0 doesn't have any IRQ routing tables from ACPI. Is that valid?

by kallisti5, 2 years ago

Attachment: acpi-tables.tar.gz added

comment:11 by kallisti5, 2 years ago

We're talking about checking for bus 0, device 0 and only raising a warning instead of a panic in this case. Another option is to detect the device type of IOMMU and simply raise a TODO around better IOMMU knowledge (in-case someone wants to implement an IOMMU driver someday)

the IVRS table in acpi confirms the device via the DeviceID below: (found in AMD's IOMMU spec, page 254 http://support.amd.com/TechDocs/48882_IOMMU.pdf)

[000h 0000   4]                    Signature : "IVRS"    [I/O Virtualization Reporting Structure]
[004h 0004   4]                 Table Length : 000000D0
[008h 0008   1]                     Revision : 02
[009h 0009   1]                     Checksum : 94
[00Ah 0010   6]                       Oem ID : ""
[010h 0016   8]                 Oem Table ID : ""
[018h 0024   4]                 Oem Revision : 00000001
[01Ch 0028   4]              Asl Compiler ID : "AMD "
[020h 0032   4]        Asl Compiler Revision : 00000000

[024h 0036   4]          Virtualization Info : 00203041
[028h 0040   8]                     Reserved : 0000000000000000

[030h 0048   1]                Subtable Type : 10 [Hardware Definition Block]
[031h 0049   1]                        Flags : B0
[032h 0050   2]                       Length : 0048
[034h 0052   2]                     DeviceId : 0002

comment:12 by kallisti5, 2 years ago

Owner: changed from nobody to kallisti5

comment:13 by kallisti5, 2 years ago

Resolution: fixed
Status: assignedclosed

Worked around via hrev51032

Note: See TracTickets for help on using tickets.