Opened 8 years ago

Closed 5 years ago

#12885 closed bug (fixed)

Invalid PCI bus access by XHCI with USB 3.1 / USB C

Reported by: kallisti5 Owned by: nobody
Priority: normal Milestone: R1/beta2
Component: Drivers/USB/XHCI Version: R1/Development
Keywords: skylake ryzen xhci usb-c usb Cc:
Blocked By: Blocking: #13372, #13735, #14557, #14608
Platform: All

Description (last modified by kallisti5)

This issue generally occurs when booting from a USB 3 flash drive.

_ZN4XHCIC2EP8pci_infoP5Stack() -> XHCI::XHCI(pci_info*, Stack*)
_ZN4XHCI5AddToEP5Stack ()      -> XHCI::AddTo(Stack*)
_ZN5StackC1Ev ()               -> Stack::Stack()

This line seems to be a core issue in syslog:

usb xhci -1: using message signaled interrupts
usb xhci -1: stating XHCI host controller
usb hub 2: hub supports more ports than we do (18 vs. 16)
usb xhci -1: successfully started the controller

Attachments (11)

1.jpg (157.1 KB ) - added by kallisti5 8 years ago.
2.jpg (127.1 KB ) - added by kallisti5 8 years ago.
3.jpg (175.1 KB ) - added by kallisti5 8 years ago.
4.jpg (164.1 KB ) - added by kallisti5 8 years ago.
XHCI-BACKTRACE.jpg (365.9 KB ) - added by kallisti5 8 years ago.
backtrace from USB 3 flash drive
XHCI-TRACE-LOG.jpg (3.2 MB ) - added by kallisti5 8 years ago.
XHCI Log output with tracing enabled
IMG_20161027_092549.jpg (487.1 KB ) - added by kallisti5 7 years ago.
KDL BT - hrev50621, x86_64, Sky Lake, USB 3.0
IMG_20161027_092648.jpg (460.5 KB ) - added by kallisti5 7 years ago.
syslog - hrev50621, x86_64, Sky Lake, USB 3.0
IMG_20161029_174005.jpg (153.2 KB ) - added by kallisti5 7 years ago.
Backtrace, clearer
IMG_20161101_154045.jpg (158.4 KB ) - added by kallisti5 7 years ago.
Trace ReadCapReg32 calls
IMG_20190122_185027.jpg (891.5 KB ) - added by kallisti5 5 years ago.
XHCI, XPS 13, EFI hrev52781

Change History (53)

by kallisti5, 8 years ago

Attachment: 1.jpg added

by kallisti5, 8 years ago

Attachment: 2.jpg added

by kallisti5, 8 years ago

Attachment: 3.jpg added

by kallisti5, 8 years ago

Attachment: 4.jpg added

comment:1 by kallisti5, 8 years ago

(at boot from USB 2.0 stick plugged into USB 3.1 socket. hrev50452 x86_64)

comment:2 by kallisti5, 8 years ago

Tested hrev50454, issue still exists w/no changes.

comment:3 by korli, 8 years ago

Weird, it is as if there were a second XHCI controller to initialize (the first is started, an allocation for a second controller is being done while KDLed).

comment:4 by kallisti5, 8 years ago

Only one USB controller via linux:

$ sudo lspci -s 00:14.0 -nnn -vv
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
	Subsystem: Dell Sunrise Point-LP USB 3.0 xHCI Controller [1028:0704]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 122
	Region 0: Memory at dc210000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
		Address: 00000000fee00238  Data: 0000
	Kernel driver in use: xhci_hcd

lsusb:

$ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 0bda:5682 Realtek Semiconductor Corp. 
Bus 001 Device 003: ID 04f3:20d0 Elan Microelectronics Corp. 
Bus 001 Device 002: ID 8087:0a2a Intel Corp. 
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

dmesg:

$ dmesg | grep -i xhci
[    1.195032] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    1.195037] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[    1.196210] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x00109810
[    1.196215] xhci_hcd 0000:00:14.0: cache line size of 64 is not supported
[    1.196299] usb usb1: Product: xHCI Host Controller
[    1.196301] usb usb1: Manufacturer: Linux 4.4.0-31-generic xhci-hcd
[    1.203786] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    1.203790] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[    1.203817] usb usb2: Product: xHCI Host Controller
[    1.203819] usb usb2: Manufacturer: Linux 4.4.0-31-generic xhci-hcd
[    1.568146] usb 1-3: new full-speed USB device number 2 using xhci_hcd
[    1.920337] usb 1-4: new full-speed USB device number 3 using xhci_hcd
[    2.272169] usb 1-5: new high-speed USB device number 4 using xhci_hcd

comment:5 by kallisti5, 8 years ago

interestingly:

kallisti5@ares ~ :) $ cat /sys/bus/usb/devices/usb1/product 
xHCI Host Controller
kallisti5@ares ~ :) $ cat /sys/bus/usb/devices/usb2/product 
xHCI Host Controller
 lsusb -tv
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M
    |__ Port 3: Dev 2, If 0, Class=Wireless, Driver=btusb, 12M
    |__ Port 3: Dev 2, If 1, Class=Wireless, Driver=btusb, 12M
    |__ Port 4: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 5: Dev 4, If 0, Class=Video, Driver=uvcvideo, 480M
    |__ Port 5: Dev 4, If 1, Class=Video, Driver=uvcvideo, 480M

comment:6 by korli, 8 years ago

Just a guess, it could be that the USB 2.0 stick comes in the last port of the hub, and because we use then the next out of bound index here. You could try to increase the value USB_MAX_PORT_COUNT to 32 for instance, and see what happens. (the lsusb output doesn't list the usb2 stick, so it's not clear what happens).

comment:7 by kallisti5, 8 years ago

The logs not showing that error already had the max USB hub count bumped to 24. Same error appeared (minus the error about 16 vs 18)

comment:8 by kallisti5, 8 years ago

bus 02 appears to be the USB C port and everything external. (USB 3.1) Bus 01 seems to be an internal bus operating at USB 2.0 speeds.

So this looks like a USB host controller that supports more than one bus? I wonder if EHCI did that.

in reply to:  8 comment:9 by korli, 8 years ago

Replying to kallisti5:

So this looks like a USB host controller that supports more than one bus? I wonder if EHCI did that.

No, the XHCI Linux driver chooses to expose two root hubs (2.0 and 3.x), each port is present on both hubs. If a USB2 device is connected, it appears on the 2.0 root hub, otherwise on the 3.x root hub.

It might help if you could try with TRACE_USB enabled.

by kallisti5, 8 years ago

Attachment: XHCI-BACKTRACE.jpg added

backtrace from USB 3 flash drive

by kallisti5, 8 years ago

Attachment: XHCI-TRACE-LOG.jpg added

XHCI Log output with tracing enabled

comment:10 by korli, 8 years ago

Owner: changed from korli to nobody
Status: newassigned

by kallisti5, 7 years ago

Attachment: IMG_20161027_092549.jpg added

KDL BT - hrev50621, x86_64, Sky Lake, USB 3.0

by kallisti5, 7 years ago

Attachment: IMG_20161027_092648.jpg added

syslog - hrev50621, x86_64, Sky Lake, USB 3.0

comment:11 by kallisti5, 7 years ago

Description: modified (diff)

comment:12 by kallisti5, 7 years ago

I wonder if all this pain is because of the hack in https://github.com/haiku/haiku/commit/192f01c669102651bdc81273811079e90e0a29e5 ?

That little hack comment sounds like a *big* issue.

comment:13 by kallisti5, 7 years ago

For the "more ports than we support number" I've *finally* found a reference for a sane value. (The xHCI specs don't really mention this limit). Our USB driver unsurprisingly works a lot like FreeBSD's They have a MAX_USB_PORTS define they use in their hub code to iterate over the hub ports... their limit is 255.

https://github.com/freebsd/freebsd/blob/master/sys/dev/usb/usb_freebsd.h#L82

Sounds like I can at least change that to something better with a little backing knowledge now.

comment:14 by korli, 7 years ago

The hack would only be a problem for non-root hubs, and the syslog shows none.

by kallisti5, 7 years ago

Attachment: IMG_20161029_174005.jpg added

Backtrace, clearer

comment:15 by kallisti5, 7 years ago

So, the original XHCI_BACKTRACE image showed the problem.

This read is looking outside of the mapped memory: http://cgit.haiku-os.org/haiku/tree/src/add-ons/kernel/busses/usb/xhci.cpp#n201

The screenshot shows an odd array of 0xf's sneaking into the registers. (check out the length... the rest of the controllers map as 0x80 length, this one maps as 0xff)

capability + operational are 0xffffffff prefixed, then runtime + doorbell are prefixed with 0x00000000.

comment:16 by kallisti5, 7 years ago

a quick fix of the tracing shows eecp register is 0x0003fffc

comment:17 by korli, 7 years ago

Seems you're using an hotplugged USB hub. Maybe you should first try without the USB-C hub connected. https://wiki.archlinux.org/index.php/Dell_XPS_13_(2016)#lspci

The lspci you provided doesn't seem quite enough to me. Please correct me if I'm wrong.

comment:18 by kallisti5, 7 years ago

There is no hub during most of my testing... for a short time I plugged a USB 2.0 hub in between the USB 3 port and the USB 3 flash drive to see if it would help... it didn't

comment:19 by korli, 7 years ago

Could you post a syslog with the XHCI module disabled?

comment:20 by kallisti5, 7 years ago

with XHCI disabled, i get a simple uhci: No devices found, ohci: No devices found, ehci: No devices found... followed by a no boot partitions kdl.

comment:21 by kallisti5, 7 years ago

It feels like the XHCI bus driver is mapping the wrong memory location. Everything it reads is 0xffffffff (including XHCI_HCI_CAPLENGTH)

I see mentions of using BAR0 and BAR1 on 64-bit platforms for the register locations in the XHCI spec... but I don't see code for that in our XHCI driver :-|

comment:22 by kallisti5, 7 years ago

Intel xHCI, page 464:

The PCI Configuration space BAR0 and BAR1 fields contain a 64 bit address that points to the base of the xHC PF0 MMIO space. This po inter will be referred to as PBAR0.

I see the ohci driver doing this:

    uint32 offset = sPCIModule->read_pci_config(fPCIInfo->bus,
        fPCIInfo->device, fPCIInfo->function, PCI_base_registers, 4);

Does that somehow factor in the BAR0 vs BAR1 for 64-bit addresses? _PCI::_GetBarInfo used by PCI_base_registers seems to do a little extra math around 64-bit.

Version 0, edited 7 years ago by kallisti5 (next)

comment:23 by pulkomandy, 7 years ago

I'm not sure how an "uint32 offset" can take into account anything 64-bit.

In src/add-ons/kernel/bus_managers/pci/pci.cpp, the implementation of PCI::ReadConfig is possibly where the 0xFFFFFFFF comes from. Which means reading from PCI config doesn't work for some reason. Could be an invalid address or something else.

comment:24 by korli, 7 years ago

I think you mentioned this is a Dell XPS 2016, looking up online, there are Dell XPS references with two USB controllers: Intel Corporation Device 9d2f and Intel Corporation Device 15b5. The first one is located on the host PCI bridge, but the second one is not. That would explain the behavior you encounter :) But as you wrote there were only one controller, I'm clueless.

comment:25 by kallisti5, 7 years ago

Here is the lspci output from linux:

$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Skylake Host Bridge/DRAM Registers [8086:1904] (rev 08)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 520 [8086:1916] (rev 07)
00:04.0 Signal processing controller [1180]: Intel Corporation Skylake Processor Thermal Subsystem [8086:1903] (rev 08)
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21)
00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Thermal subsystem [8086:9d31] (rev 21)
00:15.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60] (rev 21)
00:15.1 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 [8086:9d61] (rev 21)
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP CSME HECI #1 [8086:9d3a] (rev 21)
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] [8086:9d03] (rev 21)
00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1)
00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 [8086:9d14] (rev f1)
00:1c.5 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 [8086:9d15] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-LP LPC Controller [8086:9d48] (rev 21)
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC [8086:9d21] (rev 21)
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-LP HD Audio [8086:9d70] (rev 21)
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 21)
3a:00.0 Network controller [0280]: Intel Corporation Wireless 7265 [8086:095a] (rev 59)
3b:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)

Details:

$ sudo lspci -nn -s 00:14.0 -vvv
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
	Subsystem: Dell Device [1028:0704]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 123
	Region 0: Memory at dc210000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
		Address: 00000000fee00258  Data: 0000
	Kernel driver in use: xhci_hcd

by kallisti5, 7 years ago

Attachment: IMG_20161101_154045.jpg added

Trace ReadCapReg32 calls

comment:26 by kallisti5, 7 years ago

Ok... i've figured out the 0xffff's.

There are several PCI devices Haiku sees that Linux hides. Thunderbolt 3 controller, USB 3.1 controller, etc.

I took a USB C (usb 3.1) dongle and plugged it into the USB C port. Now the controller it's probing no longer returns trash 0xffffffff and returns sane values (8085:15b5). Interface version is proper 0x110 (USB 3.1)

So! It appears Linux and other operating systems "ignore" PCI devices we don't... now to figure out why the hell Linux ignores them until a "dongle" is plugged in :-|

Still no boot partitions found, but this explains the crashing.

comment:27 by kallisti5, 7 years ago

Linux, no USB C dongles:

$ lspci -tvnn
-[0000:00]-+-00.0  Intel Corporation Skylake Host Bridge/DRAM Registers [8086:1904]
           +-02.0  Intel Corporation HD Graphics 520 [8086:1916]
           +-04.0  Intel Corporation Skylake Processor Thermal Subsystem [8086:1903]
           +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f]
           +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem [8086:9d31]
           +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60]
           +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 [8086:9d61]
           +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1 [8086:9d3a]
           +-17.0  Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] [8086:9d03]
           +-1c.0-[01-39]--
           +-1c.4-[3a]----00.0  Intel Corporation Wireless 7265 [8086:095a]
           +-1c.5-[3b]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a]
           +-1f.0  Intel Corporation Sunrise Point-LP LPC Controller [8086:9d48]
           +-1f.2  Intel Corporation Sunrise Point-LP PMC [8086:9d21]
           +-1f.3  Intel Corporation Sunrise Point-LP HD Audio [8086:9d70]
           \-1f.4  Intel Corporation Sunrise Point-LP SMBus [8086:9d23]

Linux, USB C dongle plugged in:

-[0000:00]-+-00.0  Intel Corporation Skylake Host Bridge/DRAM Registers [8086:1904]
           +-02.0  Intel Corporation HD Graphics 520 [8086:1916]
           +-04.0  Intel Corporation Skylake Processor Thermal Subsystem [8086:1903]
           +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f]
           +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem [8086:9d31]
           +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60]
           +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 [8086:9d61]
           +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1 [8086:9d3a]
           +-17.0  Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] [8086:9d03]
           +-1c.0-[01-39]----00.0-[02-39]--+-00.0-[03]--
           |                               +-01.0-[04-38]--
           |                               \-02.0-[39]----00.0  Intel Corporation DSL6340 USB 3.1 Controller [Alpine Ridge] [8086:15b5]
           +-1c.4-[3a]----00.0  Intel Corporation Wireless 7265 [8086:095a]
           +-1c.5-[3b]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a]
           +-1f.0  Intel Corporation Sunrise Point-LP LPC Controller [8086:9d48]
           +-1f.2  Intel Corporation Sunrise Point-LP PMC [8086:9d21]
           +-1f.3  Intel Corporation Sunrise Point-LP HD Audio [8086:9d70]
           \-1f.4  Intel Corporation Sunrise Point-LP SMBus [8086:9d23]

So the problem is Haiku iterates through the PCI devices that aren't actually plugged in / attached yet?... which is weird. I don't know the PCI standard that handles this.

comment:28 by kallisti5, 7 years ago

Blocked By: 13046 added

comment:29 by kallisti5, 6 years ago

Still happens post hrev51536. Same output + error

comment:30 by kallisti5, 6 years ago

Blocking: 13735 added

comment:31 by kallisti5, 6 years ago

Blocking: 13372 added

comment:32 by kallisti5, 6 years ago

Keywords: ryzen xhci usb-c usb added
Summary: XHCI page fault under skylakeInvalid PCI bus access by XHCI with USB 3.1 / USB C

comment:33 by kallisti5, 6 years ago

Turns out I've seen this a few times on Ryzen as well per #13372. Ryzen doesn't have Thunderbolt... so it must be an issue around some new PCI spec to hide devices? This board *does* have NVMe built-in. It's possible I didn't have the NVMe port populated which could result in a similar "hidden PCI bus".

-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device [1022:1451]
           +-01.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
           +-01.1-[01]----00.0  Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a804]
           +-01.3-[03-27]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device [1022:43b9]
           |               +-00.1  Advanced Micro Devices, Inc. [AMD] Device [1022:43b5]
           |               \-00.2-[1d-27]--+-00.0-[21]--
           |                               +-02.0-[23]--
           |                               +-03.0-[24]--
           |                               +-04.0-[25]----00.0  ASMedia Technology Inc. Device [1b21:1343]
           |                               +-06.0-[26]----00.0  Intel Corporation I211 Gigabit Network Connection [8086:1539]
           |                               \-07.0-[27]--
           +-02.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
           +-03.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
           +-03.1-[28]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] [1002:67df]
           |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0]
           +-03.2-[29]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Redwood XT GL [FirePro V4800] [1002:68c8]
           |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Redwood HDMI Audio [Radeon HD 5000 Series] [1002:aa60]
           +-04.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
           +-07.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
           +-07.1-[2a]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] USB3 Host Controller [1022:145c]
           +-08.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
           +-08.1-[2b]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
           |            +-00.2  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901]
           |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device [1022:1457]
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b]
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e]
           +-18.0  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
           +-18.1  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
           +-18.2  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
           +-18.3  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
           +-18.4  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
           +-18.5  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
           +-18.6  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
           \-18.7  Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]

comment:34 by waddlesplash, 6 years ago

Possibly relevant bits from the FreeBSD PCI bus system that I don't see in ours:

1) Ignore BARs that do not conform to the spec: http://xref.plausible.coop/source/xref/freebsd-11-stable/sys/dev/pci/pci.c#3173

2) Ignore EA-BAR if not enabled (I don't think we implement EA-BAR at all?) http://xref.plausible.coop/source/xref/freebsd-11-stable/sys/dev/pci/pci.c#3828

comment:35 by kallisti5, 6 years ago

Dell Precision 5510 (no USB C dongle)

-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1910]
           +-01.0-[01]----00.0  NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1]
           +-02.0  Intel Corporation HD Graphics 530 [8086:191b]
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903]
           +-14.0  Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f]
           +-14.2  Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131]
           +-15.0  Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #0 [8086:a160]
           +-15.1  Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #1 [8086:a161]
           +-16.0  Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a]
           +-17.0  Intel Corporation SATA Controller [RAID mode] [8086:2822]
           +-1c.0-[02]----00.0  Intel Corporation Wireless 8260 [8086:24f3]
           +-1c.1-[03]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a]
           +-1d.0-[04]----00.0  Toshiba America Info Systems XG4 NVMe SSD Controller [1179:0115]
           +-1d.4-[05]--
           +-1d.6-[06-3e]--
           +-1f.0  Intel Corporation CM236 Chipset LPC/eSPI Controller [8086:a150]
           +-1f.2  Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121]
           +-1f.3  Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170]
           \-1f.4  Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123]

Dell Precision 5510 (USB C dongle)

-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1910]
           +-01.0-[01]----00.0  NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1]
           +-02.0  Intel Corporation HD Graphics 530 [8086:191b]
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903]
           +-14.0  Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f]
           +-14.2  Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131]
           +-15.0  Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #0 [8086:a160]
           +-15.1  Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #1 [8086:a161]
           +-16.0  Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a]
           +-17.0  Intel Corporation SATA Controller [RAID mode] [8086:2822]
           +-1c.0-[02]----00.0  Intel Corporation Wireless 8260 [8086:24f3]
           +-1c.1-[03]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a]
           +-1d.0-[04]----00.0  Toshiba America Info Systems XG4 NVMe SSD Controller [1179:0115]
           +-1d.4-[05]--
           +-1d.6-[06-3e]----00.0-[07-3e]--+-00.0-[08]--
           |                               +-01.0-[09-3d]--
           |                               \-02.0-[3e]----00.0  Intel Corporation DSL6340 USB 3.1 Controller [Alpine Ridge] [8086:15b5]
           +-1f.0  Intel Corporation CM236 Chipset LPC/eSPI Controller [8086:a150]
           +-1f.2  Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121]
           +-1f.3  Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170]
           \-1f.4  Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123]

Precision 5510 (USB C Thunderbolt 3 dock attached)

$ lspci -tvnn
-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1910]
           +-01.0-[01]----00.0  NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1]
           +-02.0  Intel Corporation HD Graphics 530 [8086:191b]
           +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903]
           +-14.0  Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f]
           +-14.2  Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131]
           +-15.0  Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #0 [8086:a160]
           +-15.1  Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #1 [8086:a161]
           +-16.0  Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a]
           +-17.0  Intel Corporation SATA Controller [RAID mode] [8086:2822]
           +-1c.0-[02]----00.0  Intel Corporation Wireless 8260 [8086:24f3]
           +-1c.1-[03]----00.0  Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a]
           +-1d.0-[04]----00.0  Toshiba America Info Systems XG4 NVMe SSD Controller [1179:0115]
           +-1d.4-[05]--
           +-1d.6-[06-3e]----00.0-[07-3e]--+-00.0-[08]----00.0  Intel Corporation DSL6340 Thunderbolt 3 NHI [Alpine Ridge 2C 2015] [8086:1575]
           |                               +-01.0-[09-3d]----00.0-[0a-3d]--+-00.0-[0b]----00.0  Fresco Logic FL1100 USB 3.0 Host Controller [1b73:1100]
           |                               |                               +-01.0-[0c]----00.0  Intel Corporation I210 Gigabit Network Connection [8086:1533]
           |                               |                               \-04.0-[0d-3d]--
           |                               \-02.0-[3e]--
           +-1f.0  Intel Corporation CM236 Chipset LPC/eSPI Controller [8086:a150]
           +-1f.2  Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121]
           +-1f.3  Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170]
           \-1f.4  Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123]

Bridge information:

00:1d.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #13 (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
	I/O behind bridge: None
	Memory behind bridge: None
	Prefetchable memory behind bridge: None
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot-), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #13, Speed 8GT/s, Width x2, ASPM L0s L1, Exit Latency L0s unlimited, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x0, TrErr- Train+ SlotClk+ DLActive- BWMgmt- ABWMgmt-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Via WAKE# ARIFwd+
			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Dell Device 06e5
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1d.6 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #15 (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin C routed to IRQ 18
	Bus: primary=00, secondary=06, subordinate=3e, sec-latency=0
	I/O behind bridge: 00002000-00002fff [size=4K]
	Memory behind bridge: c4000000-da0fffff [size=353M]
	Prefetchable memory behind bridge: 0000000080000000-00000000a1ffffff [size=544M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #15, Speed 8GT/s, Width x2, ASPM not supported
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x2, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
			Slot #18, PowerLimit 25.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Dell Device 06e5
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
		RootCmd: CERptEn- NFERptEn- FERptEn-
		RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
			 FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
		ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [220 v1] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

Last edited 6 years ago by kallisti5 (previous) (diff)

comment:37 by kallisti5, 6 years ago

yeah. I feel like the lack of PCI EA support is definitely the cause of this one. Nice catch!

I have some WIP work on supporting it here: https://gitlab.com/kallisti5/haiku/tree/pci-ea

Going for the FreeBSD basic "if plugged at boot, works." strategy since it is what makes the most sense for R1.

comment:38 by waddlesplash, 6 years ago

Blocking: 14557 added

comment:39 by waddlesplash, 6 years ago

Blocking: 14608 added

comment:40 by pulkomandy, 6 years ago

Milestone: UnscheduledR1/beta2

comment:41 by kallisti5, 5 years ago

Just tried the latest hrev52781 on my XPS 13 (skylake) that was having issues before.

  • MBR: Insta-reboot
  • EFI: No boot partitions error. KDL syslog inbound.

by kallisti5, 5 years ago

Attachment: IMG_20190122_185027.jpg added

XHCI, XPS 13, EFI hrev52781

comment:42 by waddlesplash, 5 years ago

Blocked By: 13046 removed
Resolution: fixed
Status: assignedclosed

Error messages are #13772; but that means the controller was started, so this is fixed!

Note: See TracTickets for help on using tickets.