#13343 closed bug (fixed)
QEMU XHCI Devices not bootable
Reported by: | kallisti5 | Owned by: | nobody |
---|---|---|---|
Priority: | high | Milestone: | R1/beta1 |
Component: | Drivers/USB/XHCI | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #12895 | |
Platform: | All |
Description
Haiku isn't able to detect partitions on XHCI boot media.
This is a high priority bug because most users could easily try to boot the haiku installation media over USB 3.
This can be reproduced in qemu:
XHCI (failure)
- plug a USB 3 flash device into a usb 3 port on your machine running linux.
- write haiku-anyboot to a USB 3 flash device
- start up a new QEMU vm attached to the USB vendor/product:
sudo qemu-system-x86_64 -device nec-usb-xhci,id=xhci,addr=0x5 -device usb-host,bus=xhci.0,vendorid=0x13fe,productid=0x5500 -net none --enable-kvm -m 2048 -serial stdio
haiku_loader can't find any partitions.
(XHCI_BOOT attached)
EHCI (Success)
- plug your USB 3 flash device into a usb 2 port on your machine running linux.
- start up a new QEMU vm attached to the USB vendor/product with an ehci bus:
sudo qemu-system-x86_64 -device usb-ehci,id=ehci,addr=0x5 -device usb-host,bus=ehci.0,vendorid=0x13fe,productid=0x5500 -net none --enable-kvm -m 2048 -serial stdio
Boots as expected.
(EHCI_BOOT attached)
Attachments (8)
Change History (37)
by , 8 years ago
Attachment: | XHCI_BOOT.txt added |
---|
comment:2 by , 8 years ago
Summary: | XHCI Devices not bootable → QEMU XHCI Devices not bootable |
---|
comment:3 by , 8 years ago
Well you need to provide the Haiku hrev, the QEMU and SeaBIOS versions... Support for XHCI USB controllers appeared in SeaBIOS 1.7.4...
comment:4 by , 8 years ago
- qemu 2.6.1
- seabios 1.8.2
- hrev50967
Keep in mind this issue closely matches the issue we see on most moden real hardware... thus this is a good low-hanging fruit to troubleshoot some of our xhci boot issues.
comment:5 by , 8 years ago
Blocking: | 12895 added |
---|
#12895 is definitely related. This shows how to replicate it in qemu.
comment:6 by , 8 years ago
I've attached a debug log of Haiku's syslog output and QEMU's XHCI debugging inline
sudo qemu-system-x86_64 -device nec-usb-xhci,id=xhci,addr=0x5 -device usb-host,bus=xhdorid=0x13fe,productid=0x5500 -net none --enable-kvm -m 2048 -serial stdio -d trace:usb_xhci_* > ~/XHCI_BOOT_QEMUDBG.txt 2>&1
Should be *really* interesting to see what Haiku is actually doing to the XHCI controller :-)
comment:7 by , 8 years ago
comment:8 by , 8 years ago
Keep in mind that FreeBSD couldn't boot on QEMU via XHCI at least in 2015. So having it working might not be so blocking in the end.
by , 8 years ago
Attachment: | XHCI_BOOT_HKUDBG_QEMUDBG.txt added |
---|
Failed XHCI boot w/qemu xhci debug inline and Haiku usb tracing
comment:9 by , 8 years ago
Component: | Drivers/USB → Drivers/USB/XHCI |
---|---|
Owner: | changed from | to
comment:10 by , 8 years ago
After looking at this for a while, I think part of the issue is the SetAddress fails:
usb xhci -1: Set Address usb xhci -1: command[4] = b (0x00000000066bb000, 0x00000000, 0x02002c00) usb xhci -1: Ding Dong! slot:0 endpoint 0 2055@1487952628.192117:usb_xhci_doorbell_write off 0x0000, val 0x00000000 2055@1487952628.192123:usb_xhci_fetch_trb addr 00000000064fed80, CR_ADDRESS_DEVICE, p 00000000066bb000, s 00000000, c 0x02002c01 2055@1487952628.192131:usb_xhci_slot_address slotid 2, port 1 2055@1487952628.192136:usb_xhci_queue_event v 0, idx 7, ER_COMMAND_COMPLETE, CC_TRB_ERROR, p 00000000064fed80, s 05000000, c 0x02008401 2055@1487952628.192141:usb_xhci_irq_msi nr 0 2055@1487952628.192147:usb_xhci_fetch_trb addr 00000000064fed90, TRB_RESERVED, p 0000000000000000, s 00000000, c 0x00000000 2055@1487952628.192161:usb_xhci_oper_read off 0x0004, ret 0x00000008 2055@1487952628.192172:usb_xhci_runtime_read off 0x0020, ret 0x00000003 2055@1487952628.192183:usb_xhci_oper_write off 0x0004, val 0x00000008 2055@1487952628.192194:usb_xhci_runtime_write off 0x0020, val 0x00000003 usb xhci -1: Event Interrupt usb xhci -1: event[7] = 33 (0x00000000064fed80 0x05000000 0x02008401) usb xhci -1: event[7] = 33 (0x00000000064fed80 0x05000000 0x02008401) usb xhci -1: Received command event usb xhci -1: event[8] = 0 (0x0000000000000000 0x00000000 0x00000000) 2055@1487952628.196027:usb_xhci_runtime_write off 0x0038, val 0x064fe0c8 2055@1487952628.196037:usb_xhci_runtime_write off 0x003c, val 0x00000000 2055@1487952628.196066:usb_xhci_doorbell_read off 0x0000, ret 0x00000000 usb xhci -1: Command Complete usb error xhci -1: unsuccessful command TRB (5) usb xhci -1: Storing trb 0x05000000 0x02008401 usb error xhci -1: unable to set address usb xhci roothub: request: 1 usb xhci roothub: clear feature: 1
SetAddress() only gets called once and fails with an CC_TRB_ERROR from the XHCI host controller.
comment:11 by , 8 years ago
More interesting, AllocateDevice gets called twice on hubAddress 1, hubPort 1 speed 3. The first time is successful. Second time fails.
I diffed' the similar calls and took of screenshot of the differences in meld.
by , 8 years ago
meld of two allocatedevice calls on same port. first works, second fails.
comment:12 by , 8 years ago
Milestone: | Unscheduled → R1/beta1 |
---|
comment:13 by , 8 years ago
Looking through the XHCI tickets. We commonly see the following errors per the XHCI spec:
- 5 TRB Error
- Asserted when a TRB parameter error condition (e.g., out of range or invalid parameter) is detected in a TRB. Refer to section 4.10.2.2 for examples
4.10.2.2 TRB Error A TRB Error indicates the TRB field values are out of range or that the xHC has determined that a TRB is incorrectly formed. This error condition may be reported in a Transfer Event or a Command Completion Event due to an error detected on a Transfer or Command TRB, respectively. This error will not be reported in any other Event TRB types. Note: A Transfer Ring TRB Error should transition an endpoint to the Error state (refer to section 4.8.3), however an xHC implementation may assert HCE18 due to the detection of TRB Error related error conditions. It is the responsibility of software to always present correctly formed TRBs to the xHC.
- Asserted when a TRB parameter error condition (e.g., out of range or invalid parameter) is detected in a TRB. Refer to section 4.10.2.2 for examples
- 17 Parameter Error
- Asserted by a command if a Context parameter is invalid.
All of the errors we see seem to align to an incorrectly set parameter field.
comment:14 by , 8 years ago
Interestingly, on qemu we see the following:
usb xhci -1: successfully started the controller usb xhci -1: No-Op test... usb xhci -1: Noop usb xhci -1: command[0] = 17 (0x0000000000000000, 0x00000000, 0x00005c00) usb xhci -1: Ding Dong! slot:0 endpoint 0 usb xhci -1: Event Interrupt usb xhci -1: event[0] = 33 (0x0000000006503d40 0x05000000 0x00008401) usb xhci -1: event[0] = 33 (0x0000000006503d40 0x05000000 0x00008401) usb xhci -1: Received command event usb xhci -1: event[1] = 0 (0x0000000000000000 0x00000000 0x00000000) usb xhci -1: Command Complete. Result: 5 usb error xhci -1: unsuccessful command TRB (5) usb xhci -1: Storing trb 0x05000000 0x00008401 usb xhci -1: No-Op unsuccessful usb stack 0: module busses/usb/xhci successfully loaded
So:
- Noop command issued
- "something" sends command 17 (reset). 0x00005c00 is the reset TRB with everything else 0'ed out.
- I say something because in these logs I never see the "XHCI::ResetDevice" trace message with tracing enabled.
- We see successful results, followed by the initial No-Op being unsuccessful (DoCommand error)
comment:15 by , 8 years ago
Ok.. that command 17 was actually (0x17) so 23 :-| (fixed via hrev51050)
This all makes a lot more sense now:
usb hub 2: initialised ok usb xhci -1: successfully started the controller usb xhci -1: No-Op test... usb xhci -1: Issue No-Op usb xhci -1: command[0] = 23 (0x0000000000000000, 0x00000000, 0x00005c00) usb xhci -1: Ding Dong! slot:0 endpoint 0 usb xhci -1: Event Interrupt usb xhci -1: event[0] = 33 (0x0000000006503d40 0x05000000 0x00008401) usb xhci -1: event[0] = 33 (0x0000000006503d40 0x05000000 0x00008401) usb xhci -1: Received command event usb xhci -1: event[1] = 0 (0x0000000000000000 0x00000000 0x00000000) usb xhci -1: Command Complete. Result: 5 usb error xhci -1: unsuccessful command TRB (5) usb xhci -1: Storing trb 0x05000000 0x00008401 usb xhci -1: No-Op unsuccessful usb xhci -1: command[0] = 23 (0x0000000000000000, 0x00000000, 0x00005c01) usb xhci -1: command[1] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[2] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[3] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[4] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[5] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[6] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[7] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[8] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[9] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[10] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[11] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[12] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[13] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb xhci -1: command[14] = 0 (0x0000000000000000, 0x00000000, 0x00000000) usb stack 0: module busses/usb/xhci successfully loaded
So we're inserting the No-Op, and the XHCI controller is reporting it failed. (updating the cycle bit to 1 as expected)
comment:16 by , 8 years ago
patch: | 0 → 1 |
---|
comment:17 by , 8 years ago
patch: | 1 → 0 |
---|
by , 7 years ago
Attachment: | 0001-xhci-Transfer-complete-semaphore-was-not-being-relea.patch added |
---|
A patch for xhci driver, but not directly impacting QEMU.
comment:18 by , 7 years ago
patch: | 0 → 1 |
---|
comment:19 by , 7 years ago
I attached a patch that seems to get things working on my NEC/Renesas chipsets. I'm sure some of my findings will impact several tickets. It is still unstable, especially for pre rev 1.0 chipsets.
The most critical item is that there was an difference between an offset value compared to trb value, which was always off by 1. The semaphore for transfers was never being released and timed out. Actual command transfers, such as the NoOp() worked consistently, and didn't rely on that semaphore mechanism. The driver never completed getting the device descriptors.
There is also a post in the discussion forum. Maybe I should continue the topic in the development board.
comment:20 by , 7 years ago
Thanks for the patch. Please consider splitting the patch in fixes and logging parts. It's then easier to review and commit separately if needed.
follow-up: 22 comment:21 by , 7 years ago
@korli: Huh? This is already in "git format-patch" format!
@Greg: Wow, thanks for investigating this!
For the "early revision" chipsets -- do you mean that this causes a regression, and that after this patch they no longer work? Or are they just unstable, and didn't work at all before this? (Do they even work on Linux?)
comment:22 by , 7 years ago
Replying to waddlesplash:
For the "early revision" chipsets -- do you mean that this causes a regression, and that after this patch they no longer work? Or are they just unstable, and didn't work at all before this? (Do they even work on Linux?)
For me, nothing worked at all previously in Haiku. No detection of anything. I haven't tried any other OS. It would fail trying to get the device descriptors, and therefore, couldn't even detect anything connected. Now, it gets the device descriptors, and Haiku see's connected devices. When you try to mount one of those devices, it fails, and the interrupts from the host stop. So trying to access the published device on xhci with the early chipsets causes Haiku to slowdown for a minute or something. I think it does recover, but for a casual user, it could be frustrating. But this is only for NEC/Renesas. I don't have any other hardware to test. Maybe it's ok for others.
comment:23 by , 7 years ago
Testing on a recent Linux or FreeBSD might be nice, but not necessary.
Thanks for this! I've got it in my commitqueue, but waiting on @kallisti5's feedback before I push it anywhere.
follow-up: 26 comment:24 by , 7 years ago
Great work!
Some feedback:
- WriteOpReg(XHCI_CRCR_LO, CRCR_CA );
Style, remove space after "_CA"
- TRACE_ERROR("AllocateDevice() failed enable slot %d\n",slot);
Style, add space after ,
- TRACE_ERROR("_LinkDescriptorForPipe max transfers count exceeded %d\n",endpoint->used );
Style, add space after , Style, remove space after "used"
- offset = offset / sizeof(xhci_trb) +1;
Style, Add space after +
- #ifdef TRACE_USB
This seems a little messy. For now it's fine though.
- - ReadOpReg(portRegister);
- - ReadOpReg(portRegister);
A lot of times, the documentation for XHCI says a read is required after making adjustments. Was there a reason to remove these?
- Make sure Command Ring is stopped
- WriteOpReg(XHCI_CRCR_LO, CRCR_CA );
Good catch here. Looks like FreeBSD unsets CS and CA to stop though? (via the ~ mask of the bits) https://github.com/freebsd/freebsd/blob/master/sys/dev/usb/controller/xhci.c#L294
EDIT: Actually, no, you were correct. They do later set CS and CA to stop. We might want to follow suite and mimic them more? (aka, set CS bit, write HI 0, set CA bit, write HI 0) Keep in mind this is hardware, which can have lots of non-nonsensical writes.
comment:25 by , 7 years ago
korli committed one part of this (the 'zero struct field' bit: http://cgit.haiku-os.org/haiku/commit/?id=5ed6ef8e18df700101160a4fd5da7edb73408411)
comment:26 by , 7 years ago
Replying to Alexander von Gluck:
Some feedback:
Ok, thanks for feedback!
- - ReadOpReg(portRegister);
- - ReadOpReg(portRegister);
A lot of times, the documentation for XHCI says a read is required after making adjustments. Was there a reason to remove these?
I checked the spec and there is no behavior on the bits in this register that change based on a read. Further, I compared to the FreeBSD code, the similar function returned the read value, where we are just doing a read. So this seemed like superfluous code. But I can't seem to find it in the FreeBSD code at the moment.
- Make sure Command Ring is stopped
- WriteOpReg(XHCI_CRCR_LO, CRCR_CA );
Good catch here. Looks like FreeBSD unsets CS and CA to stop though? (via the ~ mask of the bits) https://github.com/freebsd/freebsd/blob/master/sys/dev/usb/controller/xhci.c#L294
EDIT: Actually, no, you were correct. They do later set CS and CA to stop. We might want to follow suite and mimic them more? (aka, set CS bit, write HI 0, set CA bit, write HI 0) Keep in mind this is hardware, which can have lots of non-nonsensical writes.
I can't prove that this was a problem, but I spent ages trying to figure out why I wasn't able to read back the address that was written. I finally found a note in the spec that said all reads of the CRCR register return 0. Then the spec also stated that all writes are ignored if it is running. So I thought a good idea to add this.
The MAIN issue I found was the function 'HandleTransferComplete', this statement:
offset = offset / sizeof(xhci_trb);
when that offset value was used in a comparision, it always failed, and therefore, would never reach the code:
release_sem(fFinishTransfersSem);
And then the semaphore would time out, and you can see lots of statements in debug logs:
usb xhci -1: cancel queued transfers for pipe 0xdfc65da0 (2)
usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
Offset was 0, when the trb_count would be 1. I added a correction,
offset = offset / sizeof(xhci_trb) + 1;
I checked with FreeBSD driver, and they did something similar, but in a different way:
for (i = (offset / 16) + 1; i < td->ntrb; i++) {
But their code structure and ours is a bit different in this area, so it seems ok for now. But please review !
comment:27 by , 7 years ago
Ah yeah, ok. That all makes sense. It looks like korli jumped in and did the CS/CA fixes. (with you maintained as the author in git of course)
Nice catch on bumping the offset. I think that's the final part of these changes that'll need to be re-based and applied. I'll run some tests after the + 1 to see if there is an improvement (beyond the improvements you've already witnessed)
comment:28 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Pushed in hrev51535 I tested this in qemu (like the original tests) and it works! qemu can now boot from Haiku via XHCI with minimal errors.
usb xhci -1: interface version: 0x0100 usb xhci -1: structural parameters: 1:0x08001040 2:0x0000000f 3:0x00000000 usb xhci -1: capability params: 0x00087001 allocate_io_interrupt_vectors: allocated 1 vectors starting from 24 msi_allocate_vectors: allocated 1 vectors starting from 24 msi enabled: 0x0089 usb xhci -1: using message signaled interrupts usb xhci -1: starting XHCI host controller usb xhci -1: successfully started the controller usb uhci: no devices found usb ohci: no devices found usb ehci: no devices found usb hub 2: port 0: new device connected usb error xhci -1: unsuccessful command TRB (5) usb error xhci -1: unsuccessful command TRB (5) usb xhci -1: cancel queued transfers for pipe 0xffffffff812cce30 (0) usb hub 2: port 1: device removed usb hub 2: port 2: device removed usb hub 2: port 3: device removed usb hub 2: port 4: device removed usb hub 2: port 5: device removed usb hub 2: port 6: device removed usb hub 2: port 7: device removed usb_disk: device reports a lun count of 1 usb_disk: vendor_identification " " usb_disk: product_identification " " usb_disk: product_revision_level "PMAP" get_boot_partitions(): boot volume message:
comment:29 by , 7 years ago
Greg: Also, if you're seeing issues with your hardware with an early XHCI rev, could you open an additional ticket with information? I wasn't trying to cut you off closing this one... the original issue described is just now solved with your changes :-)
Failed XHCI boot