Opened 3 months ago

Last modified 2 months ago

#18768 new bug

Crash on boot from XHCI

Reported by: david.given Owned by: waddlesplash
Priority: normal Milestone: Unscheduled
Component: Drivers/USB/XHCI Version: R1/Development
Keywords: Cc:
Blocked By: #17549 Blocking:
Platform: All

Description

I have a reliable (i.e. showstopping, system will not start) crash on boot which looks like it's coming from the XHCI driver. For confirmation, I have tried disabling the XHCI driver, and booting succeeds, but I seem to have an XHCI-only chipset because keyboard and mouse don't work.

This is happening on the latest nightly as of writing, but it's also been happening on my main system for a really long time (but I've only just got round to looking into it). It didn't happen when I installed my main system, but I don't recall which version that was --- sorry; it could have been hrev1/beta4 or a nightly.

I'm attaching both a KDL snapshot and the output from dmidecode (on Linux), which should give full system information, but the short version is that my machine is an AMD Ryzen 9 7900X. Let me know if you need anything else.

Attachments (3)

dmidecode.txt (23.2 KB ) - added by david.given 3 months ago.
PXL_20240124_205601074.jpg (1.2 MB ) - added by david.given 3 months ago.
fdd-descriptor.txt (2.3 KB ) - added by david.given 3 months ago.

Download all attachments as: .zip

Change History (14)

by david.given, 3 months ago

Attachment: dmidecode.txt added

by david.given, 3 months ago

Attachment: PXL_20240124_205601074.jpg added

comment:1 by waddlesplash, 3 months ago

Component: - GeneralDrivers/USB/XHCI
Owner: changed from nobody to waddlesplash

comment:2 by waddlesplash, 3 months ago

Maybe related to #17549.

comment:3 by waddlesplash, 3 months ago

That address appears to be a kernel one so it's odd SMEP is being triggered.

Very strange that it happens on both pieces of hardware you've tried Haiku on, because otherwise I can't remember anyone seeing anything like this except maybe in some very specific circumstances.

Can you type at the KDL prompt? I guess you have a USB keyboard, not a PS2 one, so probably not?

comment:4 by david.given, 3 months ago

This is all on a single system, BTW --- I've just been trying several different builds on it.

I also tried disabling SMEP/SMAP in the boot menu and it still crashed, but this time with a kernel page fault (as expected).

And, sadly, I don't believe I have a PS/2 port. I'll have a look later; if I can find one, is there anything you want me to look for? Also, I did track down the XHCI driver, which contains lots of tracing; how do I turn it on?

comment:5 by korli, 3 months ago

Looks like these systems can have 5 xhci PCI devices, example: https://linux-hardware.org/?probe=f108558763&log=lspci

Booting without any connected USB devices doesn't work also? If it does, maybe try to plug in some other USB port.

comment:6 by david.given, 3 months ago

I got more data!

After finally thinking to unplug some devices and see what happened, I managed to track the crash down to having one specific device plugged in: an 03ee:6901 Mitsumi SmartDisk FDD floppy disk drive. Having this plugged in seems to trigger a crash, either on bootup or about ten seconds after connecting, if the system is booted. I'm attaching a copy of the descriptor tree. I did also try connecting a second, different floppy drive and I did get a different crash, somewhere in what looked like the block driver system, but I was unable to duplicate it so that can probably be discounted for now.

by david.given, 3 months ago

Attachment: fdd-descriptor.txt added

comment:7 by waddlesplash, 3 months ago

Blocked By: 17549 added

So, this is almost certainly a duplicate of #17549 then, as that's also got a USB FDD involved.

comment:8 by waddlesplash, 3 months ago

This looks strongly like a use-after-free. However, determining where or what is causing it may prove difficult. The USB floppy disk driver is part of the standard USB disk driver, so I don't think this is a module-unload problem.

The "finish" thread might not account for transfers having been cancelled in the interim, though, and that may be a concern. I guess logic could be added there, but that'd be a total guess. Without a way to interact with the KDL session (whether via keyboard or through a serial interface) this may be hard to debug.

comment:9 by waddlesplash, 3 months ago

Please retest after hrev57542.

comment:10 by david.given, 3 months ago

I tried this with 57554 and the crash still happened when connecting the device.

comment:11 by waddlesplash, 2 months ago

Without being able to type in KDL this is going to be hard to debug.

Any chance the problem reproduces in a VM (with the XHCI driver, of course) when the USB device is forwarded into the VM?

Note: See TracTickets for help on using tickets.