#16794 closed bug (fixed)
XHCI: Divide error exception in SubmitNormalRequest
Reported by: | LSS37040 | Owned by: | waddlesplash |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta4 |
Component: | Drivers/USB/XHCI | Version: | R1/Development |
Keywords: | Cc: | korli, x512 | |
Blocked By: | Blocking: | #16586, #16603, #16878, #17287 | |
Platform: | All |
Description
I have a SD/MMC/MS card reader on my case that is connected to the motherboard via USB header. When the card reader is empty, DriveSetup would first stuck for a few seconds on startup then the system crashes with a panic.
It happens on R1/beta2 as well as the most recent nightly, hrev54954, and at that time blocked a successful installation since I need to format a partition to BeFS before installing it to the system.
I was able to work around it by putting the Haiku install image on a SD card, insert it to the card reader and install from there. When the card reader is populated, DriveSetup ran without any issue.
I'm attaching the photo I took when the panic happens. It seems to have mentions about XHCI, so I think it's possible that all USB ports/headers on my motherboard (whether 2.0 or 3.0) are provided through USB 3.0 host controllers.
The motherboard is ASRock X570 Taichi Razer Edition.
PS: From my past experiences, some chassis card readers are known to cause issues under certain circumstances, when unpopulated (empty).
Attachments (6)
Change History (38)
by , 4 years ago
Attachment: | IMG_20210213_204101.jpg added |
---|
comment:1 by , 4 years ago
Cc: | added |
---|---|
Summary: | DriveSetup causes panic with an empty card reader. → XHCI: Divide error exception in SubmitNormalRequest |
I was under this is the impression that this is what hrev54876 was supposed to fix. korli, any ideas?
comment:3 by , 4 years ago
This photo was made on an installed hrev54954 system.
I managed to install it by putting the iso into a SD Card, inserting it to the card reader and boot from there. This way the card reader is not empty and I'm able to proceed with the installation without issues.
comment:4 by , 4 years ago
Well, SubmitNormalRequest should fail in all cases if trbSize is found to be zero (original patch: https://review.haiku-os.org/c/haiku/+/3611/1 )
follow-up: 7 comment:5 by , 4 years ago
I wonder if the problem here is that the xhci_endpoint is actually not initialized. Adding checks for that may solve this. However I am not sure how we can wind up in a state where the endpoint is not initialized but a Pipe exists...
comment:6 by , 4 years ago
Blocking: | 16878 added |
---|
comment:7 by , 4 years ago
Replying to waddlesplash:
I wonder if the problem here is that the xhci_endpoint is actually not initialized. Adding checks for that may solve this. However I am not sure how we can wind up in a state where the endpoint is not initialized but a Pipe exists...
I'm not an expert, but I do recall seeing some BIOSes/bootloaders don't play well with empty card readers before (that I either need to insert a card in it, or hide the reader somehow).
I think the BIOS could interpret empty card readers in all kinds of ways, and in some cases would cause certain BIOS/system calls to return invalid values which make the caller freaks out.
by , 3 years ago
Attachment: | photo_2021-06-08 00.16.16.jpeg added |
---|
Panic when touching usb wireless mouse dongle
comment:9 by , 3 years ago
Just tried reinstalling the same system with R1 Beta3 and the issue persists. I still need to populate the chassis SD card reader to prevent DriveSetup from crashing.
Also, while the error is the same (Divide Error Exception), the stacktrace looks a bit different this time.
by , 3 years ago
Attachment: | IMG_20210727_233017.jpg added |
---|
Divide Error Exception reproduced on R1 Beta3.
comment:10 by , 3 years ago
Cc: | added |
---|
This patch seems to be related https://github.com/X547/Haiku-riscv/blob/main/patchset-hrev55144/0012-XHCI-add-zero-division-checks.patch
comment:11 by , 3 years ago
That patch would "fix" the KDL, but really just masks it. The code is structured such that those values should never be 0, and if they are, something has gone wrong somewhere else earlier. One such path was cut off by a patch of korli's, but something else must be going wrong in order to hit this result, and I would strongly prefer that be fixed rather than masking the problem.
If there are certain devices that reliably trigger the panic when removing/etc. them, perhaps I or someone else can try to look into it using that; or other information about what reliably causes it.
comment:12 by , 3 years ago
Perhaps adding a trace at https://github.com/haiku/haiku/blob/master/src/add-ons/kernel/bus_managers/usb/usb.cpp#L470 to check if packetCount is non-zero?
This gets used at https://github.com/haiku/haiku/blob/master/src/add-ons/kernel/busses/usb/xhci.cpp#L2949, maybe a possible divide by zero?
Running an objdump on xhci
and searching for div
instructions yielded the following functions:
- GetUSBID
- FinishTransfers
- _Resize
- PhysicalMemoryAllocator
- Allocate
- Deallocate
Also note that in the last attachment, that's being triggered by a stat
syscall, pretty basic code there.
comment:13 by , 3 years ago
All the stack traces in question are Bulk or Interrupt operations, not Isochronous (which only audio/video devices use), though; and all of them appear to be in the Submit function, not the Finish function.
comment:14 by , 3 years ago
Platform: | x86-64 → All |
---|
comment:15 by , 3 years ago
I am able to reliably trigger this with my slightly bent wireless keyboard dongle. Since it's bent, sometimes it doesn't work, and fiddling with it makes it cycle between on and off very fast. Then it doesn't take long before this KDL comes up.
comment:16 by , 3 years ago
If I have surmised as to the nature of the problem correctly, this patch should fix the problem: https://review.haiku-os.org/c/haiku/+/4421
KapiX, if you can reproduce it so easily, you seem to be the ideal candidate to build and test it. :)
comment:17 by , 3 years ago
Blocking: | 16603 added |
---|
comment:18 by , 3 years ago
Blocking: | 16586 added |
---|
comment:19 by , 3 years ago
hrev55404 should probably change this KDL from "division by zero" to "endpoint is not initialized".
comment:20 by , 3 years ago
Tested hrev55409 and hrev55410.
An empty card reader still KDLs the system when starting DriveSetup.
However, when the system KDLs only a few lines of white area appear at the top of the screen and then the system appears frozen without showing any actual KDL messages, not even "endpoint is not initialized".
comment:21 by , 3 years ago
Probably because it is trying to use the usb_keyboard kernel debug addon. You may get a backtrace by blacklisting that.
by , 3 years ago
Attachment: | IMG_20210912_204926.jpg added |
---|
"endpoint not initialized" KDL when accessing DriveSetup with the chassis SD card reader empty.
comment:22 by , 3 years ago
Yeah, after disabling usb_keyboard kernel debug addon I'm able to get an "endpoint not initialized" KDL backtrace. I've attached one. I can reliably reproduce this by accessing DriveSetup with the chassis SD card reader left empty.
comment:23 by , 3 years ago
I might have found a way to reproduce perhaps closely related bug in QEMU by rapidly adding and removing USB devices: a NULL dereference in USB ECM called from the explore thread, apparently into a device descriptor. It seems the device descriptor was already removed -- but that should only have occurred in the explore thread itself.
comment:24 by , 3 years ago
Milestone: | Unscheduled → R1/beta4 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
I believe this should be fixed in hrev55429. There is a pending change on Gerrit which may resolve some other race conditions which should be merged pretty soon and resolve whatever other problems remain.
by , 3 years ago
Attachment: | IMG_20210919_205814.jpg added |
---|
KDL with empty chassis card reader on hrev55436.
comment:25 by , 3 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Just tested hrev55436 and the issue persists under the same circumstances (the presence of an empty chassis card reader), with a different KDL.
Now the panic says "USB object did not become unbusy!"
comment:26 by , 3 years ago
I guess this is probably due to the Control pipe features being in use. Seems we need a more cogent strategy for that teardown.
comment:27 by , 3 years ago
This change should hopefully resolve the problem: https://review.haiku-os.org/c/haiku/+/4491/1
When test builds are available, I will post a link here.
comment:28 by , 3 years ago
comment:29 by , 3 years ago
Tested hrev55443. DriveSetup no longer KDLs the system, but it's not functioning correctly.
With an empty chassis card reader, it takes about several minutes to show the main screen and even after that DriveSetup window looks totally frozen. I can't do anything with it, such as scrolling down to see how an empty card reader would look like, as well as closing the window.
For that I'll file a separate issue as the KDL, which is the original topic of this issue, is now gone.
comment:30 by , 3 years ago
Ack... spoke too soon. That was tested during install phase.
This time I actually installed the system and tried opening DriveSetup in the newly installed environment and I got a much different KDL, "vm_page_fault".
What I did is to open DriveSetup while my card reader is empty, open some folders while DriveSetup loads (which eventually KDLs).
comment:31 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
That appears to be a separate issue in a different component, please open a new ticket for it.
comment:32 by , 3 years ago
Blocking: | 17287 added |
---|
Panic when opening DriveSetup with the card reader unpopulated (empty).