#14506 closed bug (not reproducible)
XHCI: faulty event ring handling?
Reported by: | smallstepforman | Owned by: | waddlesplash |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta2 |
Component: | Drivers/USB/XHCI | Version: | R1/Development |
Keywords: | XHCI KDL | Cc: | |
Blocked By: | Blocking: | ||
Platform: | All |
Description
I've finally found a method to 100% reproduce the XHCI KDL on my box (before it was hit / miss, with the typical symptoms - first the mouse would die, or eventually the keyboard would constantly repeat, eventually USB hard disk would die). Now I can finally reproduce the issue 100% of the time, and have a KDL screenshot with XHCI errors. Maybe investigating this issue will finally show the XHCI timing bug effecting everyone.
Steps to reproduce:
int main(int argc, char * * argv) {
during setup of my video editor, after setting up opengl/windows/media kit
...
simple assert will KDL
assert(0);
}
On my box, 100% KDL with XHCI errors.
KDL trace (see screenshot): summary in text format below:
<kernel_x86_64> memcpy + 0x51
<xhci> XHCI::ReadDescriptorChain(xhci_td*, iovec*, unsigned long) + 0xa2
<xhci> XHCI::FinishTransfers() + 0x1d1
<xhci> XHCI::FinishThread(void *) + 0x09
<kernel_x86_64> common_thread_entry(void *) + 0x37
Attachments (3)
Change History (13)
by , 6 years ago
comment:1 by , 6 years ago
Looking at XHCI::FinishTransfers(), can we be dealing with a threading issue, where transfer->Vector() is modified by another thread?. This will invalidate transfer->VectorCount().
Also, we never check the return value of Transfer::PrepareKernelAccess(), asking for trouble.
comment:2 by , 6 years ago
Seems to have been fixed in hrev52357. The assert(0) will no longer trigger the KDL.
comment:3 by , 6 years ago
That would suggest that there is an interrupt related issue that leads the XHCI driver to do something (finish a descriptor chain early?) which is then repeated. The fact that the IO-APIC is now probably used apparently masks the issue. It should still be fixed though as it may come up in a different incarnation. Can you please provide a syslog with the new revision and try to reproduce the issue with IO-APICs disabled from the boot menu?
comment:4 by , 6 years ago
Sadly, the XHCI issue still exists (losing mouse/keyboard) even without disabling IO-APIC. Michael, you were right, it just masked the issue from my initial reproducable scenario. I no longer get KDL after the assert(0) however, so there is a small benefit :)
See attached syslog (above). Lots of USB resets. I use external USB2 hard disk to boot Haiku on 2014 MacBookPro (11.3)
comment:5 by , 6 years ago
Another KD almost identical KDL stack trace, except this time memcpy() is 2 bytes away from previous crash. See attachment (uploading)
comment:6 by , 6 years ago
Milestone: | Unscheduled → R1/beta2 |
---|
comment:7 by , 6 years ago
USB resets were fixed in master (but not in beta1.) Random disconnects (pipe stalls) may now be fixed as of db360a20648 & hrev52890 by various reports, so please upgrade and test.
comment:8 by , 6 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Summary: | XHCI KDL - possible root cause → XHCI: faulty event ring handling? |
There were some locking issues in HandleTransfersComplete that I've fixed in hrev52931, which may have been the cause of this.
However, I haven't yet audited our Event Ring handling code, which may be the source of these issues. So I'll leave this ticket open until I do.
comment:9 by , 6 years ago
Resolution: | → not reproducible |
---|---|
Status: | assigned → closed |
The Event Ring code looks perfectly fine. So, closing this as not reproducible; we can make a new ticket if it reappears.
comment:10 by , 5 years ago
Remove milestone for tickets with status = closed and resolution != fixed
Screenshot XHCI KDL