Opened 17 months ago

Last modified 7 weeks ago

#13792 new bug

xhci: stall error does not recover

Reported by: GregCrain Owned by: nobody
Priority: normal Milestone: R1/beta2
Component: Drivers/USB/XHCI Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #14756
Has a Patch: no Platform: All

Description

I have an early revision NEC USB 3.0 chip.

: PCI:   class_base 0c, class_function 03, class_api 30
: PCI:   vendor 1033: NEC Corporation
: PCI:   device 0194: uPD720200 USB 3.0 Host Controller
: PCI:   info: Serial bus controller (USB controller, XHCI)

usb xhci -1: interface version: 0x0096
usb xhci -1: structural parameters: 1:0x04000820 2:0x00000011 3:0x00000000
usb xhci -1: capability params: 0x014042cb


During normal operation, a sequence of events occurs and transfers seem ok:
usb xhci -1: SubmitTransfer()
usb xhci -1: Ding Dong! slot:1 endpoint 1
usb xhci -1: event[14] = 32 (0x000000000d8a1020 0x01000000 0x02018001)
usb xhci -1: slot=1 epno=1 remainder=0 status=1 halted=0

. . .

With some additional debugging code borrowed from FreeBSD:

/* check if error means halted */
halted = (completionCode != COMP_SHORT_PACKET &&
	    	completionCode != COMP_SUCCESS);

TRACE_ALWAYS("slot=%u epno=%u remainder=%lu status=%u halted=%u\n", slot, endpointNumber, remainder, completionCode, halted);

But at some point in the function

"HandleTransferComplete(xhci_trb* trb)", A Stall Error occurs.

usb xhci -1: slot=1 epno=1 remainder=9 status=6 halted=1

A Stall Error is reported by the status=6, TRB completion code.

Then at some point after:

usb error xhci -1: _LinkDescriptorForPipe max transfers count exceeded 8

There are no interrupts that occur after this.

It occurs very soon on my 0x0096 revision chipset, but I believe that it happens on other chipsets eventually.

The driver doesn't seem to recover from a Stall Error, or do anything. It eventually stops, and even though

usb xhci -1: SubmitTransfer()

transfers are being issued, no more interrupts occur.

Change History (6)

comment:1 Changed 17 months ago by diver

Component: - GeneralDrivers/USB/XHCI

comment:2 Changed 6 months ago by pulkomandy

Milestone: UnscheduledR1/beta2

comment:3 Changed 2 months ago by waddlesplash

Blocking: 14756 added

comment:4 Changed 2 months ago by waddlesplash

This is now largely unreproducible, it seems, after ​db360a20648 & hrev52890, according to various reports on IRC. My guess is that the first commit was the fix: we were creating a NULL descriptor and trying to submit it as a transfer, which of course did nothing and then we never got any reply.

We still don't handle stall errors properly (per the spec we need to reset the endpoint), and perhaps things still are not so good on "early-revision" controllers, so Greg if you could re-test this, that'd be great.

comment:5 Changed 7 weeks ago by waddlesplash

It seems that the two commits I referenced in my previous message did improve the situation on quite a lot of hardware, but talking to GregCrain and kallisti5 on IRC, it seems their hardware still fails -- specifically, the transfers stall during or shortly after partition identification of the disks.

So, at least we do get some transfers before everything grinds to a halt; so we are "almost correct" at this point, it seems.

comment:6 Changed 7 weeks ago by waddlesplash

Please retest after hrev52966; this fixed a nasty race condition which was likely related.

Note: See TracTickets for help on using tickets.