Opened 7 months ago

Last modified 8 weeks ago

#14835 new bug

libusb syscall stalls when EHCI errors

Reported by: pulkomandy Owned by: mmlr
Priority: normal Milestone: Unscheduled
Component: Drivers/USB/EHCI Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

https://github.com/haikuports/haikuports/issues/1345

KERN: usb_disk: got device name "Canon MP620 series 0109": No error
KERN: usb error ehci -1: qtd (0xe4c7000) error: 0x40000148
KERN: usb error ehci -1: qtd (0xe30fa80) error: 0x801f8049
KERN: usb_disk: sending the command block wrapper failed: Device check-sum error
KERN: usb error ehci -1: qtd (0xe30fc00) error: 0x00080248
KERN: usb error ehci -1: qtd (0xe30fe00) error: 0x00080248
KERN: usb error ehci -1: qtd (0xe310000) error: 0x00080248
KERN: usb hub 18: port 4: device removed

Error 0x80248 means we failed sending an 8 byte setup token, transaction halted because device is in error state.

The transfer should either be retried, or the error raised to the calling code by the mean of the USB transfer completion callback. But it seems this never happens. As a result the ioctl syscall doing the operation never returns.

From a quick look at the EHCI code, it appears that we don't call the callback for cancelled transfers. We cancel transfers without calling the callback if doing a "force" cancel, which happens in bus manager Pipe destructor.

So basically, we could end up in this situation if the Pipe was deleted while transfers where queued, and they were not properly cancelled.

I can produce a similar issue with sigrok-cli and an fx2lafw device.

I will be studying the bus manager code, but if someone with more knowledge of the architecture could have a look, that would be great.

Change History (5)

comment:1 Changed 6 months ago by waddlesplash

I guess it makes sense to force-cancel transfers from Pipe destructor; but I'm not sure why the transfers were not cancelled earlier. Pipe destruction happens during Device teardown; likely the Device teardown should cancel any queued transfers "normally"?

comment:2 Changed 4 months ago by waddlesplash

http://xref.plausible.coop/source/xref/haiku/src/add-ons/kernel/bus_managers/usb/Device.cpp#556

Instead of just "delete"ing the Pipe here, try calling CancelQueuedTransfers with "force" as "false" before doing so. I think it is still acceptable to do that at this stage.

comment:3 Changed 8 weeks ago by pulkomandy

While running sigrok-cli:

KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: TRB 0xb3b3da0 was not found in the endpoint!
KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: TRB 0xb3b3de0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b3e20 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b3e60 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: TRB 0xb3b3ea0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: TRB 0xb3b3ee0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b3f20 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: TRB 0xb3b3f60 was not found in the endpoint!
KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: TRB 0xb3b40a0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b40e0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b4120 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b4160 was not found in the endpoint!
KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b41a0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b41e0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b4220 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b4260 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b42a0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b42e0 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b4320 was not found in the endpoint!
KERN: usb xhci 0: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 0: KERN: TRB 0xb3b4360 was not found in the endpoint!

At this point sigrok-cli says there is an error and stalls (I can't kill it with control C, the main thread is gone but there are worker threads still running).

Here I unplug the device:

KERN: usb hub 7: KERN: port 2: device removed
KERN: usb xhci 1: cancel queued transfers for pipe 0xf5719c38 (0)
KERN: usb hub 7: port 2: new device connected
KERN: usb xhci 1: KERN: cancel queued transfers for pipe 0x81feeeb4 (0)

And a little later:

KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: TRB 0xb3b5260 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b52a0 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b52e0 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b5320 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b5360 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b53a0 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b5460 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b54a0 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b54e0 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb error xhci 1: KERN: TRB 0xb3b5520 was not found in the endpoint!
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb hub 7: KERN: port 2: device removed
KERN: usb xhci 1: cancel queued transfers for pipe 0xf974d648 (0)
KERN: usb xhci 1: KERN: cancel queued transfers for pipe 0xf4840c40 (2)
KERN: usb error xhci 1: unsuccessful command 15, error Context state (19)

comment:4 Changed 8 weeks ago by pulkomandy

I tried a second time and got less errors:

Connecting the device:

KERN: usb hub 7: port 2: new device connected
KERN: usb xhci 1: cancel queued transfers for pipe 0x81feeeb4 (0)

Running the command:

KERN: usb hub 7: port 2: device removed
KERN: usb xhci 1: cancel queued transfers for pipe 0xf6c46410 (0)
KERN: usb hub 7: port 2: new device connected
KERN: usb xhci 1: cancel queued transfers for pipe 0x81feeeb4 (0)

Unplugging the device:

KERN: usb xhci 1: got an interrupt for a non-Event Data TRB!
KERN: usb hub 7: port 2: device removed
KERN: usb xhci 1: cancel queued transfers for pipe 0xf970cb60 (0)
KERN: usb xhci 1: KERN: cancel queued transfers for pipe 0xf7824910 (2)
KERN: usb error xhci 1: unsuccessful command 15, error Context state (19)

comment:5 Changed 8 weeks ago by pulkomandy

Sigrok log for reference:

sr: [00:00.000002] log: libsigrok loglevel set to 5.
sr: [00:00.000033] backend: libsigrok 0.5.1/5:0:1 (rt: 0.5.1/5:0:1).
sr: [00:00.000062] backend: Libs: glib 2.57.1 (rt: 2.57.1/5701:0), libzip 1.2.0, libusb-1.0 1.0.22.11312 API 0x01000106, libftdi 1.4.
sr: [00:00.000070] backend: Host: i586-pc-haiku, little-endian.
sr: [00:00.000083] backend: SCPI backends: TCP, USBTMC.
sr: [00:00.000092] backend: Firmware search paths:
sr: [00:00.000129] backend:  - /boot/home/config/non-packaged/data/sigrok-firmware
sr: [00:00.000138] backend:  - /packages/libsigrok_x86-0.5.1-4/.self/data/sigrok-firmware
sr: [00:00.000159] backend:  - /boot/system/non-packaged/data/sigrok-firmware
sr: [00:00.000196] backend:  - /boot/system/data/sigrok-firmware
sr: [00:00.000255] backend: Sanity-checking all drivers.
sr: [00:00.000266] backend: Sanity-checking all input modules.
sr: [00:00.000281] backend: Sanity-checking all output modules.
sr: [00:00.000297] backend: Sanity-checking all transform modules.
sr: [00:00.001151] ezusb: uploading firmware to device on 1.3
sr: [00:00.001196] ezusb: setting CPU reset mode on...
sr: [00:00.001779] resource: Attempt to open '/boot/home/config/non-packaged/data/sigrok-firmware/fx2lafw-saleae-logic.fw' failed: No such file or directory
sr: [00:00.001831] resource: Opened '/packages/libsigrok_x86-0.5.1-4/.self/data/sigrok-firmware/fx2lafw-saleae-logic.fw'.
sr: [00:00.001863] ezusb: Uploading firmware 'fx2lafw-saleae-logic.fw'.
sr: [00:00.010282] ezusb: Uploaded 4096 bytes.
sr: [00:00.018498] ezusb: Uploaded 4024 bytes.
sr: [00:00.018511] ezusb: Firmware upload done.
sr: [00:00.018521] ezusb: setting CPU reset mode off...
sr: [00:00.019287] hwdriver: Scan found 1 devices (fx2lafw).
sr: [00:00.019341] device: fx2lafw: Opening device instance.
sr: [00:00.019350] fx2lafw: Waiting for device to reset.
sr: [00:00.419390] fx2lafw: Waited 400ms.
sr: [00:00.519414] fx2lafw: Waited 500ms.
sr: [00:00.619440] fx2lafw: Waited 600ms.
sr: [00:00.719464] fx2lafw: Waited 700ms.
sr: [00:00.819486] fx2lafw: Waited 800ms.
sr: [00:00.919510] fx2lafw: Waited 900ms.
sr: [00:01.019539] fx2lafw: Waited 1000ms.
sr: [00:01.119574] fx2lafw: Waited 1100ms.
sr: [00:01.219604] fx2lafw: Waited 1200ms.
sr: [00:01.319628] fx2lafw: Waited 1300ms.
sr: [00:01.419655] fx2lafw: Waited 1400ms.
sr: [00:01.519690] fx2lafw: Waited 1500ms.
sr: [00:01.619715] fx2lafw: Waited 1600ms.
sr: [00:01.719739] fx2lafw: Waited 1700ms.
sr: [00:01.819759] fx2lafw: Waited 1800ms.
sr: [00:01.919782] fx2lafw: Waited 1900ms.
sr: [00:02.019814] fx2lafw: Waited 2000ms.
sr: [00:02.119846] fx2lafw: Waited 2100ms.
sr: [00:02.219880] fx2lafw: Waited 2200ms.
sr: [00:02.319920] fx2lafw: Waited 2300ms.
sr: [00:02.419955] fx2lafw: Waited 2400ms.
sr: [00:02.519991] fx2lafw: Waited 2500ms.
sr: [00:02.620027] fx2lafw: Waited 2600ms.
sr: [00:02.720063] fx2lafw: Waited 2700ms.
sr: [00:02.721193] fx2lafw: Opened device on 1.3 (logical) / usb/1-3 (physical), interface 0, firmware 1.3.
sr: [00:02.721209] fx2lafw: Detected REVID=1, it's a Cypress CY7C68013A (FX2LP).
sr: [00:02.721217] fx2lafw: Device came back after 2700ms.
sr: [00:02.721308] session: Using thread-default main context.
sr: [00:02.721326] session: Starting.
sr: [00:02.721339] hwdriver: fx2lafw: Starting acquisition.
sr: [00:02.721371] fx2lafw: submitting transfer: 0
sr: [00:02.721384] fx2lafw: submitting transfer: 1
sr: [00:02.721394] fx2lafw: submitting transfer: 2
sr: [00:02.721403] fx2lafw: submitting transfer: 3
sr: [00:02.721413] fx2lafw: submitting transfer: 4
sr: [00:02.721422] fx2lafw: submitting transfer: 5
sr: [00:02.721430] fx2lafw: submitting transfer: 6
sr: [00:02.721438] fx2lafw: submitting transfer: 7
sr: [00:02.721446] fx2lafw: submitting transfer: 8
sr: [00:02.721454] fx2lafw: submitting transfer: 9
sr: [00:02.721462] fx2lafw: submitting transfer: 10
sr: [00:02.721472] fx2lafw: submitting transfer: 11
sr: [00:02.721483] fx2lafw: submitting transfer: 12
sr: [00:02.721493] fx2lafw: submitting transfer: 13
sr: [00:02.721511] fx2lafw: submitting transfer: 14
sr: [00:02.721520] fx2lafw: submitting transfer: 15
sr: [00:02.721528] fx2lafw: submitting transfer: 16
sr: [00:02.721537] fx2lafw: submitting transfer: 17
sr: [00:02.721547] fx2lafw: submitting transfer: 18
sr: [00:02.721579] session: bus: Received SR_DF_HEADER packet.
cli: Received SR_DF_HEADER.
sr: [00:02.721677] hwdriver: sr_config_get(): key 30000 (samplerate) sdi 0x4b0b40 cg NULL -> uint64 20000
sr: [00:02.721695] fx2lafw: GPIF delay = 1499, clocksource = 30MHz.
sr: [00:02.821749] fx2lafw: Unable to send start command: LIBUSB_ERROR_TIMEOUT.
sr: [00:02.821773] session: Could not start fx2lafw device usb/1-3 acquisition.
sr: [00:02.821780] hwdriver: fx2lafw: Stopping acquisition.
Failed to start session.
Note: See TracTickets for help on using tickets.