Opened 6 years ago
Last modified 3 years ago
#14835 new bug
libusb syscall stalls when USB errors
Reported by: | pulkomandy | Owned by: | mmlr |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | Drivers/USB | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
https://github.com/haikuports/haikuports/issues/1345
KERN: usb_disk: got device name "Canon MP620 series 0109": No error KERN: usb error ehci -1: qtd (0xe4c7000) error: 0x40000148 KERN: usb error ehci -1: qtd (0xe30fa80) error: 0x801f8049 KERN: usb_disk: sending the command block wrapper failed: Device check-sum error KERN: usb error ehci -1: qtd (0xe30fc00) error: 0x00080248 KERN: usb error ehci -1: qtd (0xe30fe00) error: 0x00080248 KERN: usb error ehci -1: qtd (0xe310000) error: 0x00080248 KERN: usb hub 18: port 4: device removed
Error 0x80248 means we failed sending an 8 byte setup token, transaction halted because device is in error state.
The transfer should either be retried, or the error raised to the calling code by the mean of the USB transfer completion callback. But it seems this never happens. As a result the ioctl syscall doing the operation never returns.
From a quick look at the EHCI code, it appears that we don't call the callback for cancelled transfers. We cancel transfers without calling the callback if doing a "force" cancel, which happens in bus manager Pipe destructor.
So basically, we could end up in this situation if the Pipe was deleted while transfers where queued, and they were not properly cancelled.
I can produce a similar issue with sigrok-cli and an fx2lafw device.
I will be studying the bus manager code, but if someone with more knowledge of the architecture could have a look, that would be great.
Change History (11)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
http://xref.plausible.coop/source/xref/haiku/src/add-ons/kernel/bus_managers/usb/Device.cpp#556
Instead of just "delete"ing the Pipe here, try calling CancelQueuedTransfers with "force" as "false" before doing so. I think it is still acceptable to do that at this stage.
comment:3 by , 5 years ago
While running sigrok-cli:
KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: TRB 0xb3b3da0 was not found in the endpoint! KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: TRB 0xb3b3de0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b3e20 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b3e60 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: TRB 0xb3b3ea0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: TRB 0xb3b3ee0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b3f20 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: TRB 0xb3b3f60 was not found in the endpoint! KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: TRB 0xb3b40a0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b40e0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b4120 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b4160 was not found in the endpoint! KERN: usb xhci 0: KERN: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b41a0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b41e0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b4220 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b4260 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b42a0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b42e0 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b4320 was not found in the endpoint! KERN: usb xhci 0: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 0: KERN: TRB 0xb3b4360 was not found in the endpoint!
At this point sigrok-cli says there is an error and stalls (I can't kill it with control C, the main thread is gone but there are worker threads still running).
Here I unplug the device:
KERN: usb hub 7: KERN: port 2: device removed KERN: usb xhci 1: cancel queued transfers for pipe 0xf5719c38 (0) KERN: usb hub 7: port 2: new device connected KERN: usb xhci 1: KERN: cancel queued transfers for pipe 0x81feeeb4 (0)
And a little later:
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: TRB 0xb3b5260 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b52a0 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b52e0 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b5320 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b5360 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b53a0 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b5460 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b54a0 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b54e0 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb error xhci 1: KERN: TRB 0xb3b5520 was not found in the endpoint! KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb hub 7: KERN: port 2: device removed KERN: usb xhci 1: cancel queued transfers for pipe 0xf974d648 (0) KERN: usb xhci 1: KERN: cancel queued transfers for pipe 0xf4840c40 (2) KERN: usb error xhci 1: unsuccessful command 15, error Context state (19)
comment:4 by , 5 years ago
I tried a second time and got less errors:
Connecting the device:
KERN: usb hub 7: port 2: new device connected KERN: usb xhci 1: cancel queued transfers for pipe 0x81feeeb4 (0)
Running the command:
KERN: usb hub 7: port 2: device removed KERN: usb xhci 1: cancel queued transfers for pipe 0xf6c46410 (0) KERN: usb hub 7: port 2: new device connected KERN: usb xhci 1: cancel queued transfers for pipe 0x81feeeb4 (0)
Unplugging the device:
KERN: usb xhci 1: got an interrupt for a non-Event Data TRB! KERN: usb hub 7: port 2: device removed KERN: usb xhci 1: cancel queued transfers for pipe 0xf970cb60 (0) KERN: usb xhci 1: KERN: cancel queued transfers for pipe 0xf7824910 (2) KERN: usb error xhci 1: unsuccessful command 15, error Context state (19)
comment:5 by , 5 years ago
Sigrok log for reference:
sr: [00:00.000002] log: libsigrok loglevel set to 5. sr: [00:00.000033] backend: libsigrok 0.5.1/5:0:1 (rt: 0.5.1/5:0:1). sr: [00:00.000062] backend: Libs: glib 2.57.1 (rt: 2.57.1/5701:0), libzip 1.2.0, libusb-1.0 1.0.22.11312 API 0x01000106, libftdi 1.4. sr: [00:00.000070] backend: Host: i586-pc-haiku, little-endian. sr: [00:00.000083] backend: SCPI backends: TCP, USBTMC. sr: [00:00.000092] backend: Firmware search paths: sr: [00:00.000129] backend: - /boot/home/config/non-packaged/data/sigrok-firmware sr: [00:00.000138] backend: - /packages/libsigrok_x86-0.5.1-4/.self/data/sigrok-firmware sr: [00:00.000159] backend: - /boot/system/non-packaged/data/sigrok-firmware sr: [00:00.000196] backend: - /boot/system/data/sigrok-firmware sr: [00:00.000255] backend: Sanity-checking all drivers. sr: [00:00.000266] backend: Sanity-checking all input modules. sr: [00:00.000281] backend: Sanity-checking all output modules. sr: [00:00.000297] backend: Sanity-checking all transform modules. sr: [00:00.001151] ezusb: uploading firmware to device on 1.3 sr: [00:00.001196] ezusb: setting CPU reset mode on... sr: [00:00.001779] resource: Attempt to open '/boot/home/config/non-packaged/data/sigrok-firmware/fx2lafw-saleae-logic.fw' failed: No such file or directory sr: [00:00.001831] resource: Opened '/packages/libsigrok_x86-0.5.1-4/.self/data/sigrok-firmware/fx2lafw-saleae-logic.fw'. sr: [00:00.001863] ezusb: Uploading firmware 'fx2lafw-saleae-logic.fw'. sr: [00:00.010282] ezusb: Uploaded 4096 bytes. sr: [00:00.018498] ezusb: Uploaded 4024 bytes. sr: [00:00.018511] ezusb: Firmware upload done. sr: [00:00.018521] ezusb: setting CPU reset mode off... sr: [00:00.019287] hwdriver: Scan found 1 devices (fx2lafw). sr: [00:00.019341] device: fx2lafw: Opening device instance. sr: [00:00.019350] fx2lafw: Waiting for device to reset. sr: [00:00.419390] fx2lafw: Waited 400ms. sr: [00:00.519414] fx2lafw: Waited 500ms. sr: [00:00.619440] fx2lafw: Waited 600ms. sr: [00:00.719464] fx2lafw: Waited 700ms. sr: [00:00.819486] fx2lafw: Waited 800ms. sr: [00:00.919510] fx2lafw: Waited 900ms. sr: [00:01.019539] fx2lafw: Waited 1000ms. sr: [00:01.119574] fx2lafw: Waited 1100ms. sr: [00:01.219604] fx2lafw: Waited 1200ms. sr: [00:01.319628] fx2lafw: Waited 1300ms. sr: [00:01.419655] fx2lafw: Waited 1400ms. sr: [00:01.519690] fx2lafw: Waited 1500ms. sr: [00:01.619715] fx2lafw: Waited 1600ms. sr: [00:01.719739] fx2lafw: Waited 1700ms. sr: [00:01.819759] fx2lafw: Waited 1800ms. sr: [00:01.919782] fx2lafw: Waited 1900ms. sr: [00:02.019814] fx2lafw: Waited 2000ms. sr: [00:02.119846] fx2lafw: Waited 2100ms. sr: [00:02.219880] fx2lafw: Waited 2200ms. sr: [00:02.319920] fx2lafw: Waited 2300ms. sr: [00:02.419955] fx2lafw: Waited 2400ms. sr: [00:02.519991] fx2lafw: Waited 2500ms. sr: [00:02.620027] fx2lafw: Waited 2600ms. sr: [00:02.720063] fx2lafw: Waited 2700ms. sr: [00:02.721193] fx2lafw: Opened device on 1.3 (logical) / usb/1-3 (physical), interface 0, firmware 1.3. sr: [00:02.721209] fx2lafw: Detected REVID=1, it's a Cypress CY7C68013A (FX2LP). sr: [00:02.721217] fx2lafw: Device came back after 2700ms. sr: [00:02.721308] session: Using thread-default main context. sr: [00:02.721326] session: Starting. sr: [00:02.721339] hwdriver: fx2lafw: Starting acquisition. sr: [00:02.721371] fx2lafw: submitting transfer: 0 sr: [00:02.721384] fx2lafw: submitting transfer: 1 sr: [00:02.721394] fx2lafw: submitting transfer: 2 sr: [00:02.721403] fx2lafw: submitting transfer: 3 sr: [00:02.721413] fx2lafw: submitting transfer: 4 sr: [00:02.721422] fx2lafw: submitting transfer: 5 sr: [00:02.721430] fx2lafw: submitting transfer: 6 sr: [00:02.721438] fx2lafw: submitting transfer: 7 sr: [00:02.721446] fx2lafw: submitting transfer: 8 sr: [00:02.721454] fx2lafw: submitting transfer: 9 sr: [00:02.721462] fx2lafw: submitting transfer: 10 sr: [00:02.721472] fx2lafw: submitting transfer: 11 sr: [00:02.721483] fx2lafw: submitting transfer: 12 sr: [00:02.721493] fx2lafw: submitting transfer: 13 sr: [00:02.721511] fx2lafw: submitting transfer: 14 sr: [00:02.721520] fx2lafw: submitting transfer: 15 sr: [00:02.721528] fx2lafw: submitting transfer: 16 sr: [00:02.721537] fx2lafw: submitting transfer: 17 sr: [00:02.721547] fx2lafw: submitting transfer: 18 sr: [00:02.721579] session: bus: Received SR_DF_HEADER packet. cli: Received SR_DF_HEADER. sr: [00:02.721677] hwdriver: sr_config_get(): key 30000 (samplerate) sdi 0x4b0b40 cg NULL -> uint64 20000 sr: [00:02.721695] fx2lafw: GPIF delay = 1499, clocksource = 30MHz. sr: [00:02.821749] fx2lafw: Unable to send start command: LIBUSB_ERROR_TIMEOUT. sr: [00:02.821773] session: Could not start fx2lafw device usb/1-3 acquisition. sr: [00:02.821780] hwdriver: fx2lafw: Stopping acquisition. Failed to start session.
comment:6 by , 5 years ago
https://review.haiku-os.org/c/haiku/+/1424 was the fix for this. I don't know if anyone ever tested it however.
comment:7 by , 5 years ago
I will check if the lower polling rate in the ehci driver helps with this, when I get back home and have access to the hardware again.
comment:8 by , 5 years ago
Still broken. As for the change in 1424, you called it "obviously wrong", so I'm happy to test a less wrong version if you can provide one :)
comment:9 by , 5 years ago
The first version of the change in 1424 was wrong, however, I think the current version is completely different and actually correct?
comment:10 by , 3 years ago
Component: | Drivers/USB/EHCI → Drivers/USB |
---|---|
Summary: | libusb syscall stalls when EHCI errors → libusb syscall stalls when USB errors |
This still happens, and under XHCI too. Example backtrace (trimmed):
stack trace for thread 20814 "cameralooper" kernel stack: 0xffffffff81f7b000 to 0xffffffff81f80000 user stack: 0x00007fd327bf6000 to 0x00007fd327c36000 frame caller <image>:function + offset 0 ffffffff81f7fcc0 (+ 128) ffffffff800997b3 <kernel_x86_64> reschedule(int) + 0x433 1 ffffffff81f7fd40 (+ 48) ffffffff80089626 <kernel_x86_64> thread_block + 0xc6 2 ffffffff81f7fd70 (+ 160) ffffffff80071195 <kernel_x86_64> switch_sem_etc + 0x3e5 3 ffffffff81f7fe10 (+ 176) ffffffff8166436b </boot/system/add-ons/kernel/drivers/dev/bus/usb_raw> usb_raw_ioctl(void*, unsigned int, void*, unsigned long) + 0x33b 4 ffffffff81f7fec0 (+ 64) ffffffff800e79da <kernel_x86_64> fd_ioctl(bool, int, unsigned int, void*, unsigned long) + 0x5a 5 ffffffff81f7ff00 (+ 32) ffffffff800e86aa <kernel_x86_64> _user_ioctl + 0x3a 6 ffffffff81f7ff20 (+ 16) ffffffff8014622f <kernel_x86_64> x86_64_syscall_entry + 0xfb user iframe at 0xffffffff81f7ff30 (end = 0xffffffff81f7fff8) rax 0x92 rbx 0x1109fcd14420 rcx 0x8d6eeb410c rdx 0x7fd327c35490 rsi 0x4002 rdi 0xe rbp 0x7fd327c35480 r8 0x7fd327c35770 r9 0x1109fcc88340 r10 0x30 r11 0x202 r12 0xc r13 0x7fd327c354f0 r14 0x1109fcc88340 r15 0x1109fcc883c8 rip 0x8d6eeb410c rsp 0x7fd327c35468 rflags 0x202 vector: 0x63, error code: 0x0 7 ffffffff81f7ff30 (+140546996393296) 0000008d6eeb410c <libroot.so> _kern_ioctl + 0x0c 8 00007fd327c35480 (+ 96) 0000013f725c7eef <libdevice.so> BUSBEndpoint::BulkTransfer const(void*, unsigned long) + 0x4f
There are various instances of this in the syslog:
usb xhci 0: transfer error on slot 2 endpoint 1: Stall usb error control pipe 15: timeout waiting for queued request to complete usb xhci 0: cancel queued transfers (1) for pipe 0xffffffffe5874e00 (0) usb error xhci 0: cancel queued transfers: halted endpoint, reset!
However, the device was not actually disconnected, so we should not be stalled here.
comment:11 by , 3 years ago
I improved things a bit in hrev55806; at least applications are now killable without removing the USB devices entirely.
I guess it makes sense to force-cancel transfers from Pipe destructor; but I'm not sure why the transfers were not cancelled earlier. Pipe destruction happens during Device teardown; likely the Device teardown should cancel any queued transfers "normally"?