Opened 13 years ago
Last modified 18 months ago
#7669 in-progress bug
SCSI sync command hangs usb mass storage device
Reported by: | pulkomandy | Owned by: | mmlr |
---|---|---|---|
Priority: | normal | Milestone: | R1.1 |
Component: | Drivers/Disk/USB | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | #7673 | |
Platform: | All |
Description
I have a cheap MP3 player that works as an usb mass storage device. However, it seems it's too slow for Haiku to handle. When I plug it I get a panic : "Could not write back block nnn (operation timed out)"
I think the timeout delay should be increased to avoid this.
Currently running on pre-alpha3, up to 41941 merged from trunk.
Attachments (1)
Change History (16)
by , 13 years ago
Attachment: | P1100427.JPG added |
---|
comment:1 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → in-progress |
Replying to pulkomandy:
I think the timeout delay should be increased to avoid this.
The timeout is at 10 seconds right now, but the problem is that it is a fixed timeout, one that doesn't actually take the amount of data (or any other factors like slower partial block reads/writes) into account. In principle the timeout could be removed completely, I put it in there to avoid systems just hanging without an indication of what's going on. If it actually failed and the device stalled/stopped, it'll be stopped after 20, 30 seconds still, so increasing the timeout to some rather large value should be fine as well. I'd just hate to see people getting impatient and writing bug reports about generic hangs or worse yet no bug reports at all. A panic like this usually is easier to report.
I'll see about increasing the timeout at least in the alpha branch. However I'd not rule out that in your case there actually is a problem and the device stalled, or that there is a problem that makes the device so slow in the first place. So the panic might not be unjustified.
comment:2 by , 13 years ago
Blocking: | 7673 added |
---|
comment:3 by , 13 years ago
i hit the exact same bug with hrev42807 but for some reason it does not occur with the gcc4 hybrid build (same version). My hardware eee pc 701
comment:4 by , 13 years ago
Summary: | Agressive timeout fails on slow USB device → SCSI sync command hangs usb mass storage device |
---|
After a try to increase the timeout, I still got the bug. Further investigagion at BeGeistert showed that the problem is actually elsewhere. The SCSI Sync (0x35) command hangs this device. We notice it and won't use it again after the first try, but it's too late, the device is hanged and won't answer to anything else after that. Seems we need a quirk to reset the device or something.
Listusb for that device :
10d6:1101 /dev/bus/usb/4/0 "" "USB MASS STORAGE CLASS " ver. 0100
This device is an "s1mp3" MP3 player. It uses a fairly widespread chip (AK1025). I expect any device from the same vendor (10d6, Actions Ltd), to have the same problem.
comment:5 by , 10 years ago
We might have to consider (as much as I hate it) to have some kind of quirks / blacklist in the usb_disk driver or so to prevent certain devices from locking up like this. We had to resort to that in ZETA as well, to keep a large number of USB devices from locking up like this.
comment:6 by , 10 years ago
Some updates here. I tried your changes to handle the device stall more properly. The device still gets confused and apparently powers itself off or enters some kind of debug mode (I know there is a "firmware upgrade" mode reachable by sending specific SCSI commands to it).
After this, I get the "could not write back block" error only as a syslog message. Then I pressed "stop" in Tracker to stop the copy, and at that point I got a NULL pointer dereference in the FAT filesystem (_fat_ioctl_
does a memcpy with a NULL pointer).
Adding a quirk for this device is fine, but even without that, we should make sure the system doesn't crash. We made a small step towards that, but it may need some extra efforts before the error gets up to Tracker and can be reported as a BAlert.
comment:7 by , 6 years ago
Component: | Drivers/Disk → Drivers/Disk/USB |
---|
comment:8 by , 4 years ago
Milestone: | R1 → R1.1 |
---|
comment:10 by , 18 months ago
I doubt it will change anything here, the issue is very specific to a bad implementation of mass storage in that specific device. Is it really useful if I retry it?
I mean, the best I can hope is that the device will still crash, but maybe it will not bring down my Haiku system with it?
comment:12 by , 18 months ago
No crash anymore.
I can mount the drive and access it without problems as long as I don't write to it.
Writing works fine until the system attempts to send the sync command. Then, Tracker freezes for a while and eventually I get this in syslog:
kern: usb error control pipe 94: timeout waiting for queued request to complete KERN: usb_disk: operation 0x35 failed at the SCSI level KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out Last message repeated 2 times KERN: usb_disk: getting request sense data failed: Operation timed out
More errors after a while and maybe as I try to unmount the device:
kern: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out KERN: could not write back block 232 (Operation timed out) KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out Last message repeated 1 time KERN: could not write back block 233 (Operation timed out) KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out KERN: could not write back block 1 (Operation timed out) KERN: slab memory manager: created area 0xffffffffb3001000 (32535) KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out KERN: could not write back block 234 (Operation timed out) KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out KERN: could not write back block 232 (Operation timed out)
And after I unplug the device:
kern: usb xhci 0: transfer error on slot 10 endpoint 2: USB transaction KERN: usb_disk: sending the command block wrapper failed: Device check-sum error KERN: usb xhci 0: transfer error on slot 10 endpoint 1: USB transaction KERN: usb hub 2: port 0: device removedKERN: usb xhci 0: KERN: could not write back block 235 (I/O error)KERN: cancel queued transfers (1) for pipe 0xffffffffaf691d00 (0) KERN: usb error xhci 0: KERN: cancel queued transfers: halted endpoint, reset! KERN: usb xhci 0: cancel queued transfers (1) for pipe 0xffffffffaf2fc6e0 (1) KERN: usb error xhci 0: KERN: cancel queued transfers: halted endpoint, reset! KERN: usb xhci 0: cancel queued transfers (0) for pipe 0xffffffffaf2fca50 (2) KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out KERN: could not write back block 288 (Operation timed out) KERN: usb_disk: failed to queue data transfer: Unknown Device Error (-2147442669) KERN: could not write back block 236 (Unknown Device Error (-2147442669)) KERN: usb_disk: failed to queue data transfer: Unknown Device Error (-2147442669) Last message repeated 1 time KERN: could not write back block 289 (Bad file descriptor) kern: could not write back block 237 (Bad file descriptor) KERN: could not write back block 290 (Bad file descriptor) KERN: usb xhci 0: KERN: could not write back block 238 (Bad file descriptor) KERN: could not write back block 291 (Bad file descriptor)KERN: cancel queued transfers (0) for pipe 0xffffffffaf691d00 (0) KERN: could not write back block 239 (Bad file descriptor) KERN: could not write back block 292 (Bad file descriptor) KERN: usb xhci 0: KERN: could not write back block 240 (Bad file descriptor) KERN: could not write back block 293 (Bad file descriptor) KERN: cancel queued transfers (0) for pipe 0xffffffffaf691d00 (0) KERN: could not write back block 241 (Bad file descriptor)
followed by more "could not write back block" errors
comment:13 by , 18 months ago
It appears we do not attempt reset_recovery on synchronize failing or timing out. We should likely do that.
comment:14 by , 18 months ago
Please try this patch:
diff --git a/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp b/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp index dd260d2c94..164eba358d 100644 --- a/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp +++ b/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp @@ -88,7 +88,6 @@ static void usb_disk_callback(void *cookie, status_t status, void *data, size_t actualLength); uint8 usb_disk_get_max_lun(disk_device *device); -void usb_disk_reset_recovery(disk_device *device); status_t usb_disk_receive_csw(disk_device *device, usb_massbulk_command_status_wrapper *status); @@ -199,7 +198,7 @@ usb_disk_clear_halt(usb_pipe pipe) } -void +static void usb_disk_reset_recovery(disk_device *device, err_act *_action) { TRACE("reset recovery\n"); @@ -999,6 +998,11 @@ usb_disk_synchronize(device_lun *lun, bool force) return B_OK; } + if (result == B_DEV_STALLED || result == B_TIMED_OUT) { + usb_disk_reset_recovery(lun->device, NULL); + result = B_DEV_INVALID_IOCTL; + } + if (result == B_DEV_INVALID_IOCTL) lun->device->sync_support = 0; else
comment:15 by , 18 months ago
As I expected the patch does not make much of a difference. Once we send the unrecognized sync command, the device crashes. The screen turns off and pressing the buttons don't do anything. There is no way we can get out of this from Haiku side. But really, this is because this device is extremely broken.
As discussed earlier in the ticket, we would need a quirk to never send this sync command to this device (identified by VID/PID).
Panic and backtrace.