Opened 13 years ago

Last modified 18 months ago

#7669 in-progress bug

SCSI sync command hangs usb mass storage device

Reported by: pulkomandy Owned by: mmlr
Priority: normal Milestone: R1.1
Component: Drivers/Disk/USB Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #7673
Platform: All

Description

I have a cheap MP3 player that works as an usb mass storage device. However, it seems it's too slow for Haiku to handle. When I plug it I get a panic : "Could not write back block nnn (operation timed out)"

I think the timeout delay should be increased to avoid this.

Currently running on pre-alpha3, up to 41941 merged from trunk.

Attachments (1)

P1100427.JPG (1.8 MB ) - added by pulkomandy 13 years ago.
Panic and backtrace.

Download all attachments as: .zip

Change History (16)

by pulkomandy, 13 years ago

Attachment: P1100427.JPG added

Panic and backtrace.

in reply to:  description comment:1 by mmlr, 13 years ago

Owner: changed from marcusoverhagen to mmlr
Status: newin-progress

Replying to pulkomandy:

I think the timeout delay should be increased to avoid this.

The timeout is at 10 seconds right now, but the problem is that it is a fixed timeout, one that doesn't actually take the amount of data (or any other factors like slower partial block reads/writes) into account. In principle the timeout could be removed completely, I put it in there to avoid systems just hanging without an indication of what's going on. If it actually failed and the device stalled/stopped, it'll be stopped after 20, 30 seconds still, so increasing the timeout to some rather large value should be fine as well. I'd just hate to see people getting impatient and writing bug reports about generic hangs or worse yet no bug reports at all. A panic like this usually is easier to report.

I'll see about increasing the timeout at least in the alpha branch. However I'd not rule out that in your case there actually is a problem and the device stalled, or that there is a problem that makes the device so slow in the first place. So the panic might not be unjustified.

comment:2 by korli, 13 years ago

Blocking: 7673 added

comment:3 by sepp, 13 years ago

i hit the exact same bug with hrev42807 but for some reason it does not occur with the gcc4 hybrid build (same version). My hardware eee pc 701

comment:4 by pulkomandy, 13 years ago

Summary: Agressive timeout fails on slow USB deviceSCSI sync command hangs usb mass storage device

After a try to increase the timeout, I still got the bug. Further investigagion at BeGeistert showed that the problem is actually elsewhere. The SCSI Sync (0x35) command hangs this device. We notice it and won't use it again after the first try, but it's too late, the device is hanged and won't answer to anything else after that. Seems we need a quirk to reset the device or something.

Listusb for that device :

10d6:1101 /dev/bus/usb/4/0 "" "USB MASS STORAGE CLASS " ver. 0100

This device is an "s1mp3" MP3 player. It uses a fairly widespread chip (AK1025). I expect any device from the same vendor (10d6, Actions Ltd), to have the same problem.

comment:5 by ithamar, 10 years ago

We might have to consider (as much as I hate it) to have some kind of quirks / blacklist in the usb_disk driver or so to prevent certain devices from locking up like this. We had to resort to that in ZETA as well, to keep a large number of USB devices from locking up like this.

comment:6 by pulkomandy, 10 years ago

Some updates here. I tried your changes to handle the device stall more properly. The device still gets confused and apparently powers itself off or enters some kind of debug mode (I know there is a "firmware upgrade" mode reachable by sending specific SCSI commands to it).

After this, I get the "could not write back block" error only as a syslog message. Then I pressed "stop" in Tracker to stop the copy, and at that point I got a NULL pointer dereference in the FAT filesystem (_fat_ioctl_ does a memcpy with a NULL pointer).

Adding a quirk for this device is fine, but even without that, we should make sure the system doesn't crash. We made a small step towards that, but it may need some extra efforts before the error gets up to Tracker and can be reported as a BAlert.

comment:7 by pulkomandy, 6 years ago

Component: Drivers/DiskDrivers/Disk/USB

comment:8 by pulkomandy, 4 years ago

Milestone: R1R1.1

comment:9 by waddlesplash, 18 months ago

Please retest after hrev57051.

comment:10 by pulkomandy, 18 months ago

I doubt it will change anything here, the issue is very specific to a bad implementation of mass storage in that specific device. Is it really useful if I retry it?

I mean, the best I can hope is that the device will still crash, but maybe it will not bring down my Haiku system with it?

comment:11 by waddlesplash, 18 months ago

Yes, that's the idea.

comment:12 by pulkomandy, 18 months ago

No crash anymore.

I can mount the drive and access it without problems as long as I don't write to it.

Writing works fine until the system attempts to send the sync command. Then, Tracker freezes for a while and eventually I get this in syslog:

kern: usb error control pipe 94: timeout waiting for queued request to complete
KERN: usb_disk: operation 0x35 failed at the SCSI level
KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
Last message repeated 2 times
KERN: usb_disk: getting request sense data failed: Operation timed out

More errors after a while and maybe as I try to unmount the device:

kern: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
KERN: could not write back block 232 (Operation timed out)
KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
Last message repeated 1 time
KERN: could not write back block 233 (Operation timed out)
KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
KERN: could not write back block 1 (Operation timed out)
KERN: slab memory manager: created area 0xffffffffb3001000 (32535)
KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
KERN: could not write back block 234 (Operation timed out)
KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
KERN: could not write back block 232 (Operation timed out)

And after I unplug the device:

kern: usb xhci 0: transfer error on slot 10 endpoint 2: USB transaction
KERN: usb_disk: sending the command block wrapper failed: Device check-sum error
KERN: usb xhci 0: transfer error on slot 10 endpoint 1: USB transaction
KERN: usb hub 2: port 0: device removedKERN: usb xhci 0: 
KERN: could not write back block 235 (I/O error)KERN: cancel queued transfers (1) for pipe 0xffffffffaf691d00 (0)
KERN: usb error xhci 0: 
KERN: cancel queued transfers: halted endpoint, reset!
KERN: usb xhci 0: cancel queued transfers (1) for pipe 0xffffffffaf2fc6e0 (1)
KERN: usb error xhci 0: KERN: cancel queued transfers: halted endpoint, reset!
KERN: usb xhci 0: cancel queued transfers (0) for pipe 0xffffffffaf2fca50 (2)
KERN: usb_disk: acquire_sem failed while waiting for data transfer: Operation timed out
KERN: could not write back block 288 (Operation timed out)
KERN: usb_disk: failed to queue data transfer: Unknown Device Error (-2147442669)
KERN: could not write back block 236 (Unknown Device Error (-2147442669))
KERN: usb_disk: failed to queue data transfer: Unknown Device Error (-2147442669)
Last message repeated 1 time
KERN: could not write back block 289 (Bad file descriptor)
kern: could not write back block 237 (Bad file descriptor)
KERN: could not write back block 290 (Bad file descriptor)
KERN: usb xhci 0: KERN: could not write back block 238 (Bad file descriptor)
KERN: could not write back block 291 (Bad file descriptor)KERN: cancel queued transfers (0) for pipe 0xffffffffaf691d00 (0)
KERN: could not write back block 239 (Bad file descriptor)
KERN: could not write back block 292 (Bad file descriptor)
KERN: usb xhci 0: KERN: could not write back block 240 (Bad file descriptor)
KERN: could not write back block 293 (Bad file descriptor)
KERN: cancel queued transfers (0) for pipe 0xffffffffaf691d00 (0)
KERN: could not write back block 241 (Bad file descriptor)

followed by more "could not write back block" errors

comment:13 by waddlesplash, 18 months ago

It appears we do not attempt reset_recovery on synchronize failing or timing out. We should likely do that.

comment:14 by waddlesplash, 18 months ago

Please try this patch:

diff --git a/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp b/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp
index dd260d2c94..164eba358d 100644
--- a/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp
+++ b/src/add-ons/kernel/drivers/disk/usb/usb_disk/usb_disk.cpp
@@ -88,7 +88,6 @@ static void	usb_disk_callback(void *cookie, status_t status, void *data,
 				size_t actualLength);
 
 uint8		usb_disk_get_max_lun(disk_device *device);
-void		usb_disk_reset_recovery(disk_device *device);
 status_t	usb_disk_receive_csw(disk_device *device,
 				usb_massbulk_command_status_wrapper *status);
 
@@ -199,7 +198,7 @@ usb_disk_clear_halt(usb_pipe pipe)
 }
 
 
-void
+static void
 usb_disk_reset_recovery(disk_device *device, err_act *_action)
 {
 	TRACE("reset recovery\n");
@@ -999,6 +998,11 @@ usb_disk_synchronize(device_lun *lun, bool force)
 		return B_OK;
 	}
 
+	if (result == B_DEV_STALLED || result == B_TIMED_OUT) {
+		usb_disk_reset_recovery(lun->device, NULL);
+		result = B_DEV_INVALID_IOCTL;
+	}
+
 	if (result == B_DEV_INVALID_IOCTL)
 		lun->device->sync_support = 0;
 	else

comment:15 by pulkomandy, 18 months ago

As I expected the patch does not make much of a difference. Once we send the unrecognized sync command, the device crashes. The screen turns off and pressing the buttons don't do anything. There is no way we can get out of this from Haiku side. But really, this is because this device is extremely broken.

As discussed earlier in the ticket, we would need a quirk to never send this sync command to this device (identified by VID/PID).

Note: See TracTickets for help on using tickets.