Opened 12 years ago

Closed 5 years ago

#2367 closed bug (fixed)

Media checker blocks in USB when booting from USB

Reported by: axeld Owned by: mmlr
Priority: normal Milestone: R1
Component: Drivers/USB Version: R1/pre-alpha1
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description (last modified by axeld)

I've installed Haiku on a 128 MB USB stick, and booted it from there. Booting itself went fine, but applications using the disk device API (like mountvolume, DriveSetup) hang when I start them.

The media checker waits on some EHCI finisher which never seems to return. When I disable the media checker in the kernel, the lockup does not happen, and the above mentioned apps work nicely.

Tested with hrev25898.

Attachments (2)

eepc-usb-lockup.jpg (199.1 KB ) - added by axeld 12 years ago.
Media checker is waiting forever…
ehci_finish_every_ms.diff (571 bytes ) - added by mmlr 12 years ago.
Possible workaround for lost interrupts on broken controllers.

Download all attachments as: .zip

Change History (16)

comment:1 by axeld, 12 years ago

Description: modified (diff)
Priority: normalhigh

comment:2 by mmlr, 12 years ago

Status: newassigned

Does this happen with the boot USB stick alone or are there other mass storage devices? If the media checker blocks waiting for a transfer that never finishes (which is possible as there is no timeout handling in usb_disk yet) then theoretically all IO to the boot volume should block too.

comment:3 by mmlr, 12 years ago

I just checked here with hrev25882 (no changes in that regard to hrev25898) and I am not able to reproduce. I booted of the stick and ran DriveSetup and also mountvolume. Both worked as expected and didn't hang. I find it a bit strange that it waits for the EHCI finisher, as this thread is just finishing transfers and calls the callbacks. Even if a callback would queue a new transfer this would be done asynchronously so it cannot really deadlock there (at least not with itself). Why do you think it waits for the finisher? Can you provide some sort of debug output? I can only imagine that, as mentioned above, the device does simply never act on a queued transfer (although then the controller should still return the transfer with a timeout at some point). What would surprise me then is if other IO to that device would still work, as it should simply lock up usb_disk then.

comment:4 by axeld, 12 years ago

There is only a single USB disk attached, plus the built-in SATA mass storage (two drives). I don't have a serial output from that machine, but I'll try to get more specific data.

The media checker itself waited for some disk device lock, and the lock owner (I don't remember who) was waiting for the EHCI finisher.

I probably don't have the time to do this before Thursday, though.

comment:5 by mmlr, 12 years ago

When you're at that machine again, could you please check what chip it uses for EHCI? There seem to be workarounds applied in other EHCI drivers namely for broken VIA chips that simply lose completion interrupts...

comment:6 by miqlas, 12 years ago

"I've installed Haiku on a 128 MB USB stick.."

How You installed the Haiku to an 128MB stick? The image size is 262MB (But it contains ~100MB free space, then 262-100=162 and 162>128). Are You sure You made correct bootdisk? How You made the bootdisk?

by axeld, 12 years ago

Attachment: eepc-usb-lockup.jpg added

Media checker is waiting forever...

comment:7 by axeld, 12 years ago

To mmlr: added image of media checker stack trace. Seems I could have remembered better... :-)

To miqlas: just remove all optional packages, and Haiku installs fine on smaller images. The image was actually only 100 MB in size, I don't remember how much free space was left, though.

comment:8 by mmlr, 12 years ago

Axel, any news regarding the EHCI chip in use? If it really is a VIA or ATI one a workaround for lost interrupts might be in order. I'll attach a patch to this ticket that should do pretty much that by setting a timeout on sem acquisition in the finisher thread, so that it will unconditionally wake up once every ms. Could you please try with that and check if it solves the problem. If so I would like to blacklist that chip you have there to always use such a workaround.

by mmlr, 12 years ago

Attachment: ehci_finish_every_ms.diff added

Possible workaround for lost interrupts on broken controllers.

comment:9 by mmlr, 12 years ago

I've added a timeout in usb_disk in hrev26082. Which might solve or at least work around this issue. Could you please retry with that.

comment:10 by axeld, 12 years ago

The timeout seems to have successfully worked around the issue. I get the following messages in syslog (tons of):

usb_disk: sending the command block wrapper failed
usb_ehci: qtd (0x0f015f00) error: 0x80008d40
usb_disk: acquire_sem failed while waiting for data transfer

Not sure what this means; maybe the command couldn't even been send in the first place? Is is possible to differentiate between devices where it makes sense to check for media, and those where it doesn't?

The EHCI controller is one from Intel 0x265c, the UHCI controllers as well (ICH6).

comment:11 by axeld, 12 years ago

Milestone: R1/alpha1R1
Priority: highnormal

Since the lockup is gone, I'm changing the milestone.

comment:12 by mmlr, 11 years ago

Replying to axeld:

Not sure what this means; maybe the command couldn't even been send in the first place? Is is possible to differentiate between devices where it makes sense to check for media, and those where it doesn't?

Well, it makes sense to check for media when it's declared as removable. Most USB drives are labeled removable though, so this is not really a good way of telling. What could and probably should be done is to just stop checking for media changes when the test unit ready command doesn't work.

In any case, could you check with a revision >= hrev28934 to see if the fixed reset recovery solves this issue?

comment:13 by mmlr, 10 years ago

As mentioned in the comment above disabling media checking on devices that don't seem to support it has been implemented some time ago. The fixed reset recovery might have improved things as well, so please retest if possible.

comment:14 by mmlr, 5 years ago

Resolution: fixed
Status: in-progressclosed

As mentioned in the comments above:

  1. Timeout handling in usb_disk has been implemented.
  2. The test unit ready command is now disabled on devices that fail it too often.

The issue should therefore be long gone.

Note: See TracTickets for help on using tickets.