Ticket #2083 (assigned bug)

Opened 3 months ago

Last modified 4 weeks ago

EHCI bus hanging at boot.

Reported by: bga Owned by: mmlr
Priority: blocker Milestone: R1
Component: Drivers/USB Version: R1 development
Cc: Blocked By:
Platform: All Blocking:

Description

Tried this with the latest revision, r24979, and the problem is still there. When booting, the system will hang when the ehci bus is trying to "steal" the host controller from the BIOS. I could not get serial debug output (as my computer has no serial port) so i took a picture of the point where it stops (hope it is clear enough). Note that the boot actually gets pretty fast up to the line that says:

More than 99% interrupts of vector 7 are unhandled

It takes several minutes for a new message to appear (probably because the previous message repeated for like 4000 times as the output shows).

Note the around one week ago it worked flawlessly on my computer.

Attachments

output.JPG (411.3 kB) - added by bga 3 months ago.
Debug output (crappy picture, sorry)
lspci.txt (8.8 kB) - added by pieterpan 3 months ago.
lspci from linux
lsusb.txt (34.6 kB) - added by pieterpan 3 months ago.
lsusb from linux
diff_no_usb_trace.diff (0.7 kB) - added by pieterpan 3 months ago.
cant get usb traccking output with this... why not?
DSC02671 (Custom).JPG (155.0 kB) - added by pieterpan 3 months ago.
bunch of pictures of all usb related output
DSC02672 (Custom).JPG (95.5 kB) - added by pieterpan 3 months ago.
bunch of pictures of all usb related output
DSC02673 (Custom).JPG (103.0 kB) - added by pieterpan 3 months ago.
bunch of pictures of all usb related output
DSC02674 (Custom).JPG (93.7 kB) - added by pieterpan 3 months ago.
bunch of pictures of all usb related output
DSC02676 (Custom).JPG (93.3 kB) - added by pieterpan 3 months ago.
bunch of pictures of all usb related output
DSC02677 (Custom).JPG (88.3 kB) - added by pieterpan 3 months ago.
bunch of pictures of all usb related output
DSC02678 (Custom).JPG (149.5 kB) - added by pieterpan 3 months ago.
bunch of pictures of all usb related output

Change History

Changed 3 months ago by bga

Debug output (crappy picture, sorry)

Changed 3 months ago by mmlr

  • status changed from new to assigned

It seems that the BIOS first does not hand over the control, but then still decides to do so which causes an interrupt flood. I've increased the amount of time the handover code tries to get the controller with r24980 and also let it disable interrupts if the handover fails. Could you please check if this allows to boot, and if so if the EHCI controller then actually failed or succeeded in its initialization? You should see corresponding lines in the syslog. In fact Linux does not really care about a failed handover. It just force turns off the BIOS owned bit and disables all the SMIs and then goes on assuming now owning the controller. If it turns out that there are BIOSes that do in fact not complete the handover at all then we might want to do the same thing in the future.

Changed 3 months ago by bga

The system booted, but I am not sure ehci was ever initialized. The only references to ehci in the syslog are from the PCI bus scan. I then searched for usb and besides the PCI scan, I only found the following messages:

KERN: loaded driver /boot/beos/system/add-ons/kernel/drivers/dev/input/usb_hid

and

KERN: loaded driver /boot/beos/system/add-ons/kernel/drivers/dev/bus/usb_raw

Mouse is connected through USB and is working. Also:

~>ls /dev/bus/usb/ 0 1 2 3 4 5 6 raw

and

~>ls /dev/bus/usb/3 1 hub

~>ls /dev/bus/usb/5 1 hub

(those probably are the only 2 devices I have connected through USB. The mouse and a webcam).

If you need any other information, let me know.

Changed 3 months ago by mmlr

You could use usb_dev_info to get the info about all of the hubs (i.e. "usb_dev_info /dev/bus/usb/x/hub"). Their product string will tell you whether they are UHCI or EHCI root hubs. Usually you have a 1:3 ratio between UHCI and EHCI devices. So I guess with the 7 devices that 6 of them (0-5) should be UHCI and one should be EHCI (6) with one EHCI controller probably missing. If you have USB 2.0 devices you can just plug them into every port you have and see where they get published. If they end up under a UHCI root hub then the EHCI controller that's responsible for these ports wasn't initialized.

Changed 3 months ago by bga

You were right. 6 was EHCI. Just to be sure I connected a USB card reader to the computer and got this (this is a multi-port reader with only one MS Pro attached to it):

KERN: usb_disk: device reports a lun count of 4 KERN: usb_disk: vendor_identification "Generic " KERN: usb_disk: product_identification "USB SD Reader " KERN: usb_disk: product_revision_level "1.00" KERN: usb_disk: operation 0x00 failed at the SCSI level KERN: usb_disk: vendor_identification "Generic " KERN: usb_disk: product_identification "USB CF Reader " KERN: usb_disk: product_revision_level "1.01" KERN: usb_disk: operation 0x00 failed at the SCSI level KERN: usb_disk: vendor_identification "Generic " KERN: usb_disk: product_identification "USB SM Reader " KERN: usb_disk: product_revision_level "1.02" KERN: usb_disk: operation 0x00 failed at the SCSI level KERN: usb_disk: vendor_identification "Generic " KERN: usb_disk: product_identification "USB MS Reader " KERN: usb_disk: product_revision_level "1.03" KERN: usb_disk: operation 0x00 failed at the SCSI level KERN: usb_disk: request_sense: key: 0x06; asc: 0x28; ascq: 0x00; KERN: usb_disk: request_sense: media changed KERN: usb_disk: operation 0x00 failed at the SCSI level KERN: usb_ehci: qtd (0x0393d100) error: 0x00088d40 KERN: usb_ehci: qtd (0x0393d300) error: 0x80008d40 KERN: usb_ehci: qtd (0x0393d480) error: 0x000d8d40 KERN: usb_ehci: qtd (0x0393d680) error: 0x80008d40 KERN: usb_ehci: qtd (0x0393d800) error: 0x000d8d40 KERN: usb_disk: receiving the command status wrapper failed KERN: usb_ehci: qtd (0x0393dc00) error: 0x80008d40 KERN: usb_disk: failed to update capacity

After this the system stop responding. I can for example type ls on the terminal but it will hang just after I hit enter. I can also not start any other apps and the ones that are running eventually hang too. Then I remove the card reader form the USB port and get this:

KERN: usb_ehci: qtd (0x0393df80) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x0393e100) error: 0x00080248 KERN: usb_ehci: qtd (0x0393e300) error: 0x00080248 KERN: usb_ehci: qtd (0x0393e500) error: 0x00080248 KERN: usb_ehci: qtd (0x0393e700) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x0393e880) error: 0x00080248 KERN: usb_ehci: qtd (0x0393ea80) error: 0x00080248 KERN: usb_ehci: qtd (0x0393ec80) error: 0x00080248 KERN: usb_disk: failed to update capacity KERN: usb_ehci: qtd (0x0393ee80) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x0393f000) error: 0x00080248 KERN: usb_ehci: qtd (0x0393f200) error: 0x00080248 KERN: usb_ehci: qtd (0x0393f400) error: 0x00080248 KERN: usb_ehci: qtd (0x0393f600) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x0393f780) error: 0x00080248 KERN: usb_ehci: qtd (0x0393f980) error: 0x00080248 KERN: usb_ehci: qtd (0x0393fb80) error: 0x00080248 KERN: usb_ehci: qtd (0x0393fd80) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x0393ff00) error: 0x00080248 KERN: usb_ehci: qtd (0x03940100) error: 0x00080248 KERN: usb_ehci: qtd (0x03940300) error: 0x00080248 KERN: usb_disk: failed to update capacity KERN: usb_ehci: qtd (0x03940500) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x03940680) error: 0x00080248 KERN: usb_ehci: qtd (0x03940880) error: 0x00080248 KERN: usb_ehci: qtd (0x03940a80) error: 0x00080248 KERN: usb_ehci: qtd (0x03940c80) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x03940e00) error: 0x00080248 KERN: usb_ehci: qtd (0x03941000) error: 0x00080248 KERN: usb_ehci: qtd (0x03941200) error: 0x00080248 KERN: usb_ehci: qtd (0x03941400) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x03941580) error: 0x00080248 KERN: usb_ehci: qtd (0x03941880) error: 0x00080248 KERN: usb_ehci: qtd (0x03941a80) error: 0x00080248 KERN: usb_disk: failed to update capacity KERN: usb_ehci: qtd (0x03941c80) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x03941e00) error: 0x00080248 KERN: usb_ehci: qtd (0x03942000) error: 0x00080248 KERN: usb_ehci: qtd (0x03942200) error: 0x00080248 KERN: usb_ehci: qtd (0x03942400) error: 0x001f8049 KERN: usb_disk: sending the command block wrapper failed KERN: usb_ehci: qtd (0x03942580) error: 0x00080248 KERN: usb_ehci: qtd (0x03942780) error: 0x00080248 KERN: usb_ehci: qtd (0x03942980) error: 0x00080248

And everything starts working again.

At least it seems EHCI is at least kinda working. :)

Changed 3 months ago by pieterpan

Tried booting haiku on my laptop, hadn't done so in a while. It used to boot fine, but now it also hangs. Sometimes the last debug output is "usb_uhci: successfully started the controller". Also, sometimes it gets further and shows the "More than 99% interrupts of vector 7 are unhandled" Once, after entering the kdb with F12 and continue-ing it actually booted. Have tried to reproduce this but it wouldn't work again. How does one use the bfs_shell to remove the usb_uhci driver to see if it continues booting?

Changed 3 months ago by mmlr

If it explicitly says that it loaded and started UHCI correctly then I wouldn't suspect the problem to be UHCI. It could be EHCI that is loaded after UHCI. If you build your installation yourself then you can simply remove "<usb>ehci" or "<usb>uhci" or both from the build/jam/HaikuImage.

Changed 3 months ago by meanwhile

Bug #1919 looks related, as the same line "More than 99% interrupts of vector 7 are unhandled" shows up (that's a very superficial argument, matching my insight perfectly :p).

Changed 3 months ago by mmlr

Bug #1919 is most probably not related to this one (no USB controller present at the mentioned interrupt vector). That interrupts are unhandled is a common symptom that can be triggered by most drivers if they activate interrupts but fail to handle/acknowledge them correctly.

Changed 3 months ago by pieterpan

I'm building Haiku from Ubuntu. Removing ehci in HaikuImage 'fixed' the problem indeed, it boots again. Anything I can try for debugging?

Changed 3 months ago by mmlr

You could enable debug output for EHCI by adding #define TRACE_USB at the top of src/add-ons/kernel/busses/usb/ehci.cpp. Then (adding EHCI back to the image first of course) you can check the debug output (best via serial debugging or otherwise by enabling on-screen debug output) and post here where it gets stuck exactly.

Changed 3 months ago by pieterpan

For some reason that does not yield any additional output, though looking at the code I should get plenty. I added EHCI back of course. (though I probably would have forgotten first try if you hadn't reminde me :-) ) I attached the diff of my attempt to the latest revision. I did a jam clean and rebuild, no change. Even enabling ENABLE_TRACING did not help. Am I missing something?

Changed 3 months ago by pieterpan

lspci from linux

Changed 3 months ago by pieterpan

lsusb from linux

Changed 3 months ago by pieterpan

cant get usb traccking output with this... why not?

Changed 3 months ago by mmlr

Well that tracing and TRACE_USB has nothing to do with eachother ;-). TRACE_USB is enough to define all the TRACE() statements to dprintf which will/should cause the debug output to appear. You could globally enable TRACE_USB by adding that define at the top of "src/add-ons/kernel/bus_managers/usb/usb_p.h". This will enable debug output for all of the USB stack and the host controller drivers. Possibly it's not EHCI, but another UHCI controller that fails in that case.

Changed 3 months ago by pieterpan

I knew that :-), but since I got no output I gave it a try anyway... I have enabled TRACE_USB in usb_p as you said, and now we get lots of output, hope it is useful for you. Sorry it took a while, i had taken the pics with my nice camera, but we're in the middle of a move, so that camera is now in our new house :) You'll have to make do with the crappy camera. The most interesting bit probably is (Only after I pressed F12, the "Last message repeated..." appeared)

usb_uhci: installing interrupt handler
usb_uhci: host controller halted
Last message repeated 4459366 times

Let me know if I can try something else

Changed 3 months ago by pieterpan

bunch of pictures of all usb related output

Changed 3 months ago by pieterpan

bunch of pictures of all usb related output

Changed 3 months ago by pieterpan

bunch of pictures of all usb related output

Changed 3 months ago by pieterpan

bunch of pictures of all usb related output

Changed 3 months ago by pieterpan

bunch of pictures of all usb related output

Changed 3 months ago by pieterpan

bunch of pictures of all usb related output

Changed 3 months ago by pieterpan

bunch of pictures of all usb related output

Changed 6 weeks ago by mmlr

Bruno, does this still occure? I have changed EHCI to force take-over the controller in case the BIOS fails to hand it over in r25918. Could you please check if the EHCI controllers actually work now? Best by checking a USB 2.0 device on the previously disabled ports, and/or by checking the presence with usb_dev_info.

Pieterpan the problem you see is in fact non-EHCI but UHCI. Could you please verify if it still happens in current revisions. If it does, please open a new bug report for that.

Changed 6 weeks ago by bga

I will check but I can not do that until next week when I am back home (I am in the US right now).

Changed 5 weeks ago by bga

During boot I get the message that the BIOS is not giving up the controller and that it will ignore it. The next message seems to indicate that the controller was correctly initialized.

I still can not connect my USB card reader tough. If I do, the system stops completely until I remove it (sometimes not even after that).

Changed 4 weeks ago by pieterpan

Okay I'm back from my vacation in the USA. Indeed the problem now seems related to OHCI. I added my issues to #1044.

Note: See TracTickets for help on using tickets.