Opened 9 years ago

Last modified 2 years ago

#11619 new bug

No USB on Poulsbo hardware

Reported by: edglex Owned by: mmlr
Priority: normal Milestone: R1
Component: Drivers/USB/EHCI Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #12749
Platform: x86

Description

I am using a Sony Vaio P series that uses the Intel poulsbo chipset, and USB does not work at all (had to dd an image on to the drive to get it bootable) (this is a USB2 machine, not USB3 related). The relevant listdev output is:

device Serial bus controller (USB controller, EHCI) [c|3|20]

vendor 8086: Intel Corporation device 8117: System Controller Hub (SCH Poulsbo) USB EHCI #1

device Serial bus controller (USB controller, UHCI) [c|3|0]

vendor 8086: Intel Corporation device 8116: System Controller Hub (SCH Poulsbo) USB UHCI #3

device Serial bus controller (USB controller, UHCI) [c|3|0]

vendor 8086: Intel Corporation device 8115: System Controller Hub (SCH Poulsbo) USB UHCI #2

device Serial bus controller (USB controller, UHCI) [c|3|0]

vendor 8086: Intel Corporation device 8114: System Controller Hub (SCH Poulsbo) USB UHCI #1

This is hrev 48481.

Attachments (4)

listusb (3.5 KB ) - added by edglex 9 years ago.
listusb -v output
syslog.old (512.0 KB ) - added by edglex 9 years ago.
relevant syslog
0001-EHCI-USB-process-the-extended-capabilities-chain.patch (4.8 KB ) - added by korli 9 years ago.
EHCI USB: process the extended capabilities chain
P5303292.JPG (3.8 MB ) - added by edglex 9 years ago.
Boot console output

Change History (33)

comment:1 by humdinger, 9 years ago

You may also want to attach the output of listusb -v and see if /var/log/syslog has any more info.

by edglex, 9 years ago

Attachment: listusb added

listusb -v output

by edglex, 9 years ago

Attachment: syslog.old added

relevant syslog

comment:2 by edglex, 9 years ago

Not sure if there is anything relevant in syslog. listusb also attached

I'm surprised the USB hardware is detcted but not working TBH. No clue where to start debugging...

comment:3 by korli, 9 years ago

These lines are of interest:

PCI: can't read config for domain 0, bus 0, device 29, function 7, offset 255, size 4
usb ehci -1: extended capability is not a legacy support register
usb error ehci -1: host controller failed to reset
usb ehci: bus failed init check

by korli, 9 years ago

EHCI USB: process the extended capabilities chain

comment:4 by korli, 9 years ago

patch: 01

comment:5 by korli, 9 years ago

Could you try this patch? It's supposed to try the next capabilities registers, after the first one.

comment:6 by edglex, 9 years ago

Wow, that was fast! I need to set up to build haiku, which I may not be able to do today, but I'll give it a whirl ASAP.

comment:7 by diver, 9 years ago

Have you gotten around to testing it yet?

comment:8 by waddlesplash, 9 years ago

Component: Drivers/USBDrivers/USB/EHCI

Sorting out Drivers/USB/EHCI tickets from Drivers/USB.

comment:9 by streak, 9 years ago

Could anybody attach this patch info official build so a can test it on my machine ?

i have the same problem: https://dev.haiku-os.org/ticket/9118

comment:10 by edglex, 9 years ago

I finally got around to testing this (sorry, my son was born just after I reported this bug so I've been kind of busy!).

Unfortunately although it appears to now detect some USB hardware, it doesn't boot anymore. I get the attached output when I enable on screen debugging. Any suggestions appreciated, I'll try any patches right away now (as I am now all set to build haiku).

by edglex, 9 years ago

Attachment: P5303292.JPG added

Boot console output

comment:11 by edglex, 9 years ago

Wow, sorry the image is so big. Is my screen really that dirty?!

comment:12 by edglex, 9 years ago

Another thing, I just tried to boot again and checked whether it says anything about usb ehci - it doesn't :(

comment:13 by edglex, 9 years ago

I've been poking at this for a while. With the supplied patch, we're in an infinite loop. Apparently this happens with some hardware, it is reported as a bug in illumos (https://www.illumos.org/issues/4225) and Linux actually only ever loops 64 times here. I've checked and booting Linux I get an error that BIOS handoff failed, but USB is still working in Linux.

I noticed that Linux also attempts to disable interrupts and take control regardless of whether this worked, so I attempted to do the same thing (as well as looping 64 times before giving up), but although this allows me to boot, I still don't have working USB.

I also tried through a hub, to rule out it being the same problem as 9118.

What I don't understand is why it is failing to read the PCI config register. Any idea?

Version 0, edited 9 years ago by edglex (next)

comment:14 by edglex, 9 years ago

I think that the offset for the extended capabilities pointer is wrong, this seems to be the only reason why the reads and writes would fail: in PCI::ReadConfig and WriteConfig we have (size == 4 && (offset & 3) != 0) as a condition for failure without even atempting to make a read/write. So extendedCapPointer should not be 0xFF/255 as it is (see the read failure in "these lines are of interest" comment above).

I suspect that if this is corrected the writes to disable interrupts/BIOS control may work and then the controller might reset correctly. However at the moment I have no idea why the offset would be wrong (or even if I am just barking up the wrong tree).

Last edited 9 years ago by edglex (previous) (diff)

comment:15 by korli, 9 years ago

Congratulations on your new arrival!

About this check (size == 4 && (offset & 3) != 0), it should probably be moved to the PCI arch implementations pci_mech[1|2]_read_config and pci_mech[1|2]_write_config because it doesn't seem relevant anymore for pci_mechpcie_read_config and pci_mechpcie_write_config. It could change something for your system because a mechanism pcie controller was found (see syslog).

[src/add-ons/kernel/bus_managers/pci/arch/x86/pci_controller.cpp]

comment:16 by edglex, 9 years ago

Thanks :)

I actually tried commenting out these checks, and it didn't help. The really telling thing is that linux reports a completely different value for the extended capabilities pointer offset, but from looking at it I think it would also fail these checks. So when I get back to this I think I may start by removing those checks and hard coding the cap offset to what linux uses (super hacky, but might just work). Whether or not that works I'll then try and work backwards to see how the offset may have ended up wrong. Unless you have a better suggestion maybe? Thanks for your help on this already :)

comment:17 by edglex, 9 years ago

I didn't get much time last night but I tried hard coding the legacy support offset and it did actually manage to properly iterate over all the legacy support pointers. However, the controller still would not reset because many of the other offsets also appear to be wrong. Next I will try to work out exactly what is going wrong, hopefully I'll get some more time tonight.

in reply to:  13 ; comment:18 by mmlr, 9 years ago

Replying to edglex:

I've been poking at this for a while today. With the supplied patch, we're in an infinite loop. Apparently this happens with some hardware, it is reported as a bug in illumos (https://www.illumos.org/issues/4225) and Linux actually only ever loops 64 times here before giving up. I've checked and booting Linux I get an error that BIOS handoff failed, but USB still works in Linux.

In PCI getting a value of 255 generally means an error or not present device. So an extended capability pointer of 255 likely hints at a failure to actually read the corresponding register, not that the capability pointer actually has a value of 255.

Also since the device doesn't advertise the PCIe capability it should be treated like a normal PCI device (which is usual for EHCI devices). In that case the config space is only 256 bytes to begin with, making an offset of 255 bytes not useful.

The bug report you linked describes exactly the same issue.

I noticed that Linux also attempts to disable interrupts and take control regardless of whether this worked, so I attempted to do the same thing (as well as looping 64 times before giving up), but although this allows me to boot, I get reports that at least one of the writes failed, and I still don't have working USB.

That's what the Haiku implementation does as well. It does only do so if it finds the corresponding legacy support register, because it can't unset the semaphore if there isn't one.

Generally not using the hand off mechanism is unproblematic as long as the firmware does not have any SMIs (System Management Interrupts) enabled on the device. Since the SMI enabled registers are also within the legacy support registers, the same as above applies.

A failing reset would further point towards not actually getting sensible values from/to the registers at all, as a reset should always work (even if legacy support was enabled).

To debug this, please enable full tracing in ehci.cpp and also output and post the value of the full EHCI_HCCPARAMS register:

TRACE_ALWAYS("host controller parameters: %#" B_PRIx32 "\n", ReadCapReg32(EHCI_HCCPARAMS));

If reading the register fails as suspected it should output 0xffffffff.

I also tried through a hub, to rule out it being the same problem as 9118.

#9118 is not related to this. It is the opposite case where the UHCI root port doesn't work whereas the EHCI one does (and using a 2.0 hub causes the root port to be driven by EHCI).

What I don't understand is why it is failing to read/write the PCI config register, this seems to be crucial. Any idea? I'm having trouble following the rest of the linux spaghetti code, so can't figure out what they do next.

The BSDs usually have code that is much easier to follow if you want to compare other implementations.

in reply to:  18 comment:19 by edglex, 9 years ago

Replying to mmlr:

Replying to edglex:

I've been poking at this for a while today. With the supplied patch, we're in an infinite loop. Apparently this happens with some hardware, it is reported as a bug in illumos (https://www.illumos.org/issues/4225) and Linux actually only ever loops 64 times here before giving up. I've checked and booting Linux I get an error that BIOS handoff failed, but USB still works in Linux.

In PCI getting a value of 255 generally means an error or not present device. So an extended capability pointer of 255 likely hints at a failure to actually read the corresponding register, not that the capability pointer actually has a value of 255.

This makes sense.

Also since the device doesn't advertise the PCIe capability it should be treated like a normal PCI device (which is usual for EHCI devices). In that case the config space is only 256 bytes to begin with, making an offset of 255 bytes not useful.

Does this mean that the config read/write checks ((size == 4 && (offset & 3) != 0)) are ok? It's odd because linux gives an offset of 0x33 for the legacy support register, which would also fail this check.

The bug report you linked describes exactly the same issue.

I noticed that Linux also attempts to disable interrupts and take control regardless of whether this worked, so I attempted to do the same thing (as well as looping 64 times before giving up), but although this allows me to boot, I get reports that at least one of the writes failed, and I still don't have working USB.

That's what the Haiku implementation does as well. It does only do so if it finds the corresponding legacy support register, because it can't unset the semaphore if there isn't one.

The patch above that loops over all the legacy support registers wasn't doing this, so I added it.

Generally not using the hand off mechanism is unproblematic as long as the firmware does not have any SMIs (System Management Interrupts) enabled on the device. Since the SMI enabled registers are also within the legacy support registers, the same as above applies.

A failing reset would further point towards not actually getting sensible values from/to the registers at all, as a reset should always work (even if legacy support was enabled).

Yes, I'm pretty well certain that these values are wrong.

To debug this, please enable full tracing in ehci.cpp and also output and post the value of the full EHCI_HCCPARAMS register:

TRACE_ALWAYS("host controller parameters: %#" B_PRIx32 "\n", ReadCapReg32(EHCI_HCCPARAMS));

If reading the register fails as suspected it should output 0xffffffff.

I already did that, and am fairly sure that I did indeed get 0xffffffff.

I also tried through a hub, to rule out it being the same problem as 9118.

#9118 is not related to this. It is the opposite case where the UHCI root port doesn't work whereas the EHCI one does (and using a 2.0 hub causes the root port to be driven by EHCI).

What I don't understand is why it is failing to read/write the PCI config register, this seems to be crucial. Any idea? I'm having trouble following the rest of the linux spaghetti code, so can't figure out what they do next.

The BSDs usually have code that is much easier to follow if you want to compare other implementations.

Yes I have also been looking at the freebsd code, it is much more similar to the haiku code (and I see at least the structure of some of the haiku code is based on netbsd). It is hard to know how useful it is though, because I don't know if USB is working under freebsd on this hardware. Perhaps I should boot a live freebsd version and find out.

Thanks for your help!

comment:20 by edglex, 9 years ago

For reference, here's the output with tracing:

usb ehci -1: constructing new EHCI host controller driver
usb ehci -1: map physical memory 0x942c4000 (base: 0x942c4000; offset: 0); size 1024
add_memory_type_range(459, 0x942c4000, 0x1000, 0)
usb ehci -1: mapped capability registers: 0x0x81d3c000
usb ehci -1: mapped operational registers: 0x0x81d3c0ff
usb ehci -1: structural parameters: 0xffffffff
usb ehci -1: capability parameters: 0xffffffff

After that it fails (failed to reset, failed init check, tear down, no devices found).

comment:21 by edglex, 9 years ago

This bug (or very similar, on the same hardware anyway) appeared in linux and was avoidable using a kernel parameter; pci=nocrs. It was present in 2.6.35 but had disappeared by 2.6.37, I haven't looked at the diffs between those versions yet, or into what ACPI Current Resource Settings really are, but presumably there is some workaround in the kernel now. There is a bug report for this here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/661600

So following this lead I disabled ACPI, and USB works! It's good that there's a workaround, but it's sort of a rock vs hard place situation because a laptop without ACPI is a bit limiting, so I will still try to fix this.

comment:22 by waddlesplash, 8 years ago

ACPICA got an upgrade; can you try again?

in reply to:  22 comment:23 by edglex, 8 years ago

Replying to waddlesplash:

ACPICA got an upgrade; can you try again?

Just tried again, and the problem remains :(

comment:24 by pulkomandy, 8 years ago

Blocking: 12749 added

(In #12749) Thanks, closing as duplicate of the 2 others then.

comment:25 by edglex, 8 years ago

I've been having another look at this. The ubuntu bug I linked to above mentions that the issue was fixed upstream. I found that bug report [1] and there is a really good explanation of what is going on. I don't have a strong understanding, but it appears the space for the USB controllers PCI registers might be getting allocated over the framebuffer space. This is why we see:

KERN: add_memory_type_range(458, 0x942c4000, 0x1000, 0)
KERN: PCI: can't read config for domain 0, bus 0, device 29, function 7, offset 255, size 4

The EHCI base register is at 0x942c4000. The giveaway was the PCI error, the rest of the EHCI errors are misleading.

It looks like this is caused by ignoring the e820 reserved ranges when doing the MTRR allocation for the EHCI controller, because the e820 range isn't also recorded in the ACPI CRS. So when using ACPI, the CRS doesn't say to ignore the range, so we allocate the controller there (though the e820 told us to ignore it). So if you disable ACPI, there's no CRS, and it works. I think this is probably beyond me to fix (without spending a lot of time learning how this works anyway), it would be great if someone could have a further look at it.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=22132

Last edited 8 years ago by edglex (previous) (diff)

comment:26 by edglex, 8 years ago

I've done some more digging. It isn't exactly the same problem as ubuntu (not sure how/if it relates to e820 and ACPI CRS) but it does appear to be related to memory allocation. I turned on lots of tracing, and with ACPI disabled (EHCI works) I get the following:

...
usb ehci -1: constructing new EHCI host controller driver
usb ehci -1: map physical memory 0x942c4000
add_memory_type_range(148, 0x942c4000, 0x1000, 0)
usb ehci -1: mapped capability registers: 0x0x81d2b000
usb ehci -1: mapped operational registers: 0x0x81d2b020
usb ehci -1: structural parameters: 0x00103208
usb ehci -1: capability parameters: 0x00006871
usb ehci -1: extended capabilities register at 104
usb ehci -1: the host controller is bios owned, claiming ownership
usb ehci -1: controller is still bios owned, waiting
usb ehci -1: successfully took ownership of the host controller
usb ehci -1: creating interrupt entries
...

And with ACPI enabled (EHCI not working):

...
usb ehci -1: constructing new EHCI host controller driver
usb ehci -1: map physical memory 0x942c4000
add_memory_type_range(462, 0x942c4000, 0x1000, 0)
usb ehci -1: mapped capability registers: 0x0x812b3000
usb ehci -1: mapped operational registers: 0x0x812b30ff
usb ehci -1: structural parameters: 0xffffffff
usb ehci -1: capability parameters: 0xffffffff
usb ehci -1: extended capabilities register at 255
PCI: can't read config for domain 0, bus 0, device 29, function 7, offset 255, size 4
usb ehci -1: extended capabilities register is not a legacy support register
usb error ehci -1: host controller failed to reset
usb ehci: bus failed init check
usb ehci -1: tear down EHCI host controller driver
remove_memory_type_range(462, 0x942c4000, 0x1000, 0)
usb_ehci: no devices found
...

comment:27 by pulkomandy, 6 years ago

patch: 10

comment:28 by pulkomandy, 6 years ago

Patch migrated to Gerrit: https://review.haiku-os.org/86

comment:29 by edglex, 2 years ago

I just tried this again (h55836), and the problem is still there I'm afraid...

Note: See TracTickets for help on using tickets.