Opened 7 years ago

Closed 6 years ago

#13768 closed bug (fixed)

USB hub spamming syslog

Reported by: humdinger Owned by: nobody
Priority: normal Milestone: R1/beta2
Component: Drivers/USB/XHCI Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

This is hrev51536.

Since the USB3 changes in hrev51533 I un-blacklisted xhci and now see the syslog being continously spammed with:

KERN: usb hub 2: KERN: port 15 is not in an over current state
KERN: usb hub 2: port 15 was reset
KERN: usb hub 2: port 16 is not suspended
KERN: usb hub 2: port 16 is not in an over current state
KERN: usb hub 2: port 16 was reset
KERN: usb hub 2: port 17 is not suspended
KERN: usb hub 2: port 17 is not in an over current state
KERN: usb hub 2: port 17 was reset
KERN: usb hub 2: port 18 is not suspended
KERN: usb hub 2: port 18 is not in an over current state
KERN: usb hub 2: port 18 was reset
KERN: usb hub 2: port 15 is not suspended
KERN: usb hub 2: KERN: port 15 is not in an over current state
KERN: usb hub 2: port 15 was reset
KERN: usb hub 2: port 16 is not suspended
KERN: usb hub 2: port 16 is not in an over current state
KERN: usb hub 2: port 16 was reset
KERN: usb hub 2: port 17 is not suspended
KERN: usb hub 2: port 17 is not in an over current state
KERN: usb hub 2: port 17 was reset
KERN: usb hub 2: port 18 is not suspended
KERN: usb hub 2: port 18 is not in an over current state
KERN: usb hub 2: port 18 was reset
KERN: usb hub 2: port 15 is not suspended
KERN: usb hub 2: KERN: port 15 is not in an over current state
KERN: usb hub 2: port 15 was reset
KERN: usb hub 2: port 16 is not suspended
KERN: usb hub 2: port 16 is not in an over current state
KERN: usb hub 2: port 16 was reset
KERN: usb hub 2: port 17 is not suspended
KERN: usb hub 2: KERN: port 17 is not in an over current state
KERN: usb hub 2: port 17 was reset
KERN: usb hub 2: port 18 is not suspended
KERN: usb hub 2: port 18 is not in an over current state
KERN: usb hub 2: port 18 was reset

Attachments (8)

syslog_blacklisted_xhci.txt (258.3 KB ) - added by humdinger 7 years ago.
All OK when XHCI is blacklisted
syslog_spam.txt (267.9 KB ) - added by humdinger 7 years ago.
Spamming after a 'warm' reboot
syslog_No_spam.txt (258.7 KB ) - added by humdinger 7 years ago.
No spamming when clean booting from powered off
cold_vs_warm.diff (192.0 KB ) - added by waddlesplash 7 years ago.
0001-xhci-fix-hub-descriptor-command-request.patch (1.1 KB ) - added by GregCrain 7 years ago.
syslog (291.9 KB ) - added by hugeping 6 years ago.
usb3.patch (1.9 KB ) - added by hugeping 6 years ago.
Clear status fix
0001-xhci-GetPortStatus-ClearPortFeature-PS_PLC-PS_WRC-fi.patch (2.4 KB ) - added by hugeping 6 years ago.

Download all attachments as: .zip

Change History (31)

comment:1 by waddlesplash, 7 years ago

Component: Drivers/USB/XHCIDrivers/USB
Owner: changed from nobody to mmlr

Those messages are coming from the USB hub manager, which has them as TRACE_ALWAYS which seems odd: http://xref.plausible.coop/source/xref/haiku/src/add-ons/kernel/bus_managers/usb/Hub.cpp#325

comment:2 by humdinger, 7 years ago

I've switched to another port and that one isn't spamming the syslog (at least for this session...)

in reply to:  1 ; comment:3 by mmlr, 7 years ago

Component: Drivers/USBDrivers/USB/XHCI
Owner: changed from mmlr to nobody

Replying to waddlesplash:

Those messages are coming from the USB hub manager, which has them as TRACE_ALWAYS which seems odd

It's not odd, it's supposed to always be logged. These are change indicator bits, which are then acknowledged through the following clear feature request.

Since it's unlikely that the bits actually toggle the whole time, I'd guess that the clear feature request to reset them just doesn't go through. That would then be an issue in the corresponding driver.

What's odd is that it apparently doesn't happen on another port. Is this a root port of the machine or is there an actual hub in between? A syslog may shed some more light on the actual topology and possible issues in setting up any ports.

comment:4 by humdinger, 7 years ago

New info: Apparently the spamming only happens when having rebooted. When cold starting after a real power-off shutdown, there's no syslog spamming.

Hardwareinfo:

listusb

062a:4102 /dev/bus/usb/0/1 "MosArt Semiconductor Corp." "2.4G Wireless Mouse" ver. 0312
8087:07da /dev/bus/usb/0/6 "Intel Corp." "" ver. 7869
04f2:b35a /dev/bus/usb/0/7 "Chicony Electronics Co., Ltd" "Chicony USB 2.0 Camera" ver. 9162
0000:0000 /dev/bus/usb/0/hub "HAIKU Inc." "XHCI RootHub" ver. 0300
8087:8008 /dev/bus/usb/1/0/hub "Intel Corp." "" ver. 0005
0000:0000 /dev/bus/usb/1/hub "HAIKU Inc." "EHCI RootHub" ver. 0200
8087:8000 /dev/bus/usb/2/0/hub "Intel Corp." "" ver. 0005
0000:0000 /dev/bus/usb/2/hub "HAIKU Inc." "EHCI RootHub" ver. 0200

listdev

device Serial bus controller (USB controller, EHCI) [c|3|20]
  vendor 8086: Intel Corporation
  device 8c26: 8 Series/C220 Series Chipset Family USB EHCI #1

device Serial bus controller (USB controller, EHCI) [c|3|20]
  vendor 8086: Intel Corporation
  device 8c2d: 8 Series/C220 Series Chipset Family USB EHCI #2

device Serial bus controller (USB controller, XHCI) [c|3|30]
  vendor 8086: Intel Corporation
  device 8c31: 8 Series/C220 Series Chipset Family USB xHCI

I attach a spamming and a non-spamming syslog, both when my wireless mouse toggle is in the USB3 port on the left of my norebook.

comment:5 by mmlr, 7 years ago

The syslogs are unfortunately incomplete. The USB initialization, where the hub topology would be visible isn't in there as it happens quite early.

comment:6 by waddlesplash, 7 years ago

My machine has the same issue. Would just one syslog plus a listdev be good enough? Or do you really need the USB init output (if the latter, perhaps I need to write a userbootscript to copy the syslogs immediately after startup...)

by humdinger, 7 years ago

Attachment: syslog_blacklisted_xhci.txt added

All OK when XHCI is blacklisted

by humdinger, 7 years ago

Attachment: syslog_spam.txt added

Spamming after a 'warm' reboot

by humdinger, 7 years ago

Attachment: syslog_No_spam.txt added

No spamming when clean booting from powered off

comment:9 by humdinger, 7 years ago

I removed the useless syslogs and up'ed 3 new ones
1 - when the xhci is blacklisted - all OK
2 - after a 'warm' reboot - spamming (no blacklisting)
3 - when freshly booting after a powerdown - no spamming (no blacklisting)

comment:10 by waddlesplash, 7 years ago

Weirdly, these exact lines appear in both the warm and cold boots:

KERN: usb xhci -1: starting XHCI host controller
KERN: usb xhci -1: successfully started the controller
KERN: usb uhci: no devices found
KERN: usb ohci: no devices found
KERN: add_memory_type_range(802, 0xf7e1c000, 0x1000, 0)
KERN: sitd entry size 64, itd entry size 128
KERN: usb ehci -1: successfully started the controller
KERN: add_memory_type_range(807, 0xf7e1b000, 0x1000, 0)
KERN: sitd entry size 64, itd entry size 128
KERN: usb ehci -1: successfully started the controller
KERN: usb hub 2: port 1: new device connected
KERN: usb error xhci -1: unsuccessful command Parameter (17)
KERN: usb xhci -1: cancel queued transfers for pipe 0x827a3ec8 (0)
KERN: usb hub 2: port 6: new device connected
KERN: usb error xhci -1: unsuccessful command Parameter (17)
Last message repeated 4 times
KERN: usb xhci -1: cancel queued transfers for pipe 0x827a3ec8 (0)
KERN: usb hub 2: port 7: new device connected
KERN: usb error xhci -1: unsuccessful command Parameter (17)
KERN: usb xhci -1: cancel queued transfers for pipe 0x827a3ec8 (0)

Immediately after this in the "warm" boot the spamming begins, whereas in the "cold" boot (and after one round of spam in the "warm" boot):

KERN: usb hub 7: port 0: new device connected
KERN: usb hub 12: port 0: new device connected
KERN: get_boot_partitions(): boot volume message:

Attaching a diff the two boots for easy comparison (notable: APIC ticks are different, add_memory_type_range is off-by-one.)

by waddlesplash, 7 years ago

Attachment: cold_vs_warm.diff added

comment:11 by GregCrain, 7 years ago

patch: 01

comment:12 by GregCrain, 7 years ago

So I found a bug in the xhci driver when getting the descriptor for a hub. I attached a patch here. I don't know if it impacts this ticket or not, since I haven't reproduced the exact same issue in this ticket. But your error log shows a lot of hub errors.

On my machine, with a USB2 hub connected on the XHCI port, it wouldn't even iterate. The command to get the hub descriptor was incorrect, and it created a stall error. The stall is not handled in the driver, and it looks like the hub driver continues trying to address it with no response, which might be the spamming issue that you see.

This is also related to ticket #13792.

comment:13 by humdinger, 7 years ago

I'm sad to say that this didn't fix my syslog spamming issue. Would've been a sensational one-line fix. :)

comment:15 by pulkomandy, 7 years ago

patch: 10

in reply to:  3 ; comment:16 by KapiX, 7 years ago

Replying to Michael Lotz:

It's not odd, it's supposed to always be logged. These are change indicator bits, which are then acknowledged through the following clear feature request.

I'd argue for TRACE_ALWAYS to be disabled until this issue is fixed. It causes syslog to grow about 1.5 KiB every second which wears out SSD without purpose. Unless someone is actively investigating the issue, these traces are useless.

in reply to:  16 comment:17 by GregCrain, 7 years ago

I'd argue for TRACE_ALWAYS to be disabled until this issue is fixed. It causes syslog to grow about 1.5 KiB every second which wears out SSD without purpose. Unless someone is actively investigating the issue, these traces are useless.

I would be good to disable logging until the problem is fixed. It's not really giving any useful information for the open issues, and just slowing the system. I don't see this spamming issue on my NEC/Renesas chipset, but it occurs consistently on an Intel chipset.

I had a computer and hard drive go bad, and now that I've got things set back up, I've been spending a lot of time trying to understand where and how things are failing with the xhci. I've added a bunch of debug output in my own build.

On my system at some point, usually after scanning partitions of a bulk device, interrupts completely stop for Normal or Control requests. All status registers, slot contexts and endpoint contexts report no errors, running, not stalled or halted.

The Ring Doorbell function gets rang until endpoint transfers used is exceeded, up to XHCI_MAX_TRANSFERS, with no interrupts. No-Op Command IRQ continues to work fine though.

Then I get "_LinkDescriptorForPipe max transfers count exceeded 8". I'm a bit baffled.

comment:18 by hugeping, 6 years ago

Same problem here. xhci spams syslog. Here is the log with USB tracing enabled:

https://dev.haiku-os.org/attachment/ticket/13768/syslog

Last edited 6 years ago by hugeping (previous) (diff)

by hugeping, 6 years ago

Attachment: syslog added

by hugeping, 6 years ago

Attachment: usb3.patch added

Clear status fix

comment:19 by hugeping, 6 years ago

Ok, i have made the patch: https://dev.haiku-os.org/attachment/ticket/13768/usb3.patch that works for me. This is still dirty, but i hope it is correct. Please, verify it. The problem was in error mixing bits number vs bitmask while clearing status.

Last edited 6 years ago by hugeping (previous) (diff)

comment:20 by tqh, 6 years ago

Can verify that patch does fix log issue (on my machine), have not done any actual XHCI testing.

Last edited 6 years ago by tqh (previous) (diff)

comment:21 by korli, 6 years ago

About the patch:

  • The features used in XHCI::ClearPortFeature() should be C_PORT_LINK_STATE and C_PORT_BH_PORT_RESET.
  • The same applies for Hub::Explore(): USB_REQUEST_CLEAR_FEATURE should be used with C_PORT_LINK_STATE and C_PORT_BH_PORT_RESET.

comment:22 by hugeping, 6 years ago

I have attached cleaned patch. https://dev.haiku-os.org/attachment/ticket/13768/0001-xhci-GetPortStatus-ClearPortFeature-PS_PLC-PS_WRC-fi.patch

Currently i have no idea how to work with gerrit and i have no time, exuse me. Will try it later, after reading manuals.

comment:24 by pulkomandy, 6 years ago

Milestone: UnscheduledR1/beta2

comment:25 by waddlesplash, 6 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.