Opened 3 years ago

Closed 7 months ago

Last modified 5 months ago

#12929 closed bug (fixed)

[XHCI] crash in _LinkDescriptorForPipe()

Reported by: diver Owned by: nobody
Priority: normal Milestone: R1/beta2
Component: Drivers/USB/XHCI Version: R1/Development
Keywords: Cc: korli
Blocked By: Blocking: #12915, #12951, #13912, #14798, #15013
Has a Patch: no Platform: All

Description

hrev50496 running in VirtualBox 5.0.26.

KDL often occurs on VM resume after pause.

Attachments (1)

kdl.png (44.7 KB ) - added by diver 3 years ago.

Download all attachments as: .zip

Change History (21)

by diver, 3 years ago

Attachment: kdl.png added

comment:1 by diver, 3 years ago

Owner: changed from nobody to korli
Status: newassigned

comment:2 by korli, 3 years ago

Owner: changed from korli to nobody

comment:3 by diver, 3 years ago

Blocking: 12951 added

comment:4 by pulkomandy, 11 months ago

Milestone: UnscheduledR1/beta2

comment:5 by waddlesplash, 11 months ago

Still happens?

comment:6 by waddlesplash, 9 months ago

Blocking: 13912 added

comment:7 by waddlesplash, 9 months ago

Found an easy way to reproduce this:

  1. Connect USB MIDI device to a USB3 port (any Android phone running 6.0+ can operate in USB MIDI device mode)
  2. Run find /dev
  3. Instant KDL with this backtrace.

comment:8 by waddlesplash, 9 months ago

Summary: [XHCI] crash in LinkDescriptioinForPipe()[XHCI] crash in _LinkDescriptorForPipe()

comment:9 by waddlesplash, 9 months ago

Blocking: 14798 added

comment:10 by waddlesplash, 8 months ago

Blocking: 12915 added

comment:11 by waddlesplash, 7 months ago

With tracing on:

usb xhci -1: SubmitTransfer()
usb xhci -1: SubmitControlRequest() length 0
usb xhci -1: _LinkDescriptorForPipe
usb xhci -1: _LinkDescriptorForPipe current 7, next 0

And then the page fault occurs.

comment:12 by waddlesplash, 7 months ago

Cc: korli added

So, it appears that since the ControlRequest length is 0, we won't get any TDs at all:

	// keep one trb for linking
	int32 tdCount = (trbCount + XHCI_MAX_TRBS_PER_TD - 2)
		/ (XHCI_MAX_TRBS_PER_TD - 1);

Probably the page fault is just LinkDescriptorForPipe trying to link the TD, since we wrapped from 7 -> 0, but it can't because it has no link TD.

korli, you originally added the logic above, can you please comment here?

comment:13 by korli, 7 months ago

Control requests actually happen here and don't call CreateDescriptorChain() but CreateDescriptor()

Anyway for normal requests, I see that Linux adds one trb when the buffer length is zero: https://github.com/torvalds/linux/blob/master/drivers/usb/host/xhci-ring.c#L2954 We should probably do the same. This means the trbCount is to increment, the tdCount computation seems already correct.

BTW I'm not sure this has to do with this bug, since ATM CreateDescriptorChain() returns NULL for a zero buffer length, _LinkDescriptorForPipe() shouldn't be called at all. Maybe I misread the code.

comment:14 by waddlesplash, 7 months ago

BTW I'm not sure this has to do with this bug, since ATM CreateDescriptorChain() returns NULL for a zero buffer length, _LinkDescriptorForPipe() shouldn't be called at all. Maybe I misread the code.

Well, if we should make sure trbCount is non-zero, then tdCount will of course be 1 in that case, yes?

Control requests actually happen ​here and don't call ​CreateDescriptorChain() but ​CreateDescriptor()

OK, then in that case I really don't understand how we get a page fault here.

Is there a reason we allocate/free descriptors for every request? That seems a little slow to say the least.

in reply to:  14 comment:15 by korli, 7 months ago

Replying to waddlesplash:

Is there a reason we allocate/free descriptors for every request? That seems a little slow to say the least.

This mirrors what EHCI does already, aka the USB stack manages memory chunks for the bus driver. Yes, xhci_td could be cached, but I suppose the overhead isn't noticeable.

in reply to:  14 comment:16 by korli, 7 months ago

Replying to waddlesplash:

Well, if we should make sure trbCount is non-zero, then tdCount will of course be 1 in that case, yes?

Yes

OK, then in that case I really don't understand how we get a page fault here.

SubmitControlRequest() could at least check _LinkDescriptorForPipe() return value and exit, instead of calling Ring() and returning B_OK.

comment:17 by waddlesplash, 7 months ago

SubmitControlRequest() could at least check _LinkDescriptorForPipe() return value and exit, instead of calling Ring() and returning B_OK.

Except the page fault occurs inside _LinkDescriptorForPipe, so I'm not sure what difference that will make?

I'll commit that change and the trbCount one, though.

comment:18 by waddlesplash, 7 months ago

Done in hrev52888.

comment:19 by waddlesplash, 7 months ago

Resolution: fixed
Status: assignedclosed

Fixed the KDL in hrev52963. The underlying cause still exists, though, but a new ticket should be made for it.

comment:20 by waddlesplash, 5 months ago

Blocking: 15013 added
Note: See TracTickets for help on using tickets.