Opened 10 years ago

Last modified 2 months ago

#4604 reopened bug

USB errors after unplugging Wacom tablet

Reported by: idefix Owned by: mmlr
Priority: normal Milestone: R1
Component: Drivers/Input/USB-HID Version: R1/alpha1
Keywords: Cc: pete.goodeve@…
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

When I unplug my Wacom Graphire 2 tablet, a lot of USB errors are showing up in the serial log, like:

usb error uhci 0: td (0x0243e4c0) error: status: 0x254507ff; token: 0x00e88769;

The strange thing is that the amount of errors (after the wacom: interrupt transfer - failure: -2147442668 line) is related to the duration the tablet was connected to the computer:

But when the pen is always in proximity of the tablet when connected to the computer, the amount of errors is:

Compare that to the amount of errors when the pen wasn't in proximity:

Because the computer is logging to the serial port, this temporarily results in an unresponsive system when disconnecting the tablet. When serial debugging is disabled, the system stays responsive but the CPU usage is 100% for several seconds (especially when the tablet has been connected to the computer for several hours).

Tested in Haiku R1/alpha1 on real hardware (PII@400MHz, 384 MB, Intel 440BX).

Attachments (8)

2seconds.txt (871 bytes) - added by idefix 10 years ago.
serial log showing: tablet connected - waited 2 seconds - tablet disconnected
5seconds.txt (1.4 KB) - added by idefix 10 years ago.
serial log showing: tablet connected - waited 5 seconds - tablet disconnected
10seconds.txt (2.1 KB) - added by idefix 10 years ago.
serial log showing: tablet connected - waited 10 seconds - tablet disconnected
30seconds-penontablet.txt (547 bytes) - added by idefix 10 years ago.
serial log showing: pen on tablet - tablet connected - waited 30 seconds - tablet disconnected - pen off tablet
30seconds-nopenontablet.txt (5.4 KB) - added by idefix 10 years ago.
serial log showing: tablet connected - waited 30 seconds - tablet disconnected
notontablet.txt (2.9 KB) - added by idefix 10 years ago.
serial log showing: tablet connected - waited less than 1 second - tablet disconnected
ontablet.txt (12.6 KB) - added by idefix 10 years ago.
serial log showing: pen on tablet - tablet connected - waited less than 1 second - tablet disconnected - pen off tablet
wacomstopsworking.zip (72.7 KB) - added by idefix 9 years ago.
serial log showing: tablet stopped working - tablet disconnected

Download all attachments as: .zip

Change History (28)

Changed 10 years ago by idefix

Attachment: 2seconds.txt added

serial log showing: tablet connected - waited 2 seconds - tablet disconnected

Changed 10 years ago by idefix

Attachment: 5seconds.txt added

serial log showing: tablet connected - waited 5 seconds - tablet disconnected

Changed 10 years ago by idefix

Attachment: 10seconds.txt added

serial log showing: tablet connected - waited 10 seconds - tablet disconnected

Changed 10 years ago by idefix

Attachment: 30seconds-penontablet.txt added

serial log showing: pen on tablet - tablet connected - waited 30 seconds - tablet disconnected - pen off tablet

Changed 10 years ago by idefix

Attachment: 30seconds-nopenontablet.txt added

serial log showing: tablet connected - waited 30 seconds - tablet disconnected

comment:1 Changed 10 years ago by mmlr

It's not related to how long the tablet has been connected. It purely depends on the poll. Device removal is detected from the explore thread which runs once a second. If the thread just ran, the time for it to run again is longer than if it happened to be close to running again. The errors are harmless and can't really be avoided. They happen between the actual unplug and the system noticing the unplug. The time could be reduced by making the explore thread run more frequently, but even the 1 second it is at now is a waste of resources most of the time. What is taking down the system is the serial and syslog output. Serial should be disabled by default on the alpha builds though, so you shouldn't see it there. A different possibility would be to check for device unplugs on this kind of error. This is kinda involved though.b

comment:2 in reply to:  1 Changed 10 years ago by idefix

Replying to mmlr:

It's not related to how long the tablet has been connected. It purely depends on the poll. Device removal is detected from the explore thread which runs once a second. If the thread just ran, the time for it to run again is longer than if it happened to be close to running again.

But wouldn't you see much more variation then, instead of this linear response?
And there would be an upper limit of 1 second. Because of the serial output, this will be a higher limit, but there still should be one.

What is taking down the system is the serial and syslog output. Serial should be disabled by default on the alpha builds though, so you shouldn't see it there.

Yes, when I have serial output disabled, the system isn't taken down although you see the CPU usage go to 100%.

I will perform another test in which I will connect the tablet for one, two and three hours (with serial output disabled) and then measure the duration of the 100% CPU usage. According to your explanation it should never be more than 1 second.

One thing I still can't explain are the zero lines of error code when the pen was in proximity of the tablet all the time. Have you any idea why this causes no errors at all?

comment:3 Changed 10 years ago by stippi

Very interesting. When the pen is not in proximation, the wacom tablet will never send an interrupt transfer that the driver scheduled. Instead, the driver will hit a timeout and reschedule the next transfer (with the same timeout). This was designed such that the transfer thread would exit cleanly and not block forever when the tablet is unplugged (this was necessary on BeOS R5 when the driver was written). Maybe there is something wrong with how the driver cleans up (respectively doesn't clean up) the timed out transfers.

comment:4 in reply to:  3 Changed 10 years ago by idefix

I have turned on the driver debugging code and you're correct: the amount of usb errors is exactly the amount of timed out transfers. notontablet.txt shows three B_TIMED_OUT events and three usb errors after the wacom: interrupt transfer - failure: line. Compare that to ontablet.txt showing no B_TIMED_OUT events and no usb errors.

Changed 10 years ago by idefix

Attachment: notontablet.txt added

serial log showing: tablet connected - waited less than 1 second - tablet disconnected

Changed 10 years ago by idefix

Attachment: ontablet.txt added

serial log showing: pen on tablet - tablet connected - waited less than 1 second - tablet disconnected - pen off tablet

comment:5 Changed 9 years ago by idefix

After some hours my Wacom tablet stopped working and the serial log showed (excerpt):

PMA: found no free slot to store 32 bytes (4999 tries left)
PMA: found no free slot to store 32 bytes (4998 tries left)
PMA: found no free slot to store 32 bytes (4997 tries left)
PMA: found no free slot to store 32 bytes (4996 tries left)
PMA: found no free slot to store 32 bytes (4995 tries left)
[...]
PMA: found no free slot to store 32 bytes (5 tries left)
PMA: found no free slot to store 32 bytes (4 tries left)
PMA: found no free slot to store 32 bytes (3 tries left)
PMA: found no free slot to store 32 bytes (2 tries left)
PMA: found no free slot to store 32 bytes (1 tries left)
PMA: found no free slot to store 32 bytes (0 tries left)
usb error uhci 0: failed to allocate a transfer descriptor
wacom: device_read(0x80fbd6e0) name = "input/wacom/usb/0" -> error queuing interrupt: -2147483648
usb error uhci 0: td (0x0213bc20) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213bd20) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213bd60) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213bda0) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213be00) error: status: 0x254507ff; token: 0x00e88269;
[...]
usb error uhci 0: td (0x0213b460) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213b7c0) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213b900) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213bb00) error: status: 0x254507ff; token: 0x00e88269;
usb error uhci 0: td (0x0213bb40) error: status: 0x254507ff; token: 0x00e88269;
usb hub 2: port 0: device removed
usb hub 2: port 0 disabled
wacom: device_removed() open: 1
wacom: device /dev/input/wacom/usb/0 still open -- marked for removal
void AddOnManager::MessageReceived(BMessage *) what: NMP_
Last message repeated 3 times.
[33musb_asix:[0m198.45.287:init_driver::ver.0.8.3

Attached is the full serial log (zipped, due to the large size).

Changed 9 years ago by idefix

Attachment: wacomstopsworking.zip added

serial log showing: tablet stopped working - tablet disconnected

comment:6 Changed 9 years ago by mmlr

Can you please check with hrev34915? It should fix both issues.

comment:7 Changed 9 years ago by idefix

Yes, the usb errors are gone. But the tablet will now only work when the pen is always in proximity of the tablet; as soon as the pen isn't in proximity the serial log shows:

wacom: interrupt transfer - failure: -2147483636

And, after that, the tablet won't work anymore. I have to reconnect the USB plug (and put the pen in proximity) for it to work again (until the pen isn't in proximity).

comment:8 in reply to:  7 Changed 9 years ago by Pete

Cc: pete.goodeve@… added

Replying to idefix:

Yes, the usb errors are gone. But the tablet will now only work when the pen is always in proximity of the tablet; [....]

I can confirm this exact behaviour with my Intuos tablet. I guess timeout errors are better than not working at all, but I hope a complete fix can be figured out.

comment:9 Changed 9 years ago by mmlr

Yeah, I've introduced a bug with that change. Can you please retest with hrev34932? Keeping the previous leak is definitely not an option, as it simply is a bug (after a long enough time it would exhaust the USB physical memory pool and kill off all other USB transfers).

comment:10 in reply to:  9 Changed 9 years ago by Pete

Replying to mmlr:

Yeah, I've introduced a bug with that change. Can you please retest with hrev34932? Keeping the previous leak is definitely not an option, as it simply is a bug (after a long enough time it would exhaust the USB physical memory pool and kill off all other USB transfers).

That's better! (:-)) Yep, it seems to do everything properly now. No errors at unplug, and it doesn't go dead with the pen removed.

comment:11 Changed 9 years ago by stippi

Resolution: fixed
Status: newclosed

Awesome work, mmlr! Sorry I never took the time to dig into this deeper myself.

comment:12 in reply to:  9 Changed 9 years ago by idefix

Also a big Thank you! from me! :)

comment:13 Changed 9 years ago by idefix

I think I've cheered too early.

I still see a linearity in the amount of usb errors and the time the pen isn't in proximity, but:

  • there are a lot less errors per time,
  • the amount seems to reset as soon as the pen is in proximity again.

Will test some more this weekend.

comment:14 Changed 9 years ago by Pete

Resolution: fixed
Status: closedreopened

Again it seems idefix is right. I guess I didn't leave the tablet connected long enough last night. There are very many fewer errors now though. What I see -- maybe one per minute -- is:

usb error uhci 0: td (0x020e1c20) error: status: 0x214507ff; token: 0x01208969;

The status is slightly different from both what idefix reported and what I got yesterday (0x204507ff). Is there any cheat-sheet for these codes anywhere?

comment:15 in reply to:  14 ; Changed 9 years ago by mmlr

Replying to Pete:

Again it seems idefix is right. I guess I didn't leave the tablet connected long enough last night. There are very many fewer errors now though. What I see -- maybe one per minute -- is:

usb error uhci 0: td (0x020e1c20) error: status: 0x214507ff; token: 0x01208969;

I don't quite understand: did these happen during normal use already before? I thought this was only about unplugging. There will always be such error output on unplug as there simply is a time between the device not being connected anymore and the stack getting the unplug event. These are totally harmless and expected.

It is entirely possible that the device stalls some of the transfers from time to time for various (vendor/device specific) reasons. This is largely unproblematic as long as the driver handles these cases gracefully (usb_hid at least does so). So if the device now works despite these errors I'd like to close this one. Actually since this bug report is specifically about the leaked transfers I'd like to close it in any case and then maybe open another ticket for the runtime issue.

The status is slightly different from both what idefix reported and what I got yesterday (0x204507ff). Is there any cheat-sheet for these codes anywhere?

You can always look it up in the UHCI specs of course. There's also uhci_decode in the tree which decodes the status value into it's fields (jam run :uhci_decode 0x204507ff, note that the 0x7ff actually indicates a 0-byte actual length). In general the result likely won't tell you much more. The UHCI/EHCI error codes are not very specific as they are made for indicating various hardware/lowlevel issues, higher level issues will generally simply result in stalls. OHCI is only slightly better.

The general rule is that stalls can happen from time to time and should be dealt with either the generic clear_feature(ENDPOINT_HALT) or with error recovery specific to the USB device class if defined (mass storage defines the mass storage reset as an error recovery method for example, other classes do similar things).

I hope this clarifies it a bit (and also answers your private mail).

comment:16 in reply to:  15 ; Changed 9 years ago by idefix

Replying to mmlr:

Replying to Pete:

Again it seems idefix is right. I guess I didn't leave the tablet connected long enough last night. There are very many fewer errors now though. What I see -- maybe one per minute -- is:

usb error uhci 0: td (0x020e1c20) error: status: 0x214507ff; token: 0x01208969;

I don't quite understand: did these happen during normal use already before? I thought this was only about unplugging.

I think he meant that for every minute since the last time the pen was in proximity of the tablet, there will be an usb error when you unplug the tablet. That's what I see anyway.

There will always be such error output on unplug as there simply is a time between the device not being connected anymore and the stack getting the unplug event. These are totally harmless and expected.

Yes, just as you said in comment:1, but then you wouldn't see this linearity (like I said in comment:2).

It is entirely possible that the device stalls some of the transfers from time to time for various (vendor/device specific) reasons. This is largely unproblematic as long as the driver handles these cases gracefully (usb_hid at least does so). So if the device now works despite these errors I'd like to close this one. Actually since this bug report is specifically about the leaked transfers I'd like to close it in any case and then maybe open another ticket for the runtime issue.

I predict that this will still cause my computer to exhaust the USB physical memory pool, but it will take longer and during that time the pen must not be in proximity. I will test this this weekend.

comment:17 in reply to:  16 Changed 9 years ago by Pete

Replying to idefix:

Replying to mmlr:

Replying to Pete:

Again it seems idefix is right. I guess I didn't leave the tablet connected long enough last night. There are very many fewer errors now though. What I see -- maybe one per minute -- is:

usb error uhci 0: td (0x020e1c20) error: status: 0x214507ff; token: 0x01208969;

I don't quite understand: did these happen during normal use already before? I thought this was only about unplugging.

I think he meant that for every minute since the last time the pen was in proximity of the tablet, there will be an usb error when you unplug the tablet. That's what I see anyway.

Yes -- thanks for clarifying my muddy statement. (:-/) They accumulate, but only show up at unplugging.

comment:18 in reply to:  15 Changed 9 years ago by Pete

Replying to mmlr:

[....]

The status is slightly different from both what idefix reported and what I got yesterday (0x204507ff). Is there any cheat-sheet for these codes anywhere?

You can always look it up in the UHCI specs of course. There's also uhci_decode in the tree which decodes the status value into it's fields (jam run :uhci_decode 0x204507ff, note that the 0x7ff actually indicates a 0-byte actual length). In general the result likely won't tell you much more. The UHCI/EHCI error codes are not very specific as they are made for indicating various hardware/lowlevel issues, higher level issues will generally simply result in stalls. OHCI is only slightly better.

Thanks. I'll check that out.

The general rule is that stalls can happen from time to time and should be dealt with either the generic clear_feature(ENDPOINT_HALT) or with error recovery specific to the USB device class if defined (mass storage defines the mass storage reset as an error recovery method for example, other classes do similar things).

I hope this clarifies it a bit (and also answers your private mail).

I think so. (Regarding my usb_midi question in my mail, I gather I was right to think the unplug errors are harmless there.) Thanks.

comment:19 in reply to:  16 Changed 9 years ago by idefix

Replying to idefix:

Replying to mmlr:

There will always be such error output on unplug as there simply is a time between the device not being connected anymore and the stack getting the unplug event. These are totally harmless and expected.

Yes, just as you said in comment:1, but then you wouldn't see this linearity (like I said in comment:2).

I have tested this some more and for every minute since the last time that the pen was in proximation there will be four errors when I unplug the tablet. There is an upper limit however: there won't be more than ~20 errors when unplugging the tablet.

I predict that this will still cause my computer to exhaust the USB physical memory pool, but it will take longer and during that time the pen must not be in proximity. I will test this this weekend.

Because of the upper limit in usb errors shown when unplugging the tablet, I'm not entirely sure if my prediction still holds. It does however cause my cardreader to stop working when the pen is not in proximation for ~4 minutes (see comment:ticket:2344:21).

comment:20 Changed 2 months ago by waddlesplash

Component: Drivers/Mouse/USBDrivers/Input/USB-HID

component -> USB-HID (new)

Note: See TracTickets for help on using tickets.