Opened 7 years ago

Closed 7 years ago

#8987 closed bug (fixed)

ps2: initial setup of command byte failed

Reported by: x-ist Owned by: siarzhuk
Priority: normal Milestone: R1
Component: Drivers/Keyboard/PS2 Version: R1/Development
Keywords: ps2 keyboard touchpad OHCI Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

This is haiku-hrevr1alpha4-44597.

Tested on Acer Aspire 7551G Phenom N390 2GHz
This ticket represents a sub-issue of #8984.
The built-in keyboard and touchpad do not work.

I built a debug version of the ps2 bus manager and the ps2_hid addon to get the debug traces in syslog, which is attached.
The following lines are obviously of interest:

KERN: ps2: ps2_command result 0xffffffff
KERN: ps2: set command byte: res 0xffffffff, cmdbyte 0x44
KERN: ps2: initial setup of command byte failed
KERN: ps2: ps2_service_exit enter
KERN: ps2: ps2_service_exit done
KERN: ps2: init failed!

Attachments (8)

syslog (301.6 KB) - added by x-ist 7 years ago.
syslog with traces from debug versions of ps2 bus manager and ps2_hid
syslog_with_atheros_removed (149.2 KB) - added by x-ist 7 years ago.
syslog_with_ohci_bus_removed (235.2 KB) - added by x-ist 7 years ago.
syslog_ohci_debug (319.6 KB) - added by x-ist 7 years ago.
ints (2.7 MB) - added by x-ist 7 years ago.
interrupts
ohci.zip (367.2 KB) - added by x-ist 7 years ago.
Sample ohci bus driver (to be copied into /system/add-ons/kernel/busses/usb)
FixHandoverFromSMM.patch (2.7 KB) - added by x-ist 7 years ago.
Keeps ownership change request interrupt enabled for handover.
FixHandoverFromSMM.2.patch (3.2 KB) - added by x-ist 7 years ago.

Change History (45)

Changed 7 years ago by x-ist

Attachment: syslog added

syslog with traces from debug versions of ps2 bus manager and ps2_hid

comment:1 Changed 7 years ago by siarzhuk

Could you please try to remove atheros wifi driver from the target image and repeat your test?

comment:2 in reply to:  1 Changed 7 years ago by x-ist

Replying to siarzhuk:

Could you please try to remove atheros wifi driver from the target image and repeat your test?

Did that. The new syslog is attached. It seems atheroswifi is actually not the root of the problem. The keyboard is still not working.

Changed 7 years ago by x-ist

Attachment: syslog_with_atheros_removed added

comment:3 Changed 7 years ago by siarzhuk

In some reason it cannot read anything from the ps2 registers at all. Could you, please, acquire syslog files with disabled APIC than with disabled SMP? Thanks.

comment:4 in reply to:  3 ; Changed 7 years ago by x-ist

Keywords: OHCI added

Replying to siarzhuk:

In some reason it cannot read anything from the ps2 registers at all. Could you, please, acquire syslog files with disabled APIC than with disabled SMP? Thanks.

Nope, didn't help, but
using boot-time onscreen debugging the last entries shown where regarding USB. Then I tried to boot with usb removed from /system/add-ons/kernel/busses and .. it worked! (new syslog attached) Afterwards I narrowed down the selection to OHCI only, i.e. removed only the OHCI bus from usb folder still having a functioning keyboard + touchpad.

It seems the OHCI bus driver is interfering here.

Changed 7 years ago by x-ist

comment:5 in reply to:  4 ; Changed 7 years ago by mmlr

Replying to x-ist:

Afterwards I narrowed down the selection to OHCI only, i.e. removed only the OHCI bus from usb folder still having a functioning keyboard + touchpad.

That is most probably related to legacy emulation. The first syslog mentions the SMM (System Management Mode, i.e. firmware/BIOS) being in control at the time OHCI is initialized and it not responding to the handover request. It is possible that the reset after that leaves things in a bad state. You could try disabling "USB legacy support" or similarly named options in the BIOS and see if that results in a working system as well.

comment:6 in reply to:  5 ; Changed 7 years ago by x-ist

Replying to mmlr:

That is most probably related to legacy emulation. The first syslog mentions the SMM (System Management Mode, i.e. firmware/BIOS) being in control at the time OHCI is initialized and it not responding to the handover request. It is possible that the reset after that leaves things in a bad state.

Should I make debug builds of something, ohci maybe, to get more traces?

You could try disabling "USB legacy support" or similarly named options in the BIOS and see if that results in a working system as well.

Unfortunately that BIOS doesn't offer anything to configure in this regard.

Changed 7 years ago by x-ist

Attachment: syslog_ohci_debug added

comment:7 in reply to:  6 Changed 7 years ago by x-ist

Just added another syslog with OHCI bus and bus_manager debug traces. Here's an abstract:

KERN: usb stack 0: adding module busses/usb/ohci
KERN: usb ohci: searching devices
KERN: usb ohci: found device at IRQ 18
KERN: usb ohci -1: constructing new OHCI host controller driver
KERN: usb ohci -1: iospace offset: 0xd0504000
KERN: add_memory_type_range(159, 0xd0504000, 0x1000, 0)
KERN: usb ohci -1: mapped operational registers: 0x81771000
KERN: usb ohci -1: version 1.0, legacy support
KERN: usb stack 0: allocating 256 bytes for USB OHCI Host Controller Communication Area
KERN: usb stack 0: area = 160, size = 4096, log = 0x81772000, phy = 0x505a000
KERN: usb ohci -1: smm is in control of the host controller
KERN: usb error ohci -1: smm does not respond. resetting...

Could it be that IRQ 18 was used by the ps2 interface previously and is then grabbed by OHCI?

comment:8 Changed 7 years ago by anevilyak

Unlikely, PS2 is generally fixed at IRQ 12 if memory serves.

Changed 7 years ago by x-ist

Attachment: ints added

interrupts

comment:9 Changed 7 years ago by x-ist

Added output of ints from kernel_debugger after booting without OHCI.

comment:10 in reply to:  5 Changed 7 years ago by x-ist

That is most probably related to legacy emulation. The first syslog mentions the SMM (System Management Mode, i.e. firmware/BIOS) being in control at the time OHCI is initialized and it not responding to the handover request.

I had a look at the freeBSD implementation of that handover. That's where we borrowed this code part from, right? Actually there is a difference in performing the ownership change request.

FreeBSD:

/* Determine in what context we are running. */
ctl = OREAD4(sc, OHCI_CONTROL);
if (ctl & OHCI_IR) {
	/* SMM active, request change */
	DPRINTF("SMM active, request owner change\n");
	OWRITE4(sc, OHCI_COMMAND_STATUS, OHCI_OCR);
	...

Haiku:

// Determine in what context we are running (Kindly copied from FreeBSD)
uint32 control = _ReadReg(OHCI_CONTROL);
if (control & OHCI_INTERRUPT_ROUTING) {
	TRACE_ALWAYS("smm is in control of the host controller\n");
	uint32 status = _ReadReg(OHCI_COMMAND_STATUS);
	_WriteReg(OHCI_COMMAND_STATUS, status | OHCI_OWNERSHIP_CHANGE_REQUEST);

I wondered why we are OR-ing the status with OHCI_OWNERSHIP_CHANGE_REQUEST, which FreeBSD does not.

comment:11 Changed 7 years ago by x-ist

Has a Patch: set

comment:12 Changed 7 years ago by x-ist

The patch attached makes my ps2 keyboard and touchpad work together with the additional USB keyboard + mouse attached. It definetly fixes #8987, #8984 and possibly #7897, i.e. it also fixes occasional Media server chrashes during System shutdown I observed.
Patch description:

  • OHCI interrupts must no be disabled when performing ownership change request. Otherwise the SMM does not reset the OHCI_INTERRUPT_ROUTING flag as expected. Thus disabling the OHCI interrupts is postponed to after the handover procedure.
  • Removed excess reset upon a failed handover since we do a reset anyway, regardless of the handover result.

comment:13 in reply to:  11 ; Changed 7 years ago by siarzhuk

Nice catch. Thank you!

Replying to x-ist:

From 4f5df7df8b17d5df3f5ed4f576a2446c2e68712e Mon Sep 17 00:00:00 2001
From: Yourself <user@shredder.(none)>
Date: Sun, 23 Sep 2012 11:18:37 +0000
Subject: [PATCH] Postpone disabling of OHCI interrupts to after handover from SMM.

Could you, please, provide your realname and e-mail? It will be required during comitting this patch. Thank you.

Last edited 7 years ago by siarzhuk (previous) (diff)

comment:14 in reply to:  13 Changed 7 years ago by x-ist

Replying to siarzhuk:

Could you, please, provide your realname and e-mail? It will be required during comitting this patch. Thank you.

Hmm, there's always something I forget .. :)
Just added to the patch.

comment:15 in reply to:  12 Changed 7 years ago by mmlr

Replying to x-ist:

It definetly fixes #8987, #8984 and possibly #7897, i.e. it also fixes occasional Media server chrashes during System shutdown I observed.

I don't exactly see how this would be related to the media_server. Can you elaborate? I see that #8588 reports something similar, however I would then tend to say that an interrupt sharing issue existed with some audio hardware that is more related to the audio driver than OHCI.

Patch description:

  • OHCI interrupts must no be disabled when performing ownership change request. Otherwise the SMM does not reset the OHCI_INTERRUPT_ROUTING flag as expected. Thus disabling the OHCI interrupts is postponed to after the handover procedure.

Where do you get the "must not be disabled" from? Just from your testing or is there an actual reference? The specs are rather terse about the whole handover process, so it's quite hard to know for sure what the correct environment for the handover is. So if there is a reference I'd like to have it included in the commit message, otherwise I'd like to have the commit message reworded to sound less authoritative.

As you can see in the blame list I've moved it up there in hrev41513 in an attempt to maybe help on some chipsets that got stuck at that point. I think that at least on some of these systems the real problem were coming from the C1E problems for which a fix has been committed since (see #3999). The original idea was that there might be firmware that doesn't disable interrupts on handover, leading to interrupt storms right after the interrupts are routed to the OS. Please read through #8085 where a pretty similar patch has been developed and exactly that fear of random interrupt storms has been brought up by me (comments 15, 17) and confirmed by the author of the patch (comment 18). Therefore I don't think it is a good idea to commit this patch as is. As I mentioned in my first comment (15) in the other ticket, an alternative to disabling all interrupts is to disable all but the ownership change request interrupt. That would seem more correct and logical to me in either case. Please see if that alone fixes your problems as well.

  • Removed excess reset upon a failed handover since we do a reset anyway, regardless of the handover result.

The original idea was to only reset in the error case, that's where this reset came from. The general reset was introduced in hrev22625 and came from FreeBSD, where the same was or may still be done. The argument for not resetting is that it takes a lot of time to complete and happens in a boot state where nothing else can happen, hence prolongs the boot up time (and since there often are multiple OHCI controllers, it does so quite considerably). If at all possible we should try to later remove that again. Nothing speaks against adding the on-error reset back at that point of course.

comment:16 Changed 7 years ago by x-ist

Replying to mmlr:

I don't exactly see how this would be related to the media_server. Can you elaborate?

I can try at least. It's a bit like stumbling in fog for me :)

I see that #8588 reports something similar, however I would then tend to say that an interrupt sharing issue existed with some audio hardware that is more related to the audio driver than OHCI.

I didn't observe any king of audio issues, nor did I see something in the syslog. My observations are:

  • On my notebook the OHCI handover like it is implemented now blocks PS2 (somehow). In most cases a trip to KDL (white bar as explained in #8984) is unavoidable.
  • Removing the ps2_hid driver is sufficient to avoid the "white bar" state.
  • Starting Haiku without the USB RF receiver stick helps too.
  • However, then upon shutdown the system freezes, i.e. it stops at the dialog "Asking other processes ..." being unresponsive. Sometimes the media addon server crash dialog appears instead. The click on "Debug" freezes the system finally.

So there must be some kind of triangulation of issues concerning PS2, OHCI and media addon server.

  • OHCI interrupts must no be disabled when performing ownership change request...

Where do you get the "must not be disabled" from?

Not sure whether I understand it correctly but 6.5.8 of

ftp://ftp.compaq.com/pub/supportinformation/papers/hcir1_0a.pdf

sounds like the Host Controller Driver has to ensure that

... an interrupt is generated (unless it is masked) whenever ownership of the Host Controller is passed to and from the operating system’s Host Controller Driver and any SMM-based Host Controller Driver in the system.

As you can see in the blame list I've moved it up there in hrev41513 in an attempt to maybe help on some chipsets that got stuck at that point.

Prior to proposing the patch I found the change in hrev41513 but saw that the handover procedure then differs from the FreeBSD version, which is assumed to be reliable I believe.

Please read through #8085 where a pretty similar patch has been developed ...

That one I missed :P

... an alternative to disabling all interrupts is to disable all but the ownership change request interrupt. That would seem more correct and logical to me in either case. Please see if that alone fixes your problems as well.

Guess what .. It does! Would that be the common denominator then?

  • Removed excess reset upon a failed handover ...

If at all possible we should try to later remove that again. Nothing speaks against adding the on-error reset back at that point of course.

Then adding a ToDo here and there should be appropriate?

Changed 7 years ago by x-ist

Attachment: ohci.zip added

Sample ohci bus driver (to be copied into /system/add-ons/kernel/busses/usb)

comment:17 Changed 7 years ago by x-ist

Those who might have a similar issue could try to use the attached OHCI bus driver. This driver disables all but the ownership change request interrupt upon handover from SMM. That seems to work for the three notebooks I have around. Testing on more systems would be definitely helpful.

comment:18 Changed 7 years ago by x-ist

In order to get a rough picture of how well OHCI works for different systems when all interrupts are disabled prior to handover I googled for "haiku attachment syslog" with a constrained time frame, starting from the 2011-05-15 (date of hrev41513) until now. The resulting syslogs, despite of their actual problem description, show a quite clear trend:

Appart from those systems whose syslog contains

usb ohci: no devices found

most systems report

usb ohci -1: smm is in control of the host controller
usb error ohci -1: smm does not respond. resetting...

Before hrev41513 I didn't find syslogs where "smm does not respond". Hence I tend to believe that this change pretty much broke OHCI for many systems. Unfortunatelly that does not say anything about whether keeping the ownership change request interrupt enabled would work better.

Changed 7 years ago by x-ist

Attachment: FixHandoverFromSMM.patch added

Keeps ownership change request interrupt enabled for handover.

comment:19 Changed 7 years ago by x-ist

As I don't know how we should proceed here I updated my patch which works with my system. At least it should not make things worse for others.

comment:20 Changed 7 years ago by korli

Please also check patch from ticket #8588 ( http://dev.haiku-os.org/attachment/ticket/8588/revert-hrev41513.patch ) or have your patch tested by the ticket owner.

comment:21 in reply to:  20 Changed 7 years ago by x-ist

Replying to korli:

... or have your patch tested by the ticket owner.

The reporter of #8588 (drcouzelis) confirmed the patch to work correctly.

comment:22 Changed 7 years ago by korli

@mmlr please comment on whether the patch FixHandoverFromSMM.patch can be applied as is.

comment:23 in reply to:  22 ; Changed 7 years ago by mmlr

Replying to korli:

@mmlr please comment on whether the patch FixHandoverFromSMM.patch can be applied as is.

The first comment should be rewritten as it sounds a bit "constructed" now and the "ToDo"s should be "TODO"s for highlights to work in some editors. The commit message isn't formatted properly (line lengths, 64 chars on first, 72 chars on the others, maybe indent the lines below the bullet point asterisk). Other than that the change is simple enough, seems more correct in any case and therefore should be applied.

comment:24 in reply to:  23 ; Changed 7 years ago by x-ist

Replying to mmlr:

The first comment should be rewritten as it sounds a bit "constructed" now ...

I just could try to guess what a better comment might be like. It motivates the change by means of the observations I made and your suggestion. I don't have hard arguments unfortunately. The other objections will be resolved of course.

comment:25 in reply to:  24 Changed 7 years ago by mmlr

Replying to x-ist:

Replying to mmlr:

The first comment should be rewritten as it sounds a bit "constructed" now ...

I just could try to guess what a better comment might be like. It motivates the change by means of the observations I made and your suggestion. I don't have hard arguments unfortunately. The other objections will be resolved of course.

I more meant it was a bit rough to read, not that the content was incorrect in any way. I'd suggest making it a bit more elaborate as well. Something like: "When the handover from SMM takes place, all interrupts are routed to the OS. As we don't yet have an interrupt handler installed, this may cause interrupt storms if the firmware does not disable the interrupts during handover. Therefore we disable interrupts before requesting ownership. We have to keep the ownership change interrupt enabled though, as otherwise the SMM will not be notified of the ownership change request we trigger below." More or less summing up the discussion in here so that one doesn't have to wonder next time.

Changed 7 years ago by x-ist

Attachment: FixHandoverFromSMM.2.patch added

comment:26 Changed 7 years ago by x-ist

Has a Patch: unset

Oh dear, please tell me the patch is ok now :P

comment:27 Changed 7 years ago by x-ist

I should have seen that before! There is a linux ohci implementation which does exactly the same thing. It enables the OC interrupt for handover. http://lxr.free-electrons.com/source/drivers/usb/host/ohci-hcd.c#L524 After all, I now have another good source to peek at in future :)

comment:28 Changed 7 years ago by korli

Applied in hrev44689. Closing this ticket, thanks!

comment:29 Changed 7 years ago by korli

Resolution: fixed
Status: newclosed

comment:30 in reply to:  28 ; Changed 7 years ago by x-ist

Replying to korli:

Applied in hrev44689.

I'm not up to date regarding the state of the ongoing alpha4 release process and I believe we have had feature freeze already. However, how big are chances of integrating this change into the Alpha4 branch? I could imagine that some unlucky newcomers (who happen to use OHCI that is) could be quite disappointed by a non-working keyboard/mouse without even getting to know this great OS. Additionally, problems with other system components appear then too as we know.

comment:31 in reply to:  30 ; Changed 7 years ago by korli

Replying to x-ist:

Replying to korli:

Applied in hrev44689.

I'm not up to date regarding the state of the ongoing alpha4 release process and I believe we have had feature freeze already. However, how big are chances of integrating this change into the Alpha4 branch? I could imagine that some unlucky newcomers (who happen to use OHCI that is) could be quite disappointed by a non-working keyboard/mouse without even getting to know this great OS. Additionally, problems with other system components appear then too as we know.

I personally have no objection to including this change in alpha4. mmlr, what's your opinion?

comment:32 in reply to:  31 ; Changed 7 years ago by mmlr

Replying to korli:

I personally have no objection to including this change in alpha4. mmlr, what's your opinion?

Yes, please go ahead and cherry pick it over to alpha4. The change makes the behaviour more correct in any case and, as researched above, should correct a bug that affects many if not most OHCI users.

comment:33 in reply to:  32 Changed 7 years ago by x-ist

Replying to mmlr:

Yes, please go ahead and cherry pick it over to alpha4.

Thank you both!

comment:34 Changed 7 years ago by luroh

Added to R1/Alpha4 in hrevr1alpha4-44643.

comment:35 Changed 7 years ago by x-ist

Resolution: fixed
Status: closedreopened

This is hrev44584-x86gcc2hybrid, a downloaded nightly. The symptoms reappeared.

usb ohci -1: smm is in control of the host controller
usb error ohci -1: smm does not respond. resetting...

As if the bug was never fixed. I don't see any changes in ohci though.

comment:36 Changed 7 years ago by x-ist

Well the latest change in trunk is at hrev44869, but the latest hybrid nightly shown on haiku-files is hrev44584, which was the one I installed. Seems like the nightlies page isn't showing recent builds. Wasn't that fixed already weeks ago.

Please close this ticket.

comment:37 in reply to:  36 Changed 7 years ago by anevilyak

Resolution: fixed
Status: reopenedclosed

Replying to x-ist:

Well the latest change in trunk is at hrev44869, but the latest hybrid nightly shown on haiku-files is hrev44584, which was the one I installed. Seems like the nightlies page isn't showing recent builds. Wasn't that fixed already weeks ago.

Not an error, nightlies have been temporarily disabled due to the release of A4, they'll be back soonish.

Note: See TracTickets for help on using tickets.