Opened 2 years ago

Last modified 16 months ago

#17852 new bug

VirtualBox KDL "AT Keyboard 1 watcher" on boot

Reported by: avanspector Owned by: nobody
Priority: high Milestone: Unscheduled
Component: Drivers/Input/PS2/Keyboard Version: R1/Development
Keywords: Cc: pulkomandy
Blocked By: Blocking: #17106, #17873, #18524
Platform: x86-64

Description

Both livecd and installed systems.

Attachments (2)

Снимок экрана 2022-07-27 181008.png (42.6 KB ) - added by avanspector 2 years ago.
VirtualBox_Haiku_25_01_2023_18_21_22.png (31.5 KB ) - added by gridsleep 22 months ago.
Running in Virtualbox 6.1, Windows 10 64, MSI laptop. Unable to perform dis, VM is frozen immediately after banner screen. Can only restart or shut down.

Download all attachments as: .zip

Change History (24)

comment:1 by waddlesplash, 2 years ago

Component: - GeneralDrivers/Input/PS2/Keyboard

What hrev, please?

in reply to:  1 comment:2 by avanspector, 2 years ago

Replying to waddlesplash:

What hrev, please?

56231, 56272, 56315 as i tested

in reply to:  1 comment:3 by avanspector, 2 years ago

Replying to waddlesplash:

What hrev, please?

Turns out it was crashing, because i gave the VM only 1 core. If give it 2 cores, works fine. Btw, is it necessary to be that way? Thought system is pretty light on things like this

comment:4 by waddlesplash, 2 years ago

It definitely shouldn't crash with only one core. In fact if anything I would expect a crash like this to happen with 2+ cores and not just one. That's extremely strange...

comment:5 by waddlesplash, 2 years ago

Blocking: 17873 added

comment:6 by waddlesplash, 2 years ago

Cc: pulkomandy added

There is now another report of this issue. CC'ing PulkoMandy who recently modified the PS/2 keyboard code, perhaps something was "shaken loose" there?

comment:7 by pulkomandy, 2 years ago

One thing I'm confused about.

The error says it is at address 0x864c6e8f in kernel space.

By looking at the register dump I see a register set to 0xffffffff864c6e8f (that would be a pointer to something allocated on the stack I think).

Was that pointer accidentally cast to 32 bits somewhere? Is that just a KDL display problem/quirk or is that really the root cause of this panic?

Can we get a disassembly of the code that's being run? (type "dis" at the KDL prompt).

comment:8 by waddlesplash, 2 years ago

Anyone who can reproduce this issue reliably, can you reply to PulkoMandy's request above (i.e. run "dis" at the KDL prompt and upload a new KDL log?)

by gridsleep, 22 months ago

Running in Virtualbox 6.1, Windows 10 64, MSI laptop. Unable to perform dis, VM is frozen immediately after banner screen. Can only restart or shut down.

comment:9 by pulkomandy, 22 months ago

That makes sense, since the keyboard driver isn't working. Can VirtualBox emulate a serial port and allow sending the commands there?

comment:10 by korli, 22 months ago

it looks like the crash happens at https://cgit.haiku-os.org/haiku/tree/src/add-ons/kernel/bus_managers/ps2/ps2_dev.cpp#n434

r13 contains 0x100000000 which theorically shouldn't be a problem because the low 32-bit is 0 (the first output byte index), but on x86_64 the generated code is: movzbl (%rax,%r13,1),%edi so it uses the 64-bit value leading to an address like 0x864c6e8f.

The question is why r13 comes to the value of 0x100000000. it is being set to 0xffffffffffffffff, then add one. This could simply be another VirtualBox bug. The exact VirtualBox version where this is reproduced would be interessant, while also trying with the last 6.1.x release. I couldn't find something telling in the changelog.

Last edited 22 months ago by pulkomandy (previous) (diff)

comment:11 by pulkomandy, 22 months ago

One case that would cause this and seems not unlikely to happen by accident. The setting of r13 uses a mov with sign extension:

From https://www.felixcloutier.com/x86/mov:

C7 /0 id	        MOV r/m32, imm32	MI	Valid	Valid	Move imm32 to r/m32.
REX.W + C7 /0 id	MOV r/m64, imm32	MI	Valid	N.E.	Move imm32 sign extended to 64-bits to r/m64.

As you can see, the normal version of C7 is a simple MOV. But the 64bit version with the REX.W prefix should sign-extend. This seems easy to overlook so my guess is Virtualbox somehow missed it? However I don't know how their internal works and couldn't find where they implement this instruction.

Last edited 22 months ago by pulkomandy (previous) (diff)

comment:12 by gridsleep, 22 months ago

The problem in my case seems to be due to using a USB keyboard, since three keys of my MSI laptop are dead and can only be simulated using ALT-codes. I started VirtualBox one more time and it made it to the install screen (with the USB keys still plugged in). After the installation the error came back. I unplugged the USB keyboard and Haiku is starting up properly. I will have to replace the MSI keybed which I was avoiding due to the trouble. The laptop os out of warranty so sending it to CA is out of the question. Thanks for you attention. I hope this information helps someone.

comment:13 by pulkomandy, 22 months ago

For completeness, this is discussed on https://discuss.haiku-os.org/t/error-at-start-of-virtualbox-install-keyboard-related/13029 and it happened there with Virtualbox 6.1.30.

comment:14 by korli, 22 months ago

The host CPU would also be interessant to compare. Interessant is that clang tricks and computes everything from 0 instead of -1. We could also adjust the code to have the same behavior with gcc:

diff --git a/src/add-ons/kernel/bus_managers/ps2/ps2_dev.cpp b/src/add-ons/kernel/bus_managers/ps2/ps2_dev.cpp
index 7637644427..df895c89e9 100644
--- a/src/add-ons/kernel/bus_managers/ps2/ps2_dev.cpp
+++ b/src/add-ons/kernel/bus_managers/ps2/ps2_dev.cpp
@@ -380,11 +380,10 @@ standard_command_timeout(ps2_dev* dev, uint8 cmd, const uint8* out,
        bigtime_t start;
 #endif
        int32 sem_count;
-       int i;
 
        TRACE("ps2: ps2_dev_command cmd 0x%02x, out-count %d, in-count %d, "
                "dev %s\n", cmd, out_count, in_count, dev->name);
-       for (i = 0; i < out_count; i++)
+       for (int i = 0; i < out_count; i++)
                TRACE("ps2: ps2_dev_command tx: 0x%02x\n", out[i]);
 
        res = get_sem_count(dev->result_sem, &sem_count);
@@ -402,7 +401,7 @@ standard_command_timeout(ps2_dev* dev, uint8 cmd, const uint8* out,
        dev->result_buf = in;
 
        res = B_OK;
-       for (i = -1; res == B_OK && i < out_count; i++) {
+       for (int i = 0; res == B_OK && i < out_count + 1; i++) {
 
                atomic_and(&dev->flags,
                        ~(PS2_FLAG_ACK | PS2_FLAG_NACK | PS2_FLAG_GETID | PS2_FLAG_RESEND));
@@ -422,7 +421,7 @@ standard_command_timeout(ps2_dev* dev, uint8 cmd, const uint8* out,
 
                res = ps2_wait_write();
                if (res == B_OK) {
-                       if (i == -1) {
+                       if (i == 0) {
                                if (cmd == PS2_CMD_GET_DEVICE_ID)
                                        atomic_or(&dev->flags, PS2_FLAG_CMD | PS2_FLAG_GETID);
                                else if (cmd == PS2_CMD_RESEND)
@@ -431,7 +430,7 @@ standard_command_timeout(ps2_dev* dev, uint8 cmd, const uint8* out,
                                        atomic_or(&dev->flags, PS2_FLAG_CMD);
                                ps2_write_data(cmd);
                        } else {
-                               ps2_write_data(out[i]);
+                               ps2_write_data(out[i - 1]);
                        }
                }
 
@@ -485,7 +484,7 @@ standard_command_timeout(ps2_dev* dev, uint8 cmd, const uint8* out,
 #ifdef TRACE_PS2_DEV
                        TRACE("ps2: ps2_dev_command wait for input res 0x%08" B_PRIx32 ", "
                                "wait-time %" B_PRId64 "\n", res, system_time() - start);
-                       for (i = 0; i < in_count; i++)
+                       for (int i = 0; i < in_count; i++)
                                TRACE("ps2: ps2_dev_command rx: 0x%02x\n", in[i]);
 #endif
                }

comment:15 by pulkomandy, 22 months ago

I asked on Mastodon and someone confirmed they can reproduce the problem in VirtualBox 1.6.30 but not in the latest 7.x release.

https://mastodon.tetaneutral.net/@jookia@social.tchncs.de/109757725578214293

So, yes, we can workaround it, but it's likely that this version of Virtualbox will fail in other places as well?

comment:16 by korli, 22 months ago

Looks like Virtualbox fixes alike bugs lately, ie https://www.virtualbox.org/changeset/96860/vbox/trunk/

I suppose the bug isn't trivially in other places, because normally this would run on the cpu, but in this case, it looks like the code is interpreted by virtualbox.

comment:17 by pulkomandy, 22 months ago

Yes, reading the virtualbox manual (for version 6) it says that they will revert to software emulation for example when interrupts are disabled, and a lot more for kernel code than for user code.

I was looking at the code for the MOV instruction initializing R13 but did not find anything too strange. It is implemented here: https://www.virtualbox.org/browser/vbox/trunk/src/VBox/VMM/VMMAll/IEMAllInstructionsOneByte.cpp.h?rev=96860#L6473

The access to the operand is done using IEM_OPCODE_GET_NEXT_S32_SX_U64 which can be implemented in various different ways depending on both compile time and runtime factors. All it takes is one of these variants being incorrect. I checked some of them and they look ok, relying on the C compiler to do the sign extension by doing some casts between different types.

In VBox 1.6.30 this macro was defined in IEMAll.cpp but it is not there anymore. It ended up calling functions like https://www.virtualbox.org/browser/vbox/trunk/src/VBox/VMM/VMMAll/IEMAll.cpp?rev=96860#L1419 (or the non-slow variant if possible).

Last edited 22 months ago by pulkomandy (previous) (diff)

comment:19 by waddlesplash, 22 months ago

Based on the stack trace (and the panic message confirms this), we should not be running in "interrupts disabled" mode here. So, are we in fact doing so? That sounds wrong. Alternatively, is there some other reason VirtualBox has chosen to use emulation instead of the recompiler? Maybe we can use this to identify why Haiku is so slow in VirtualBox.

comment:20 by pulkomandy, 22 months ago

Here is the full page of documentation about software virtualization. Interrupts are just one of many things that can trigger it: https://docs.oracle.com/en/virtualization/virtualbox/6.0/admin/swvirt-details.html

Note that this seems to be gone in version 7 of Virtualbox, I think the usage of hardware technologies largely replaced it.

comment:21 by waddlesplash, 16 months ago

Blocking: 17106 added

comment:22 by waddlesplash, 16 months ago

Blocking: 18524 added
Note: See TracTickets for help on using tickets.