Opened 14 months ago

Last modified 2 months ago

#18593 new bug

Autovectorization (SSE2+) causes issues in QEMU following GCC 13 upgrade

Reported by: waddlesplash Owned by: nobody
Priority: normal Milestone: R1/beta6
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: #18541, #18550, #18562 Blocking:
Platform: All

Description (last modified by waddlesplash)

This is a meta-ticket created to encompass the various symptoms which seem to have implicit SSE2 usage from GCC 13 optimizations as a common cause.

In summary:

  • QEMU/KVM: Hangs on rocket during network device initialization, "emulation failure" in console (#18541), sometimes reports as "paused" (#18562)
  • Hyper-V: GPEs, SMEP volations, READ/WRITE FAULTs in KDL, etc. (#18550)
  • VMware: READ/WRITE FAULT on KDL backtraces into userland, area contains, and a bunch of other KDL commands (seen in #17233.)

The issues in QEMU/KVM do not occur in VMware and vice versa. Compiling the kernel and drivers with -mno-sse2 (i.e. leaving SSE(1) enabled for standard floating-point usage) seems to resolve the problems.

Change History (26)

comment:1 by waddlesplash, 14 months ago

Note: regular QEMU without KVM doesn't have any issues, it works fine. To my knowledge, note of these problems reproduce on bare metal, either.

comment:2 by waddlesplash, 14 months ago

The problem isn't related to usage of XSAVE, it appears. I disabled that in arch_cpu.cpp and the problems in VMware noted above persisted.

comment:3 by waddlesplash, 14 months ago

Disabled autovectorization in hrev57286, which should "fix" the above problems.

comment:4 by waddlesplash, 14 months ago

Blocked By: 18550 added
Description: modified (diff)

comment:5 by jmairboeck, 14 months ago

For reference: the READ/WRITE FAULTs also occur in Hyper-V the same as VMware (see the comments of the Hyper-V ticket).

comment:6 by tqh, 14 months ago

Just to make sure, does your native CPU support SSE2? It sounds like maybe you run QEMU/KVM with incompatible emulation flags?

comment:7 by waddlesplash, 14 months ago

SSE2 is required to be present on all x86_64 machines, so that won't be the problem here.

comment:8 by waddlesplash, 14 months ago

Blocked By: 17233 removed

comment:9 by tqh, 14 months ago

Right, but what CPU are you running QEMU/kvm with? The default is quite crippled as it is a safe option to do live migrations of VM's so probably has very few features. Here is a bit of info: https://www.qemu.org/docs/master/system/i386/cpu.html

comment:10 by waddlesplash, 14 months ago

All tickets in question were tested against Haiku x86_64, I think, so it doesn't matter what CPU is selected for emulation as in order for Haiku to boot at all, there has to be SSE2.

comment:11 by tqh, 14 months ago

QEMU doesn't enable all features of the CPU unless you do -cpu host. It only does a small subset, so not sure it does sse2. Please try with one of the options that has more than "baseline" x86 cpu.

comment:12 by waddlesplash, 14 months ago

Again, SSE2 is part of the base instruction set for AMD64/x86_64. It's not legal to have a CPU without them, and QEMU handles this correctly. If it didn't, we would get illegal opcode exceptions when userland started, as autovectorization is still enabled there and is used for drawing operations.

But you can also see in the syslogs in #18541 that SSE2 and all sorts of other CPU extensions are advertised, so again, that's not the problem.

comment:13 by tqh, 14 months ago

I just asked if you ran with any of the recommended settings for KVM, I'd expect the non recommended might ones might not work that well.

comment:14 by pulkomandy, 14 months ago

Again, SSE2 is part of the base instruction set for AMD64/x86_64. It's not legal to have a CPU without them, and QEMU handles this correctly.

But that's not the only thing we use. For example, XSAVE may not be supported by the QEMU emulated CPU. It needs a Sandy Bridge machine at least. So, if the tests were made with the default QEMU settings, enabling or disabling XSAVE won't make a difference, all your tests were in fact made without XSAVE.

If, however, you enable a more modern CPU in QEMU, maybe it uses XSAVE.

And this is just one example of a CPU feature. There may be others like this. The question is not just about wether the CPU supports SSE2.

Another question: was this reported to QEMU and other VM developers? Since things work on real hardware, they may want to know about it.

comment:15 by waddlesplash, 14 months ago

I checked the syslogs for "xsave" feature.

comment:16 by jackburton, 14 months ago

I guess that git operations locking the userland (in QEMU/KVM) could be a symptom of the same problem ?

comment:17 by waddlesplash, 14 months ago

I doubt it. That probably deserves its own ticket. Debug reports will be needed to determine whether that is a Git problem or a Haiku problem.

comment:18 by waddlesplash, 14 months ago

So, the READ/WRITE FAULT on KDL commands was probably due to stack misalignment in one of the arch_debug functions. That was fixed in hrev57304.

in reply to:  18 comment:19 by volodroid, 12 months ago

Replying to waddlesplash:

So, the READ/WRITE FAULT on KDL commands was probably due to stack misalignment in one of the arch_debug functions. That was fixed in hrev57304.

Does it mean there are currently no issues in VMWare with SSE2+ enabled (and the bug description should be updated)? Or an additional testing is needed? (I can help with one).

comment:20 by waddlesplash, 12 months ago

Right now the problems in the ticket have been mitigated by disabling autovectorization optimizations for the whole kernel. So you have to make custom builds to even potentially encounter the problems at the moment.

in reply to:  20 ; comment:21 by volodroid, 12 months ago

Replying to waddlesplash:

Got it. What I meant is if the issue on VMWare has been caused by some other bug (the one fixed in hrev57304) then maybe we were too quick in the assumption about a generic issue with the SSE2+ code on virtual machines. But AFAIU this has not been retested on VMWare after with the hrev57304 fix and SSE2+ enabled, right?

Unfortunately right now I don't have enough free time to learn how to build (and actually build) a custom build myself. Is there other possibility? Like asking someone on the forum who's familiar with building OS to do it and share an SSE2+ enabled Kernel file with me for the tests? Would that work?

in reply to:  21 comment:22 by madmax, 12 months ago

Replying to volodroid:

Like asking someone on the forum who's familiar with building OS to do it

Here is hrev57386 with hrev57286 reverted.

comment:23 by waddlesplash, 10 months ago

Priority: blockernormal
Summary: Issues in virtual machines related to SSE2+ usage following GCC 13 upgradeAutovectorization (SSE2+) causes issues in QEMU following GCC 13 upgrade

Adjusting ticket title and downgrading priority.

comment:24 by nephele, 6 months ago

Milestone: R1/beta5R1/beta6

Right now the problems in the ticket have been mitigated by disabling autovectorization optimizations for the whole kernel. So you have to make custom builds to even potentially encounter the problems at the moment.

Since this is mitigated for the moment this can be in beta6

comment:25 by pulkomandy, 6 months ago

Milestone: R1/beta6R1/beta5

It would be nice to fix it. The mitigation means we can't use modern CPU instructions in the kernel, with likely performance implications.

It's not a blocker for beta 5 but it would definitely be a good idea to take a closer look.

comment:26 by waddlesplash, 2 months ago

Milestone: R1/beta5R1/beta6

move remaining tickets to beta6

Note: See TracTickets for help on using tickets.