Opened 14 months ago
Last modified 2 months ago
#18593 new bug
Autovectorization (SSE2+) causes issues in QEMU following GCC 13 upgrade
Reported by: | waddlesplash | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | R1/beta6 |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | #18541, #18550, #18562 | Blocking: | |
Platform: | All |
Description (last modified by )
This is a meta-ticket created to encompass the various symptoms which seem to have implicit SSE2 usage from GCC 13 optimizations as a common cause.
In summary:
- QEMU/KVM: Hangs on rocket during network device initialization, "emulation failure" in console (#18541), sometimes reports as "paused" (#18562)
- Hyper-V: GPEs, SMEP volations, READ/WRITE FAULTs in KDL, etc. (#18550)
- VMware: READ/WRITE FAULT on KDL backtraces into userland,
area contains
, and a bunch of other KDL commands (seen in #17233.)
The issues in QEMU/KVM do not occur in VMware and vice versa. Compiling the kernel and drivers with -mno-sse2
(i.e. leaving SSE(1) enabled for standard floating-point usage) seems to resolve the problems.
Change History (26)
comment:1 by , 14 months ago
comment:2 by , 14 months ago
The problem isn't related to usage of XSAVE, it appears. I disabled that in arch_cpu.cpp
and the problems in VMware noted above persisted.
comment:3 by , 14 months ago
Disabled autovectorization in hrev57286, which should "fix" the above problems.
comment:4 by , 14 months ago
Blocked By: | 18550 added |
---|---|
Description: | modified (diff) |
comment:5 by , 14 months ago
For reference: the READ/WRITE FAULTs also occur in Hyper-V the same as VMware (see the comments of the Hyper-V ticket).
comment:6 by , 14 months ago
Just to make sure, does your native CPU support SSE2? It sounds like maybe you run QEMU/KVM with incompatible emulation flags?
comment:7 by , 14 months ago
SSE2 is required to be present on all x86_64 machines, so that won't be the problem here.
comment:8 by , 14 months ago
Blocked By: | 17233 removed |
---|
comment:9 by , 14 months ago
Right, but what CPU are you running QEMU/kvm with? The default is quite crippled as it is a safe option to do live migrations of VM's so probably has very few features. Here is a bit of info: https://www.qemu.org/docs/master/system/i386/cpu.html
comment:10 by , 14 months ago
All tickets in question were tested against Haiku x86_64, I think, so it doesn't matter what CPU is selected for emulation as in order for Haiku to boot at all, there has to be SSE2.
comment:11 by , 14 months ago
QEMU doesn't enable all features of the CPU unless you do -cpu host. It only does a small subset, so not sure it does sse2. Please try with one of the options that has more than "baseline" x86 cpu.
comment:12 by , 14 months ago
Again, SSE2 is part of the base instruction set for AMD64/x86_64. It's not legal to have a CPU without them, and QEMU handles this correctly. If it didn't, we would get illegal opcode exceptions when userland started, as autovectorization is still enabled there and is used for drawing operations.
But you can also see in the syslogs in #18541 that SSE2 and all sorts of other CPU extensions are advertised, so again, that's not the problem.
comment:13 by , 14 months ago
I just asked if you ran with any of the recommended settings for KVM, I'd expect the non recommended might ones might not work that well.
comment:14 by , 14 months ago
Again, SSE2 is part of the base instruction set for AMD64/x86_64. It's not legal to have a CPU without them, and QEMU handles this correctly.
But that's not the only thing we use. For example, XSAVE may not be supported by the QEMU emulated CPU. It needs a Sandy Bridge machine at least. So, if the tests were made with the default QEMU settings, enabling or disabling XSAVE won't make a difference, all your tests were in fact made without XSAVE.
If, however, you enable a more modern CPU in QEMU, maybe it uses XSAVE.
And this is just one example of a CPU feature. There may be others like this. The question is not just about wether the CPU supports SSE2.
Another question: was this reported to QEMU and other VM developers? Since things work on real hardware, they may want to know about it.
comment:16 by , 14 months ago
I guess that git operations locking the userland (in QEMU/KVM) could be a symptom of the same problem ?
comment:17 by , 14 months ago
I doubt it. That probably deserves its own ticket. Debug reports will be needed to determine whether that is a Git problem or a Haiku problem.
follow-up: 19 comment:18 by , 13 months ago
So, the READ/WRITE FAULT on KDL commands was probably due to stack misalignment in one of the arch_debug functions. That was fixed in hrev57304.
comment:19 by , 12 months ago
Replying to waddlesplash:
So, the READ/WRITE FAULT on KDL commands was probably due to stack misalignment in one of the arch_debug functions. That was fixed in hrev57304.
Does it mean there are currently no issues in VMWare with SSE2+ enabled (and the bug description should be updated)? Or an additional testing is needed? (I can help with one).
follow-up: 21 comment:20 by , 12 months ago
Right now the problems in the ticket have been mitigated by disabling autovectorization optimizations for the whole kernel. So you have to make custom builds to even potentially encounter the problems at the moment.
follow-up: 22 comment:21 by , 12 months ago
Replying to waddlesplash:
Got it. What I meant is if the issue on VMWare has been caused by some other bug (the one fixed in hrev57304) then maybe we were too quick in the assumption about a generic issue with the SSE2+ code on virtual machines. But AFAIU this has not been retested on VMWare after with the hrev57304 fix and SSE2+ enabled, right?
Unfortunately right now I don't have enough free time to learn how to build (and actually build) a custom build myself. Is there other possibility? Like asking someone on the forum who's familiar with building OS to do it and share an SSE2+ enabled Kernel file with me for the tests? Would that work?
comment:22 by , 12 months ago
comment:23 by , 10 months ago
Priority: | blocker → normal |
---|---|
Summary: | Issues in virtual machines related to SSE2+ usage following GCC 13 upgrade → Autovectorization (SSE2+) causes issues in QEMU following GCC 13 upgrade |
Adjusting ticket title and downgrading priority.
comment:24 by , 6 months ago
Milestone: | R1/beta5 → R1/beta6 |
---|
Right now the problems in the ticket have been mitigated by disabling autovectorization optimizations for the whole kernel. So you have to make custom builds to even potentially encounter the problems at the moment.
Since this is mitigated for the moment this can be in beta6
comment:25 by , 6 months ago
Milestone: | R1/beta6 → R1/beta5 |
---|
It would be nice to fix it. The mitigation means we can't use modern CPU instructions in the kernel, with likely performance implications.
It's not a blocker for beta 5 but it would definitely be a good idea to take a closer look.
Note: regular QEMU without KVM doesn't have any issues, it works fine. To my knowledge, note of these problems reproduce on bare metal, either.