Opened 11 years ago
Closed 7 years ago
#10509 closed enhancement (fixed)
Stack is not aligned
Reported by: | pulkomandy | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | x86-64 |
Description
Our stack is aligned only on 8-byte. GCC sometimes generates code that assumes a 16-byte alignment, with instructions such as MOVDQA or MOVAPS. This leads to a "General Protection Fault" crash.
This can be worked around with -mtackrealign, but this makes the code slower. Instead, we should fix all places where our code doesn't guarantee stack alignment.
Attachments (5)
Change History (36)
by , 11 years ago
Attachment: | HaikuLauncher-43356-debug-06-02-2014-11-14-11.report added |
---|
comment:2 by , 11 years ago
Component: | System/Kernel → System |
---|---|
Owner: | changed from | to
by , 11 years ago
Attachment: | HaikuLauncher-50315-debug-06-02-2014-12-20-21.report added |
---|
comment:3 by , 11 years ago
Component: | System → System/Kernel |
---|---|
Owner: | changed from | to
The initial stack for a new thread is set up in the architecture specific kernel function arch_thread_enter_userspace()
(x86, x86-64). All that has to be done is align the value of stackTop
to 16 and offset it as necessary (note, the stack grows downward, so the address must only be decreased).
I considered doing that already, but it hadn't been clear to me what that offset should be. According to the crash report the movaps
instruction tries to access 0xffffff48(%ebp), which, I suppose, means that for x86 ebp (the stack frame pointer) should be offset by 8, which means that for the initial stack pointer (esp) esp % 16 == 12
should hold (due to ebp = esp - 4
). The main()
prologue gcc generates seems to agree:
1ad5: 8d 4c 24 04 lea 0x4(%esp),%ecx 1ad9: 83 e4 f0 and $0xfffffff0,%esp 1adc: ff 71 fc pushl -0x4(%ecx) 1adf: 55 push %ebp 1ae0: 89 e5 mov %esp,%ebp
I think the best way to implement that is by having arch_randomize_stack_pointer()
enforce that property and change all callers (namely arch_thread_enter_userspace()
and get_signal_stack()
) to supply an address that has already been adjusted to make the needed room on the stack.
comment:4 by , 11 years ago
Milestone: | Unscheduled → R1/alpha5 |
---|
comment:5 by , 11 years ago
Sorry, I'm not too familiar with this code. Modifying arch_randomize_stack_pointer
is ok. However, in arch_thread_enter_userspace
, the stackTop is decremented by12 already, as the "args" are pushed on the stack. Wouldn't that break the alignment?
In the current code, it means we get stackTop % 16 == 4
, and modifying arch_randomize_stack_pointer
to substract 4 from the aligned value would lead to stackTop % 16 == 0
after the push. Still not what we want. If I change arch_randomize_stack_pointer
so the alignment gets correct after the push, then get_signal_stack, which doesn't appear to push anything, would be misaligned.
I'm not sure how to adjust the stack size, either. Callers use thread->signal_stack_size
and thread->signal_stack_base
, and the respective user_
fields. These are allocated in create_thread_user_stack
, and may even be provided by the user. Is it needed to increase the stack size there, or is it acceptable to lose some bytes because of the alignment without adjusting it?
comment:6 by , 11 years ago
The stack size doesn't have to be adjusted. I'm attaching a quick patch. It's untested and possibly broken, but I hope it clarifies what I mean.
comment:7 by , 11 years ago
Ok, I ran the WebKit testsuite with this, and it solves the JS crashes I was getting. Applied the patch in hrev46886. Thanks for doing it!
I don't know if x86_64 needs the same change, and if the offset is the same. Leaving the ticket open until this is clarified.
follow-up: 10 comment:8 by , 11 years ago
Given that the 64 bit code looks very similar to the 32 bit code, I think it needs to be adjusted analogously. I don't know about the stack offset either. Should be verified with code that requires the alignment.
On a related note, we may also want to check whether we do/should align the kernel stacks. While the alignment shouldn't be strictly required, doing that may improve the performance.
comment:9 by , 11 years ago
Fortunately, System V ABI for x86_64 states explicitly what the alignment requirement is:
The end of the input argument area shall be aligned on a 16 byte boundary.
Apparently, that's exactly the same requirement we have for i386 (well, that's hardly surprising) and means that we need to subtract 8 from the initial stack pointer returned by arch_randomize_stack_pointer
.
comment:10 by , 11 years ago
Replying to bonefish:
Should be verified with code that requires the alignment.
Mesa has such a requirement on x86_64.
comment:11 by , 11 years ago
patch: | 0 → 1 |
---|
comment:12 by , 11 years ago
Here is a patch for x86_64. It doesn't seem to break anything, but unfortunately I couldn't check if things are better with webkit. I thought Mesa needed such an alignment but I'm not sure anymore it's the reason for the crash I met.
comment:13 by , 11 years ago
I applied the patch and tried LLVM again, but LLVM is still crashing due to unaligned stack access using movdqa. ls -l /system/kernel_x86_64 shows a date that is after I applied the patch, so I'm assuming I replaced the kernel correctly.
comment:14 by , 11 years ago
Oh, forgot to mention: Our debugger shows an %rsp that ends in 0x8 and the movdqa stores its value in (%rsp). When I look at the %rsp in _start, it ends in 0x0, according to the debugger. IMHO it should be the other way around: When you enter a function, the %rsp should end in 0x8. The first thing the prologue will do is pushing %rbq, so that after the push the alignment will end 0x0 and thus be correct.
comment:15 by , 11 years ago
Milestone: | R1/alpha5 → Unscheduled |
---|---|
Platform: | x86 → x86-64 |
The issue is fixed for x86; this is the only supported arch for R1; moving out of alpha5 milestone.
by , 9 years ago
Attachment: | 0002-x86_64-Glue-code-Keep-stack-16-byte-aligned.patch added |
---|
Glue code should not be un-aligning stack
comment:16 by , 9 years ago
Trying to use a libroot.so
built with debugging information on x86_64 will reliably reproduce this issue with a general protection fault in runtime_loader
. The victim is memset_sse, whose first line creates a variable on the stack that must be 16-byte-aligned. (When compiled with -O2
gcc places this variable in a register, so a non-debug build works.)
Applying korli's patch helps but doesn't resolve the issue, since the debugger shows the stack becomes misaligned again once runtime_loader
calls the initialization routine of the image it loads from disk.
This is because the glue code for x86_64 upsets the stack alignment by pushing a single register onto the stack before calling initialization code that eventually calls into libroot.
I've added a patch that modifies the glue code to keep the stack 16-byte-aligned. With korli's patch and this one applied the issue seems to be completely resolved; I can produce a stable nightly image (and a fully debuggable libroot.so
) though the packages the build system downloads, notably WebKit and the OpenGL renderer, will need to be rebuilt with the new glue code before everything works normally again.
comment:17 by , 9 years ago
Good analysis so far, your patch applied in hrev49731. The other patch is a bit dated, I'll have a look again before applying. I'll also try your hint at reproducing the issue.
comment:18 by , 9 years ago
I could effectively reproduce with a libroot.so with debug info on a Haiku version without the glue code patch. But once the glue code patch is applied, I get no crash, whereas the kernel patch isn't yet applied. Any ideas on what could trigger the crash without the kernel patch?
comment:19 by , 9 years ago
Not off the top of my head. I tested only with both patches applied. Maybe the glue code was the issue all along.
I'll drop your patch from my local commits and build again and see if I can find another way to trigger this.
comment:20 by , 9 years ago
So far I've failed to reproduce this problem with (only) the glue-code patch applied.
Building as much of the system as possible with debugging enabled, in the hopes of forcing greater reliance on the stack, didn't have any effect. Neither did compiling libroot with -O3 -msse
to try to get gcc to output more SSE instructions.
The result is always the same: Libraries built with the new glue code run fine; libraries built with the old cause a GPF.
I think the glue-code fix might actually have resolved this.
comment:23 by , 8 years ago
I think there still is a problem with 32-bit as well.
Simple test program:
#include <stdio.h> void pa(void* a) { printf("%x\n", a); } int main(void) { int __attribute__((aligned(16))) foo; pa(&foo); return 0; }
The address for foo should end in "0" (aligned to 16 bytes). But when built with gcc2, it ends with 4.
The code generated:
main: pushl %ebp movl %esp,%ebp subl $20,%esp pushl %ebx call .L7 .L7: popl %ebx addl $_GLOBAL_OFFSET_TABLE_+[.-.L7],%ebx addl $-12,%esp leal -4(%ebp),%eax pushl %eax call pa@PLT addl $16,%esp xorl %eax,%eax jmp .L5 .align 4 .L5: movl -24(%ebp),%ebx movl %ebp,%esp popl %ebp ret
With gcc5, the pointer is properly aligned. However, this seems to be gcc own's feature:
main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp pushl %ebx pushl %ecx subl $16, %esp call __x86.get_pc_thunk.ax addl $_GLOBAL_OFFSET_TABLE_, %eax subl $12, %esp leal -24(%ebp), %edx pushl %edx movl %eax, %ebx call pa@PLT addl $16, %esp movl $0, %eax leal -8(%ebp), %esp popl %ecx popl %ebx popl %ebp leal -4(%ecx), %esp ret
The second instruction realigns the stack, no matter its initial aligment.
This "attribute aligned" is also what ffmpeg uses to detect if the compiler and runtime properly aligns stack variables. I have modified the stack allocation so that the simple test has a properly aligned result, but then ffmpeg still complained that its variables were misaligned. So there must be something else at play.
Possibly relevant info: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3299
comment:24 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:25 by , 7 years ago
We are still not quite there even on 32-bit x86. Here is a revised test program:
#include <stdio.h> #include <stdlib.h> #include <signal.h> #include <OS.h> #define ALIGNED __attribute__((aligned(16))) //#define ALIGNED void pa(const char* from, void* a) { if ((uintptr_t)a & 0xF) printf("In %s, stack is NOT aligned: %x\n", from, a); else printf("In %s, stack is aligned!\n", from); } int function(void) { int ALIGNED foo; pa("function", &foo); return 0; } status_t thread(void* arg) { int ALIGNED foo; pa("thread", &foo); return B_OK; } void handler(int param) { int ALIGNED foo; pa("signal", &foo); } int main(void) { // Test from main int ALIGNED foo; pa("main", &foo); // Test from called function function(); // Test from thread { status_t rv; wait_for_thread(spawn_thread(thread, "test", B_NORMAL_PRIORITY, NULL), &rv); } // Test from signal handler { stack_t signalStack; struct sigaction action; signalStack.ss_sp = malloc(SIGSTKSZ); signalStack.ss_flags = 0; signalStack.ss_size = SIGSTKSZ; sigaltstack(&signalStack, NULL); action.sa_handler = handler; action.sa_mask = 0; action.sa_flags = SA_ONSTACK; sigaction(SIGUSR1, &action, NULL); kill(getpid(), SIGUSR1); } return 0; }
Results on a current system (the _noattr ones do not use attribute((aligned))):
[x86_gcc2] ~# ./gcc2_noattr In main, stack is NOT aligned: 72873464 In function, stack is NOT aligned: 72873424 In thread, stack is NOT aligned: 71820b44 In signal, stack is NOT aligned: 18502a34 [x86_gcc2] ~# ./gcc2-attr In main, stack is NOT aligned: 71b0f314 In function, stack is NOT aligned: 71b0f2d4 In thread, stack is NOT aligned: 722483e4 In signal, stack is NOT aligned: 183ecae4 [x86_gcc2] ~# ./gcc5_noattr In main, stack is NOT aligned: 728cdf6c In function, stack is NOT aligned: 728cdf2c In thread, stack is NOT aligned: 70eceb1c In signal, stack is NOT aligned: 185c0d3c [x86_gcc2] ~# ./gcc5-attr In main, stack is aligned! In function, stack is aligned! In thread, stack is aligned! In signal, stack is aligned!
So, this works only for gcc5, and only if asking the compiler to realign the stack.
I then made the following change:
--- a/src/system/kernel/arch/x86/32/thread.cpp +++ b/src/system/kernel/arch/x86/32/thread.cpp @@ -114,8 +114,8 @@ arch_randomize_stack_pointer(addr_t value) { STATIC_ASSERT(MAX_RANDOM_VALUE >= B_PAGE_SIZE - 1); value -= random_value() & (B_PAGE_SIZE - 1); - return (value & ~addr_t(0xf)) - 4; - // This means, result % 16 == 12, which is what esp should adhere to + return (value & ~addr_t(0xf)) - 8; + // This means, result % 16 == 8, which is what esp should adhere to // when a function is entered for the stack to be considered aligned to // 16 byte. }
Results:
/Haiku/home> ./gcc2-attr In main, stack is aligned! In function, stack is aligned! In thread, stack is aligned! In signal, stack is aligned! /Haiku/home> ./gcc2_noattr In main, stack is aligned! In function, stack is aligned! In thread, stack is aligned! In signal, stack is aligned! /Haiku/home> ./gcc5-attr In main, stack is aligned! In function, stack is aligned! In thread, stack is NOT aligned: 71f874ac In signal, stack is NOT aligned: 187c98fc /Haiku/home> ./gcc5_noattr In main, stack is NOT aligned: 72ad2e2c In function, stack is NOT aligned: 72ad2dec In thread, stack is NOT aligned: 70e06e98 In signal, stack is NOT aligned: 18fd6d98
So, this fixed the problem for gcc2 in all cases. However, gcc5 is now broken, even when asking the compiler to align the variables explicitly (for threads and signal stacks).
This seems related to the issue Simon South patched for x86_64: the 32-bit crti.S is pushing only 12 bytes to the stack (EBP, 8(EBP), then return address) before calling the next function. So, it breaks the stack alignment. Trying to fix this by further offseting the initial stack pointer fixes the main thread, but then other threads and sinal handlers become misaligned.
I tried to fix crti.S by pushing an extra value to the stack, but then launch_daemon crashed when I tried to boot the resulting image. I must have missed something?
comment:26 by , 7 years ago
There are two separate issues of alignment here:
1) The alignment of local variables on the stack. The default alignment for the various data types is defined by the compiler's ABI. If a greater alignment is required, it must be specified manually with e.g. the "aligned" attribute. Note that the "aligned" attribute is broken on gcc2 for stack variables. That's e.g. why ffmpeg always complains. It was only fixed in later gcc versions, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 I once looked into backporting that support to gcc2, but gave up quickly, the amount of code difference is large.
The compiler is free to layout the local variables on the stack the way it wants to. There is no determinism where the first local variable of a function will end up, and in any way, when the function prologue was run, things like the previous EBP will already be on the stack anyway (moving the stack pointer). This means that PulkoMandy's test program does not show anything's wrong: it only checks if a local variable is aligned to 16. But this usually won't be the case unless you request it explicitly with the "aligned" attribute - that's not a bug.
Here's a version of the test program that will actually check the stack pointer value, and for me at least (on 32-bit), it reports everything is fine:
#include <stdio.h> #include <stdlib.h> #include <signal.h> #include <OS.h> uint32 gStackPointer; #define CHECK_STACK_ALIGN(from) \ asm volatile ("mov %%esp, %0" : "=r" (gStackPointer) :); \ if (gStackPointer & 0xF) \ printf("In %s, stack is NOT aligned: %x\n", from, gStackPointer); \ else \ printf("In %s, stack is aligned!\n", from); int function(void) { CHECK_STACK_ALIGN("function"); return 0; } status_t thread(void* arg) { CHECK_STACK_ALIGN("thread"); return B_OK; } void handler(int param) { CHECK_STACK_ALIGN("signal"); } int main(void) { CHECK_STACK_ALIGN("main"); // Test from called function function(); // Test from thread { status_t rv; wait_for_thread(spawn_thread(thread, "test", B_NORMAL_PRIORITY, NULL), &rv); } // Test from signal handler { stack_t signalStack; struct sigaction action; signalStack.ss_sp = malloc(SIGSTKSZ); signalStack.ss_flags = 0; signalStack.ss_size = SIGSTKSZ; sigaltstack(&signalStack, NULL); action.sa_handler = handler; action.sa_mask = 0; action.sa_flags = SA_ONSTACK; sigaction(SIGUSR1, &action, NULL); kill(getpid(), SIGUSR1); } return 0; }
2) The alignment of the stack pointer that is initially passed off to code compiled by gcc. The patch made by Ingo is correct IMO as it creates an alignment just like gcc itself does it in the main() prologue. Changing the "-4" to "-8" would mess up the alignment.
comment:27 by , 7 years ago
The attempt to patch crti.S (does not work, launch daemon dereferences address 0x8 or 0x10 at boot):
--- a/src/system/glue/arch/x86/crti.S +++ b/src/system/glue/arch/x86/crti.S @@ -23,6 +23,7 @@ FUNCTION(_init): pushl %ebp movl %esp, %ebp + sub 4,%esp // Keep stack aligned pushl 8(%ebp) // put image ID on the stack again call __haiku_init_before // crtbegin.o stuff comes here
I find that crti.S is worrying because it will perform the following operations:
- pushl, adds 4 bytes to the stack
- pushl, adds 4 bytes to the stack
- call, adds 4 bytes to the stack (return value)
So, this code pushes 12 bytes to the stack, and does not preserve stack alignment. However, gcc5 main prologue (I already listed that in an above comment) will compensate for this, and realign the stack. This would mean the stack is only aligned after entering main().
And indeed we can add this to the test program:
struct Foo { Foo() { CHECK_STACK_ALIGN("init"); } }; Foo init;
With this result:
In init, stack is NOT aligned: 72b40ac4 In main, stack is aligned! In function, stack is aligned! In thread, stack is aligned! In signal, stack is aligned!
(with both gcc2 and gcc5).
I conclude that:
- The alignment in randomize_stack_pointer is indeed correct and should not be changed (it works for signal stack and non-main threads)
- crti.S breaks the alignment and should be fixed
- we don't notice this too badly, because gcc will realign the stack when entering the main function, but global constructors (and possibly global destructors) are affected.
comment:28 by , 7 years ago
Made the change to crti.S in hrev51664. I had missed a $ in the code above to mark the 4 as immediate. This fixes alignment for init and fini code.
Note that this code is inlined in compiled executables, so all executables which need an aligned stack in these places should be recompiled.
comment:29 by , 7 years ago
patch: | 1 → 0 |
---|
comment:30 by , 7 years ago
patch: | 0 |
---|
Remaining patch migrated to Gerrit: https://review.haiku-os.org/39
comment:31 by , 7 years ago
patch: | → 0 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Patch is merged in hrev51774. Only the signal stack was still misaligned.
Crash report.