#13744 closed bug (fixed)
Kernel PANIC stack fault exception from a userland C++ app
Reported by: | phoudoin | Owned by: | nobody |
---|---|---|---|
Priority: | critical | Milestone: | Unscheduled |
Component: | System/Kernel | Version: | R1/Development |
Keywords: | kdl ss cpp | Cc: | |
Blocked By: | Blocking: | ||
Platform: | x86-64 |
Description
Consider stack_fault.c source file:
#include <sys/select.h> int main(int argc, char* argv[]) { fd_set errorSet; // let's trigger a stack segment fault FD_SET(-1, &errorSet); return 0; }
When compiled as a C module:
gcc -o stack_fault_c stack_fault.c
running stack_fault_c would crash - as expected - on a segmentation fault.
But when compiled as a C++ module:
g++ -o stack_fault_cpp stack_fault.c
running stack_fault_cpp trigger a KDL PANIC, which is neither expected or desirable. Screenshot attached.
Running Haiku x86_64 hrev51474 under VirtualBox, 2 vCPU, 4GB of RAM.
Attachments (2)
Change History (18)
by , 7 years ago
Attachment: | VirtualBox_Haiku x86_64_20_10_2017_16_38_24.png added |
---|
comment:1 by , 7 years ago
patch: | 0 → 1 |
---|
comment:2 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 7 years ago
FD_SET is just accesing an array. So the code is roughly equivalent to:
int main(int argc, char* argv[]) { int foo[1]; // let's trigger a stack segment fault foo[-1] |= 1; return 0; }
This will overwrite whatever is on the stack after the foo array (because the stack grows downwards). We can't make this a systematic segmentation fault, because the application is allowed to write to the stack and the array is not necessarily on a page boundary (there may be other local variables after it). What happens once the stack is corrupt is unspecified, but to understand how we end up in KDL we will have to disassemble the code and go through it step by step to find out what gets corrupt.
It seems the kernel does detect the problem but is not sure what to do (crash/kill the team, I guess?). Or maybe some important data structure stored near the stack was erased?
comment:5 by , 7 years ago
patch: | 1 → 0 |
---|
comment:6 by , 7 years ago
Due to varying signedness of the various operands in the macros, it will actually end up accessing a really large memory address, so the "foo[-1]" doesn't work to reproduce.
Here's a minimal example that reliably reproduces this KDL on x64-Haiku for me:
#include <SupportDefs.h> int main() { int i[1]; uint64 offset = 0x10000000000; i[offset] = 1; return 0; }
comment:7 by , 7 years ago
Then it's just writing at essentially a random address. If there is nothing mapped there it will segfault, otherwise it will corrupt some other data. So we need to identify what's at that address and how corrupting it puts us into KDL.
comment:8 by , 7 years ago
Ok, looking a bit further, this is a problem with our x86-64 exception handling, which does not properly handle the Stack-Fault (#SS) exception.
x86-64 has 64-bit addresses, but in practice, only 48 of these bits are really usable, the rest of the address bits must be a copy of bit 47. This is the "canonical address format". Violating this rule will generate either a #GP or an #SS exception. Quoting from the Intel Architecture manual, vol. 1 ch. 3.3.7.1:
The first implementation of IA-32 processors with Intel 64 architecture supports a 48-bit linear address. This means a canonical address must have bits 63 through 48 set to zeros or ones (depending on whether bit 47 is a zero or one). [...] If a linear-memory reference is not in canonical form, the implementation should generate an exception. In most cases, a general-protection exception (#GP) is generated. However, in the case of explicit or implied stack references, a stack fault (#SS) is generated.
We handle the case of #GP exceptions just fine. However, what the code in this ticket does is referencing the stack (by doing an address calculation involving ebp), so it generates an #SS exception. Currently, we handle that one with x86_fatal_exception()
, which triggers KDL.
So what we need is to properly handle the #SS exception to only terminate the application in such cases.
by , 7 years ago
Attachment: | handle-stack-fault.diff added |
---|
comment:9 by , 7 years ago
patch: | 0 → 1 |
---|
comment:10 by , 7 years ago
patch: | 1 → 0 |
---|
Attached a patch. Fixes the bug - comments welcome. I'm not sure if the signal code is the right one, but it seems that none of the POSIX signals really fits here, so I tried to make a reasonable choice.
comment:11 by , 7 years ago
Linux seems to use SIGBUS. "Access to an undefined portion of a memory object." according to http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html. Ends with "Abnormal termination of the process".
comment:14 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Applied in hrev51511.
p.s. oops, forgot to set git correct user/mail on new machine prior to commit...
comment:15 by , 7 years ago
Any idea why the same code KDL when compiled as c++ code but not as C code? Something different in the compiler generated code I guess...
comment:16 by , 7 years ago
Yup, if you look at the disassembly of both, gcc generates quite different code. In C language mode, the crash-inducing instruction doesn't contain a stack reference, so it generates a general protection fault instead of a stack fault.
KDL Panic dump