Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#13744 closed bug (fixed)

Kernel PANIC stack fault exception from a userland C++ app

Reported by: phoudoin Owned by: nobody
Priority: critical Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: kdl ss cpp Cc:
Blocked By: Blocking:
Platform: x86-64


Consider stack_fault.c source file:

#include <sys/select.h>

int main(int argc, char* argv[])
  fd_set errorSet;

  // let's trigger a stack segment fault
  FD_SET(-1, &errorSet);
  return 0;

When compiled as a C module:

gcc -o stack_fault_c stack_fault.c

running stack_fault_c would crash - as expected - on a segmentation fault.

But when compiled as a C++ module:

g++ -o stack_fault_cpp stack_fault.c

running stack_fault_cpp trigger a KDL PANIC, which is neither expected or desirable. Screenshot attached.

Running Haiku x86_64 hrev51474 under VirtualBox, 2 vCPU, 4GB of RAM.

Attachments (2)

VirtualBox_Haiku x86_64_20_10_2017_16_38_24.png (121.9 KB ) - added by phoudoin 4 years ago.
KDL Panic dump
handle-stack-fault.diff (1.8 KB ) - added by jua 4 years ago.

Download all attachments as: .zip

Change History (18)

by phoudoin, 4 years ago

KDL Panic dump

comment:1 by phoudoin, 4 years ago

patch: 01

comment:2 by phoudoin, 4 years ago

Owner: changed from nobody to korli
Status: newassigned

comment:3 by korli, 4 years ago

Owner: changed from korli to nobody

Please don't change the default assignee.

comment:4 by pulkomandy, 4 years ago

FD_SET is just accesing an array. So the code is roughly equivalent to:

int main(int argc, char* argv[])
  int foo[1];

  // let's trigger a stack segment fault
  foo[-1] |= 1;
  return 0;

This will overwrite whatever is on the stack after the foo array (because the stack grows downwards). We can't make this a systematic segmentation fault, because the application is allowed to write to the stack and the array is not necessarily on a page boundary (there may be other local variables after it). What happens once the stack is corrupt is unspecified, but to understand how we end up in KDL we will have to disassemble the code and go through it step by step to find out what gets corrupt.

It seems the kernel does detect the problem but is not sure what to do (crash/kill the team, I guess?). Or maybe some important data structure stored near the stack was erased?

comment:5 by pulkomandy, 4 years ago

patch: 10

comment:6 by jua, 4 years ago

Due to varying signedness of the various operands in the macros, it will actually end up accessing a really large memory address, so the "foo[-1]" doesn't work to reproduce.

Here's a minimal example that reliably reproduces this KDL on x64-Haiku for me:

#include <SupportDefs.h>

int main()
    int i[1];
    uint64 offset = 0x10000000000;
    i[offset] = 1;
    return 0;

comment:7 by pulkomandy, 4 years ago

Then it's just writing at essentially a random address. If there is nothing mapped there it will segfault, otherwise it will corrupt some other data. So we need to identify what's at that address and how corrupting it puts us into KDL.

comment:8 by jua, 4 years ago

Ok, looking a bit further, this is a problem with our x86-64 exception handling, which does not properly handle the Stack-Fault (#SS) exception.

x86-64 has 64-bit addresses, but in practice, only 48 of these bits are really usable, the rest of the address bits must be a copy of bit 47. This is the "canonical address format". Violating this rule will generate either a #GP or an #SS exception. Quoting from the Intel Architecture manual, vol. 1 ch.

The first implementation of IA-32 processors with Intel 64 architecture supports a 48-bit linear address. This means a canonical address must have bits 63 through 48 set to zeros or ones (depending on whether bit 47 is a zero or one). [...] If a linear-memory reference is not in canonical form, the implementation should generate an exception. In most cases, a general-protection exception (#GP) is generated. However, in the case of explicit or implied stack references, a stack fault (#SS) is generated.

We handle the case of #GP exceptions just fine. However, what the code in this ticket does is referencing the stack (by doing an address calculation involving ebp), so it generates an #SS exception. Currently, we handle that one with x86_fatal_exception(), which triggers KDL.

So what we need is to properly handle the #SS exception to only terminate the application in such cases.

by jua, 4 years ago

Attachment: handle-stack-fault.diff added

comment:9 by jua, 4 years ago

patch: 01

comment:10 by jua, 4 years ago

patch: 10

Attached a patch. Fixes the bug - comments welcome. I'm not sure if the signal code is the right one, but it seems that none of the POSIX signals really fits here, so I tried to make a reasonable choice.

Last edited 4 years ago by jua (previous) (diff)

comment:11 by korli, 4 years ago

Linux seems to use SIGBUS. "Access to an undefined portion of a memory object." according to Ends with "Abnormal termination of the process".

comment:12 by jua, 4 years ago

Ok, FreeBSD uses that too, so that's a better choice.

comment:13 by phoudoin, 4 years ago

So, SIGBUS, BUS_ADRERR, then? Otherwise, the patch looks fine.

Last edited 4 years ago by phoudoin (previous) (diff)

comment:14 by jua, 4 years ago

Resolution: fixed
Status: assignedclosed

Applied in hrev51511.

p.s. oops, forgot to set git correct user/mail on new machine prior to commit...

comment:15 by phoudoin, 4 years ago

Any idea why the same code KDL when compiled as c++ code but not as C code? Something different in the compiler generated code I guess...

comment:16 by jua, 4 years ago

Yup, if you look at the disassembly of both, gcc generates quite different code. In C language mode, the crash-inducing instruction doesn't contain a stack reference, so it generates a general protection fault instead of a stack fault.

Note: See TracTickets for help on using tickets.