Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#16337 closed bug (invalid)

Segment violation on x86_64 in bbitmap

Reported by: bbjimmy Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: - General Version: R1/beta2
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by diver)

load an image to a bitmap and check its size errors with a segment violation ... only on x86_64 haiku.

hrev54397 x86_64 using yab1.7.8 execute msize.yab results in a segment violation.

hrev54397 x86_gcc2 execute msize.yab and there is no segment violation.

Attachments (3)

msize.yab (752 bytes ) - added by bbjimmy 4 years ago.
yab script using bitmap
msize-12328-debug-02-07-2020-13-40-10.report (13.7 KB ) - added by bbjimmy 4 years ago.
debug report
msize-12350-debug-02-07-2020-13-41-00.report (13.4 KB ) - added by bbjimmy 4 years ago.
LD_PRELOAD=libroot_debug.so MALLOC_DEBUG=grs25 msize

Download all attachments as: .zip

Change History (12)

by bbjimmy, 4 years ago

Attachment: msize.yab added

yab script using bitmap

comment:1 by mmlr, 4 years ago

Please include a debug report directly. This may be a yab issue and the platform difference could boil down to differences in the memory layout. Please run the application from a terminal with the guarded heap and attach a debug report if it generates a fault on x86_gcc2 as well:

LD_PRELOAD=libroot_debug.so MALLOC_DEBUG=grs25 program...

comment:2 by diver, 4 years ago

Description: modified (diff)
Summary: segment violation on x86_s4 with bbitmapSegment violation on x86_64 in bbitmap

by bbjimmy, 4 years ago

debug report

by bbjimmy, 4 years ago

LD_PRELOAD=libroot_debug.so MALLOC_DEBUG=grs25 msize

comment:3 by bbjimmy, 4 years ago

no error on x86_gcc2

comment:4 by mmlr, 4 years ago

Resolution: invalid
Status: newclosed

This isn't a Haiku bug but one in yab. As the stack trace suggests, it ends up in YabInterface::BitmapSave() where it shouldn't be at all as it is only supposed to load an image. The reason is a missing return statement in yi_BitmapLoad which leads to undefined behaviour. In this specific case, the version built for x86_64 misses the final stack pop and ret, so after the call to YabInterface::BitmapLoad() it simply continues running. The next function happens to be YabInterface::BitmapSave() which is then run with a bogus stack and crashes.

The compiler does point the issue out as a warning (even by default) but, as the function called may or may not actually return, it can't simply treat it as an error outright. In almost all cases you really do want to heed those warnings though and one should IMHO always strive for "-Wall -Wextra -Werror" to have such mistakes break the build instead of getting subtly broken binaries. The codebase has more of these return value and other issues (like calling "sprintf(stderr, ...)" which probably should have been "fprintf(stderr, ...)" instead) that would've been caught this way.

comment:5 by bbjimmy, 4 years ago

This does not explain the reason that the same code worked on previous versions of haiku and is the same code is used and works on x86_gcc2. As for our BitmapLoad, it completes prtoperly and there is no way it should flow into the two functions that follow it in the source. (BitmapSave follows BitmapGet)

comment:6 by bbjimmy, 4 years ago

Resolution: invalid
Status: closedreopened

I re-compiled the same yab source code on haiku-r1beta1-x86_64 and it works just fine. This is not a yab problem.

Last edited 4 years ago by bbjimmy (previous) (diff)

comment:7 by X512, 4 years ago

Fact that Yab is working in previous Haiku versions do not always means that it is Haiku fault. Yab can cause undefined behavior that worked fine before by accident.

comment:8 by waddlesplash, 4 years ago

Resolution: invalid
Status: reopenedclosed

mmlr pointed out above exactly what the problem is:

The reason is a missing return statement in yi_BitmapLoad which leads to undefined behaviour.

R1/beta1 may work due to its use of a different compiler. That does not change the fact that the code is wrong.

comment:9 by mmlr, 4 years ago

I've spent a fair amount of time on this and even pointed out where exactly the problem in this specific instance is. I understand that this is getting somewhat low-level, but I tried to explain that and how yab is buggy in this case and pointed out an easy way to find and fix this and some similar issues. Just dismissing that and continuing to blame the operating system isn't going to make this go away, only fixing the code will.

I don't have enough personal investment in yab to fix these issues myself, but I know that it is a used by others and consider it a nice thing to have. I'll therefore try to explain the problem some more in the hope that it leads to fixes in yab that benefit the broader community.

The fundamental problem is this: Having control flow reach the end of a non-void function without a return statement results in undefined behaviour. In such cases pretty much anything is allowed to happen, including for the compiler to not produce a function epilogue at all. The epilogue is the part that restores registers and the stack to where it needs to be and actually returns the control flow to the calling function. When it's missing, the function will quite literally not return, i.e. it will continue to run code adjacent to its code. The layout of the source code is not relevant, only the layout of the produced binary is and that one can vary depending on the toolchain, compiler flags, used libraries, etc and should be considered unpredictable.

The yab package currently available for x86_64 as well as one I built locally (using gcc-8.3.0_2019_05_24-7-x86_64.hpkg for reference) shows exactly this missing epilogue. This is what a disassembly of the relevant part looks like (produced by objdump --demangle --disassemble-all /boot/system/lib/libyab_1.7.8.so):

0000000000061ac4 <yi_BitmapLoad>:
   61ac4:       55                      push   %rbp
   61ac5:       48 89 e5                mov    %rsp,%rbp
   61ac8:       48 89 f8                mov    %rdi,%rax
   61acb:       48 89 d7                mov    %rdx,%rdi
   61ace:       48 89 f2                mov    %rsi,%rdx
   61ad1:       48 89 c6                mov    %rax,%rsi
   61ad4:       e8 87 03 fe ff          callq  41e60 <YabInterface::BitmapLoad(char const*, char const*)@plt>
   61ad9:       90                      nop

0000000000061ada <YabInterface::BitmapSave(char const*, char const*, char const*)>:
   61ada:       55                      push   %rbp
   61adb:       48 89 e5                mov    %rsp,%rbp
   61ade:       41 57                   push   %r15
   61ae0:       41 56                   push   %r14
...

It shows that no epilogue was produced, i.e. no final stack pop and no ret instruction. Therefore the program will simply continue to run and in this case it happens to have YabInterface::BitmapSave directly following which it then executes. That function has different arguments and therefore expects a different stack layout and register values and then crashes.

Compare this with a similar function that isn't missing the return statement:

0000000000061cec <yi_BitmapSave>:
   61cec:       55                      push   %rbp
   61ced:       48 89 e5                mov    %rsp,%rbp
   61cf0:       48 89 f8                mov    %rdi,%rax
   61cf3:       48 89 cf                mov    %rcx,%rdi
   61cf6:       48 89 d1                mov    %rdx,%rcx
   61cf9:       48 89 f2                mov    %rsi,%rdx
   61cfc:       48 89 c6                mov    %rax,%rsi
   61cff:       e8 1c d8 fd ff          callq  3f520 <YabInterface::BitmapSave(char const*, char const*, char const*)@plt>
   61d04:       5d                      pop    %rbp
   61d05:       c3                      retq   

As I already said above, the compiler warns about such errors. It only doesn't fail compilation because it cannot know for sure that undefined behaviour will actually occur (the inner function could for example always exit by throwing an exception or actually be meant to execute an infinite loop). But in almost all cases they are indeed just errors and they will lead to hard to understand bugs. So ignoring these compilation warnings because "they are just warnings" is not an option.

Recompiling in an even slightly different environment may alter what happens. It is possible that the layout of the binary changes so that a function happens to follow that wouldn't crash with the given stack. Obviously it's always problematic to run functions that weren't intended to run. Even if it doesn't crash, it will possibly have dangerous side effects (what if it was a function to delete some files for example?) and it most certainly will mangle the return value so the rest of the program will run with bogus data.

So my advice to build yab with -Wall -Wextra -Werror and fixing anything it points out stands. It will fix some bugs that may have been hidden by pure chance so far (or lead to so far unexplained strange behaviour) and it will increase code quality. You can always disable some of these warnings if you explicitly don't want to change the code (-Wno-unused-variable or -Wno-unused-parameter for example if you don't want to remove them). But even the "misleading indent" ones are sometimes right and even if they aren't, adjusting the code style doesn't really hurt. A quick glance showed at least one instance of a misleading indent warning that indeed looked wrong.

Note: See TracTickets for help on using tickets.