Opened 7 years ago

Closed 7 years ago

#13111 closed bug (fixed)

Possible execve bug in Haiku

Reported by: miqlas Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: System/POSIX Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by bonefish)

Noticed it in fish shell, below the bugreport:

https://github.com/fish-shell/fish-shell/issues/3645

Komment from ridiculousfish:

krader is right that execve is allocating memory, and I would tentatively argue that this is a bug in Haiku. In particular POSIX requires that execve is async-signal safe, which means it may be called from a signal handler, and thus must not call malloc internally.

Original bugreport, for the history:

Reproduction steps

write characters into your Terminal press Ctrl-C Repeat .... Profit! Fish hangs after Ctrl-C and aren't accept any input anymore, so you need to kill it. I never had any problems like this with the 2.3.x releases. (if i remember correctly.)

Attachments (5)

fish_debug_1.png (176.9 KB ) - added by miqlas 7 years ago.
Debugger-1
fish_debug_2.png (180.5 KB ) - added by miqlas 7 years ago.
Debugger-2
fork-heap-locks-handling.patch (4.6 KB ) - added by korli 7 years ago.
patch without pthread_atfork
fork-heap-locks-handling-2.patch (3.9 KB ) - added by korli 7 years ago.
path with pthread_atfork
fork-heap-locks-handling-3.patch (7.5 KB ) - added by korli 7 years ago.
updated patch

Download all attachments as: .zip

Change History (18)

by miqlas, 7 years ago

Attachment: fish_debug_1.png added

Debugger-1

by miqlas, 7 years ago

Attachment: fish_debug_2.png added

Debugger-2

comment:1 by korli, 7 years ago

Component: - GeneralSystem/POSIX

comment:2 by korli, 7 years ago

Owner: changed from nobody to axeld
Status: newassigned

Axel might have a clue.

comment:3 by bonefish, 7 years ago

Description: modified (diff)

Our malloc() implementation is actually async signal safe, by way of deferring signal delivery. I suspect the issue in this case is rather that malloc() isn't fork() safe in an multi-threaded program. If one thread calls fork() while another is currently in malloc() (or free(),...) the child process will inherit a locked mutex.

The solution is to register respective atfork() hooks for the memory allocator. They would acquire the mutex pre-fork() and release it post-fork(), thus ensuring that the child finds the memory allocator in a usable state.

in reply to:  3 comment:4 by korli, 7 years ago

Replying to bonefish:

The solution is to register respective atfork() hooks for the memory allocator. They would acquire the mutex pre-fork() and release it post-fork(), thus ensuring that the child finds the memory allocator in a usable state.

It seems to already be a TODO of yours from 2005 :) http://cgit.haiku-os.org/haiku/tree/src/system/libroot/posix/malloc/arch-specific.cpp#n115 http://cgit.haiku-os.org/haiku/commit/?id=faed617707da8b09adc50f3dc22642f7d3d8a12f

comment:5 by miqlas, 7 years ago

Reply from a fish developer:

The atfork() mechanism is inherently broken. It does not fix this issue.

The execve() function must not execute any malloc() calls since it must, by definition, be deadlock free after a fork() call. The person who provided that answer is wrong in their recommendation.

https://github.com/fish-shell/fish-shell/issues/3645#issuecomment-267249560

by korli, 7 years ago

patch without pthread_atfork

comment:6 by korli, 7 years ago

patch: 01

by korli, 7 years ago

path with pthread_atfork

comment:7 by korli, 7 years ago

2 possible patches: In the first, I have skipped the use of pthread_atfork() because we need to be sure our handler comes last, and there is at least another hook _ _reinit_pwd_backend_after_fork. On another hand, we could replace the atfork() call in arch-specifics.cpp with pthread_atfork(), assuming we're the first to call pthread_atfork().

I can't reproduce the issue with fish, so I can't validate that this help with the issue.

comment:8 by bonefish, 7 years ago

At a quick glance both patches look okay. While the pthread_atfork() solution would be a bit more elegant from a software design point of view I'd rather go with the first patch for sake of robustness -- even if the memory allocator is the first to register a hook ATM, this might change later and possibly reintroduce the issue. I would move the __heap_after_fork_child() call before __reinit_pwd_backend_after_fork(), though (the memory allocator won't ever use the user/group API, but the user/group API implementation uses memory allocations (even if the hook doesn't ATM)).

One other issue I noticed is that fork() is supposed to be async signal safe, but our implementation is not. The reason being the use of sForkLock. We should defer signals before we acquire sForkLock in __register_atfork and probably also in fork().

by korli, 7 years ago

updated patch

comment:9 by korli, 7 years ago

Thanks, I updated the first patch with your suggestions. Had to make user_thread.h C compliant. I also got rid of the atfork() call as originally done in the second patch. I hope to test deeper tomorrow, ATM I just did basic tests (qemu and real hw boot + bash fork loop).

comment:10 by korli, 7 years ago

Tested OK with a 8-threads process, each using fork simultaneously (a custom haikuporter). This was previously causing locks up on this 4-threads computer. I should commit this patch tonight when nothing comes to amend it.

comment:11 by korli, 7 years ago

Patch applied in hrev50771 with the following change: I chose to handle the fork failure and parent cases together, as it seems the parent after fork hooks should also be called in case of failure. I checked two fork() implementations, the spec also says "The parent fork handle shall be called after fork() processing completes in the parent process."

comment:12 by miqlas, 7 years ago

Just tested fish on the latest Haiku x86_64 version, i can't reproduce the hang anymore. Thnak you Guys, i need to say i haven't expected so fast solution :) Really nice job! I think you can close this ticket.

comment:13 by korli, 7 years ago

Resolution: fixed
Status: assignedclosed

Thanks for the feedback!

Note: See TracTickets for help on using tickets.