#13546 closed bug (fixed)
wait4() doesn't seem to work
Reported by: | korli | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | System/libroot.so | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
wait4() located here: http://cgit.haiku-os.org/haiku/tree/src/libs/bsd/wait.c#n23
Test program:
/* * Copyright 2007, Axel Dörfler, axeld@pinc-software.de. All rights reserved. * Distributed under the terms of the MIT License. */ #include <errno.h> #include <stdio.h> #include <string.h> #include <sys/resource.h> #define _BSD_SOURCE #include <sys/wait.h> #include <unistd.h> /*! wait() should wait only once. If any argument is given, waitpid() should return an error (and errno to ECHILD), since there is no child with that process group ID. */ int child2() { sleep(2); return 2; } //! exits before child 2 int child1() { setpgrp(); // put us into a new process group pid_t child = fork(); if (child == 0) return child2(); sleep(1); return 1; } int main(int argc, char** argv) { bool waitForGroup = argc > 1; pid_t child = fork(); if (child == 0) return child1(); struct rusage usage; pid_t pid; do { int childStatus = -1; pid = wait4(-1, &childStatus, 0, &usage); printf("wait4() returned %ld (%s), child status %d\n", pid, strerror(errno), childStatus); } while (pid >= 0); return 0; }
Haiku strace output
[ 581] _kern_image_relocated(0xf6d) (409 us) [ 581] _kern_set_area_protection(0x2cce, 0x5) = 0x00000000 No error (11 us) [ 581] _kern_set_area_protection(0x2cd0, 0x5) = 0x00000000 No error (10 us) [ 581] _kern_set_area_protection(0x2cd2, 0x5) = 0x00000000 No error (23 us) [ 581] _kern_get_system_info(0x72d84998) = 0x00000000 No error (7 us) [ 581] _kern_get_system_info(0x72d84798) = 0x00000000 No error (4 us) [ 581] _kern_reserve_address_range([0x18033000], 0x7, 0x48000000) = 0x00000000 No error (7 us) [ 581] _kern_create_area("heap", 0xc6b980, 0x1, 0x40000, 0x0, 0x33) = 0x00002cd6 (15 us) [ 581] _kern_fork() = 0x00000247 (241 us) [ 581] _kern_wait_for_child(0xffffffff, 0x0, 0x72d84b64) = 0x00000247 (999999 us) [ 581] --- Child exited (Child exited) --- [ 581] _kern_get_team_usage_info(0x247, 0x0, 0x72d84bb8, 0x10) = 0x80001103 Operation on invalid team (2 us) [ 581] _kern_read_stat(0x1, (nil), false, 0x72d83390, 0x58) = 0x00000000 No error (4 us) [ 581] _kern_ioctl(0x1, TCGETA, 0x72d83338, 0x20) = 0x00000000 No error (5 us) wait4() returned -1 (No error), child status 1 [ 581] _kern_write(0x1, 0xffffffffffffffff, 0x180571b0, 0x2f) = 0x0000002f (1379 us) [ 581] _kern_exit_team(0x0) (9 us)
Linux output
wait4() returned 22104 (Success), child status 256 wait4() returned 4294967295 (No child processes), child status -1
Attachments (1)
Change History (13)
comment:1 by , 8 years ago
Description: | modified (diff) |
---|
comment:2 by , 8 years ago
comment:3 by , 8 years ago
There is a WNOWAIT flag to get the status, but not reap the dead entry. Maybe getrusage() needs to check such dead entries too.
comment:4 by , 8 years ago
I'm not sure this is possible. WNOWAIT still removes the team from the running team list (rightfully so, the team isn't running). What it does is keep it in the local list for the parent team to use wait on.
However, our gtrusage (and wait4) is implemented using get_team_usage_info, which lookups the team by PID in the running team list.
So, the wait_for_child syscall should be extended to return the rusage info (it can extract it before reaping the team). Then, wait4 can get it from there.
comment:5 by , 7 years ago
patch: | 0 → 1 |
---|
comment:6 by , 7 years ago
Here is a patch to retrieve dead team entries usage information. This adds a parameter to the wait_for_child syscall. I extended the test case to show the actual retrieved information.
Please review, maybe there is a better way to implement the feature.
comment:7 by , 7 years ago
Confirmed fixing my problem with boost_build, which works fine with this patch applied.
+1
comment:9 by , 7 years ago
comment:10 by , 7 years ago
comment:12 by , 7 years ago
Hmm, after 7th reboot input_server suddenly works again. Never seen it before on vmware. In any case, sorry for the noise.
I hit this same problem while porting boost_build.
Investigation:
wait4 calls waitpid. This works as expected. Then, it calls getrusage using the pid to get the final resource usage. However, at this point the team is already gone, and getrusage fails.
Workaround:
Do not return -1 when getrusage fails. Instead, set the rusage to some invalid value. But then we give bogus rusage data to apps in that situation, which is not good.
Proper fix:
It seems wait4 needs to be a syscall, so it can get usage just as the process terminates and before the PID becomes invalid. Alternatively, getrusage on a dead team should be allowed for some (?) time after its death.