Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#13546 closed bug (fixed)

wait4() doesn't seem to work

Reported by: korli Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: System/libroot.so Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by korli)

wait4() located here: http://cgit.haiku-os.org/haiku/tree/src/libs/bsd/wait.c#n23

Test program:

/*
 * Copyright 2007, Axel Dörfler, axeld@pinc-software.de. All rights reserved.
 * Distributed under the terms of the MIT License.
 */


#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/resource.h> 
#define _BSD_SOURCE
#include <sys/wait.h>
#include <unistd.h>


/*!
	wait() should wait only once. If any argument is given, waitpid() should return
	an error (and errno to ECHILD), since there is no child with that process group ID.
*/


int
child2()
{
	sleep(2);
	return 2;
}


//! exits before child 2
int
child1()
{
	setpgrp();
		// put us into a new process group

	pid_t child = fork();
	if (child == 0)
		return child2();

	sleep(1);
	return 1;
}


int
main(int argc, char** argv)
{
	bool waitForGroup = argc > 1;

	pid_t child = fork();
	if (child == 0)
		return child1();

	struct rusage usage;
	pid_t pid;
	do {
		int childStatus = -1;
		pid = wait4(-1, &childStatus, 0, &usage);
		printf("wait4() returned %ld (%s), child status %d\n",
			pid, strerror(errno), childStatus);
	} while (pid >= 0);

	return 0;
}

Haiku strace output

[   581] _kern_image_relocated(0xf6d) (409 us)
[   581] _kern_set_area_protection(0x2cce, 0x5) = 0x00000000 No error (11 us)
[   581] _kern_set_area_protection(0x2cd0, 0x5) = 0x00000000 No error (10 us)
[   581] _kern_set_area_protection(0x2cd2, 0x5) = 0x00000000 No error (23 us)
[   581] _kern_get_system_info(0x72d84998) = 0x00000000 No error (7 us)
[   581] _kern_get_system_info(0x72d84798) = 0x00000000 No error (4 us)
[   581] _kern_reserve_address_range([0x18033000], 0x7, 0x48000000) = 0x00000000 No error (7 us)
[   581] _kern_create_area("heap", 0xc6b980, 0x1, 0x40000, 0x0, 0x33) = 0x00002cd6 (15 us)
[   581] _kern_fork() = 0x00000247 (241 us)
[   581] _kern_wait_for_child(0xffffffff, 0x0, 0x72d84b64) = 0x00000247 (999999 us)
[   581] --- Child exited (Child exited) ---
[   581] _kern_get_team_usage_info(0x247, 0x0, 0x72d84bb8, 0x10) = 0x80001103 Operation on invalid team (2 us)
[   581] _kern_read_stat(0x1, (nil), false, 0x72d83390, 0x58) = 0x00000000 No error (4 us)
[   581] _kern_ioctl(0x1, TCGETA, 0x72d83338, 0x20) = 0x00000000 No error (5 us)
wait4() returned -1 (No error), child status 1
[   581] _kern_write(0x1, 0xffffffffffffffff, 0x180571b0, 0x2f) = 0x0000002f (1379 us)
[   581] _kern_exit_team(0x0) (9 us)

Linux output

wait4() returned 22104 (Success), child status 256
wait4() returned 4294967295 (No child processes), child status -1

Attachments (1)

wait4.patch (7.5 KB ) - added by korli 7 years ago.
wait4 patch

Download all attachments as: .zip

Change History (13)

comment:1 by korli, 8 years ago

Description: modified (diff)

comment:2 by pulkomandy, 7 years ago

I hit this same problem while porting boost_build.

Investigation:

wait4 calls waitpid. This works as expected. Then, it calls getrusage using the pid to get the final resource usage. However, at this point the team is already gone, and getrusage fails.

Workaround:

Do not return -1 when getrusage fails. Instead, set the rusage to some invalid value. But then we give bogus rusage data to apps in that situation, which is not good.

Proper fix:

It seems wait4 needs to be a syscall, so it can get usage just as the process terminates and before the PID becomes invalid. Alternatively, getrusage on a dead team should be allowed for some (?) time after its death.

comment:3 by korli, 7 years ago

There is a WNOWAIT flag to get the status, but not reap the dead entry. Maybe getrusage() needs to check such dead entries too.

comment:4 by pulkomandy, 7 years ago

I'm not sure this is possible. WNOWAIT still removes the team from the running team list (rightfully so, the team isn't running). What it does is keep it in the local list for the parent team to use wait on.

However, our gtrusage (and wait4) is implemented using get_team_usage_info, which lookups the team by PID in the running team list.

So, the wait_for_child syscall should be extended to return the rusage info (it can extract it before reaping the team). Then, wait4 can get it from there.

by korli, 7 years ago

Attachment: wait4.patch added

wait4 patch

comment:5 by korli, 7 years ago

patch: 01

comment:6 by korli, 7 years ago

Here is a patch to retrieve dead team entries usage information. This adds a parameter to the wait_for_child syscall. I extended the test case to show the actual retrieved information.

Please review, maybe there is a better way to implement the feature.

comment:7 by pulkomandy, 7 years ago

Confirmed fixing my problem with boost_build, which works fine with this patch applied.

+1

comment:8 by korli, 7 years ago

Resolution: fixed
Status: newclosed

Applied in hrev51470, thanks!

comment:9 by diver, 7 years ago

I have a suspicion that this commit broke input_server. Both mouse and keyboard stopped working after full sync from hrev51469 to hrev51471.

comment:10 by pulkomandy, 7 years ago

That would be unexpected, as the wait4 function is not used much in Haiku code.

Are you sure it is not #12713 / #12412 or similar?

comment:11 by diver, 7 years ago

Yes, I connected via ssh and input_server is running.

comment:12 by diver, 7 years ago

Hmm, after 7th reboot input_server suddenly works again. Never seen it before on vmware. In any case, sorry for the noise.

Note: See TracTickets for help on using tickets.