Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#1755 closed bug (fixed)

APR 0.9.x configure hangs

Reported by: andreasf Owned by: bonefish
Priority: normal Milestone: R1
Component: - General Version: R1/pre-alpha1
Keywords: Cc: bonefish
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description

Configuring apr-0.9.17 or 0.9.x SVN branch (--with-build=i586-pc-beos) reproducibly hangs when supposed to generate the Makefiles.

Expected would be almost instant generation of the Makefiles as on R5.

Spawned processes include sed and sort. CPU usage is < 1% for > 5 mins. While likely some deadlock, this happens also with SMP disabled.

Entering the kernel debugger, listing all threads and exiting results in immediate error "/bin/sort: write failed: standard output: Broken pipe" and generation of the Makefiles (and then another hang when "config.status: executing default commands"). When immediately exiting the kernel debugger without other commands, it still hangs.

Last experienced at hrev23891, with GCC 2.95.3 from Haiku site with its include dir replaced from Linux' generated/, Haiku headers and libs symlinked/copied.

Change History (14)

comment:1 Changed 11 years ago by andreasf

Checking the 1.2.x branch I got a similar hang. Interestingly, there, maximizing the Terminal window resolves the hang and makes it continue to the end without errors or further hangs.

For 0.9.x, maximizing the window at the hang leads to the same broken pipe error as when returning from kernel debugger and hangs again at the end; maximizing once more makes the script end normally without pipe error.

comment:2 Changed 11 years ago by andreasf

Make that "maximizing and restoring". After maximizing stilll nothing happens.

comment:3 Changed 11 years ago by bonefish

That the process continues after playing with the Terminal window size is likely due our missing automatic syscall restarts (cf. #1743). Why it hangs in the first place is a different problem. If you can reproduce the problem, you could check in the kernel debugger, where the responsible thread hangs ("sc") and -- if you've kernel tracing enabled (for syscalls at least, even better also for signals and teams) -- also print the last "traced" entries of this thread.

comment:4 Changed 11 years ago by andreasf

Any hint how to find the "responsible thread"? I'm not even sure which process, there are about five.

comment:5 in reply to:  3 ; Changed 11 years ago by andreasf

Cc: bonefish added

Replying to bonefish:

If you can reproduce the problem, you could check in the kernel debugger, where the responsible thread hangs ("sc")

I do have five teams, each single-threaded, waiting for different semaphores (0x9...).

sh 9375 appears to be the configure script, and interpreting sc, it is waiting for a child process (kernel:wait_for_child). sed 9376 appears to be reading from a pipe (kernel:pipefs_read). sh 9380 appears to be writing to a pipe (kernel:pipefs_write). sh 9381 appears to be waiting for a child process (kernel:wait_for_child). sort 9382 appears to be reading from a pipe (kernel:pipefs_read).

Obviously I've shortened the symbol names and picked a meaningful one from the top of the list - if you need the full backtrace, is there a better way than a digicam?

Sounds like a reader-writer-lock problem to me.

I don't know how to interpret the sem output; however sem for the sed semaphore (0x94c7de9c) printed [*** READ/WRITE FAULT ***] as the last line and above as name two triangles and as id 0 and as owner 1 (count and queue both large negative numbers; all others had a hexadecimal next and negative next_id instead, no name and a negative id).

and -- if you've kernel tracing enabled (for syscalls at least, even better also for signals and teams) -- also print the last "traced" entries of this thread.

traced was not recognized as command in the kernel debugger. If I need to enable this to help debug this further, please tell me how.

comment:6 in reply to:  5 Changed 11 years ago by bonefish

Replying to andreasf:

Replying to bonefish:

If you can reproduce the problem, you could check in the kernel debugger, where the responsible thread hangs ("sc")

I do have five teams, each single-threaded, waiting for different semaphores (0x9...).

sh 9375 appears to be the configure script, and interpreting sc, it is waiting for a child process (kernel:wait_for_child). sed 9376 appears to be reading from a pipe (kernel:pipefs_read). sh 9380 appears to be writing to a pipe (kernel:pipefs_write). sh 9381 appears to be waiting for a child process (kernel:wait_for_child). sort 9382 appears to be reading from a pipe (kernel:pipefs_read).

Obviously I've shortened the symbol names and picked a meaningful one from the top of the list - if you need the full backtrace, is there a better way than a digicam?

If you don't have a serial port and a second compute to record the serial output, then taking a picture is the only way.

Sounds like a reader-writer-lock problem to me.

Doesn't look too bad. At least there are both pipe readers and writers. The question is why they don't make progress. Using the "team" command for each of the teams, you can also get (a part) of their command line arguments.

I don't know how to interpret the sem output; however sem for the sed semaphore (0x94c7de9c) printed [*** READ/WRITE FAULT ***] as the last line and above as name two triangles and as id 0 and as owner 1 (count and queue both large negative numbers; all others had a hexadecimal next and negative next_id instead, no name and a negative id).

If the "sem/cv" number listed by the "threads" command is greater than 0x80000000, then it isn't a semaphore but a condition variable (not unlikely, since the pipefs implementation does indeed use condition variables). You get information about it via the "cvar" command -- not much, since condition variables are quite simple.

and -- if you've kernel tracing enabled (for syscalls at least, even better also for signals and teams) -- also print the last "traced" entries of this thread.

traced was not recognized as command in the kernel debugger. If I need to enable this to help debug this further, please tell me how.

I recently wrote an article about the kernel debugger, including a section "Kernel Tracing" with a subsection "Enabling It":

http://www.haiku-os.org/documents/dev/welcome_to_kernel_debugging_land

comment:7 Changed 11 years ago by andreasf

Just a short note that I still get this at hrev23990 but with the new automatic syscall restart I can workaround by maximizing/minimizing without broken pipe error.

comment:8 Changed 11 years ago by andrewbachmann

I also ran into this issue on hrev24209, and the max/minimize seems to restart just fine. I noticed that if I do a "ps", it prints "Bad semaphore ID(-1)" for sh and sed processes. But this seems to show up all the time so perhaps it doesn't mean anything.

comment:9 Changed 11 years ago by jonas.kirilla

The error message is due to my recent change to http://dev.haiku-os.org/changeset/24023/haiku/trunk/src/bin/ps.c

Previously ps would just list the previous value found by the while loop for each line which now prints the error message instead.

I guess the -1 is due to waiting on a condition variable. http://dev.haiku-os.org/browser/haiku/trunk/src/system/kernel/condition_variable.cpp?rev=23980 (Look for -1 in PrivateConditionVariableEntry::Wait())

comment:10 in reply to:  9 ; Changed 11 years ago by andreasf

Replying to jonas.kirilla:

I guess the -1 is due to waiting on a condition variable.

Yes, Ingo pointed this out above.

comment:11 in reply to:  10 Changed 11 years ago by jonas.kirilla

Replying to andreasf:

Replying to jonas.kirilla:

I guess the -1 is due to waiting on a condition variable.

Yes, Ingo pointed this out above.

Yeah, I just wanted to follow up on Andrew's observation of 'ps' output since I'm responsible for the latest change to it. I forgot to press reply (for proper quotation). I didn't mean to comment on the reported issue.

comment:12 Changed 11 years ago by bonefish

Owner: changed from axeld to bonefish
Status: newassigned

I can reproduce a problem with blocking pipes with unzip -l large.zip | less + "G" (not always, but often enough). Looking into it.

comment:13 Changed 11 years ago by bonefish

Resolution: fixed
Status: assignedclosed

Fixed in hrev24701.

comment:14 Changed 11 years ago by andreasf

Finally got around to checking on this, and it no longer hangs for me. Thanks!

Note: See TracTickets for help on using tickets.