Opened 14 years ago

Last modified 15 months ago

#5263 assigned bug

could not create mutex

Reported by: mmadia Owned by: nobody
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by mmadia)

hrev34979 + VM_CACHE_TRACING_STACK_TRACE 0 , KDEBUG_LEVEL 0

While building @alpha-raw, Terminal would spit out "could not create mutex". At this point Vision would disconnect. This tends to occur while executing the build_haiku_image script. The jam process will need to be forcefully killed. Prior to killing the process, KDL can not be entered.

syslog snippet:

KERN: vm_soft_fault: va 0x0 not covered by area in address space
KERN: vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0x24c284, write 1, user 1, thread 0x26eb5
KERN: vm_page_fault: thread "bfs_shell" (159413) in team "bfs_shell" (159413) tried to write address 0x0, ip 0x24c284 ("bfs_shell_seg0ro" +0x4c284)
KERN: thread_hit_serious_debug_event(): Failed to install debugger: thread: 159413: Out of memory

Replying to #5138 comment:25 bonefish:

The system has apparently run out of semaphores.

Change History (8)

comment:1 by kaliber, 14 years ago

Is it possible to solve the availability problem of semaphores? I have to increase their number from 40960 to 102400 (src/build/libroot/sem.cpp) because I have to copy a lot of files in fs_shell.

comment:2 by mmadia, 14 years ago

Description: modified (diff)

Seems I mistakenly was trying to enter KDL with the ps/2 keyboard attached to a different machine. Edited description to remove the related text.

comment:3 by mmadia, 14 years ago

reproduced in 35087 + KDEBUG_LEVEL 2 , VM_CACHE_TRACING 2, VM_CACHE_TRACING_STACK_TRACE 0

Here's a serial debugging snippet. I requested halt upon encountering the "could not create mutex" message.

vm_soft_fault: va 0x0 not covered by area in address space
vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x0, ip 0x24c284, write 1, user 1, thread 0x8402
vm_page_fault: thread "bfs_shell" (33794) in team "bfs_shell" (33794) tried to write address 0x0, ip 0x24c284 ("bfs_shell_seg0ro" +0x4c284)
thread_hit_serious_debug_event(): Failed to install debugger: thread: 33794: Out of memory
PANIC: Keyboard Requested Halt.
Welcome to Kernel Debugging Land...
Thread 1 "idle thread 1" running on CPU 0
kdebug> info
kernel build: Jan 15 2010 14:44:58
SVN revision: 35087

cpu count: 2, active times:
  [1] 1232928659
  [2] 1246006260
pages:		641349 (753600 max)
sems:		3768 (65536 max)
ports:		308 (4096 max)
threads:	217 (4096 max)
teams:		25 (2048 max)
kdebug> avail
Available memory: 2419318784/3086745600 bytes
kdebug> thread 33794
thread "33794" (33794) doesn't exist!
kdebug> show_waste
24: 506288
48: 2796
224: 48
256: 224
total waste: 509356

comment:4 by mmadia, 14 years ago

hrev36423-r1a2-rc. Found one way to reproduce this. add an "exit 1" inside build_haiku_image, so far mine have been within extractFile(). When building a raw image, that exit 1 will cause bfs_shell to stick around in memory. Once 3~5 stray processes are hanging about, jam -q @alpha-raw will echo the "could not create mutex". Each time, vision does disconnect. Using ProcessController on the stray bfs_shell's will return the system to normal.

in reply to:  4 ; comment:5 by bonefish, 14 years ago

Replying to kaliber:

Is it possible to solve the availability problem of semaphores? I have to increase their number from 40960 to 102400 (src/build/libroot/sem.cpp) because I have to copy a lot of files in fs_shell.

Since semaphores are a kernel resource (at least under Haiku), they need to be limited. The limit could be increased, of course, but I think it's an fs_shell bug, if that many semaphores are used. The only dynamic use of semaphores is associated with vnodes and the number of unused vnodes is limited to 16 in the fs_shell. So there's probably a leak somewhere (maybe vnode references).

Replying to mmadia:

hrev36423-r1a2-rc. Found one way to reproduce this. add an "exit 1" inside build_haiku_image, so far mine have been within extractFile(). When building a raw image, that exit 1 will cause bfs_shell to stick around in memory. Once 3~5 stray processes are hanging about, jam -q @alpha-raw will echo the "could not create mutex". Each time, vision does disconnect. Using ProcessController on the stray bfs_shell's will return the system to normal.

The fact that one process uses that many semaphores is one problem, but this is a different issue. Obviously the bfs_shell process shouldn't stick around after the script terminates. Unfortunately with the current communication mechanisms (UNIX sockets respectively BeOS ports) the bfs_shell cannot know when it is no longer needed and therefore needs to be told explicitly. In the explicit "exit" cases that could be done easily, but the "set -o errexit" makes it complicated. Maybe it is possible to install some generic exit callback -- I know that it is possible to install signal callbacks at least.

Alternatively the communication mechanism could be changed. E.g. to use named pipes. The shell would open the pipe and when it quits the pipe would be closed automatically, thus indicating to the bfs_shell that it should quit.

in reply to:  5 comment:6 by korli, 14 years ago

Replying to bonefish:

The fact that one process uses that many semaphores is one problem, but this is a different issue. Obviously the bfs_shell process shouldn't stick around after the script terminates. Unfortunately with the current communication mechanisms (UNIX sockets respectively BeOS ports) the bfs_shell cannot know when it is no longer needed and therefore needs to be told explicitly. In the explicit "exit" cases that could be done easily, but the "set -o errexit" makes it complicated. Maybe it is possible to install some generic exit callback -- I know that it is possible to install signal callbacks at least.

Something like this should do the trick

function cleanexit() {
    local exit_status=$?
    echo "Exiting with $exit_status"
    exit $exit_status
}

trap cleanexit ERR

comment:7 by axeld, 7 years ago

Owner: changed from axeld to nobody
Status: newassigned

comment:8 by waddlesplash, 15 months ago

Do any of the above problems still persist? At least I don't know of bfs_shell processes persisting when a script fails.

Note: See TracTickets for help on using tickets.