Opened 11 months ago
Closed 2 months ago
#18828 closed bug (fixed)
Can not spawn process after a while
Reported by: | LupusMichaelis | Owned by: | nobody |
---|---|---|---|
Priority: | normal | Milestone: | Unscheduled |
Component: | - General | Version: | R1/beta4 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description
Notes:
- I may use interchangeably “team id” and “pid” in this conversation
- I'm not sure of the category in which to post this, if a moderator may sort me out and explains (in private) a more adequate place, I'd be grateful
- I didn't open a Trac issue as they might already be one I didn't find, if not, please tell me I'll open one (and tell me in what module to triage it)
I'm encountering a stability issue with HaikuOS. I'm trying to track down what's happening, so I wrote a program for that. But I'm not sure my watching app is right, so I'd like someone to give it a look and tell me if I'm wrong.
This didn't lead to any data loss.
I'm having an HaikuOS (R1/beta4 hrev56578+95
) deployed in a virtual machine
qemu under Linux
(Linux mothra 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
)
I'm using it daily to learn BeOS API by writing programs and sometime attempting to compile HaikuOS's components. In order to simplify my life, I downloaded and compiled Byobu, which works ok as long as it's not too demanding on the Terminal app (but that's a problem for a different day). Byobu spawn a couple of processes per second to refresh it's status bar.
About ever week or so, HaikuOS can't spawn processes or threads anymore. The
Deskbar becomes unresponsive, but <ctrl>+<alt>+<del>
allow (sometimes) for a
soft reboot, and all running app are still usable.
When attempting to run from the Terminal app crashes it: it displays a message about forking failing and exits.
I observed that the team id was growing. During the past 2 months, I was writing
down the uptime and the last team id with the ps
command. Once I reach the last
tether of my patience, I decided to monitor this growth:
[collect-ps](https://gitlab.com/LupusMichaelis/belab/-/tree/trunk/collect-ps)
Once compiled, you can use it to fetch the last team id:
`bash
./collect-ps
`
or watch the system:
`bash
./collect-ps -w &
`
This will collect the last team id and the max team id every second, and attempt to spawn a process and a thread.
On my last attempt, the KO is reached for PID 25'465'541. This number doesn't look like anything to me. I'm waiting for the next crash ;
At this time, when I ran ps
in an open Terminal app, this didn't crash and output that message:
`bash
~/workshop/belab/todo> ps
-bash: fork: Unknown Device Error (-2147432385)
-bash: cannot make pipe for command substitution: Too many open files
`
My hypothesis is that the kernel reaches an integer ceiling and overflows. But I might be wrong, maybe an other resource id is exhausted.
I join the logs I collected.
So, in my monitoring tool, I observe a few strange problem:
- The team id seems stuck for
a while, even though a
ps
will show higher team ids. It's like if that value was cached at some point, but I don't have an good enough knowledge of the system. And my monitoring tool might be erroneous.
- sometime the team id will reverse to a previous value
Is this a known behaviour? I didn't find anything in the Trac concerning such issue.
Attachments (4)
Change History (10)
by , 11 months ago
Attachment: | watch-pid2024-02-27.log added |
---|
comment:1 by , 11 months ago
Please retest with a nightly build (you can just change your repos and full-sync with pkgman) and see if anything is different.
comment:2 by , 11 months ago
I downloaded the nightly (hrev57609) and installed it in a new VM. Compiled and launch. I'm waiting for it to crash (I'll promote it by building HaikuOS from the VM every while).
So far the behaviour about the team id being stuck when the activity is low remains.
comment:4 by , 2 months ago
I don't know, I'm running some tests (building again and again a project) and I'll tell you by the end of the weekend if this still occurs
(Testing with nightly hrev58352, started the loop the 2024/11/22 at 10:34)
comment:5 by , 2 months ago
So, I let it run until now. The issue seems not to be triggered as before. As well, the collecting of PIDs feels way more consistent.
I assume you solved the issue, good work!
watch-pid2024-02-27.log