Opened 11 months ago

Closed 2 months ago

#18828 closed bug (fixed)

Can not spawn process after a while

Reported by: LupusMichaelis Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: - General Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

Notes:

  • I may use interchangeably “team id” and “pid” in this conversation
  • I'm not sure of the category in which to post this, if a moderator may sort me out and explains (in private) a more adequate place, I'd be grateful
  • I didn't open a Trac issue as they might already be one I didn't find, if not, please tell me I'll open one (and tell me in what module to triage it)

I'm encountering a stability issue with HaikuOS. I'm trying to track down what's happening, so I wrote a program for that. But I'm not sure my watching app is right, so I'd like someone to give it a look and tell me if I'm wrong.

This didn't lead to any data loss.

I'm having an HaikuOS (R1/beta4 hrev56578+95) deployed in a virtual machine qemu under Linux (Linux mothra 5.10.0-23-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux)

I'm using it daily to learn BeOS API by writing programs and sometime attempting to compile HaikuOS's components. In order to simplify my life, I downloaded and compiled Byobu, which works ok as long as it's not too demanding on the Terminal app (but that's a problem for a different day). Byobu spawn a couple of processes per second to refresh it's status bar.

About ever week or so, HaikuOS can't spawn processes or threads anymore. The Deskbar becomes unresponsive, but <ctrl>+<alt>+<del> allow (sometimes) for a soft reboot, and all running app are still usable.

When attempting to run from the Terminal app crashes it: it displays a message about forking failing and exits.

I observed that the team id was growing. During the past 2 months, I was writing down the uptime and the last team id with the ps command. Once I reach the last tether of my patience, I decided to monitor this growth: [collect-ps](https://gitlab.com/LupusMichaelis/belab/-/tree/trunk/collect-ps)

Once compiled, you can use it to fetch the last team id: `bash ./collect-ps `

or watch the system: `bash ./collect-ps -w & `

This will collect the last team id and the max team id every second, and attempt to spawn a process and a thread.

On my last attempt, the KO is reached for PID 25'465'541. This number doesn't look like anything to me. I'm waiting for the next crash ;

At this time, when I ran ps in an open Terminal app, this didn't crash and output that message:

`bash ~/workshop/belab/todo> ps -bash: fork: Unknown Device Error (-2147432385) -bash: cannot make pipe for command substitution: Too many open files `

My hypothesis is that the kernel reaches an integer ceiling and overflows. But I might be wrong, maybe an other resource id is exhausted.

I join the logs I collected.

So, in my monitoring tool, I observe a few strange problem:

  1. The team id seems stuck for

a while, even though a ps will show higher team ids. It's like if that value was cached at some point, but I don't have an good enough knowledge of the system. And my monitoring tool might be erroneous.

  1. sometime the team id will reverse to a previous value

Is this a known behaviour? I didn't find anything in the Trac concerning such issue.

Attachments (4)

watch-pid2024-02-27.log (258.3 KB ) - added by LupusMichaelis 11 months ago.
watch-pid2024-02-27.log
watch-pid2024-02-27.err (600 bytes ) - added by LupusMichaelis 11 months ago.
watch-pid2024-02-27.err
watch-pid2024-02-28.log (14.4 KB ) - added by LupusMichaelis 11 months ago.
watch-pid2024-02-28.log
collect-ps.c (5.9 KB ) - added by LupusMichaelis 11 months ago.
The monitoring tool to watch team ids

Download all attachments as: .zip

Change History (10)

by LupusMichaelis, 11 months ago

Attachment: watch-pid2024-02-27.log added

watch-pid2024-02-27.log

by LupusMichaelis, 11 months ago

Attachment: watch-pid2024-02-27.err added

watch-pid2024-02-27.err

by LupusMichaelis, 11 months ago

Attachment: watch-pid2024-02-28.log added

watch-pid2024-02-28.log

by LupusMichaelis, 11 months ago

Attachment: collect-ps.c added

The monitoring tool to watch team ids

comment:1 by waddlesplash, 11 months ago

Please retest with a nightly build (you can just change your repos and full-sync with pkgman) and see if anything is different.

comment:2 by LupusMichaelis, 11 months ago

I downloaded the nightly (hrev57609) and installed it in a new VM. Compiled and launch. I'm waiting for it to crash (I'll promote it by building HaikuOS from the VM every while).

So far the behaviour about the team id being stuck when the activity is low remains.

comment:3 by waddlesplash, 2 months ago

Does this still happen?

comment:4 by LupusMichaelis, 2 months ago

I don't know, I'm running some tests (building again and again a project) and I'll tell you by the end of the weekend if this still occurs

(Testing with nightly hrev58352, started the loop the 2024/11/22 at 10:34)

comment:5 by LupusMichaelis, 2 months ago

So, I let it run until now. The issue seems not to be triggered as before. As well, the collecting of PIDs feels way more consistent.

I assume you solved the issue, good work!

comment:6 by waddlesplash, 2 months ago

Resolution: fixed
Status: newclosed

Thanks for testing!

Note: See TracTickets for help on using tickets.