Opened 9 years ago

Last modified 3 weeks ago

#5790 assigned enhancement

Use "vmem" or a similar system for allocation of IDs

Reported by: mmlr Owned by: nobody
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

On the WebPositive svn/trac/build server, which is still running on a hrev35294 kernel, I've got this panic after an uptime of way over a month:

out of ports, but sUsedPorts is broken

Sadly I didn't have a working keyboard at that time so I couldn't really dig into why this happened. Since this machine is doing a clean WebPositive build on each new revision, including building the webkit parts and all generated stuff, besides hosting the WebPositive svn and trac and serving the nightlies, as well as being regularly ssh brute-force attacked, this machine is in rather heavy use. Due to the long uptime there might have been some resource that overflowed (like area ids or some such), though it shouldn't have happened quite that "quickly".

Change History (17)

comment:1 Changed 9 years ago by vooshy

About 6:30AM GMT I tried to svn checkout the trunk - mid way through it stopped and the site hasn't worked since, apologies if my use killed it.

comment:2 Changed 9 years ago by axeld

The port ID computation is actually pretty stupid, and can advance rapidly:

	// make the port_id be a multiple of the slot it's in
	if (i >= sNextPort % sMaxPorts)
		sNextPort += i - sNextPort % sMaxPorts;
	else
		sNextPort += sMaxPorts - (sNextPort % sMaxPorts - i);

Still, the code looks reasonably safe to overflows, so I think sUsedPorts is actually broken.

comment:3 in reply to:  2 ; Changed 9 years ago by mmlr

Replying to axeld:

Still, the code looks reasonably safe to overflows, so I think sUsedPorts is actually broken.

I don't see any code that'd deal with wrapping to negative port ids though (as port_id is a int32). Also what happens if the port id happens to become -1 which is used to indicate a free slot (I didn't actually try to figure out if this can happen though). The concept of sUsedPorts looks pretty simple to me so I don't see where it could really go wrong.

comment:4 in reply to:  3 Changed 9 years ago by bonefish

Replying to mmlr:

I don't see any code that'd deal with wrapping to negative port ids though (as port_id is a int32).

Indeed. The same problem exists for various other ID generating kernel services. Fixes welcome. :-)

comment:5 Changed 9 years ago by axeld

Damn it, I didn't think about it being int32... but yes, that would be a good cause for this problem.

comment:6 Changed 9 years ago by mmlr

After a bit more than ten days of uptime the port ids are already at 544968881+, so it is quite likely that after the 30+ days of uptime they wrapped. Anything linking to libbe currently uses up the 3 default reply ports of the static BMessage initialization. Since the server is flooded by the ssh bruteforce attacks which spawn sshd instances which happen to produce these reply ports it sounds like a reasonable explanation. On one side the static BMessage initialization could be made more lazy, on the other we probably should come up with a way to handle the id reuse case in general.

comment:7 Changed 9 years ago by mmlr

As Rene points out, sshd doesn't actually link to libbe, but it links to libnetwork which in turn links to libbe, so that's where this is coming from. Since wget does also link to libnetwork and is run in a loop to test the availability of the trac instance, that'd be another regular port id consumer.

comment:8 in reply to:  7 Changed 9 years ago by bonefish

Milestone: R1R1/alpha2

Replying to mmlr:

As Rene points out, sshd doesn't actually link to libbe, but it links to libnetwork which in turn links to libbe, so that's where this is coming from.

I guess hrev28825 totally slipped by me. Apparently someone (no names :-)) reintroduced the libbe dependency after I removed it in hrev25485. Since this only concerns the private API start_watching_network() functions (which play with BMessengers), I'm very much in favor of removing the dependency again, either by simply making the functions inline (and only provide a port+token non-inline version) or move them to libbnetapi.

Anyway, regarding the ID overflow issues, I'm moving the ticket to the R1/alpha2 milestone to add further incentive to solve it soon. :-)

comment:9 Changed 9 years ago by mmlr

I thought about using a list of sorted "free ranges" that can be extended/joined on freeing an id and updated/removed on id allocation. Could be made generic and used for area_id as well.

comment:10 in reply to:  9 Changed 9 years ago by bonefish

Replying to mmlr:

I thought about using a list of sorted "free ranges" that can be extended/joined on freeing an id and updated/removed on id allocation. Could be made generic and used for area_id as well.

You might want to have a look at Bonwick's resource allocator (don't have a link at hand, but shouldn't be too hard to find -- usually in combination with the slab allocator), which was invented to do pretty much exactly that. Though, a simple solution -- like an increment + lookup loop until free spot found -- should work well enough (at least for alpha 2, where I wouldn't want to introduce larger amounts of untested code anymore), particularly in the cases where the domain (positive int32) is several orders of magnitude greater than the total count limit.

comment:11 Changed 9 years ago by bonefish

Here's the paper describing the resource allocator (which is called Vmem for some reason) in chapter 4. The kernel address space management was somewhat inspired by the design.

comment:12 Changed 8 years ago by pulkomandy

Any news on this ? Or should it be delayed to after alpha3 ?

comment:13 Changed 8 years ago by scottmc

Milestone: R1/alpha3R1/beta1

comment:14 Changed 5 years ago by pulkomandy

The *BSD implementation of vmem (under a 2-clause BSD license): http://www.leidinger.net/FreeBSD/dox/kern/html/d8/d5d/subr__vmem_8c_source.html

comment:15 Changed 5 years ago by pulkomandy

Milestone: R1/beta1Unscheduled
Type: bugenhancement

It seems http://cgit.haiku-os.org/haiku/diff/src/system/kernel/port.cpp?id=24df65921befcd0ad0c5c7866118f922da61cb96 changed the port ID computation and the new code handles overflows. This solves the initial problem.

I'm making this an enhancement ticket and moving it out of beta1, since it would still be better to use vmem for allocation of port, area, and process IDs (and possibly in other places).

comment:16 Changed 2 years ago by axeld

Owner: changed from axeld to nobody
Status: newassigned

comment:17 Changed 3 weeks ago by waddlesplash

Summary: panic: out of ports, but sUsedPorts is brokenUse "vmem" or a similar system for allocation of IDs
Note: See TracTickets for help on using tickets.