Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#16106 closed bug (fixed)

[kernel] vfork: Out of memory when building haiku with plenty of RAM

Reported by: diver Owned by: nobody
Priority: normal Milestone: R1/beta2
Component: System/POSIX Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

This is hrev54249 x86_64.

Building Haiku fails with 1.5GB RAM and the same swap size.

Could it be that the kernel can't release cached memory or perhaps swap is not used for some reason?

haiku.hpkg: Removing and re-creating package contents dir ...
haiku.hpkg: Collecting package contents ...
haiku.hpkg: mimeset'ing package contents ...
haiku.hpkg: Creating the package ...
DownloadLocatedFile1 generated/download/be_book-2008_10_26-3-any.hpkg 
vfork: Out of memory

This is what I get when I try to build haiku with jam @release-anyboot despite having so much RAM. Jam uses 1GB and a lot of memory is used by cache but it looks like kernel is unable to release any of it to make jam finish. Or maybe it can't use swap?

I think this could be the same reason why AboutSystem fails to start with 256M:

qemu -cdrom haiku-master-hrev54249-x86_64-anyboot.iso -m 256M

Running Haiku in qemu with 256M RAM lets you boot all the way to the desktop in 7 minutes. ProcessController will show that only 190MB RAM is used and some more is used by cache. However, starting AboutSystem will fail.

Attachments (1)

0001-jam-use-load_image-on-Haiku.patch (1.2 KB ) - added by X512 4 years ago.
Jam patch.

Download all attachments as: .zip

Change History (18)

comment:1 by waddlesplash, 4 years ago

Probably there is memory used in "reservations". I don't know if we have a way to dump this (perhaps via "vmstat"?)

comment:2 by pulkomandy, 4 years ago

The two issues are probably unrelated. The design of vfork() is that it essentially makes a copy of the existing running program. All the variables, etc need to be copied. Quite often, this is very temporary because after a vfork(), the new team does an exec() and replaces all this with the new executable to run.

This is a well-known limitation of the way to run things in UNIX. In the case of jam in particular, the whole context of jam (which is large because it stores all the build tree in memory) has to be forked everytime it wants to start a new job. So the build process is known to require a lot of RAM (2GB plus swap is not unheard of and it has been so for a long time, I think this was already mentionned in alpha1 release notes, even).

Linux works around it by using overcommit memory for this and hoping for the best (that is, that the program will not try to access too much variables after forking, otherwise it would crash in an unrecoverable way there). In BeOS and Haiku, this is handled by normally not using fork/exec but starting apps directly from a clean context (BRoster::Launch does this). AboutSystem is started using the latter way, so it can't be a problem with vfork.

Here is the listimage for AboutSystem:

TEAM 2057 (/boot/system/apps/AboutSystem):
   ID       Text       Data  Seq#      Init# Name
--------------------------------------------------------------------------------
 5077 0x01333000 0x01353000     0          0 /boot/system/apps/AboutSystem
 5075 0x60e7e000 0x00000000     0          0 commpage
 5076 0x01688000 0x016a7000     0          0 /boot/system/runtime_loader
 5078 0x01978000 0x019aa000     0          0 /boot/system/lib/libstdc++.r4.so
 5079 0x00d43000 0x01013000     0          0 /boot/system/lib/libbe.so
 5080 0x002e8000 0x002ff000     0          0 /boot/system/lib/libtranslation.so
 5081 0x00517000 0x005f2000     0          0 /boot/system/lib/libroot.so
 5082 0x025cf000 0x03e4f000     0          0 /boot/system/lib/libicudata.so.57.2
 5083 0x01d9f000 0x02030000     0          0 /boot/system/lib/libicui18n.so.57.2
 5084 0x012b4000 0x012bd000     0          0 /boot/system/lib/libicuio.so.57.2
 5085 0x009c5000 0x00a18000     0          0 /boot/system/lib/libicule.so.57.2
 5086 0x0188c000 0x01897000     0          0 /boot/system/lib/libiculx.so.57.2
 5087 0x008dc000 0x00915000     0          0 /boot/system/lib/libicutu.so.57.2
 5088 0x00b68000 0x00cda000     0          0 /boot/system/lib/libicuuc.so.57.2
 5089 0x02151000 0x0216b000     0          0 /boot/system/lib/libz.so.1.2.11
 5090 0x010f4000 0x011eb000     0          0 /boot/system/lib/libtextencoding.so
 5092 0x0182f000 0x0183f000     0          0 /boot/system/lib/libroot-addon-icu.so
 5093 0x010c2000 0x010df000     0          0 /boot/system/add-ons/Translators/NanoSVGTranslator
 5094 0x01918000 0x01928000     0          0 /boot/system/add-ons/Translators/WonderBrushTranslator
 5095 0x020d5000 0x020e1000     0          0 /boot/system/add-ons/Translators/WebPTranslator
 5096 0x00964000 0x009b5000     0          0 /boot/system/lib/libwebp.so.7.1.0
 5097 0x0089c000 0x008a8000     0          0 /boot/system/add-ons/Translators/TIFFTranslator
 5098 0x02305000 0x0236e000     0          0 /boot/system/lib/libtiff.so.5.5.0
 5099 0x01c9b000 0x01d25000     0          0 /boot/system/lib/libjpeg.so.62.3.0
 5100 0x010ae000 0x010bb000     0          0 /boot/system/add-ons/Translators/TGATranslator
 5101 0x004c4000 0x004d1000     0          0 /boot/system/add-ons/Translators/SGITranslator
 5102 0x00785000 0x00796000     0          0 /boot/system/add-ons/Translators/RTFTranslator
 5103 0x0126d000 0x0128e000     0          0 /boot/system/add-ons/Translators/RAWTranslator
 5104 0x01628000 0x01637000     0          0 /boot/system/add-ons/Translators/PSDTranslator
 5105 0x0179e000 0x017a9000     0          0 /boot/system/add-ons/Translators/PPMTranslator
 5106 0x00754000 0x00760000     0          0 /boot/system/add-ons/Translators/PNGTranslator
 5107 0x011ee000 0x01218000     0          0 /boot/system/lib/libpng16.so.16.37.0
 5108 0x015dd000 0x015e7000     0          0 /boot/system/add-ons/Translators/PCXTranslator
 5109 0x00882000 0x00892000     0          0 /boot/system/add-ons/Translators/JPEG2000Translator
 5110 0x012c2000 0x01329000     0          0 /boot/system/lib/libjasper.so.4.0.0
 5111 0x01223000 0x01237000     0          0 /boot/system/add-ons/Translators/JPEGTranslator
 5112 0x01d90000 0x01d9b000     0          0 /boot/system/add-ons/Translators/ICNSTranslator
 5113 0x021d7000 0x021e6000     0          0 /boot/system/lib/libicns.so.1.2.0
 5114 0x006b5000 0x006fb000     0          0 /boot/system/lib/libopenjp2.so.2.1.2
 5115 0x01bd3000 0x01bdf000     0          0 /boot/system/add-ons/Translators/ICOTranslator
 5116 0x0042f000 0x00439000     0          0 /boot/system/add-ons/Translators/HVIFTranslator
 5117 0x00a68000 0x00a7c000     0          0 /boot/system/add-ons/Translators/GIFTranslator
 5118 0x020bc000 0x020c7000     0          0 /boot/system/add-ons/Translators/EXRTranslator
 5119 0x03f29000 0x041e2000     0          0 /boot/system/lib/libIlmImf-2_2.so.23.0.0
 5120 0x003b1000 0x0040e000     0          0 /boot/system/lib/libIlmImfUtil-2_2.so.23.0.0
 5121 0x01d4a000 0x01d8c000     0          0 /boot/system/lib/libHalf.so.23.0.0
 5122 0x020f0000 0x0211a000     0          0 /boot/system/lib/libIex-2_2.so.23.0.0
 5123 0x01774000 0x0177a000     0          0 /boot/system/lib/libIexMath-2_2.so.23.0.0
 5124 0x0076b000 0x00776000     0          0 /boot/system/lib/libIlmThread-2_2.so.23.0.0
 5125 0x0064a000 0x00669000     0          0 /boot/system/lib/libImath-2_2.so.23.0.0
 5126 0x0171b000 0x01727000     0          0 /boot/system/add-ons/Translators/BMPTranslator
 5127 0x020a6000 0x020b1000     0          0 /boot/system/add-ons/Translators/STXTTranslator

I already see that all translators are loaded, which seems not really needed. That would save quite a lot on both memory usage and loading time. The translation kit is used to load the logo, which we could instead convert to raw BBitmap data to not need the translation kit. Or, we could make it so that the translation kit does not load all translators when a specific format is asked for the translation (AboutSystem knows and specifies that the logo is in PNG format).

the next contributor to memory use is probably libicudata, which is 20MB on its own. There are also two versions of it for gcc2 and gcc8. We could try to use a .dat file instead which could be shared by the two architectures, however I don't know if that will still work when mixing icu version (the format of the files changed sometimes, so we'll have to review this for each version we want to update to if we go this way). We can also probably remove a few things from the lib/datafile to save space and memory (see http://userguide.icu-project.org/icudata).

comment:3 by X512, 4 years ago

Maybe fork can be replaced by load_image in jam?

comment:4 by diver, 4 years ago

Thanks for detailed explanation! I see there are quite a few optimization opportunities. Should we create a ticket for each of them. Maybe with a common keyword "optimization".

comment:5 by korli, 4 years ago

Theorically fork should map the shared libraries from the same physical memory in multiple processes (read-only segments), thus wouldn't require more physical memory.

comment:6 by pulkomandy, 4 years ago

Yes, for the read-only segments the data can be shared.

But for the read-write ones, it still needs to reserve some physical address space (mapped to either physical RAM, or swapfile space) so that it can do the copy-on-write as needed.

listimage does not show the read-write parts indeed. For jam that is where most of the memory space usage is. For AboutSystem, possibly not, but we would need to check.

For example, the locale kit currently loads the catalogs using fread(), so they end up in read/write space. We could save maybe a few kilobytes by moving that to read-only space (maybe by making the catalogs be .so files, or BResources, or loading them using mmap instead, for example). More detailed analysis is needed to see which apps are using the RAM and how. But in any case, with the way Haiku manages memory, it's unlikely that you can reach 100% RAM usage unless you have a quite large swap file (which allows the system to know it can always swap things out of main memory to the file if needed to make space for another running program). It's a design choice we made to tell applications that we are out of memory when they request memory, and not sometime later. This means malloc, vfork, ... can fail more easily than on Linux. Linux by default almost never returns false from malloc, but if you try to access the allocated memory and it turns out there is no physical memory to allocate at the time you access it, it will kill the process using most RAM to free some space, leaving no chance for proper error handling. We don't think this is a suitable approach on a desktop OS. It's better to let the user decide which apps it can close, and leave them a chance to first save their work to disk.

comment:7 by pulkomandy, 4 years ago

In btrev43125 jam was modified to use posix_spawnp, but only on Linux.

In hrev51418 Haiku received a posix_spawnp implementation.

It looks like we could enable use of posix_spawnp on Haiku as well then?

comment:8 by pulkomandy, 4 years ago

Please try https://review.haiku-os.org/c/buildtools/+/2798 and let us know if that's better (you have to rebuild jam using this change).

To rebuild jam from inside Haiku:

cd buildtools/jam
make
cp bin.haikux86/jam /system/non-packaged/bin
Last edited 4 years ago by pulkomandy (previous) (diff)

comment:9 by waddlesplash, 4 years ago

That will not help; our posix_spawn is implemented using fork().

comment:10 by X512, 4 years ago

Can posix_spawn be implemented using load_image?

comment:11 by pulkomandy, 4 years ago

Not really.

  • posix_spawn creates a child process, I think load_image creates an independant process.
  • posix_spawn allows to keep open file descriptors for the child process, load_image doesn't
  • posix_spawn can change process group, signal mask, and a few other things for the newly created team, which can't be done with load_image.

They could maybe be implemented using some shared code and common syscalls, but it's not as simple as one calling the other.

comment:12 by X512, 4 years ago

load_image also create child process and inherit IO context: https://git.haiku-os.org/haiku/tree/src/system/kernel/team.cpp#n1680.

comment:13 by waddlesplash, 4 years ago

I previously tried to do this: https://review.haiku-os.org/c/haiku/+/1752

korli pointed out the io_context problem then, but I did not check to see that he was actually correct. So maybe that can be revived indeed.

by X512, 4 years ago

Jam patch.

comment:14 by diver, 4 years ago

I applied the Jam patch and it seemed to improve things. I managed to build anyboot iso with 1.5GB RAM and the same swap size.

comment:15 by diver, 4 years ago

I reduced RAM to 1GB (leaving 1.GB swap on) and that worked too. Great!

comment:16 by waddlesplash, 4 years ago

Component: System/KernelSystem/POSIX
Resolution: fixed
Status: newclosed

jam changed to use posix_spawn on Haiku in btrev43157, and Haiku's posix_spawn implementation changed to use load_image in the general case in hrev54278. HaikuPorts Jam will need to be updated, however.

comment:17 by nielx, 4 years ago

Milestone: UnscheduledR1/beta2

Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone

(final time)

Note: See TracTickets for help on using tickets.