Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#12258 closed bug (fixed)

[launch_daemon] increased boot time

Reported by: diver Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: Servers/launch_daemon Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

hrev49474.

It looks like boot time has increased after switching to launch_daemon. I can't tell for sure but it seems to spend several seconds at the rocket icon. I have a feeling app_server now starts longer.

Attachments (7)

system_time (10.6 KB) - added by mmlr 4 years ago.
system_time binary for x86_gcc2
boot_time_logger.sh (834 bytes) - added by mmlr 4 years ago.
boot_time_logger script
LD_boot_time.txt (90 bytes) - added by vidrep 4 years ago.
PLD_boot_time.txt (90 bytes) - added by vidrep 4 years ago.
Alpha_boot_time.txt (9 bytes) - added by vidrep 4 years ago.
previous_syslog (188.0 KB) - added by Giova84 3 years ago.
syslog (127.4 KB) - added by Giova84 3 years ago.

Download all attachments as: .zip

Change History (53)

comment:1 Changed 4 years ago by luroh

Boot times definitely took a hit here in VBox, I've been meaning to file a bug and provide some metrics but admittedly laziness got the better of me.

comment:2 Changed 4 years ago by pulkomandy

I'm not sure, we stay longer on the boot screen but after that we get immediately to the desktop with Tracker and Deskbar running already here. Before, the app_server would start earlier but then there would be just the empty desktop and mouse cursor for some time.

comment:3 Changed 4 years ago by diver

In VBox it spends about 9 seconds at the rocket icon whereas it took BeOS only 6 seconds to fully boot from bootman :P So I think it's just too much. Do we have a boot profiler somewhere?

comment:4 Changed 4 years ago by pulkomandy

For the comparison to be fair, you need to build the kernel in release mode and without serial output. Not sure about boot profilers, maybe we should add timestamps to the syslog?

comment:5 Changed 4 years ago by axeld

There is an option to add timestamps to the syslog already. However, we also have a profiler for the boot process; I even made it work again before doing the launch_daemon changes. See this commit: http://cgit.haiku-os.org/haiku/commit/?id=035e3e77ed4550c9e9e5d932d02125be0d80d04c

Basically, you copy the 'build/config_headers/kernel_debug_config.h' to 'build/user_config_headers', and set SYSTEM_PROFILER to 1. From the running system you can then call profile -r to get the data. This can then be analyzed using different tools.

BTW, it's totally expected that the time spent on the rocket icon will be longer now. However, the total time to have the system booted should be reduced. If that's not the case, we should find out what's going on :-)

comment:6 Changed 4 years ago by diver

Just updated Haiku to launch_daemon in VMware Fusion. Boot time increased from 6 seconds (yeah, like in BeOS!) to 11 seconds. So there's definitely a regression :/

comment:7 Changed 4 years ago by waddlesplash

Takes 19 seconds to go from the VirtualBox logo to a fully-booted Haiku (Tracker/Deskbar running) here. Took about half that before launch_daemon. So yes, there is most certainly a regression.

comment:8 Changed 4 years ago by vidrep

Boot time for pre-launch daemon 16 seconds Boot time for post-launch daemon 25 seconds This is on real hardware, both installs sharing same hard drive on separate partitions. Compared to Windows, this is still pretty good, so not a big deal about a few extra seconds.

comment:9 Changed 4 years ago by waddlesplash

If we can at all possibly help it, I'd prefer we didn't stop at "pretty good" and try to get all the way to "great" or beyond. :)

comment:10 Changed 4 years ago by axeld

How much differs the measurement when you stop once the HD stops working?

The original Bootscript launched a number of services only after Tracker. The launch_daemon no longer does this, and massively parallelizes the boot sequence. I think 9 seconds are a huge slow down, so it'd be worth investigating whether the launch_daemon is at fault, or the lower layers of Haiku slow down that much when they have to work in parallel.

In any case, I have a short pause during the boot, and that's probably not a good thing :-)

comment:11 Changed 4 years ago by waddlesplash

and massively parallelizes the boot sequence

It should only launch as many jobs at once as there are cores -- launch, wait 200ms or so for that to cool down, etc. Right?

comment:12 in reply to:  11 Changed 4 years ago by bonefish

Replying to waddlesplash:

and massively parallelizes the boot sequence

It should only launch as many jobs at once as there are cores -- launch, wait 200ms or so for that to cool down, etc. Right?

If the boot process was CPU bound, reducing the number of active threads would make some -- though I doubt the performance hit with more active threads would be relevant. However, I think the boot process is mainly I/O bound, so the number of cores/hardware threads probably isn't a useful criterion.

comment:13 Changed 4 years ago by mmlr

At least a 5 second boot delay was fixed in hrev49561. I'm unsure what should happen with this ticket, as other concerns have been raised within the comments and not all of the increase in boot time is explained by the 5 second timeout.

comment:14 Changed 4 years ago by vidrep

I updated to hrev49591 and the boot and shutdown seemed to be faster. Just to make sure, I timed the process, and was indeed faster by about 4 seconds (comparable to pre-launch daemon times).

Last edited 4 years ago by vidrep (previous) (diff)

comment:15 Changed 4 years ago by vidrep

The previous post was for 64 bit. After updating the x86_gcc2 build to the same rev, I noted that boot and shutdown are considerably slower by several seconds.

comment:16 Changed 4 years ago by diver

Could you please time x86_gcc2 boot/shutdown time for x86_gcc2 then?

comment:17 Changed 4 years ago by vidrep

I did a series of boots for Alpha 4.1, 32 bit and 64 bit. To be consistent, I decided the system to be booted as soon as the blue screen appears. Alpha 4.1 is a consistent 12 second boot. Next up is 64 bit at 18 seconds. 32 bit took 21 seconds. 32 bit would not shut down at all on two consecutive attempts. This issue is already tracked in another ticket. Otherwise, shutdown times are comparable for all three builds.

comment:18 Changed 4 years ago by vidrep

My earlier attempt at timing were probably based on a different criteria of when the system was considered to be fully booted up. I guess to be consistent in testing from now on I'll use the criteria from the last series I.e. The system is fully booted as soon as the blue screen appears.

comment:19 Changed 4 years ago by vidrep

So, it appears that launch daemon boot times are still lagging boot script times on the order of 6 to 9 seconds.

comment:20 Changed 4 years ago by diver

Please note comment:2. Testing agains A4.1 is irrelevant for this ticket as this release happened to long ago. What's important here is testing against pre-launch_daemon build.

P.S. Axel, is it possible to return the previous behaviour of app_server and show cursor ASAP. This way the overall feeling of boot speed increases dramatically.

comment:21 Changed 4 years ago by vidrep

Ok. I'll install a pre-launch daemon build and try a series of boots using the same criteria as the last series of tests.

comment:22 Changed 4 years ago by vidrep

The results of testing pre-launch daemon (hrev49408 x86_gcc2) boot times against the most current revision (hrev49591 x86_gcc2) did yield boot times that were about 2 or 3 seconds faster. I got consistent boot times of 18 -19 seconds, compared to 21 - 22. For whatever reason, 64 bit boots faster than either of these. Out of curiosity, I'll try a comparison of 64 bit boot times tomorrow.

comment:23 in reply to:  17 Changed 4 years ago by mmlr

Replying to vidrep:

I did a series of boots for Alpha 4.1, 32 bit and 64 bit. To be consistent, I decided the system to be booted as soon as the blue screen appears.

Unfortunately this isn't a good criteria at all. The introduction of the launch_daemon made the (userland) boot process more parallel. This means more servers are started at the same time. While this may make the individual server start up more slowly (simply due to more happening in the system at the same time), the overall boot time, i.e. the time it takes until all servers are ready, may still be reduced.

The blue screen may therefore appear later because the app_server init takes longer. However at the point it does appear, all the other servers have started as well and may even be ready completing the boot process. Before the launch_daemon the app_server would be started early and the blue screen appeared early due to that, all the other servers had to wait for this to happen though and would only be started then, extending the boot process away from the blue screen appearing.

There currently is no reliable way to determine when the system is fully booted. Making the criteria "once all servers are launched" will for example strongly favor the launch_daemon against the previous boot method because it launches them all at the same time. This doesn't mean that the launched servers are actually usable at that point though (the blue screen isn't necessarily shown yet at that point for example), making this criteria bogus.

One of the last things to happen in either boot method is the UserBootScript. It's still far from ideal as in the previous method it is launched in parallel with non-essential servers but after initial volumes are mounted, while with the launch_daemon it is launched in parallel to the initial volume mounting. Still it'd probably be the easiest way to get a more or less useful number out of this by putting a symlink to AboutSystem into "/boot/home/config/settings/boot/launch" (which will cause it to be launched when the UserBootScript runs) and take a look at the uptime when it appears.

Another measure could be to wait for the system to become idle after boot (monitoring the CPU use in ProcessController for example). This still has the problem for cases where the startup of a server isn't CPU but IO bound, but in the normal case it should be reasonably accurate.

comment:24 Changed 4 years ago by pulkomandy

one can use alert "uptime" instead of starting AboutSystem, as the latter updates the time in its window. Or just wait for the HDD led to stop blinking.

comment:25 Changed 4 years ago by diver

Or even add uptime >> ~/uptime.log at the end of the UserBootScript.

comment:26 in reply to:  25 Changed 4 years ago by mmlr

Replying to diver:

Or even add uptime >> ~/uptime.log at the end of the UserBootScript.

Yes, very clever, except that "up 0 minutes" doesn't really tell you much...

As another note: As the launch_daemon starts things in parallel, it profits from multiple CPUs. When running things in a single CPU VM the results between pre- and post-launch_daemon should be similar.

I've tested this over here with a gcc2 image before and after the launch daemon in kvm and the single CPU configuration was the same (10 vs. 10 seconds) while with 4 emulated CPUs (on a quad-core host) the results with the launch_daemon were better (11 vs. 12 seconds). I attribute the slowdown with more cores to emulation overhead and slow CPU bootstrap. The numbers are so similar that it's hard to tell if the launch_daemon helps things, it certainly doesn't make things drastically worse for me though. This will likely vary with hardware/emulation software used though.

A possible explanation for the 64 bit builds being quicker (apart from the better compiler and instruction set) could also be that there just aren't all drivers present.

comment:27 Changed 4 years ago by vidrep

Until a criterion is established for what constitutes "fully booted", then any further testing is just wasted time. I'm beginning to suspect that we may not have a problem here at all. Is there some method whereby the system itself can be automatically timed during the boot process and logged? Of greater concern to myself is the intermittent "no shutdown" problem, which manifested itself in recent builds.

comment:28 Changed 4 years ago by mmlr

I've added the system_time bin command which prints system_time() (the uptime in microseconds) and a script for determining the "boot complete" point in hrev49598. I'll also attach these to this ticket (for x86_gcc2) as on older hrevs they will obviously not be available. The established criterion is "all servers (and Tracker & Deskbar) are started and respond to messages", which is probably as close as one can get to the definition of a usable system. Note that this isn't super precise either as the used waitfor has a generous 100ms delay loop when waiting, but it should be reasonably close.

To take measurements, put the system_time binary and boot_time_logger.sh script into /boot/home/config/non-packaged/bin and add the following line to /boot/home/config/settings/boot/UserBootScript:

boot_time_logger.sh >> /boot/home/Desktop/boot_time.txt

After each reboot a new line should appear in that file on the desktop telling the uptime after the script waited for a reply of all servers.

With the results we should be able to finally decide if this ticket can be closed.

A side note regarding the shutdown problem: The launch_daemon is definitely innocent in this case, as it doesn't change anything in that regard. I have observed a regular crash in the BMediaRoster looper when the media_server quits during shutdown, which is a recent regression. In my configuration this does not prevent shutdown, but it might be different depending on timing.

Changed 4 years ago by mmlr

Attachment: system_time added

system_time binary for x86_gcc2

Changed 4 years ago by mmlr

Attachment: boot_time_logger.sh added

boot_time_logger script

comment:29 in reply to:  20 ; Changed 4 years ago by axeld

Replying to diver:

P.S. Axel, is it possible to return the previous behaviour of app_server and show cursor ASAP. This way the overall feeling of boot speed increases dramatically.

Nothing really changed there; the app_server will show the blue screen once an application tries to connect to it -- and that's also when the input_server is launched. On my system the time span between I last see the rocket icon, and a completely booted system (ie. moving cursor on Tracker with all windows open) is less than a second.

One way to improve on that would be to delay the blue screen until the first window opens. I've already started doing something like that in my "background" branch on github (https://github.com/axeld/haiku/tree/background). However, it's not yet complete.

comment:30 Changed 4 years ago by vidrep

To take measurements, put the system_time binary and boot_time_logger.sh script into /boot/home/config/non-packaged/bin and add the following line to /boot/home/config/settings/boot/UserBootScript: boot_time_logger.sh >> /boot/home/Desktop/boot_time.txt

I did as instructed. However, after a reboot, the "boot_time.txt" created on the desktop is empty.

comment:31 Changed 4 years ago by diver

You probably forgot to chmod +x /boot/home/config/non-packaged/bin/system_time.

comment:32 in reply to:  31 Changed 4 years ago by vidrep

Replying to diver:

You probably forgot to chmod +x /boot/home/config/non-packaged/bin/system_time.

The solution was to chmod the boot time logger script rather than the bin. Anyway, your tip helped solve the problem. Thanks!

comment:33 Changed 4 years ago by vidrep

Attached are my test results using hrev49605 x86_gcc2 and hrev49292 x86_gcc2 (pre-launch daemon). They both have virtually identical boot times. Just as an experiment I copied the same bin and boot script to Alpha 4.1, and uncommented #application/x-vnd.haiku-package_daemon in the boot script, since the package daemon was non-existent. I attached this as well simply for sake of comparison. LD_boot_time (launch daemon boot time) PLD_boot_time (pre-launch daemon boot time) Alpha_boot_time (Alpha 4.1 boot time)

Changed 4 years ago by vidrep

Attachment: LD_boot_time.txt added

Changed 4 years ago by vidrep

Attachment: PLD_boot_time.txt added

Changed 4 years ago by vidrep

Attachment: Alpha_boot_time.txt added

comment:34 in reply to:  29 Changed 4 years ago by diver

Replying to axeld:

Nothing really changed there; the app_server will show the blue screen once an application tries to connect to it -- and that's also when the input_server is launched. On my system the time span between I last see the rocket icon, and a completely booted system (ie. moving cursor on Tracker with all windows open) is less than a second.

One way to improve on that would be to delay the blue screen until the first window opens. I've already started doing something like that in my "background" branch on github (https://github.com/axeld/haiku/tree/background). However, it's not yet complete.

Delay the blue screen? I was meaning exactly an opposite :) I meant if it was possible to show blue screen with cursor faster (as was with Boostscript). Quoting PulkoMandy: "we stay longer on the boot screen but after that we get immediately to the desktop with Tracker and Deskbar running already here. Before, the app_server would start earlier but then there would be just the empty desktop and mouse cursor for some time."

comment:35 Changed 4 years ago by pulkomandy

But that was already fixed, at least for me, in hrev49561. I get the empty blue screen again. Now most of the time spent on boot on my machine is on the "disk" icon, which I think is before launch daemon is even started. It is longer if I have more disks connected to the system (audio CD inserted, SD card, USB drives, etc); but this is not related to the launch daemon.

So I think we can close this ticket now and try to optimize the early parts of the boot process to get it back to alpha4 speed. According to vidrep's benchmarks the launch daemon only causes a 200ms increase in boot time, and I guess this is with only one CPU?

comment:36 in reply to:  35 ; Changed 4 years ago by mmlr

Resolution: fixed
Status: newclosed

Replying to pulkomandy:

So I think we can close this ticket now and try to optimize the early parts of the boot process to get it back to alpha4 speed. According to vidrep's benchmarks the launch daemon only causes a 200ms increase in boot time, and I guess this is with only one CPU?

Yes, let's close this.

Note that the alpha4 had serial debugging disabled by default, so if these tests were all with default settings, it's fully expected to have a boot time difference of several seconds. Local tests showed that boot time dropped from the above mentioned 10 to somewhat over 6 seconds when serial debugging was disabled, so it can be quite dramatic (because serial output is really slow and there's a lot of it, sometimes more and sometimes less depending on the drivers involved).

So for the sake of comparison it'd be interesting to have the boot times of a post launch_daemon build with serial debugging disabled in /boot/home/config/settings/kernel/drivers/kernel.

comment:37 in reply to:  36 Changed 4 years ago by bonefish

Replying to mmlr:

Note that the alpha4 had serial debugging disabled by default, so if these tests were all with default settings, it's fully expected to have a boot time difference of several seconds.

The comparison with alpha 4 is also not quite fair, because, as had been reported, the introduction of package management slowed down the boot several seconds. One reason for that is probably the additional serial debug output. The hope was that the packaged system would actually boot faster, due to the compressed code increasing the effective I/O throughput. However, under virtualization the host I/O caches negate that effect -- I/O will be very fast (after the first boot) while the decompression time slows down the packaged boot. And unfortunately the current compression algorithm used is relatively slow -- switching to a faster one (e.g. LZ4) would help quite a bit.

comment:38 Changed 3 years ago by Giova84

Resolution: fixed
Status: closedreopened

Hi,

I see that this ticket is closed as fixed, but I experienced an evident boot delay with hrev50067, where the launch_daemon is present. The last time that i run Haiku on my computer (Pentium dual core E5300 @ 2.60GHz - 6 GB DDR3 of ram) was about one year ago, and the boot time was fairly quick (on my system I have the same hpkgs and the same settings). But now, on hrev50067, I noticed that the boot screen will delay for about 12 seconds at the disk icon.This delay is also occurred when I reinstalled Haiku, so without any hpkgs and without startup items.
and is never happened for me the last time that I booted Haiku one year ago.

EDIT: the serial debug output is disabled in the kernel file.

I attempted to run the boot_time_logger, then I opened the boot_time log and there was only a message which stated "Failed to get a reply for application/x-vnd.Haiku-cddb_daemon"; in facts the cddb_daemon is not currently active on my system, although that is present in the file /boot/system/data/launch/system

Maybe is due to this bug ticket:12634 ?

Last edited 3 years ago by Giova84 (previous) (diff)

comment:39 in reply to:  38 Changed 3 years ago by Giova84

Replying to Giova84:

I attempted to run the boot_time_logger, then I opened the boot_time log and there was only a message which stated "Failed to get a reply for application/x-vnd.Haiku-cddb_daemon"; in facts the cddb_daemon is not currently active on my system, although that is present in the file /boot/system/data/launch/system

Well, i removed the reference to the cddb_daemon from the boot_time_logger and now I register the boot time: it takes 23342952 microseconds, which are 23,342952 seconds. Is absolutely too much compared to my previous installation from one year ago.

comment:40 Changed 3 years ago by pulkomandy

It could be something else than the launch_daemon, for example, booting is slower currently if you have a lot of things in /system/packages/administrative. Maybe try cleaning the old "state" directories there?

comment:41 Changed 3 years ago by humdinger

@Giova84: It may very well be related to the issue you reported with #12634, i.e. something AHCI related keep filling up your syslog.

comment:42 in reply to:  40 Changed 3 years ago by Giova84

Replying to pulkomandy:

It could be something else than the launch_daemon, for example, booting is slower currently if you have a lot of things in /system/packages/administrative. Maybe try cleaning the old "state" directories there?

I cleaned the old state directories but this doesn't changed anything about the boot time. However I also noticed that if I also boot Haiku hrev50067 from the live cd (in the live cd there is no old state directories :-) ) , occurs the identical issue at the boot screen: the disk icon hangs for a lot of seconds. I didn't see this behaviour before on my system.

comment:43 in reply to:  41 Changed 3 years ago by Giova84

Replying to humdinger:

@Giova84: It may very well be related to the issue you reported with #12634, i.e. something AHCI related keep filling up your syslog.

There is a way to temporarly disable the syslog? In the /boot/home/config/settings/kernel/drivers/kernel file I just disabled the serial debug output.

Changed 3 years ago by Giova84

Attachment: previous_syslog added

Changed 3 years ago by Giova84

Attachment: syslog added

comment:44 Changed 3 years ago by Giova84

However, for investigation reasons, I attached the syslog file and the previous_syslog file (the second one due to the fact that the plain syslog is spammed with the same messages).

comment:45 Changed 3 years ago by axeld

Resolution: fixed
Status: reopenedclosed

The launch_daemon is started when the rocket icon lits up.

comment:46 in reply to:  38 Changed 3 years ago by Giova84

Replying to Giova84:

Hi,

I see that this ticket is closed as fixed, but I experienced an evident boot delay with hrev50067, Maybe is due to this bug ticket:12634 ?

Well, the bug described in ticket:12634 and in ticket:12415 seems the culprit of this boot delay. As I've said in ticket:12415#comment:13 I tried with an old revision (hrev49662) and Haiku is booted fast as usual.

Note: See TracTickets for help on using tickets.