Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#9632 closed bug (fixed)

Shutdown/restart sometimes slow/never completes on latest builds

Reported by: Kev Owned by: nobody
Priority: normal Milestone: R1
Component: System Version: R1/Development
Keywords: Cc: Jens.Arm@…
Blocked By: Blocking: #9667
Has a Patch: no Platform: x86

Description

The was on hrev45413 gcc4hybrid until a couple nights ago, when I upgraded to the latest nightly, and then today upgraded again, now to hrev45444 gcc4hybrid. These last two have often, but not every time, had shutdowns proceed very slowly, with each item taking 2-3 seconds to shut down, until stopping on a seemingly random service (two examples, midi_server and cddb_daemon) for over a minute until I gave up and just chopped the power. I've never had this under Haiku or BeOS before on this machine. At least once though on hrev45444 I've had the shutdown proceed quite quickly as it normally does. Did something change between 45413 and 45444 in the shutdown sequence?

Attachments (1)

TestRechner1.png (36.6 KB ) - added by jahaiku 6 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 by SeanCollins, 6 years ago

I have been afflicted with this for sometime. Sometimes I get a clean shut down, sometimes not. Subscribing.

comment:2 by Premislaus, 6 years ago

I have similar observations. Closing each of the services, sometimes takes tens of seconds. This is a random and rare. This occurs for a long time - surely from alpha3. In the syslogs I saw nothing.

comment:3 by Kev, 6 years ago

One thing that changed for me is my install method. Formerly I had been reformatting my nightly partition, then using Installer. Now I rm -r and cp -r. I don't think it should make a difference but I thought I should mention it. Is that true for either of you?

comment:4 by Kev, 6 years ago

In case hardware is a common factor:

CPU: AMD Sempron™ 3100+ Processor (64-bit) (256KB L2 cache, 1.80GHz, 1600MHz FSB) Chipset: VIA K8M800 Memory: 256MB DDR SDRAM (1 × 256MB), 400MHz (PC3200) Expandable to 2GB (expanded to 512MB) Hard Drive: 100GB (7200rpm, 2MB cache) Optical Drive: 48x CD-RW/DVD Combo Drive Media Reader: 8-in-1 digital media manager (Secure Digital (SD), Smart Media, Micro Drive, Memory Stick, Memory Stick Pro, Compact Flash, Mulitimedia Card, USB 2.0) Video: S3 Graphics Unichrome™ Pro 64MB DDR Shared memory, AGP 8x slot available for upgrade (upgraded to Radeon 9200 128 MB AGP) Sound: AC '97 Audio, Dolby 5.1 (6-channel) Network: 10/100Mbps integrated Ethernet LAN Ports/Other: 5 USB 2.0 (4 in back, 1 in Media Reader), 1 VGA external connector, 1 serial, 1 parallel, 2 PS/2, 5 audio ports (2 in front, 3 in back)

comment:5 by jahaiku, 6 years ago

Cc: Jens.Arm@… added

I have this, too. Since one or two weeks. Every time I change into the kernel debugger then there is the "w>Twitcher"-Thread running with 100%-CPU. I attach a screenshot of this.
I think this could be started at around hrev45422 with the last net or deskbar changes, but I could be wrong.

Last edited 6 years ago by jahaiku (previous) (diff)

by jahaiku, 6 years ago

Attachment: TestRechner1.png added

comment:6 by Kev, 6 years ago

The latest time it happened, it did actually complete the shutdown, but each item took 1-3 seconds.

Last edited 6 years ago by Kev (previous) (diff)

comment:7 by Kev, 6 years ago

I also noticed that sometimes even the icon for the current item takes a second to appear, and then a couple more seconds before it moves on to the next item.

My backtrace looked very similar.

Last edited 6 years ago by Kev (previous) (diff)

comment:8 by Kev, 6 years ago

Summary: Shutdown sometimes slow/never completes on latest buildsShutdown/restart sometimes slow/never completes on latest builds

Confirmed this (the slow shutdown, not the never-completing shutdown) sequence also happens for a restart command. Being unfamiliar with the shutdown process, I wasn't sure if that would make any difference, but there it is.

comment:9 by phoudoin, 6 years ago

If you can, try to temporarely replace in your hrev45444 system Deskbar app by the old one from an hrev45413 or sooner image.

If shutdown works now fine everytime, it's indeed something introduced recently, most probably related to loops optimization in Switcher.cpp (my bet).

If not, then, Houston, we got a problem... somewhere else.

comment:10 by jahaiku, 6 years ago

I can see an increased occurence of the slow shutdown, if there is some network work running in the background, like dhcp, if no cable is connected.
I installed a time ago a build from before the nfs4 commit and this worked without problems on real HW and in VBox.

in reply to:  9 comment:11 by Premislaus, 6 years ago

Replying to phoudoin:

If you can, try to temporarely replace in your hrev45444 system Deskbar app by the old one from an hrev45413 or sooner image.

If shutdown works now fine everytime, it's indeed something introduced recently, most probably related to loops optimization in Switcher.cpp (my bet).

If not, then, Houston, we got a problem... somewhere else.

https://dev.haiku-os.org/ticket/9632#comment:2

I saw it on the alpha 3 and 4, on the many nightly builds. That's all I remember.

I did not report it, because it is random, I was not able to reproduce. The only clue is that previously I had a lot hangs and crashes the system and applications available on Haiku.

comment:12 by jahaiku, 6 years ago

Some more info:

  1. If I assign a static IP it occurs rarly
  2. If I remove all network interfaces in VBox I can not reproduce it, until now
Last edited 6 years ago by jahaiku (previous) (diff)

in reply to:  9 comment:13 by Kev, 6 years ago

Replying to phoudoin:

If you can, try to temporarely replace in your hrev45444 system Deskbar app by the old one from an hrev45413 or sooner image.

If shutdown works now fine everytime, it's indeed something introduced recently, most probably related to loops optimization in Switcher.cpp (my bet).

I did this, and it seemed to fix things. Five restarts in a row that were lightning fast. That's a sequence I hadn't been getting lately--it was more like 60-80% of the time it would be slow.

in reply to:  12 comment:14 by Kev, 6 years ago

Replying to jahaiku:

Some more info:

  1. If I assign a static IP it occurs rarly
  2. If I remove all network interfaces in VBox I can not reproduce it, until now

When I read this I wondered if it would apply to me, and when shutting down before trying anything different, it needed a manual power down (stopped on power_server, with icon showing, indefinitely), and just before I did that, I happened to glance at the router, and saw that there was still some network activity happening on this wire.

I'm on real hardware, and today 1. seemed to also make it go away--5 restarts in a row (with the hrev45453 gcc4h Deskbar put back in place) were lightning-fast. I switched to DHCP and the next restart was slow. Next couple were fast, and the next one after that was slow again.

So are there two different bugs here?

comment:15 by Kev, 6 years ago

Along the "two bugs" line, I just had my Internet connection stop working (under DHCP and back on hrev45413's Deskbar but everything else hrev45453). I killed WebPositive because it was getting unresponsive (and the status bar said it was loading pages from tabs I had closed). Still couldn't get it to go. So, I went to network settings and switched to static. Bingo, everything works again, no restart required. So maybe there's a DHCP bug of some kind, and in a separate way, Deskbar does not deal with the results of that bug so well?

in reply to:  9 comment:16 by jahaiku, 6 years ago

Replying to phoudoin:

If you can, try to temporarely replace in your hrev45444 system Deskbar app by the old one from an hrev45413 or sooner image.

If shutdown works now fine everytime, it's indeed something introduced recently, most probably related to loops optimization in Switcher.cpp (my bet).

If not, then, Houston, we got a problem... somewhere else.

As in the backtrace of the "w>Twitcher" Message-Handling functions are written and I realized a increased slow shutdown count while networking things are running, I searched in the commit log for changes in the Messages. There are changes:
http://cgit.haiku-os.org/haiku/commit/?id=6de478363ee34b8d265dc2b763c031dda9e6fec8 http://cgit.haiku-os.org/haiku/commit/?id=a3a541eebd0b2ffe2e2ba30acf1c87d5ed9cf98f http://cgit.haiku-os.org/haiku/commit/?id=8163a8e0efcfe7279bf6890250fa98e44cbbcedf

Perhaps the first change with the reinit cause the Twitcher to run at 100% at shutdown?

comment:17 by jscipione, 6 years ago

Blocking: 9667 added

(In #9667) It seems that this bug is related to #9632. Deskbar settings not getting saved seems to happen always with a slow shutdown. The first image is fine, Deskbar initially creates a 0 size file on startup and then fills it out on quit. The second image shows the problem, the setting file didn't get filled out because of something related to #9632. The third image shows the setting file getting filled out correctly.

comment:18 by mmlr, 6 years ago

I've debugged this yesterday and the cause for the slow shutdown and Deskbar not quitting (and therefore not saving its settings) is an endless loop in _FindNextValidApp() (as pointed to by the attached screenshot).

What happens is that fCurrentIndex and with it startIndex becomes -1 when the last application from the list is quit (see B_SOME_APP_QUIT, which also causes CycleApp() and indirectly _FindNextValidApp() to be called). The value of fCurrentIndex is clamped to 0..max, where max is 0 as well, not finding any further valid apps (as there are none in the list anymore) and never triggering the startIndex (-1) == fCurrentIndex (0) abort condition either => endless loop.

Why this happens only now is due to a change in the content of the list. In hrev45422 TBarApp::AddTeam() was modified to not add background apps and the Deskbar itself to the list of BarTeamInfos anymore. This is significant because the subscribers (ExpandoMenuBar and Switcher) now don't get messages for these anymore either. The list of apps in the Switcher is therefore now shorter than before (missing the background apps that would usually be there), making the above situation possible. I haven't found an obvious reason why AddTeam() shouldn't filter out background apps and the Deskbar, as all subscribers seem to filter them out as well (this redundant individual filtering should now be removed though). This is just an explanation for why the situation did not arise previously.

The easiest solution would probably be to change _FindNextValidApp() to check for an empty list and return false directly in that case. Note that OKToUse() could now be removed and replaced by simple NULL checks, as the list now only includes non-background and non-Deskbar apps.

comment:19 by jscipione, 6 years ago

Thanks mmlr! Fix coming Real Soon Now™. I figured out a 100% reliable way to trigger this bug (in case it ever needs to be tested again). Open Terminal, then Vulcan Death Grip Tracker. Then issue the shutdown command from Terminal. Shutdown will proceed very slowly.

comment:20 by jscipione, 6 years ago

Resolution: fixed
Status: newclosed

Fixed in hrev45512 in large part due to mmlr's detective work.

comment:21 by jscipione, 6 years ago

Implemented the second part of the fix suggested in comment 18, eliminating OKToUse(), in hrev45524.

Note: See TracTickets for help on using tickets.