Opened 10 years ago

Closed 9 years ago

#10694 closed bug (duplicate)

Somewhat reproducible KDL: power_daemon in wait_for_objects()

Reported by: ttcoder Owned by: axeld
Priority: low Milestone: R1
Component: Servers/power_daemon Version: R1/Development
Keywords: Cc: mmlr
Blocked By: #11098 Blocking:
Platform: All

Description

Collected a lot of data over several weeks but here's the highlights..

  • it takes just a few steps after boot-up to trigger this: open FileTypes, 4 Tracker windows and 4 Terminals in a certain order, then invoke "Restart computer".
  • this bug occurs easily on this hrev (early PM), presumably because that particular hrev has a bug where it eats up 100% CPU in kernelland, which was fixed a few days later. Maybe the extra pressure exhibits an otherwise tricky bug ?
  • this bug is VERY finicky and shy -- using the exact same steps on the same hrev but on a different computer (Asus F2A55M based) I completely failed (so far) in reproducing it over there as it does over here. Can post the list of steps I use if somebody wants to try them out but my guts tell me it will be very difficult for others to see this KDL :-/
  • cold boot affinity: seems high.. This KDL is easy to reproduce after a cold boot, and sometimes you can get a "series" of KDLs at each bootup for a while. But once you break you lucky streek and get one trouble-free correct reboot, the thingie is gone and won't trigger any more.
  • all in all, I'm wondering if this can be useful maybe in relating this to other, more annoying, KDLs? I look forward to some diagnosing with haiku devs on this, try to determine which component is at fault, and if this kernel panic could possibly be related to other ones, which are as difficult (or more difficult) to reproduce.. Or if to the contrary, I should stop working this one out and we can archive this because it's not useful in improving current/more recent hrevs..

Attachments (2)

powerdaemon_kdl.jpg (184.5 KB ) - added by ttcoder 10 years ago.
Stack trace, hrev info
powerdaemon_kdl (AMD-Asus F2A55M).jpg (230.8 KB ) - added by ttcoder 10 years ago.
Exact same panic except for a few pointers, this time on an AMD system

Download all attachments as: .zip

Change History (11)

by ttcoder, 10 years ago

Attachment: powerdaemon_kdl.jpg added

Stack trace, hrev info

comment:1 by ttcoder, 10 years ago

Making the steps available for others to try this out, as I've managed to reproduce this on another machine for the first time yesterday, so I assumed wrong it thinking nobody else could ever see this, I just hadn't tried hard enough.

Again, this pretty much requires a ca. 46368 nightly as this is the one that exposes a vulnerability in the kernel. The ones before did not have the CPU-hog code, and neither did the ones soon after (would appreciate confirmation of my working theory from the haiku head-honchos: revisions around this one exhibit kernel instability and sometimes also userland crashes; starting with hrev46370 korli fixed the 100% CPU usage kernel thread and it became much more difficult to expose this kernel flaw; am I right in assuming that hrev46370 does not fix any memory corruption related bug, it only fixes excessive CPU usage, hence the crashes in previous revs were possibly due to "pressure" put on the kernel ? That would be something worth investigating if so, as that might still be present in the current 47xxx revs, but tell me if my intuition is dead wrong again :-) (heck, maybe somebody could even try building a current nightly with current baseline code except for a reversal of hrev46370, to see if the 100% CPU usage pressure still crashes current nightlies; there's probably a multitude of other ways to induce 100% kernelland CPU usage too!)

=======

Ingredients:

  • Tracker
  • Terminal
  • Filetypes

Recipe:

  • cold boot
  • wait for things to settle down (the Network notification window i.e.)
  • Alt-Opt-F to open Filetypes 's main window
  • click on the desktop, then Alt-N, then <enter> <enter> to open the created New Folder
  • with the new folder selected, alt-N/enter/enter again; repeat until you have 4 windows open
  • Alt-Opt-T to open a Terminal
  • Alt-N, until you have 4 terminals open
  • select the first created New Folder on the desktop, Alt-T to trash that folder. All its children windows disappear too.
  • in quick succession, click Terminal 4 and Alt-W to close that terminal, same with 3 2 and 1.
  • close Filetypes
  • Deskbar > shutdown > restart (or power off)

With some luck you'll get the attached kernel panic.

by ttcoder, 10 years ago

Exact same panic except for a few pointers, this time on an AMD system

comment:2 by korli, 10 years ago

Do I understand correctly that the crash happens on shutdown/restart only?

I'm not sure the code is correct in case the power daemon is asked to quit while the event thread wait for events: we close the file descriptors and wait for the event thread to exit. It's well possible that there is something wrong in the kernel in this case.

BTW in both screenshots the right side of the screen is cropped, this is important to know the offset in common_wait_for_objects().

comment:3 by ttcoder, 10 years ago

Possibly, at least that's how I trigger the panic 'easily'.

Well, other than at shutdown, I also get other symptoms though, hinting at something bad occuring even before shutdown. For instance if I start using this install normally (instead of doing the above 30 seconds "KDL recipe"), I get strange userland crashes in Terminal, WebPositive ... Sometimes ending up with a KDL, sometimes I can reboot after the W+ crash or Terminal crash without getting a KDL. Didn't collect real data on that aspect yet though.

Looking at the original/uncropped screenshots I see the offset is 0xb6 in all cases.

The full line reads

... (+120) 8008851e <kernel_x86> common_wait_for_objects(object_wait_info*: d27738a0, int32:2 uint32: 0, int64: 100000, false) + 0xb6
Version 0, edited 10 years ago by ttcoder (next)

in reply to:  3 comment:4 by korli, 10 years ago

Replying to ttcoder:

Possibly, at least that's how I trigger the panic 'easily'.

Well it seems the acpi_button driver doesn't notify anyone on close(). It should probably also refuse select() requests when the fd is already closed, or is it already enforced by the driver API? A bit of locking would be needed too. Same for acpi_lid.

Well, other than at shutdown, I also get other symptoms though, hinting at something bad occuring even before shutdown. For instance if I start using this install normally (instead of doing the above 30 seconds "KDL recipe"), I get strange userland crashes in Terminal, WebPositive ... Sometimes ending up with a KDL (EDIT: at time of reboot reboot, never before IIRC), sometimes I can reboot after the W+ crash or Terminal crash without getting a KDL. Didn't collect real data on that aspect yet though.

Here I'm unsure it's really related.

Looking at the original/uncropped screenshots I see the offset is 0xb6 in all cases.

Thanks for the info.

comment:5 by ttcoder, 10 years ago

Maybe this can be closed as no-changed-req? I won't try 46368 no more, and it's the only rev that ever showed that behavior.

Last edited 10 years ago by ttcoder (previous) (diff)

comment:6 by ttcoder, 10 years ago

Priority: normallow

comment:7 by ttcoder, 9 years ago

While hrev48100 is still 'fresh' in our minds, maybe someone can give a quick glance to the common_wait_for_objects() reference here, maybe it's the same that was fixed by Michael in 48100 and this ticket is a duplicate..

Last edited 9 years ago by ttcoder (previous) (diff)

comment:8 by anevilyak, 9 years ago

Cc: mmlr added

in reply to:  7 comment:9 by mmlr, 9 years ago

Blocked By: 11098 added
Resolution: duplicate
Status: newclosed

Replying to ttcoder:

While hrev48100 is still 'fresh' in our minds, maybe someone can give a quick glance to the common_wait_for_objects() reference here, maybe it's the same that was fixed by Michael in 48100 and this ticket is a duplicate..

Indeed it is. Thanks for the heads up!

Note: See TracTickets for help on using tickets.