Opened 11 years ago
Closed 10 years ago
#10694 closed bug (duplicate)
Somewhat reproducible KDL: power_daemon in wait_for_objects()
Reported by: | ttcoder | Owned by: | axeld |
---|---|---|---|
Priority: | low | Milestone: | R1 |
Component: | Servers/power_daemon | Version: | R1/Development |
Keywords: | Cc: | mmlr | |
Blocked By: | #11098 | Blocking: | |
Platform: | All |
Description
Collected a lot of data over several weeks but here's the highlights..
- it takes just a few steps after boot-up to trigger this: open FileTypes, 4 Tracker windows and 4 Terminals in a certain order, then invoke "Restart computer".
- this bug occurs easily on this hrev (early PM), presumably because that particular hrev has a bug where it eats up 100% CPU in kernelland, which was fixed a few days later. Maybe the extra pressure exhibits an otherwise tricky bug ?
- this bug is VERY finicky and shy -- using the exact same steps on the same hrev but on a different computer (Asus F2A55M based) I completely failed (so far) in reproducing it over there as it does over here. Can post the list of steps I use if somebody wants to try them out but my guts tell me it will be very difficult for others to see this KDL :-/
- cold boot affinity: seems high.. This KDL is easy to reproduce after a cold boot, and sometimes you can get a "series" of KDLs at each bootup for a while. But once you break you lucky streek and get one trouble-free correct reboot, the thingie is gone and won't trigger any more.
- all in all, I'm wondering if this can be useful maybe in relating this to other, more annoying, KDLs? I look forward to some diagnosing with haiku devs on this, try to determine which component is at fault, and if this kernel panic could possibly be related to other ones, which are as difficult (or more difficult) to reproduce.. Or if to the contrary, I should stop working this one out and we can archive this because it's not useful in improving current/more recent hrevs..
Attachments (2)
Change History (11)
by , 11 years ago
Attachment: | powerdaemon_kdl.jpg added |
---|
comment:1 by , 11 years ago
Making the steps available for others to try this out, as I've managed to reproduce this on another machine for the first time yesterday, so I assumed wrong it thinking nobody else could ever see this, I just hadn't tried hard enough.
Again, this pretty much requires a ca. 46368 nightly as this is the one that exposes a vulnerability in the kernel. The ones before did not have the CPU-hog code, and neither did the ones soon after (would appreciate confirmation of my working theory from the haiku head-honchos: revisions around this one exhibit kernel instability and sometimes also userland crashes; starting with hrev46370 korli fixed the 100% CPU usage kernel thread and it became much more difficult to expose this kernel flaw; am I right in assuming that hrev46370 does not fix any memory corruption related bug, it only fixes excessive CPU usage, hence the crashes in previous revs were possibly due to "pressure" put on the kernel ? That would be something worth investigating if so, as that might still be present in the current 47xxx revs, but tell me if my intuition is dead wrong again :-) (heck, maybe somebody could even try building a current nightly with current baseline code except for a reversal of hrev46370, to see if the 100% CPU usage pressure still crashes current nightlies; there's probably a multitude of other ways to induce 100% kernelland CPU usage too!)
=======
Ingredients:
- Tracker
- Terminal
- Filetypes
Recipe:
- cold boot
- wait for things to settle down (the Network notification window i.e.)
- Alt-Opt-F to open Filetypes 's main window
- click on the desktop, then Alt-N, then <enter> <enter> to open the created New Folder
- with the new folder selected, alt-N/enter/enter again; repeat until you have 4 windows open
- Alt-Opt-T to open a Terminal
- Alt-N, until you have 4 terminals open
- select the first created New Folder on the desktop, Alt-T to trash that folder. All its children windows disappear too.
- in quick succession, click Terminal 4 and Alt-W to close that terminal, same with 3 2 and 1.
- close Filetypes
- Deskbar > shutdown > restart (or power off)
With some luck you'll get the attached kernel panic.
by , 11 years ago
Attachment: | powerdaemon_kdl (AMD-Asus F2A55M).jpg added |
---|
Exact same panic except for a few pointers, this time on an AMD system
comment:2 by , 11 years ago
Do I understand correctly that the crash happens on shutdown/restart only?
I'm not sure the code is correct in case the power daemon is asked to quit while the event thread wait for events: we close the file descriptors and wait for the event thread to exit. It's well possible that there is something wrong in the kernel in this case.
BTW in both screenshots the right side of the screen is cropped, this is important to know the offset in common_wait_for_objects().
follow-up: 4 comment:3 by , 11 years ago
Possibly, at least that's how I trigger the panic 'easily'.
Well, other than at shutdown, I also get other symptoms though, hinting at something bad occuring even before shutdown. For instance if I start using this install normally (instead of doing the above 30 seconds "KDL recipe"), I get strange userland crashes in Terminal, WebPositive ... Sometimes ending up with a KDL (EDIT: at time of reboot reboot, never before IIRC), sometimes I can reboot after the W+ crash or Terminal crash without getting a KDL. Didn't collect real data on that aspect yet though.
Looking at the original/uncropped screenshots I see the offset is 0xb6 in all cases.
The full line reads
... (+120) 8008851e <kernel_x86> common_wait_for_objects(object_wait_info*: d27738a0, int32:2 uint32: 0, int64: 100000, false) + 0xb6
comment:4 by , 11 years ago
Replying to ttcoder:
Possibly, at least that's how I trigger the panic 'easily'.
Well it seems the acpi_button driver doesn't notify anyone on close(). It should probably also refuse select() requests when the fd is already closed, or is it already enforced by the driver API? A bit of locking would be needed too. Same for acpi_lid.
Well, other than at shutdown, I also get other symptoms though, hinting at something bad occuring even before shutdown. For instance if I start using this install normally (instead of doing the above 30 seconds "KDL recipe"), I get strange userland crashes in Terminal, WebPositive ... Sometimes ending up with a KDL (EDIT: at time of reboot reboot, never before IIRC), sometimes I can reboot after the W+ crash or Terminal crash without getting a KDL. Didn't collect real data on that aspect yet though.
Here I'm unsure it's really related.
Looking at the original/uncropped screenshots I see the offset is 0xb6 in all cases.
Thanks for the info.
comment:5 by , 11 years ago
Maybe this can be closed as no-changed-req? I won't try 46368 no more, and it's the only rev that ever showed that behavior.
comment:6 by , 10 years ago
Priority: | normal → low |
---|
follow-up: 9 comment:7 by , 10 years ago
While hrev48100 is still 'fresh' in our minds, maybe someone can give a quick glance to the common_wait_for_objects()
reference here, maybe it's the same that was fixed by Michael in 48100 and this ticket is a duplicate..
comment:8 by , 10 years ago
Cc: | added |
---|
comment:9 by , 10 years ago
Blocked By: | 11098 added |
---|---|
Resolution: | → duplicate |
Status: | new → closed |
Stack trace, hrev info