Opened 5 years ago

Last modified 5 years ago

#10468 new bug

scheduler related KDL assert failed

Reported by: jscipione Owned by: pdziepak
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description (last modified by jscipione)

This KDL occurred while running the Icons screensaver overnight.

src/system/kernel/scheduler/scheduler_thread.h:268: stolenTime >= 0

KDL image attached

Attachments (4)

KDL running Icons.png (31.6 KB) - added by jscipione 5 years ago.
KDL assert failed while running Icons screensaver overnight
stolenTime.png (53.7 KB) - added by diver 5 years ago.
10468-assertions.diff (1.7 KB) - added by pdziepak 5 years ago.
additional assertion checks
syslog (121.3 KB) - added by diver 5 years ago.

Download all attachments as: .zip

Change History (16)

Changed 5 years ago by jscipione

Attachment: KDL running Icons.png added

KDL assert failed while running Icons screensaver overnight

comment:1 Changed 5 years ago by jscipione

Description: modified (diff)

comment:2 Changed 5 years ago by diver

Component: - GeneralSystem/Kernel

comment:3 Changed 5 years ago by diver

I'm also seeing this panic right after boot in vbox with 2 CPUs in hrev46759.

Changed 5 years ago by diver

Attachment: stolenTime.png added

comment:4 Changed 5 years ago by diver

Still happens in hrev47003.

Changed 5 years ago by pdziepak

Attachment: 10468-assertions.diff added

additional assertion checks

comment:5 Changed 5 years ago by pdziepak

Unfortunately, I am not able to reproduce this issue and reviewing the involved code does not show anything obviously wrong. I have attached a patch that adds numerous assertion checks to the code which, hopefully, would help to narrow down the issue.

comment:6 Changed 5 years ago by jscipione

Just a crazy thought due to the timing of the occurrence of this bug... could it be possible that the assert failed due to a time calculation that was skewed by the system clock changing due to DST?

comment:7 Changed 5 years ago by pdziepak

Not really, system_time() is monotonic (at least when invoked on the same logical processor). DST, time zones, etc affect only real_time_clock*() family of functions.

Changed 5 years ago by diver

Attachment: syslog added

comment:8 Changed 5 years ago by diver

Applied the patch. This syslog contains several crashes in scheduler related code.

comment:9 Changed 5 years ago by diver

ping

comment:10 Changed 5 years ago by pdziepak

Some of the assertion fails in the syslog really shouldn't happen (namely fail at timeUsed >= 0) and are very strange. However, it just occurred to me that these problems may have also been caused by normal threads running with B_IDLE_PRIORITY like in case of #10766.

Could you check whether the problem is still present on hrev47129?

comment:11 Changed 5 years ago by diver

So far I've reproduced timeUsed >= 0 assertion fail in hrev47140 in vbox with 2 cpu.

comment:12 Changed 5 years ago by jua

I'm seeing this issue as well, running hrev47380 in VirtualBox. Got both kinds, timeUsed >= 0 and stolenTime >= 0 failed asserts. I've never seen it happen when running directly on hardware, but since installing in virtualbox a few days ago, it occured several times already, so maybe it has to do with running in a VM... maybe something related to timing inaccuracy by the VM?

In all cases, the system was idling and not even being interacted with, the KDL was then waiting for me when I got back to it. I can provide KDL screenshots in case they are helpful (but I guess they don't show anything that's not already shown in the other attachments on this ticket).

Note: See TracTickets for help on using tickets.