Opened 7 months ago

Last modified 2 months ago

#16546 assigned bug

KDL, laptop no longer boots

Reported by: luroh Owned by: korli
Priority: high Milestone: R1/beta3
Component: System/Kernel Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: x86

Description

hrev54600, gcc2h

Laptop with i7-7500U no longer boots, this regressed somewhere between R1/beta2 and now. Sometimes it drops me into KDL, sometimes it just shows the blue background with the mouse pointer stuck in the center.

Attachments (4)

hrev54600.png (296.0 KB ) - added by luroh 7 months ago.
hrev54605_syslog1.png (705.2 KB ) - added by luroh 7 months ago.
hrev54605_syslog2.png (596.7 KB ) - added by luroh 7 months ago.
hrev54896.png (293.8 KB ) - added by luroh 3 months ago.

Download all attachments as: .zip

Change History (26)

by luroh, 7 months ago

Attachment: hrev54600.png added

comment:1 by pulkomandy, 7 months ago

Milestone: UnscheduledR1/beta3
Priority: normalhigh

comment:2 by luroh, 7 months ago

Apologies, R1/beta2 doesn't fully boot either, it hangs on the blue background with an immobile mouse pointer in the center of the screen. No KDL (tried 10 restarts).

This laptop has booted in the past but I am not sure when. Maybe something else changed at some point, it has received a few BIOS updates in the past year or so.

I guess the good news is that I now at least get dropped to KDL 50% of the time as opposed to just being stuck with a blue background.

comment:3 by korli, 7 months ago

Can you type in KDL, for instance "syslog"? There should be a line "using Intel C-states".

comment:4 by luroh, 7 months ago

Attaching syslog pictures of C-States and P-States.

by luroh, 7 months ago

Attachment: hrev54605_syslog1.png added

by luroh, 7 months ago

Attachment: hrev54605_syslog2.png added

comment:5 by korli, 7 months ago

Thanks, I can't find a reason for this to happen. We could eventually add an assert.

comment:6 by luroh, 7 months ago

Sure, no hurry. I can build straight to disk on this machine so turnaround time for testing is short, should you get any ideas.

comment:7 by pulkomandy, 3 months ago

Hi,

Please try hrev54870. The error message should be different and will give us more information about where the problem could be.

comment:8 by X512, 3 months ago

I can't find a reason for this to happen.

Negative timeStep?

comment:9 by pulkomandy, 3 months ago

timeStep is either set to BASE_TIME_STEP (500) or BASE_TIME_STEP / 4 (125), and these are clearly non-negative.

So, there is probably something resulting in a negative result, but it's not obvious how it could happen. The other involved value is a delta of two successive current_time call, which could only fail if the time goes back.

Or it could be that the number of CPUs changed and we are accessing the array out of range.

I don't see any reason one of these would happen, so I just logged all the values involved in the computation. We can then move our attention to the one that's not behaving as expected and investigate further.

by luroh, 3 months ago

Attachment: hrev54896.png added

comment:10 by luroh, 3 months ago

hrev54896, gcc2h:

comment:11 by pulkomandy, 3 months ago

So it's the idleTime being negative.

It's computed this way:

bigtime_t start = system_time();
// go in suspend mode and wait until we need to wakeup...
bigtime_t delta = system_time() - start;

idleTime = (idleTime + delta) / 2;

The only thing that I can imagine going wrong here is if system_time() somehow goes back in time?

Its implementation is based on rdtsc multiplied with a conversion factor to get microseconds. We are sure that the two calls to it will be run on the same CPU here so that shouldn't be a problem with de-synchronized TSC between two CPU cores.

The idle time value converted to hex: 0xc00bb0b000000001. Not sure what to make of that.

We could ignore the delta values that we find to be negative, but is that the proper fix, or is there some deeper problem at play here? Could it be a problem with the conversion factor used by system_time? It's computed by matching the rdtsc changes with the PC programmable timer, and the code in the bootloader looks like it can fail silently if it doesn't manage to compute a stable value after 20 tries. It will still gives a "best guess", but it could be completely wrong, and in particular it could result in overflow of system_time computations?

comment:12 by luroh, 2 months ago

hrev54937, gcc2h

No KDL, just blue background with mouse pointer stuck in the middle, no desktop.

comment:13 by X512, 2 months ago

No KDL, just blue background with mouse pointer stuck in the middle, no desktop.

Can you enter KDL by keyboard (Ctrl+Alt+SysRq+D)? If you can, please type teams, press enter key and take photo of screen.

in reply to:  13 comment:14 by luroh, 2 months ago

Can you enter KDL by keyboard (Ctrl+Alt+SysRq+D)?

If someone could provide a patch to reduce it to Ctrl+Alt+D, perhaps.

According to the manual, Fn+S should emulate SysRq but it doesn't work (horrible Lenovo keyboard).

comment:15 by X512, 2 months ago

If someone could provide a patch to reduce it to Ctrl+Alt+D, perhaps.

Print screen key should work as SysRq.

Last edited 2 months ago by X512 (previous) (diff)

in reply to:  15 comment:16 by luroh, 2 months ago

Print screen key should work as SysRq.

Tried that too, doesn't help.

comment:17 by pulkomandy, 2 months ago

Did you try both 32 and 64bit versions of Haiku? The system_time implementation is a bit different, if one works but not the other, that would be a likely place to check.

in reply to:  17 comment:18 by luroh, 2 months ago

Did you try both 32 and 64bit versions of Haiku?

Can't remember but I'll give it a try, good idea.

comment:19 by luroh, 2 months ago

Yes, 64-bit works. Come to think of it, it may very well have been the case that gcc2h never worked on this machine, sorry about that.

comment:20 by pulkomandy, 2 months ago

Platform: Allx86

comment:21 by pulkomandy, 2 months ago

So I suspect something is not working as expected with the code to compute the conversion factor for system time: https://git.haiku-os.org/haiku/tree/src/system/boot/arch/x86/arch_cpu.cpp

Can you check this?

From the bootloader menu, go in debug options -> display current bootloader log.

See if one of these logs are visible:

"needed %" B_PRIu32 " quick samples for TSC calibration\n"
"needed %" B_PRIu32 " slow samples for TSC calibration\n"

If one of these is 20 or larger, it means we didn't manage to properly find the timer frequency. As a result, everything involving system_time would be broken, including anything that tries to sleep for some number of microseconds.

If that's the case, the behavior could be different between EFI and BIOS booting, since different timers are used in each case.

comment:22 by luroh, 2 months ago

Unfortunately no, no such log entries are visible. gcc2h hrev54950.

Note: See TracTickets for help on using tickets.