Opened 4 years ago

Last modified 2 years ago

#16546 assigned bug

KDL, laptop no longer boots

Reported by: luroh Owned by: korli
Priority: normal Milestone: Unscheduled
Component: System/Kernel Version: R1/Development
Keywords: boot-failure Cc:
Blocked By: Blocking:
Platform: x86

Description

hrev54600, gcc2h

Laptop with i7-7500U no longer boots, this regressed somewhere between R1/beta2 and now. Sometimes it drops me into KDL, sometimes it just shows the blue background with the mouse pointer stuck in the center.

Attachments (4)

hrev54600.png (296.0 KB ) - added by luroh 4 years ago.
hrev54605_syslog1.png (705.2 KB ) - added by luroh 4 years ago.
hrev54605_syslog2.png (596.7 KB ) - added by luroh 4 years ago.
hrev54896.png (293.8 KB ) - added by luroh 4 years ago.

Download all attachments as: .zip

Change History (30)

by luroh, 4 years ago

Attachment: hrev54600.png added

comment:1 by pulkomandy, 4 years ago

Milestone: UnscheduledR1/beta3
Priority: normalhigh

comment:2 by luroh, 4 years ago

Apologies, R1/beta2 doesn't fully boot either, it hangs on the blue background with an immobile mouse pointer in the center of the screen. No KDL (tried 10 restarts).

This laptop has booted in the past but I am not sure when. Maybe something else changed at some point, it has received a few BIOS updates in the past year or so.

I guess the good news is that I now at least get dropped to KDL 50% of the time as opposed to just being stuck with a blue background.

comment:3 by korli, 4 years ago

Can you type in KDL, for instance "syslog"? There should be a line "using Intel C-states".

comment:4 by luroh, 4 years ago

Attaching syslog pictures of C-States and P-States.

by luroh, 4 years ago

Attachment: hrev54605_syslog1.png added

by luroh, 4 years ago

Attachment: hrev54605_syslog2.png added

comment:5 by korli, 4 years ago

Thanks, I can't find a reason for this to happen. We could eventually add an assert.

comment:6 by luroh, 4 years ago

Sure, no hurry. I can build straight to disk on this machine so turnaround time for testing is short, should you get any ideas.

comment:7 by pulkomandy, 4 years ago

Hi,

Please try hrev54870. The error message should be different and will give us more information about where the problem could be.

comment:8 by X512, 4 years ago

I can't find a reason for this to happen.

Negative timeStep?

comment:9 by pulkomandy, 4 years ago

timeStep is either set to BASE_TIME_STEP (500) or BASE_TIME_STEP / 4 (125), and these are clearly non-negative.

So, there is probably something resulting in a negative result, but it's not obvious how it could happen. The other involved value is a delta of two successive current_time call, which could only fail if the time goes back.

Or it could be that the number of CPUs changed and we are accessing the array out of range.

I don't see any reason one of these would happen, so I just logged all the values involved in the computation. We can then move our attention to the one that's not behaving as expected and investigate further.

by luroh, 4 years ago

Attachment: hrev54896.png added

comment:10 by luroh, 4 years ago

hrev54896, gcc2h:

comment:11 by pulkomandy, 4 years ago

So it's the idleTime being negative.

It's computed this way:

bigtime_t start = system_time();
// go in suspend mode and wait until we need to wakeup...
bigtime_t delta = system_time() - start;

idleTime = (idleTime + delta) / 2;

The only thing that I can imagine going wrong here is if system_time() somehow goes back in time?

Its implementation is based on rdtsc multiplied with a conversion factor to get microseconds. We are sure that the two calls to it will be run on the same CPU here so that shouldn't be a problem with de-synchronized TSC between two CPU cores.

The idle time value converted to hex: 0xc00bb0b000000001. Not sure what to make of that.

We could ignore the delta values that we find to be negative, but is that the proper fix, or is there some deeper problem at play here? Could it be a problem with the conversion factor used by system_time? It's computed by matching the rdtsc changes with the PC programmable timer, and the code in the bootloader looks like it can fail silently if it doesn't manage to compute a stable value after 20 tries. It will still gives a "best guess", but it could be completely wrong, and in particular it could result in overflow of system_time computations?

comment:12 by luroh, 4 years ago

hrev54937, gcc2h

No KDL, just blue background with mouse pointer stuck in the middle, no desktop.

comment:13 by X512, 4 years ago

No KDL, just blue background with mouse pointer stuck in the middle, no desktop.

Can you enter KDL by keyboard (Ctrl+Alt+SysRq+D)? If you can, please type teams, press enter key and take photo of screen.

in reply to:  13 comment:14 by luroh, 4 years ago

Can you enter KDL by keyboard (Ctrl+Alt+SysRq+D)?

If someone could provide a patch to reduce it to Ctrl+Alt+D, perhaps.

According to the manual, Fn+S should emulate SysRq but it doesn't work (horrible Lenovo keyboard).

comment:15 by X512, 4 years ago

If someone could provide a patch to reduce it to Ctrl+Alt+D, perhaps.

Print screen key should work as SysRq.

Last edited 4 years ago by X512 (previous) (diff)

in reply to:  15 comment:16 by luroh, 4 years ago

Print screen key should work as SysRq.

Tried that too, doesn't help.

comment:17 by pulkomandy, 4 years ago

Did you try both 32 and 64bit versions of Haiku? The system_time implementation is a bit different, if one works but not the other, that would be a likely place to check.

in reply to:  17 comment:18 by luroh, 4 years ago

Did you try both 32 and 64bit versions of Haiku?

Can't remember but I'll give it a try, good idea.

comment:19 by luroh, 4 years ago

Yes, 64-bit works. Come to think of it, it may very well have been the case that gcc2h never worked on this machine, sorry about that.

comment:20 by pulkomandy, 4 years ago

Platform: Allx86

comment:21 by pulkomandy, 4 years ago

So I suspect something is not working as expected with the code to compute the conversion factor for system time: https://git.haiku-os.org/haiku/tree/src/system/boot/arch/x86/arch_cpu.cpp

Can you check this?

From the bootloader menu, go in debug options -> display current bootloader log.

See if one of these logs are visible:

"needed %" B_PRIu32 " quick samples for TSC calibration\n"
"needed %" B_PRIu32 " slow samples for TSC calibration\n"

If one of these is 20 or larger, it means we didn't manage to properly find the timer frequency. As a result, everything involving system_time would be broken, including anything that tries to sleep for some number of microseconds.

If that's the case, the behavior could be different between EFI and BIOS booting, since different timers are used in each case.

comment:22 by luroh, 4 years ago

Unfortunately no, no such log entries are visible. gcc2h hrev54950.

comment:23 by nielx, 3 years ago

Milestone: R1/beta3R1/beta4

Ticket retargeted after milestone closed

comment:24 by waddlesplash, 3 years ago

Keywords: boot-failure added

comment:25 by korli, 3 years ago

Hi luroh, could you check on a current nightly? Thanks.

comment:26 by waddlesplash, 2 years ago

Milestone: R1/beta4Unscheduled
Priority: highnormal

No reply, bumping out of the milestone.

Note: See TracTickets for help on using tickets.