Opened 3 years ago

Closed 2 years ago

#17595 closed bug (fixed)

Intel 11th gen slow performance

Reported by: lkernan Owned by: tqh
Priority: blocker Milestone: R1/beta4
Component: Drivers/ACPI Version: R1/Development
Keywords: i5-1135G7 i7-1165G7 i7-1185G7 Cc:
Blocked By: Blocking: #17594
Platform: x86-64

Description

Tonight I decided to do an install of the latest nightly on my Intel NUC 11th generation. It's the i3 model with an NVME boot drive that I installed Haiku onto.

From the moment it started the system fan went to full speed and even just moving the mouse or clicking menus was incredibly slow. Menus need multiple clicks before they respond. The task list menu showed the acpi_task item at 100%.

I saved a copy of syslog to a USB and it contains the following lines repeated continuously.

KERN: set MTRRs to:
KERN:   mtrr:  0: base: 0x40000000, size:  0x1000000, type: 0
KERN:   mtrr:  1: base: 0x60000000, size: 0x20000000, type: 0
KERN:   mtrr:  2: base: 0x80000000, size: 0x80000000, type: 0
KERN:   mtrr:  3: base: 0x603d180000, size:    0x20000, type: 0
KERN:   mtrr:  4: base: 0x4000000000, size: 0x10000000, type: 1

Attachments (17)

syslog.txt (437.7 KB ) - added by lkernan 3 years ago.
listdev.txt (3.4 KB ) - added by lkernan 3 years ago.
listusb.txt (349 bytes ) - added by lkernan 3 years ago.
syslog-noacpi.txt (212.2 KB ) - added by lkernan 3 years ago.
syslog-huawei-55969.txt (123.8 KB ) - added by kquote03 3 years ago.
pulse55969.png (40.9 KB ) - added by kquote03 3 years ago.
Pulse reporting bogus clockspeed (hrev55969)
syslog_huawei_nightly.txt (137.5 KB ) - added by kquote03 3 years ago.
latest syslog
screenshot1.png (64.8 KB ) - added by kquote03 3 years ago.
Pulse reporting 20 GHz
syslog_huawei_30522.txt (137.3 KB ) - added by kquote03 3 years ago.
screenshot30522.png (100.5 KB ) - added by kquote03 3 years ago.
syslog30522afterconnectingmouse.txt (137.6 KB ) - added by kquote03 3 years ago.
syslog-31522.txt (137.4 KB ) - added by kquote03 3 years ago.
hrev56253 Intel i7 16gb ram,png.png (410.0 KB ) - added by atomozero 3 years ago.
hrev56253 syslog (264.9 KB ) - added by atomozero 3 years ago.
listdev hrev56253 Intel i7 16gb ram.txt (3.2 KB ) - added by atomozero 3 years ago.
listusb hrev56253 Intel i7 16gb ram.txt (578 bytes ) - added by atomozero 3 years ago.
syslog 14922.txt (142.0 KB ) - added by kquote03 2 years ago.

Change History (66)

by lkernan, 3 years ago

Attachment: syslog.txt added

by lkernan, 3 years ago

Attachment: listdev.txt added

by lkernan, 3 years ago

Attachment: listusb.txt added

comment:1 by diver, 3 years ago

Component: - GeneralDrivers/ACPI
Owner: changed from nobody to tqh
Version: R1/beta3R1/Development

Are you able to boot with ACPI disabled in the bootloader? Possibly a dupe of #14784.

comment:2 by tqh, 3 years ago

Sounds like another one that decided to do locking between fw and hw differently. Which means embedded controller needs new lock handling. Surprised that Intel would do that, so maybe there is a newer standard.

Anyone interested in working on embedded controller in ACPI should take a look. Our embedded controller is similar to FreeBSD's but hasn't been synced in ages. As my test machines work fine, and Lenovo does their own non standard way. It is up for grabs for someone who wants to work on edge cases, and a lot of strange hardware.

by lkernan, 3 years ago

Attachment: syslog-noacpi.txt added

comment:3 by lkernan, 3 years ago

I've added a syslog with ACPI disabled, hopefully that helps. That certainly stops the CPU usage, but the sluggish mouse input is still going on.

comment:4 by kallisti5, 3 years ago

Interestingly, my new Framework laptop has the same issue. In my case the mouse and system moves *really* slow. I move my cursor and it slowly creeps the direction you move it.

Same result with a USB attached mouse and the internal trackpad. I checked the unhandled interrupt counts and it seems normal.

My system is based on the 11th gen i7-1165G7

comment:5 by kallisti5, 3 years ago

Framework i7-1165G7:

  • Same issue seen booting from USB and internal NVMe
  • Booting via UEFI
  • Disabling ACPI in the bootloader doesn't fix the issue
  • Disabling SMAP and SMEP doesn't fix the issue
  • Disabling SMP doesn't fix the issue
  • Disabling IO-APIC doesn't fix the issue
  • Disabling local APIC breaks boot (stalled on card icon)
  • Disabling X2APIC doesn't fix the issue

comment:6 by kallisti5, 3 years ago

Blocking: 17594 added

comment:7 by kallisti5, 3 years ago

Keywords: i5-1135G7 i7-1165G7 i7-1185G7 added

Looks like we're starting to get a flood of these issues. The "sluggish mouse" issue seems to impact all Intel 11th Gen hardware from multiple vendors.

comment:8 by kallisti5, 3 years ago

Milestone: UnscheduledR1/beta4
Platform: Allx86-64
Priority: normalblocker

comment:9 by kallisti5, 3 years ago

Summary: Intel NUC 11th gen slow performanceIntel 11th gen slow performance

comment:10 by kallisti5, 3 years ago

Witnessed in #17594 is a CPU speed of "45.59 Ghz" in pulse. That lends to estimates that this one is system timer related. We saw a similar issue on riscv64 when the system time speed was miscalculated.

I just confirmed on my framework laptop that pulse shows "41.68 Ghz"

comment:11 by kallisti5, 3 years ago

framework: sysinfo shows "Frequency: 61759.596"

comment:12 by kallisti5, 3 years ago

I disabled the intel_pstates and intel_cstates power management add-ons. No change in behaviour (technically it's worse... no mouse movement at all)

Feels like the base system timer calculation is failing. The cstates / pstates stuff might be messing with the system speed making it "better".

comment:13 by kallisti5, 3 years ago

I enabled tracing for the timer components in tree. syslog shows a long loop of:

arch_timer: arch_timer_set_hardware_timer: timeout 80295
arch_timer: arch_timer_set_hardware_timer: timeout 2000
apic: arch_smp_set_apic_timer: config 251, timeout 24571, tics/sec 570200000, tics 1140400
arch_timer: arch_timer_set_hardware_timer: timeout 80295
arch_timer: arch_timer_set_hardware_timer: timeout 2000

comment:14 by kallisti5, 3 years ago

Raising my tracing up to kernel timer.cpp...

pages and pages of...

timer_Interrupt: calling hook 0xffffffff8009db90 for event 0xffffffff801b8888
add_timer: event 0xffffffff82e24668
cancel_timer: event 0xffffffff801b8888
add_timer: event 0xffffffff801b8888
cancel_timer: event 0xffffffff801b8d88
cancel_timer: event 0xffffffff801b8888
cancel_timer: event 0xffffffff801b8608
.
.
timer_interrupt: time 14662330, cpu 5
timer_interrupt: timer 14662435, cpu 1
.
.

comment:15 by pulkomandy, 3 years ago

Are we talking about the CPU frequency measurements in https://git.haiku-os.org/haiku/tree/src/system/boot/arch/x86/arch_cpu.cpp or does this CPU freq value come from something else?

Since these CPUs are possibly the first to go past the 232Hz (4GHz) barrier, it wouldn't be surprising if there's simply an overflow somewhere?

Last edited 3 years ago by pulkomandy (previous) (diff)

comment:16 by kallisti5, 3 years ago

4Ghz is really nothing special.

  • 4,000,000,000 is 4Ghz in Hz.
  • 2,147,483,647 is the maximum 32-bit unsigned int
  • My Ryzen 9 5950X has a base clock of 3.70 Ghz, and boosts to 4.9Ghz and works just fine with our code today.
  • The i7 in question has a base clock of 2.80 GHz, and boosts to 4.7Ghz.

Maybe the gTimeConversionFactor is over 32-bits, but that doesn't make a lot of sense. https://git.haiku-os.org/haiku/tree/src/system/boot/arch/x86/arch_cpu.cpp#n233

I get a feeling Intel introduced some kind of new time conversation system and we need special code to handle it.

Last edited 3 years ago by kallisti5 (previous) (diff)

comment:18 by pulkomandy, 3 years ago

2,147,483,647 is the maximum 32-bit unsigned int

That's signed.

The max value for unsigned is 4294967296 (a bit more than 4GHz). And yes, it could be intel specific.

There are various places where we deal with frequencies. For example the CPUID instruction also returns one: Processor Frequency Information Leaf in CPUID (see https://wukl.net/asm/x86/instr/CPUID), dynamic frequency meansurements, ...

I'm trying to understand which one of these is wrong first.

(I don't have the hardware yet, I ordered a new laptop but it was not delivered today as expected...)

comment:19 by pulkomandy, 3 years ago

Confirmed that Pulse gets its value from get_cpu_info which gets it from the bootloader. So it's indeed that arch_cpu code not working as expected.

comment:20 by pulkomandy, 3 years ago

I have received my new laptop. Fujitsu U7311 with Core i7 1165G7 CPU.

No slowness problem as far as I can see, the machine is working fine. I did not run compilation jobs yet, but played a bit with Mandelbrot to stress the CPU. It's working well.

So, this is not specific to the CPU, there's something else involved.

in reply to:  20 comment:21 by korli, 3 years ago

Replying to pulkomandy:

No slowness problem as far as I can see, the machine is working fine. I did not run

with x86_64 and EFI I suppose?

comment:22 by pulkomandy, 3 years ago

Yes, should I try other setups?

in reply to:  22 comment:23 by korli, 3 years ago

No, any options in the BIOS which could be enabled/disabled?

comment:24 by pulkomandy, 3 years ago

There are various options in the bios, including for switching between performance and powersaving modes. I will make some experiments after I have finished installing the machine and transferring my data from the previous one.

comment:25 by korli, 3 years ago

it could be that this machine throttles when running the efi loader.

comment:26 by kallisti5, 3 years ago

pulkomany: What is pulse reporting for your cpu speed?

comment:27 by pulkomandy, 3 years ago

2.80GHz

comment:28 by kquote03, 3 years ago

@pulkomandy Are you able to enable CSM on your machine? The two laptops I tested were class 3 only

comment:29 by pulkomandy, 3 years ago

@pulkomandy Are you able to enable CSM on your machine? The two laptops I tested were class 3 only

I have switched "Fast Boot" off, it makes no difference.

comment:30 by lkernan, 3 years ago

I've done some more testing on my NUC with the latest nightly image.

Firstly, About System shows it as an "Intel Gen Intel® Core™ i3 1115G4 39.42 Ghz"

It's also one of the ones where Intel hasn't included CSM support in the EFI anymore. It's not possible to turn on BIOS compatibility.

One thing that I have tried was to turn off "High Precision Event Timers" in the UEFI setup. That didn't totally fix the mouse sluggishness, but it made it way closer to normal. The mouse moves better with that off, but I still have to hold the button for about 3 seconds for it to register a click in a menu.

Last edited 3 years ago by lkernan (previous) (diff)

in reply to:  30 comment:31 by korli, 3 years ago

Replying to lkernan:

I've done some more testing on my NUC with the latest nightly image.

Please check with hrev55969 once it's available (USB anyboot is preferred to get the updated efi loader).

comment:32 by kquote03, 3 years ago

korli: It seems the same behaviour is there on hrev55969 (had to manually compile)

by kquote03, 3 years ago

Attachment: syslog-huawei-55969.txt added

by kquote03, 3 years ago

Attachment: pulse55969.png added

Pulse reporting bogus clockspeed (hrev55969)

comment:33 by kallisti5, 3 years ago

This issue still exists as of hrev56087 on my Framework laptop. track pad is having timing / movement issues, and cpu is reported as 41.16 Ghz

Last edited 3 years ago by kallisti5 (previous) (diff)

in reply to:  34 comment:35 by kquote03, 3 years ago

Replying to korli:

Please check with https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/1/hrev56134/x86_64/ whether it helps.

Sadly the same behavior persists as in the first post (no change). Thank you nevertheless

by kquote03, 3 years ago

Attachment: syslog_huawei_nightly.txt added

latest syslog

by kquote03, 3 years ago

Attachment: screenshot1.png added

Pulse reporting 20 GHz

comment:36 by korli, 3 years ago

Thanks. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/2/hrev56145/x86_64/ The EFI bootloader smp logs are enabled. Hopefully they'll end up in the syslog.

comment:37 by waddlesplash, 3 years ago

(Note that you will need to test with the full anyboot image in order to get the new logs from the bootloader.)

in reply to:  36 comment:38 by kquote03, 3 years ago

Replying to korli:

Thanks. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/2/hrev56145/x86_64/ The EFI bootloader smp logs are enabled. Hopefully they'll end up in the syslog.

The same behavior as previous builds, I will upload the syslogs shortly

by kquote03, 3 years ago

Attachment: syslog_huawei_30522.txt added

by kquote03, 3 years ago

Attachment: screenshot30522.png added

by kquote03, 3 years ago

comment:39 by korli, 3 years ago

Thanks. I enabled the wrong logs. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/3/hrev56145/x86_64/

in reply to:  39 comment:40 by kquote03, 3 years ago

Replying to korli:

Thanks. I enabled the wrong logs. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/3/hrev56145/x86_64/

No problem, new syslog should be attached shortly.

by kquote03, 3 years ago

Attachment: syslog-31522.txt added

by atomozero, 3 years ago

by atomozero, 3 years ago

Attachment: hrev56253 syslog added

comment:42 by kquote03, 2 years ago

Hello, sorry for the late reply, it seems the builds have been erased, however I compiled the latest git with the patch 0001-kernel-libroot-apply-a-shift-on-rdtsc-on-higher-TSC-.patch

Syslog should be attached soon

by kquote03, 2 years ago

Attachment: syslog 14922.txt added

comment:43 by waddlesplash, 2 years ago

KERN: calculating apic timer conversion factor
KERN: APIC ticks/sec = 463028571

comment:44 by pulkomandy, 2 years ago

It seems for modern CPUs, Linux does not do this calculation anymore and they instead get the frequency from CPUID, see the various messages in this thread: https://lore.kernel.org/lkml/tip-2420a0b1798d7a78d1f9b395f09f3c80d92cc588@git.kernel.org/

The detected frequency of 463MHz seems strange, shouldn't this be the same as the CPU clock? Wouldn't it be useful to log the t1, t2 and count values in this function?

comment:45 by pulkomandy, 2 years ago

According to Intel docs the APIC timer should not be that fast, it should be around 20-25MHz and indeed they document that the frequency can be read from CPUID instructions if available there.

Source: section 10.5.4 of Intel architecture software developer manual, volume 3A (part 1).

So, should we try to get the frequency from there instead of measuring it?

comment:47 by kallisti5, 2 years ago

Resolution: fixed
Status: newclosed

Tested fixed via hrev56471!! Nice work Jerome! I'm going to close this one out. Please report here and reopen if the issue continues after hrev56471 for you.

comment:48 by diver, 2 years ago

Resolution: fixed
Status: closedreopened

comment:49 by diver, 2 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.