Opened 7 months ago

Closed 40 hours ago

#17595 closed bug (fixed)

Intel 11th gen slow performance

Reported by: lkernan Owned by: tqh
Priority: blocker Milestone: R1/beta4
Component: Drivers/ACPI Version: R1/Development
Keywords: i5-1135G7 i7-1165G7 i7-1185G7 Cc:
Blocked By: Blocking: #17594
Platform: x86-64

Description

Tonight I decided to do an install of the latest nightly on my Intel NUC 11th generation. It's the i3 model with an NVME boot drive that I installed Haiku onto.

From the moment it started the system fan went to full speed and even just moving the mouse or clicking menus was incredibly slow. Menus need multiple clicks before they respond. The task list menu showed the acpi_task item at 100%.

I saved a copy of syslog to a USB and it contains the following lines repeated continuously.

KERN: set MTRRs to:
KERN:   mtrr:  0: base: 0x40000000, size:  0x1000000, type: 0
KERN:   mtrr:  1: base: 0x60000000, size: 0x20000000, type: 0
KERN:   mtrr:  2: base: 0x80000000, size: 0x80000000, type: 0
KERN:   mtrr:  3: base: 0x603d180000, size:    0x20000, type: 0
KERN:   mtrr:  4: base: 0x4000000000, size: 0x10000000, type: 1

Attachments (17)

syslog.txt (437.7 KB ) - added by lkernan 7 months ago.
listdev.txt (3.4 KB ) - added by lkernan 7 months ago.
listusb.txt (349 bytes ) - added by lkernan 7 months ago.
syslog-noacpi.txt (212.2 KB ) - added by lkernan 7 months ago.
syslog-huawei-55969.txt (123.8 KB ) - added by kquote03 6 months ago.
pulse55969.png (40.9 KB ) - added by kquote03 6 months ago.
Pulse reporting bogus clockspeed (hrev55969)
syslog_huawei_nightly.txt (137.5 KB ) - added by kquote03 4 months ago.
latest syslog
screenshot1.png (64.8 KB ) - added by kquote03 4 months ago.
Pulse reporting 20 GHz
syslog_huawei_30522.txt (137.3 KB ) - added by kquote03 4 months ago.
screenshot30522.png (100.5 KB ) - added by kquote03 4 months ago.
syslog30522afterconnectingmouse.txt (137.6 KB ) - added by kquote03 4 months ago.
syslog-31522.txt (137.4 KB ) - added by kquote03 4 months ago.
hrev56253 Intel i7 16gb ram,png.png (410.0 KB ) - added by atomozero 3 months ago.
hrev56253 syslog (264.9 KB ) - added by atomozero 3 months ago.
listdev hrev56253 Intel i7 16gb ram.txt (3.2 KB ) - added by atomozero 3 months ago.
listusb hrev56253 Intel i7 16gb ram.txt (578 bytes ) - added by atomozero 3 months ago.
syslog 14922.txt (142.0 KB ) - added by kquote03 11 days ago.

Change History (66)

by lkernan, 7 months ago

Attachment: syslog.txt added

by lkernan, 7 months ago

Attachment: listdev.txt added

by lkernan, 7 months ago

Attachment: listusb.txt added

comment:1 by diver, 7 months ago

Component: - GeneralDrivers/ACPI
Owner: changed from nobody to tqh
Version: R1/beta3R1/Development

Are you able to boot with ACPI disabled in the bootloader? Possibly a dupe of #14784.

comment:2 by tqh, 7 months ago

Sounds like another one that decided to do locking between fw and hw differently. Which means embedded controller needs new lock handling. Surprised that Intel would do that, so maybe there is a newer standard.

Anyone interested in working on embedded controller in ACPI should take a look. Our embedded controller is similar to FreeBSD's but hasn't been synced in ages. As my test machines work fine, and Lenovo does their own non standard way. It is up for grabs for someone who wants to work on edge cases, and a lot of strange hardware.

by lkernan, 7 months ago

Attachment: syslog-noacpi.txt added

comment:3 by lkernan, 7 months ago

I've added a syslog with ACPI disabled, hopefully that helps. That certainly stops the CPU usage, but the sluggish mouse input is still going on.

comment:4 by kallisti5, 7 months ago

Interestingly, my new Framework laptop has the same issue. In my case the mouse and system moves *really* slow. I move my cursor and it slowly creeps the direction you move it.

Same result with a USB attached mouse and the internal trackpad. I checked the unhandled interrupt counts and it seems normal.

My system is based on the 11th gen i7-1165G7

comment:5 by kallisti5, 7 months ago

Framework i7-1165G7:

  • Same issue seen booting from USB and internal NVMe
  • Booting via UEFI
  • Disabling ACPI in the bootloader doesn't fix the issue
  • Disabling SMAP and SMEP doesn't fix the issue
  • Disabling SMP doesn't fix the issue
  • Disabling IO-APIC doesn't fix the issue
  • Disabling local APIC breaks boot (stalled on card icon)
  • Disabling X2APIC doesn't fix the issue

comment:6 by kallisti5, 7 months ago

Blocking: 17594 added

comment:7 by kallisti5, 7 months ago

Keywords: i5-1135G7 i7-1165G7 i7-1185G7 added

Looks like we're starting to get a flood of these issues. The "sluggish mouse" issue seems to impact all Intel 11th Gen hardware from multiple vendors.

comment:8 by kallisti5, 7 months ago

Milestone: UnscheduledR1/beta4
Platform: Allx86-64
Priority: normalblocker

comment:9 by kallisti5, 7 months ago

Summary: Intel NUC 11th gen slow performanceIntel 11th gen slow performance

comment:10 by kallisti5, 7 months ago

Witnessed in #17594 is a CPU speed of "45.59 Ghz" in pulse. That lends to estimates that this one is system timer related. We saw a similar issue on riscv64 when the system time speed was miscalculated.

I just confirmed on my framework laptop that pulse shows "41.68 Ghz"

comment:11 by kallisti5, 7 months ago

framework: sysinfo shows "Frequency: 61759.596"

comment:12 by kallisti5, 7 months ago

I disabled the intel_pstates and intel_cstates power management add-ons. No change in behaviour (technically it's worse... no mouse movement at all)

Feels like the base system timer calculation is failing. The cstates / pstates stuff might be messing with the system speed making it "better".

comment:13 by kallisti5, 7 months ago

I enabled tracing for the timer components in tree. syslog shows a long loop of:

arch_timer: arch_timer_set_hardware_timer: timeout 80295
arch_timer: arch_timer_set_hardware_timer: timeout 2000
apic: arch_smp_set_apic_timer: config 251, timeout 24571, tics/sec 570200000, tics 1140400
arch_timer: arch_timer_set_hardware_timer: timeout 80295
arch_timer: arch_timer_set_hardware_timer: timeout 2000

comment:14 by kallisti5, 7 months ago

Raising my tracing up to kernel timer.cpp...

pages and pages of...

timer_Interrupt: calling hook 0xffffffff8009db90 for event 0xffffffff801b8888
add_timer: event 0xffffffff82e24668
cancel_timer: event 0xffffffff801b8888
add_timer: event 0xffffffff801b8888
cancel_timer: event 0xffffffff801b8d88
cancel_timer: event 0xffffffff801b8888
cancel_timer: event 0xffffffff801b8608
.
.
timer_interrupt: time 14662330, cpu 5
timer_interrupt: timer 14662435, cpu 1
.
.

comment:15 by pulkomandy, 7 months ago

Are we talking about the CPU frequency measurements in https://git.haiku-os.org/haiku/tree/src/system/boot/arch/x86/arch_cpu.cpp or does this CPU freq value come from something else?

Since these CPUs are possibly the first to go past the 232Hz (4GHz) barrier, it wouldn't be surprising if there's simply an overflow somewhere?

Last edited 7 months ago by pulkomandy (previous) (diff)

comment:16 by kallisti5, 7 months ago

4Ghz is really nothing special.

  • 4,000,000,000 is 4Ghz in Hz.
  • 2,147,483,647 is the maximum 32-bit unsigned int
  • My Ryzen 9 5950X has a base clock of 3.70 Ghz, and boosts to 4.9Ghz and works just fine with our code today.
  • The i7 in question has a base clock of 2.80 GHz, and boosts to 4.7Ghz.

Maybe the gTimeConversionFactor is over 32-bits, but that doesn't make a lot of sense. https://git.haiku-os.org/haiku/tree/src/system/boot/arch/x86/arch_cpu.cpp#n233

I get a feeling Intel introduced some kind of new time conversation system and we need special code to handle it.

Last edited 7 months ago by kallisti5 (previous) (diff)

comment:18 by pulkomandy, 7 months ago

2,147,483,647 is the maximum 32-bit unsigned int

That's signed.

The max value for unsigned is 4294967296 (a bit more than 4GHz). And yes, it could be intel specific.

There are various places where we deal with frequencies. For example the CPUID instruction also returns one: Processor Frequency Information Leaf in CPUID (see https://wukl.net/asm/x86/instr/CPUID), dynamic frequency meansurements, ...

I'm trying to understand which one of these is wrong first.

(I don't have the hardware yet, I ordered a new laptop but it was not delivered today as expected...)

comment:19 by pulkomandy, 7 months ago

Confirmed that Pulse gets its value from get_cpu_info which gets it from the bootloader. So it's indeed that arch_cpu code not working as expected.

comment:20 by pulkomandy, 6 months ago

I have received my new laptop. Fujitsu U7311 with Core i7 1165G7 CPU.

No slowness problem as far as I can see, the machine is working fine. I did not run compilation jobs yet, but played a bit with Mandelbrot to stress the CPU. It's working well.

So, this is not specific to the CPU, there's something else involved.

in reply to:  20 comment:21 by korli, 6 months ago

Replying to pulkomandy:

No slowness problem as far as I can see, the machine is working fine. I did not run

with x86_64 and EFI I suppose?

comment:22 by pulkomandy, 6 months ago

Yes, should I try other setups?

in reply to:  22 comment:23 by korli, 6 months ago

No, any options in the BIOS which could be enabled/disabled?

comment:24 by pulkomandy, 6 months ago

There are various options in the bios, including for switching between performance and powersaving modes. I will make some experiments after I have finished installing the machine and transferring my data from the previous one.

comment:25 by korli, 6 months ago

it could be that this machine throttles when running the efi loader.

comment:26 by kallisti5, 6 months ago

pulkomany: What is pulse reporting for your cpu speed?

comment:27 by pulkomandy, 6 months ago

2.80GHz

comment:28 by kquote03, 6 months ago

@pulkomandy Are you able to enable CSM on your machine? The two laptops I tested were class 3 only

comment:29 by pulkomandy, 6 months ago

@pulkomandy Are you able to enable CSM on your machine? The two laptops I tested were class 3 only

I have switched "Fast Boot" off, it makes no difference.

comment:30 by lkernan, 6 months ago

I've done some more testing on my NUC with the latest nightly image.

Firstly, About System shows it as an "Intel Gen Intel® Core™ i3 1115G4 39.42 Ghz"

It's also one of the ones where Intel hasn't included CSM support in the EFI anymore. It's not possible to turn on BIOS compatibility.

One thing that I have tried was to turn off "High Precision Event Timers" in the UEFI setup. That didn't totally fix the mouse sluggishness, but it made it way closer to normal. The mouse moves better with that off, but I still have to hold the button for about 3 seconds for it to register a click in a menu.

Last edited 6 months ago by lkernan (previous) (diff)

in reply to:  30 comment:31 by korli, 6 months ago

Replying to lkernan:

I've done some more testing on my NUC with the latest nightly image.

Please check with hrev55969 once it's available (USB anyboot is preferred to get the updated efi loader).

comment:32 by kquote03, 6 months ago

korli: It seems the same behaviour is there on hrev55969 (had to manually compile)

by kquote03, 6 months ago

Attachment: syslog-huawei-55969.txt added

by kquote03, 6 months ago

Attachment: pulse55969.png added

Pulse reporting bogus clockspeed (hrev55969)

comment:33 by kallisti5, 5 months ago

This issue still exists as of hrev56087 on my Framework laptop. track pad is having timing issues, and cpu is reported as 41.16 Ghz

Version 0, edited 5 months ago by kallisti5 (next)

in reply to:  34 comment:35 by kquote03, 4 months ago

Replying to korli:

Please check with https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/1/hrev56134/x86_64/ whether it helps.

Sadly the same behavior persists as in the first post (no change). Thank you nevertheless

by kquote03, 4 months ago

Attachment: syslog_huawei_nightly.txt added

latest syslog

by kquote03, 4 months ago

Attachment: screenshot1.png added

Pulse reporting 20 GHz

comment:36 by korli, 4 months ago

Thanks. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/2/hrev56145/x86_64/ The EFI bootloader smp logs are enabled. Hopefully they'll end up in the syslog.

comment:37 by waddlesplash, 4 months ago

(Note that you will need to test with the full anyboot image in order to get the new logs from the bootloader.)

in reply to:  36 comment:38 by kquote03, 4 months ago

Replying to korli:

Thanks. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/2/hrev56145/x86_64/ The EFI bootloader smp logs are enabled. Hopefully they'll end up in the syslog.

The same behavior as previous builds, I will upload the syslogs shortly

by kquote03, 4 months ago

Attachment: syslog_huawei_30522.txt added

by kquote03, 4 months ago

Attachment: screenshot30522.png added

by kquote03, 4 months ago

comment:39 by korli, 4 months ago

Thanks. I enabled the wrong logs. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/3/hrev56145/x86_64/

in reply to:  39 comment:40 by kquote03, 4 months ago

Replying to korli:

Thanks. I enabled the wrong logs. Please try again with this image: https://haiku.movingborders.es/testbuild/Ifae8f2cea5aadc46b7591c4debc2dd247a787fb1/3/hrev56145/x86_64/

No problem, new syslog should be attached shortly.

by kquote03, 4 months ago

Attachment: syslog-31522.txt added

by atomozero, 3 months ago

by atomozero, 3 months ago

Attachment: hrev56253 syslog added

comment:42 by kquote03, 11 days ago

Hello, sorry for the late reply, it seems the builds have been erased, however I compiled the latest git with the patch 0001-kernel-libroot-apply-a-shift-on-rdtsc-on-higher-TSC-.patch

Syslog should be attached soon

by kquote03, 11 days ago

Attachment: syslog 14922.txt added

comment:43 by waddlesplash, 11 days ago

KERN: calculating apic timer conversion factor
KERN: APIC ticks/sec = 463028571

comment:44 by pulkomandy, 7 days ago

It seems for modern CPUs, Linux does not do this calculation anymore and they instead get the frequency from CPUID, see the various messages in this thread: https://lore.kernel.org/lkml/tip-2420a0b1798d7a78d1f9b395f09f3c80d92cc588@git.kernel.org/

The detected frequency of 463MHz seems strange, shouldn't this be the same as the CPU clock? Wouldn't it be useful to log the t1, t2 and count values in this function?

comment:45 by pulkomandy, 7 days ago

According to Intel docs the APIC timer should not be that fast, it should be around 20-25MHz and indeed they document that the frequency can be read from CPUID instructions if available there.

Source: section 10.5.4 of Intel architecture software developer manual, volume 3A (part 1).

So, should we try to get the frequency from there instead of measuring it?

comment:47 by kallisti5, 42 hours ago

Resolution: fixed
Status: newclosed

Tested fixed via hrev56471!! Nice work Jerome! I'm going to close this one out. Please report here and reopen if the issue continues after hrev56471 for you.

comment:48 by diver, 40 hours ago

Resolution: fixed
Status: closedreopened

comment:49 by diver, 40 hours ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.