Opened 10 years ago

Closed 9 years ago

#3772 closed bug (fixed)

Freeze on high memory load when not limiting available memory (reproduceable)

Reported by: michael.weirauch Owned by: axeld
Priority: normal Milestone: R1
Component: System Version: R1/Development
Keywords: Cc: marcusoverhagen, imker@…
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description

System freezes reproduceably when compiling the Haiku tree. No KDL. No KDL-enter with F12.

Environment: Haiku gcc2 and gcc4 any rev up until now (current hrev30177)
System: Thinkpad T500 (NK13AGE), C2D T9600@2.8Ghz, 4GB DDR3, 320GB SATA (WD Scorpio Black WD3200BEKT 16MB Cache), VESA

listdev: http://dev.haiku-os.org/attachment/ticket/3632/listdev-29784.txt
KDL-ints: http://dev.haiku-os.org/attachment/ticket/3632/kdl-ints-r29784-small.jpg (hda on int11 since some revs after that one)

Evaluated all of the below in different combinations /scenarios with hrev30177 and reproduceable freeze:

  • gcc2 & gcc4
  • new ata bus_manager
  • KDEBUG_LEVEL_2 on every item in kernel_debug_config.h (small fix reguired for gcc4 compile in src/system/kernel/vm/vm_page.cpp)
  • INTA-INTH configured in BIOS for distributing the devices on different IRQs instead of all on IRQ 11 as seen in kdl-ints shot from above (figure further below)
  • INTA only on IRQ11 (INTB-INTH disabled) (ahci + uhci)
  • INTA only on IRQ11 (INTB-INTH disabled) (ahci only) removed busses/usb/[e,o,u]hci
  • same as above with all safe mode options enabled
  • BIOS-settings tried: SMP disabled, disabled EM64T, Intel VT, IDE DMA

INT-lines/IRQs when configured in BIOS:

INTA3ahci/uhci
INTB4hda/firewire/uhci
INTC5uhci
INTD6ehci
INTE7uhci/ipro1000
INTF9-
INTG10uhci
INTH11ehci

Doing intense "time sync" during compilation gives real times in the average of 200ms. One extreme was 5047ms which blocked AboutWindows uptime display and console output of the compile IIRC. (At least I remember the uptime display stuttering occassionally before freeze-on-compile during the test scenarios.)

A perhaps related ticket: #3632

Getting more and more clueless on what things I might evaluate on my own. Thanks in advance for any pointers on how to (help) track this issue!

Michael

Attachments (1)

haiku-r34198-trace-mtrr-syslog.txt (186.4 KB) - added by michael.weirauch 9 years ago.
ThinkPad T500 hrev34198 TRACE_MTRR

Download all attachments as: .zip

Change History (46)

comment:1 Changed 10 years ago by axeld

Maybe your system gets warm during compilation, and the BIOS tries to start the fan, or even lower the chip frequency. This might actually have the same cause as #3632 I would think.

Hopefully enabling ACPI will already do the trick. However, we currently don't do much of CPU thermal/frequency management ourself, so please check if your system is not running hot for some reason.

comment:2 Changed 10 years ago by michael.weirauch

Thermal/EIST: I would say that there are no thermal issues with this ThinkPad. It probably the best cooling system I've heard/seen/felt so far. The system fan never gets noisy and thermals are - subjective impression - medium during longer compilation runs on GNU/Linux or gaming on Windows. Nothing near hot or anything near the thermals of my old Samsung X20 which allowed for cooking a pile of eggs on it.

In BIOS everything is set to "Max Performance". There seems to be no frequency management done by the BIOS itself. No signs of that during runs of Haiku, too. (No slower clock ticks) At least I've never seen something like that except the system actively (as in an EIST driver/mechanism) requesting frequency changes.

hrev30266; new-ata-bm; acpi enabled: No go. System froze after ~ 4m. Will do some more testing when I get back home today. Also will recheck the BIOS settings.

comment:3 Changed 10 years ago by michael.weirauch

hrev30284; new-ata-bm; acpi

PMCPU: CPU Power Management (throttling on inactivity) PMPCI: PCI BUS Power Management

jam -qaj2 sessions:

  • BIOS: EIST, PMCPU, PMPCI
    • freeze 12m14s
  • BIOS: EIST, PMCPU, PMPCI
    • freeze 5m30s
  • BIOS: EIST, PMCPU, PMPCI
    • compile in 29m (finished after 30m30s uptime)
    • leaving system idle after compile: freeze after 32m30s uptime

Perhaps the latest changes in bfs (hrev30221) or file_cache (hrev30276) do have some influence on the (greatly) improved uptimes. I wouldn't really say that enabling/disabling EIST, PMCPU or PMPCI - especially as when disabled, freezes appeared earlier - do have an influence. Perhaps just coincidence.

Happy to report some success, though!

comment:4 Changed 10 years ago by michael.weirauch

hrev30284; new-ata-bm; acpi

blender scons sessions:

  • BIOS: EIST, PMCPU, PMPCI
    • freeze after 8m36s
  • BIOS: EIST, PMCPU, PMPCI
    • compile in 6m51
    • leave idle: freeze after 40s
  • BIOS: EIST, PMCPU, PMPCI
    • compile in 6m53
    • leave idle: freeze after 1m

Perhaps there is some kind of power management going on as the freeze after 40s/1m might indicate a cool down of the proc and the fan spinning down. (Not audible, though.) Haven't tested stressing the system afterwards.

comment:5 Changed 10 years ago by michael.weirauch

hrev30347; new-ata-bm; acpi

I stripped down the system one by one removing bt, hda, firewire, ipro1000, usb and the eist driver. The freeze is still reproduceable, but comes at a later stage. (see further below) I also tested a installation from a USB hard disk and a compilation there.

The main observation is that the system freezes either during high I/O or shorty afterwards:

  • ./configure for SDL-1.2.3 freeze short after config.status is written and header dep generation takes place; or shortly after the whole configure is run
  • jam -qaj2 on the Haiku tree; ctrl+c'ing the process right in the middle waiting a bit and the system freezes some time afterwards on inactivity (not reproduceable every time)
  • freeze during random_file_actions -hrev100000 -f150000 -d100 -m128000 -v during execution

There are occassions where the whole system just works quite long. Remember two days ago with a full tree compile, browsing the net, downloading and checking out blender trees via svn at the same time...

I am going for testing out the random_file_actions with tracing options enabled as mentioned in 3808. Jaming the tree with these options froze unfortunately last night and I fell asleep waiting for it :)

Can it be that the system might get into a freeze/deadlock due to file system corruption? The storage partition I am jam'ing on is existant for quite a time and has gone through several dozens of outages/freezes since 2008-11.

comment:6 Changed 10 years ago by axeld

Even if the file system already has problems (checkfs should be able to tell you, though), the system should never freeze that you cannot enter the KDL anymore.

I would still suspect ACPI related problems. Have you tried to enable APM instead, and see if that makes any difference (also in the kernel settings file)?

comment:7 Changed 10 years ago by michael.weirauch

The freezes are/were reproduceable with and without acpi. Only tried acpi later on as that also helped with #3632. Will try with apm enabled when getting back home.

Regarding file system corruption: Yes there is according to checkfs. ;)

comment:8 Changed 10 years ago by michael.weirauch

hrev30464; new-ata-bm; default image:

It seems Marcus' recent changes (hrev30443 and hrev30454) have had some impact.

  • Only one freeze on svn co of the haiku tree with acpi disabled.
  • Full tree compile with acpi enabled. (Freeze minutes later when changing font prefs in Firefox)

What is observable that there are seconds of UI freeze (especially noticeable Deskbar and ActivityMonitor living on the desktop). The mouse and windows are movable, though. Just nothing gets updated. Populating the haiku.image "froze" UI updates for 1m12 seconds.

During heavy IO (svn checkout + deletion of 40k files via Tracker; tree compile) there is noticeable UI update freezes, but the system gets back after up to 10 seconds.

Generally, the "scsi scheduler" kernel thread takes up full CPU periodically. (Only confirmable when UI is not frozen.)

comment:9 Changed 10 years ago by marcusoverhagen

Cc: marcusoverhagen added

comment:10 Changed 10 years ago by marcusoverhagen

Can you retest with hrev30477 ? Please make sure to change the #define mentioned in hrev30475 to 1

What is the system doing when frozen? Is F12 working? If F12 is not working, please connect a PS/2 keabord and try again. PS/2 keyboard interrupts have a higher priority than USB.

comment:11 Changed 10 years ago by michael.weirauch

I tried last night right after your changes (hrev30475 without vm prefetch and with ATA_STACK 1). System behaves as with hrev30464 observations in my previous comment. Nevertheless this is a big improvement over the weeks/months before your changes.

  • system freeze on SDL configure
  • system freeze after ~10m of inactivity after having done a "jam -qaj2 @install4 update kernel" which involved a little compiling

When the system is frozen, it is completely frozen. No KDL, nor KDL-enter. As mentioned earlier in this thread, the trackpoint and touchpad are PS2-attached and are not reacting either on complete freeze.

comment:12 Changed 10 years ago by michael.weirauch

  • SDL configure on a distcleaned tree -> freeze after ~30s of idle when process finsihed
  • jam -qj2 the haiku tree (mostly present objects) -> freeze after ~30s of idle when process finished
  • jam -qaj2 the haiku tree -> freeze after 2m10s of idle when process finished

comment:13 Changed 10 years ago by michael.weirauch

hrev30772; ata-stack;

For reproduction, I wen't on and wrote a little script which dd's 10 images 100MB each to disk.

When setting all safe mode options, or just "Disable IDE-DMA" and "Disable SMP" I can run the dd writes several times without freeze. What is reproduceable is that after the images have been written, the "scsi scheduler" kernel thread sucks up one cpu core 100% for about 3-5minutes and then reliefs again. During that high load of the thread a "sync" or system shutdown is not possible. The UI (Deskbar + ActivityMonitor replicants on Desktop) are frozen for a bit of time. The Terminal cursor blinks, and I can move windows around.

When letting SMP enabled, I could produce a total freeze on third run of the dd writes. (Mileage on subsequent test might vary I guess, though.)

Perhaps there is some other fix required as done in hrev30454 as I do have ICH9M running in compat mode? (This rev btw helped a lot regarding uptime until freeze for me.)

comment:14 Changed 10 years ago by michael.weirauch

I just went on testing with hrev30868; gcc4; ata-bm; acpi; on the second sata disk with an installation and work partition and could reproduce the occassional UI freeze during I/O and total freeze after building the Haiku tree.

But I never dared to test the old ide-bm after Marcus' changes (some or all related for me: hrev30443; hrev30454; hrev30475), which - if I got things right - have also been applied to the ide-bm & co after the separation of the two bus managers.

I tested clean installations of hrev30868; gcc4; ide-bm; acpi; on the second sata disk and on my primary sata disk with full Haiku builds (jam -qaj2), SDL configures + builds and the dd write tests mentioned earlier:

  • No occassional UI freezes on I/O (e.g. no Deskbar or ActivityMonitor replicant updates)
  • No system freeze during or after compiles ... read on ...

I thought I nailed it down. While writing this text, there was more or less 5 mintutes of no heavy disk I/O after the "tests" performed above and it suddenly froze again.

At least the UI freezes are not reproduceable with the ide-bm.

Attaching "hdparm" info for the two disks if these are of interesst:

Primary:

/dev/sda:

ATA device, with non-removable media
	Model Number:       WDC WD3200BEKT-00F3T0                   
	Serial Number:      WD-WXE808PN8192
	Firmware Revision:  11.01A11
	Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5
Standards:
	Supported: 8 7 6 5 
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  625142448
	device size with M = 1024*1024:      305245 MBytes
	device size with M = 1000*1000:      320072 MBytes (320 GB)
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, with device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 16
	Advanced power management level: 128
	Recommended acoustic management value: 128, current value: 254
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	   *	Advanced Power Management feature set
	    	SET_MAX security extension
	    	Automatic Acoustic Management feature set
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	WRITE_{DMA|MULTIPLE}_FUA_EXT
	   *	64-bit World wide name
	   *	IDLE_IMMEDIATE with UNLOAD
	   *	{READ,WRITE}_DMA_EXT_GPL commands
	   *	Segmented DOWNLOAD_MICROCODE
	   *	SATA-I signaling speed (1.5Gb/s)
	   *	SATA-II signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Host-initiated interface power management
	   *	Phy event counters
	    	DMA Setup Auto-Activate optimization
	    	Device-initiated interface power management
	   *	Software settings preservation
	   *	SMART Command Transport (SCT) feature set
	   *	SCT Long Sector Access (AC1)
	   *	SCT LBA Segment Access (AC2)
	   *	SCT Error Recovery Control (AC3)
	   *	SCT Features Control (AC4)
	   *	SCT Data Tables (AC5)
	    	unknown 206[12] (vendor specific)
	    	unknown 206[13] (vendor specific)
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
		supported: enhanced erase
	84min for SECURITY ERASE UNIT. 84min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee22157687
	NAA		: 5
	IEEE OUI	: 14ee
	Unique ID	: 22157687
Checksum: correct

Secondary:

/dev/sdb:

ATA device, with non-removable media
	Model Number:       ST9160823AS                             
	Serial Number:      5NK1DJD1
	Firmware Revision:  3.CME   
Standards:
	Supported: 7 6 5 4 
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  312581808
	device size with M = 1024*1024:      152627 MBytes
	device size with M = 1000*1000:      160041 MBytes (160 GB)
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 16
	Advanced power management level: 128
	Recommended acoustic management value: 254, current value: 0
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	DOWNLOAD_MICROCODE
	   *	Advanced Power Management feature set
	    	SET_MAX security extension
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	IDLE_IMMEDIATE with UNLOAD
	   *	Disable Data Transfer After Error Detection
	    	Write-Read-Verify feature set
	   *	WRITE_UNCORRECTABLE_EXT command
	   *	SATA-I signaling speed (1.5Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Phy event counters
	    	Device-initiated interface power management
	   *	Software settings preservation
	   *	SMART Command Transport (SCT) feature set
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
		supported: enhanced erase
	56min for SECURITY ERASE UNIT. 56min for ENHANCED SECURITY ERASE UNIT.
Checksum: correct

comment:15 Changed 10 years ago by marcusoverhagen

Please insert a panic directly after

" if (sVectors[vector].ignored_count > 9900) "

into src/system/kernel/int.c

I'm interested to see if that code path gets executed during the freeze.

comment:16 Changed 10 years ago by michael.weirauch

Nope, that code is not taken it seems. At least it seems not taken during the occassional UI freezes. (Which then relief again) Do you think it's worth planting some more dprintf() and or panic() in the interrupt handling code?

Btw, I think I need to invalidate my statement that the UI freezes don't happen with the ide-bm. Dunno what was different yesterday. Had them again with a fresh hrev30878 and ide-bm.

comment:17 Changed 10 years ago by bga

Michael, the UI freezes (the ones where the entire system seems to freeze and then resumes working after a few seconds) is not restricted to you. AFAIK, it happens with every single person running Haiku. I get it with a Core 2 Quad Extreme at 3.6 GHz and with 3 Gb of memory. It is indeed IO related but I don't think it is the cause of your other problems. If I were you, I would focus on the other pones as this is a know problem (there is a ticket for it somewhere).

comment:18 Changed 10 years ago by michael.weirauch

Hey Bruno, thanks for the info related to the occassional UI freezes. I just mention them as a side note. But as they aren't as dramatical, and as I am apparently not the only one experiencing these, I think can forego mentioning these until they disappear ;)

What is ineed more of a pressing issue are the total freezes of which I haven't yet figured out with the help from the "others" where they might stem from.

comment:19 in reply to:  17 Changed 10 years ago by bonefish

Replying to bga:

Michael, the UI freezes (the ones where the entire system seems to freeze and then resumes working after a few seconds) is not restricted to you. AFAIK, it happens with every single person running Haiku.

Haven't seen that problem on my machines yet.

comment:20 Changed 10 years ago by bga

Then you are lucky. :) It has been discussed several times already in the mailing lists (Axel even started looking into it once as it was even more visible in his EEE PC if I am not mistaken). It happens all the time when I do, for example, "svn up" in the Haiku tree or when compiling Haiku inside itself. It is easier to notice if you are playing an audio file for example but can also be noticed if you have "Show seconds" enabled in the Deskbar clock and you pay attention to it or if you run ActivityMonitor while doing what I mentioned as you will notice that sometimes it refuses to redraw itself for several seconds when heavy IO is going on.

*BUT*, this is a subject for another ticket I guess.

comment:21 Changed 10 years ago by michael.weirauch

hrev30981; ide-bm; acpi; scheduler-affine;

Following the recent "Scheduler"-thread on the mailing list, commenting the asm("hlt"); in x86/arch_cpu.cpp#arch_cpu_idle() or replacing it by a asm("nop"); results in the system to not freeze during or after compilation runs, or other I/O intense tasks.

System is up for more then two hours including compilations, svn up's and surfing.

comment:22 Changed 10 years ago by siarzhuk

Cc: imker@… added

comment:23 Changed 10 years ago by michael.weirauch

For keeping this one updated...

hrev32467-hrev32497;gcc4;ata-bm;acpi

For reproducing freezes I usually do now: dd if=/dev/zero of=dd.img bs=1024k count=4096

Sometimes the system freezes before reaching the 3GB (cache) memory limit, sometimes shortly after or minute(s) after the file has been written and the disk has actually written the file back. (There is heavy disk activity after the file has been created and the dd-process exited.)

On other occassions the dd-write just works (also repeatedly) fine and I can continue work. But sooner or later, the system will freeze.

comment:24 Changed 10 years ago by michael.weirauch

I am inclined to say that I probably found a way of circumventing the system freezes...

I did experiment with kernel_debug_config.h swap-support and memory-limitation:

memory-limitswap-supportswap-sizefreeze
-yesdisabledyes
-no-yes
512no-no
512yes509no
2560yes509no
2560no-no

So it seems to boil down to limiting the maximum available memory. I've done several tests - especially with the last config (though it shouldn't matter if swap is enabled or not) - which did reproduceably freeze the system before. (Including jam -qaj2, SDL configure and build, dd-tests creating a 4GB image)

No system freeze with limited available memory yesterday during tests session and a quick dd-test this morning.

As a side effect, it seems the system is able to shut down and reboot correctly where it mostly just frooze right before doing so. (Showing the last "state" of the shutdown alert/dialog.)

comment:25 Changed 10 years ago by michael.weirauch

afformentioned tests performed with hrev32869-trunk;gcc4;ata;acpi

comment:26 Changed 10 years ago by michael.weirauch

hrev32893-hrev33027-trunk;gcc4;ata;acpi

3067MB seems to be the last possible RAM-limit-setting on which the system is not freezable with the dd-tests.

memory-limitfreeze
-yes
3070yes
3068yes
3067no
3066no
3064no
3056;3040;2944no

In the meantime it turned out chaotic (trac-user) has the same ThinkPad T500 series, just another processor and other hard drive.

He also reported occassional system freezes and could reproduce the dd-test-freeze.

So I built him a hrev32932 hybrid kernel limited to 2944MB RAM for testing on his r1a1 installation and he could not get the system to freeze with the dd-tests. Just the occassional *UI*-freezes. (But these are another story...)

comment:27 Changed 10 years ago by marcusoverhagen

Perhaps some memory io range is not useable but it used by haiku? Is the e820 table ok?

comment:28 Changed 10 years ago by michael.weirauch

Marcus, pardon my ignorance, but how can I determine what contents that "e820" table holds and how I can make it available to you for verification? I have not the slightest idea where that table resides and what is supposed to represent ;) Thanks!

Btw, hrev33032 looks interessting. Will have to test later on...

comment:29 in reply to:  28 ; Changed 10 years ago by mmlr

Replying to michael.weirauch:

Btw, hrev33032 looks interessting. Will have to test later on...

Please try hrev33037 instead. The protection in hrev33032 didn't actually work due to overflowing.

comment:30 in reply to:  29 Changed 10 years ago by michael.weirauch

Replying to mmlr:

Replying to michael.weirauch:

Btw, hrev33032 looks interessting. Will have to test later on...

Please try hrev33037 instead. The protection in hrev33032 didn't actually work due to overflowing.

Just tested with a fresh hrev33040 and it did still freeze with the dd-test.

Let me know on how I can help tracking/determine if there is still something wrong with the overflows.

comment:31 Changed 10 years ago by michael.weirauch

hrev33064; freezes still reproduceable; sorry to have report that.

4 runs: (max cache-memory visually approximated from ActivityMonitor output)

  1. freeze after disk-activity ceased (long after dd exited)
  2. max cache-memory bypassed, freeze right after
  3. freeze some megabytes before max cache-memory
  4. freeze some megabytes before max cache-memory

comment:32 Changed 10 years ago by michael.weirauch

Summary: Freeze on Haiku tree compilation (reproduceable)Freeze on high memory load when not limiting available memory (reproduceable)
Version: R1/pre-alpha1R1/Development

hrev33655; gcc4hybrid;

Still persistent. Some mtrr info. Perhaps this sheds some more light into the issue as it seems memory related. (Limiting memory to 3067MB works)

bott_mtrr_dump.diff from #4399

mtrr: 7 variable ranges
mtrr: default type: 0xc00 (uncacheable, variable enabled, fixed enabled)
mtrr: entry 0: base: 0x13c000000; length: 0x40007ff; type: 0 uncacheable
mtrr: entry 1: base: 0x0; length: 0x800007ff; type: 6 write-back
mtrr: entry 2: base: 0x80000000; length: 0x400007ff; type: 6 write-back
mtrr: entry 3: base: 0x100000000; length: 0x400007ff; type: 6 write-back
mtrr: entry 4: empty
mtrr: entry 5: empty
mtrr: entry 6: empty

/proc/mtrr from openSUSE:

reg00: base=0x13c000000 (5056MB), size=  64MB: uncachable, count=1
reg01: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg02: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg04: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1

comment:33 Changed 9 years ago by bonefish

That's the MTRR setup produced by the BIOS and since we reset all MTRRs during the boot process, this is not related. It would be interesting to see the MTTR setup we produce, though. Enable TRACE_MTRR in src/add-ons/kernel/cpu/x86/generic_x86.cpp to have it printed to the syslog. You might need to increase syslog_buffer_size in your kernel settings to prevent the output from being dropped.

comment:34 Changed 9 years ago by bonefish

hrev34197 reimplements the MTRR handling. Please retest.

comment:35 in reply to:  34 ; Changed 9 years ago by michael.weirauch

Replying to bonefish:

hrev34197 reimplements the MTRR handling. Please retest.

hrev34198-gcc4h; fresh install; Unfortunately the freeze still occurs. On 2 of the 2 dd-tests, the ActivitMonitor-GUI froze on max block-cache memory usage. Mouse still movable. Then it started drawing again after about 5-10 seconds. And then the system froze completely. Once after the 4GB file was written, and once right after the ActivityMonitor drew again.

Btw, I am not getting any "mtrr:" output, allthough TRACE_MTRR is defined. If it helps, I haven't seen any early dprintf() of my little acpi_thinkpad in the early "support"-hook either on my work installation.

comment:36 in reply to:  35 Changed 9 years ago by bonefish

Replying to michael.weirauch:

Btw, I am not getting any "mtrr:" output, allthough TRACE_MTRR is defined. If it helps, I haven't seen any early dprintf() of my little acpi_thinkpad in the early "support"-hook either on my work installation.

Have you increased syslog_buffer_size as I've suggested? If you still get <TRUNC>/<DROP> the buffer size is still not sufficiently large.

Changed 9 years ago by michael.weirauch

ThinkPad T500 hrev34198 TRACE_MTRR

comment:37 Changed 9 years ago by michael.weirauch

Hi Ingo, sorry for the trouble! The elevated syslog_buffer_size setting which I usually have enabled when cross compiling and installing got lost on my work installation.

Attached a syslog. Additionally the boot_mttr_dump-patch output is still present at the very beginning. Hope this helps.

comment:38 Changed 9 years ago by bonefish

Our MTRR setup looks OK -- considering what it can do with the weird ranges it gets ((base 0x0, size 0xbfac6000) and (base 0xbfdff000, size 0x1000)) at least. The BIOS has a laxer setup, so I'd say your issue is not MTRR related. To be sure you could check whether the MTRR setup under Linux is also not stronger than ours.

PS: The boot_mttr_dump patch is not needed when you enable TRACE_MTRR -- the same info is printed anyway (in fact even more correctly).

comment:39 in reply to:  38 ; Changed 9 years ago by michael.weirauch

Replying to bonefish:

Our MTRR setup looks OK -- considering what it can do with the weird ranges it gets ((base 0x0, size 0xbfac6000) and (base 0xbfdff000, size 0x1000)) at least. The BIOS has a laxer setup, so I'd say your issue is not MTRR related. To be sure you could check whether the MTRR setup under Linux is also not stronger than ours.

Please see some posts above for the openSUSE (11.1) setup: here

PS: The boot_mttr_dump patch is not needed when you enable TRACE_MTRR -- the same info is printed anyway (in fact even more correctly).

Ok, going to remove it again.

comment:40 in reply to:  39 ; Changed 9 years ago by bonefish

Replying to michael.weirauch:

Replying to bonefish:

Our MTRR setup looks OK -- considering what it can do with the weird ranges it gets ((base 0x0, size 0xbfac6000) and (base 0xbfdff000, size 0x1000)) at least. The BIOS has a laxer setup, so I'd say your issue is not MTRR related. To be sure you could check whether the MTRR setup under Linux is also not stronger than ours.

Please see some posts above for the openSUSE (11.1) setup: here

Ah, missed that (the ticket is getting rather long :-/). Verified, the setup is laxer than ours too. So this is not MTRR related for sure.

comment:41 in reply to:  40 Changed 9 years ago by michael.weirauch

Replying to bonefish:

Ah, missed that (the ticket is getting rather long :-/). Verified, the setup is laxer than ours too. So this is not MTRR related for sure.

Thanks for having a look at this Ingo! Before I let this ticket remain idle until it is resolved one day...
I am currently running with 3067MB limit and am experiencing no freezes whatsoever. Everything is fine. The other day I switched (in BIOS) to use the Intel GMA X4500 instead of the Radeon HD3650 in order to be able to use the full screen estate (1680x1050 instead of 1400x1050 as the Radeon VBE doesn't export the full mode list). I removed the intel_extreme driver though, because it seemed to "flicker", so I went with VESA. I was very quickly able to get the system to freeze again during use of the system. (Memory was 3034MB due to RAM shared with the GPU) The question is, if this is not MTRR related, can this still be "memory management"-related in general as the use of the integrated graphics influences the memory system by using parts of it as video memory and seemed to bring back the freezeability?

comment:42 Changed 9 years ago by korli

0xbfac6000 seems to be your upper memory limit. Maybe the system tries to use this physical memory even if it seems to be mapped with map_physical_memory(). Which driver tries to map this range ?

comment:43 in reply to:  42 Changed 9 years ago by michael.weirauch

Replying to korli:

0xbfac6000 seems to be your upper memory limit. Maybe the system tries to use this physical memory even if it seems to be mapped with map_physical_memory(). Which driver tries to map this range ?

Could you give a hint on how to provide this info? ;)

comment:44 Changed 9 years ago by michael.weirauch

hrev35736 fixed this longstanding issue without having to limit the available system memory to 3067MB! (System now shows 3069MB due to ignoring the lower 1MB)

None of the dd-tests to provoke the system freeze actually get it to do so anymore. Great work! Bug can be closed.

comment:45 Changed 9 years ago by stippi

Resolution: fixed
Status: newclosed

Awesome, thanks for the note!

Note: See TracTickets for help on using tickets.