Opened 9 years ago

Last modified 2 years ago

#5353 assigned enhancement

Improve MTRR setup

Reported by: PieterPanman Owned by: nobody
Priority: normal Milestone: R1
Component: System/Kernel Version: R1/Development
Keywords: mtrr Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

Just so that this isn't forgotten: For my laptop (with vesa), the current MTRR setup is not able to properly cover the framebuffer. Some people will probably be stuck with vesa, and there is a huge performance difference if the MTRR setup covers the framebuffer for vesa. A better way of achieving this is needed.

See bonefish's comment in #5085.

Refer to #5085 for more information. Right now I just use the workaround in that bug, which works well.

Attachments (1)

syslog (143.6 KB) - added by PieterPanman 9 years ago.
Syslog of hrev35519 with larger ringbuffer

Download all attachments as: .zip

Change History (9)

comment:1 Changed 9 years ago by bonefish

Type: bugenhancement

That's not only VESA related. Even for cards that have driver support the graphics memory needs to be marked "write-combining".

A relatively simple solution would be to mark the whole physical address space WB via one MTRR and use the PCD and PWT bits in the PTEs to define the actual memory type. Not sure, if that has any disadvantages, though.

comment:2 Changed 9 years ago by bonefish

Blocking: 5383 added

(In #5383) Here's the interesting part from the latest syslog:

1049	KERN: add_memory_type_range(-1, 0x0, 0x1ffec000, 6)
1050	KERN: set MTRRs to:
1051	KERN:   mtrr:  0: base:       0x0, size: 0x20000000, type: 6
1052	KERN:   mtrr:  1: base: 0x1fff0000, size:   0x10000, type: 0
1053	KERN:   mtrr:  2: base: 0x1ffec000, size:    0x4000, type: 0

That's the RAM range (write-back) at 0 - 0x1ffec000.

1060	KERN: add_memory_type_range(75, 0xf0000000, 0x300000, 1)
1061	KERN: set MTRRs to:
1062	KERN:   mtrr:  0: base:       0x0, size: 0x20000000, type: 6
1063	KERN:   mtrr:  1: base: 0x1fff0000, size:   0x10000, type: 0
1064	KERN:   mtrr:  2: base: 0x1ffec000, size:    0x4000, type: 0
1065	KERN:   mtrr:  3: base: 0xf0000000, size:  0x200000, type: 1
1066	KERN:   mtrr:  4: base: 0xf0200000, size:  0x100000, type: 1

That's a 3 MB write-combining memory range at 0xf0000000.

1110	KERN: add_memory_type_range(1279, 0x4d00000, 0x100000, 1)
1111	KERN: add_memory_type_range(1279, 0x4d00000, 0x100000, 1): Memory range intersects with existing one (0x0, 0x1ffec000, 6).

A one MB write-combining memory range at 0x4d00000, which intersects with the RAM range, and is therefore ignored.

The previous algorithm would just have used a free MTR register, which would have worked in this case. Generally this is not an option though, if a subtractive MTRR setup is used. Theoretically the existing range could be split and the MTRR setup be recomputed, but since oftentimes the MTRR setups are so complex that the number of MTRRs is barely sufficient (or not even that), I don't think this is a reasonable approach. As suggested in #5353 we should probably not even try to use MTRRs, but rather define memory types via the respective PTE bits.

comment:3 Changed 9 years ago by bonefish

Blocking: 5383 removed

(In #5383) Replying to rudolfc:

The driver's buffer is used for writing by the CPU only. The GPU reads from this buffer. Using MTRR-WC has a big-time acceleration performance increase compared to write-trough or uncached (especially in accelerated 3D, I benchmarked this once).

I was a bit surprised that WT is slower than WC, since the specification for WT says that "write-combining is allowed". Setting the frame buffer to WT instead of WC makes the graphics feel tremendously slower, so apparently the "is allowed" part doesn't mean it's actually done.

Anyway, the problem should be fixed for P6 and later in hrev35515, since overlapping ranges are now handled correctly. Please close the ticket, if you can verify this.

comment:4 Changed 9 years ago by bonefish

Pieter, please give hrev35515 a try. Regardless of whether it improves the situation on your laptop, I would appreciate a new syslog from this revision. Thanks!

comment:5 Changed 9 years ago by PieterPanman

Sure, I will probably do it this weekend. Thanks for working on it.

Changed 9 years ago by PieterPanman

Attachment: syslog added

Syslog of hrev35519 with larger ringbuffer

comment:6 Changed 9 years ago by PieterPanman

I reverted everything back to normal and updated to hrev35519. It now performs smoothly, even playing videos at full screen. Syslog attached. Mouse acts a little jerky, possibly related to the ps/2 outputs in the syslog? Bugworthy?

comment:7 in reply to:  6 Changed 9 years ago by bonefish

Replying to PieterPanman:

I reverted everything back to normal and updated to hrev35519. It now performs smoothly, even playing videos at full screen. Syslog attached.

Thanks, looks nice!

Mouse acts a little jerky, possibly related to the ps/2 outputs in the syslog? Bugworthy?

Sure.

Leaving the ticket open as general enhancement ticket. The current algorithm is much better at using the MTRRs, but still it can run out of registers. Furthermore the Pentium doesn't have MTRRs (the memory type ranges are predefined by hardware), so not everything that should work does actually work yet. The method of using the PTE flags as proposed in 1 would work fine for Pentium III and later, since there PAT is available, which makes all memory types settable. For Pentium II/Pro some mix of MTRRs and fallback to PTE flags could be used. For Pentium the PTE flags should be used, if a type stricter than predefined by the hardware is required.

comment:8 Changed 2 years ago by axeld

Owner: changed from axeld to nobody
Status: newassigned
Note: See TracTickets for help on using tickets.