Opened 14 years ago

Closed 14 years ago

#5383 closed bug (fixed)

MTRR regression: AGP transfer inconsistencies

Reported by: rudolfc Owned by: bonefish
Priority: normal Milestone: R1
Component: - General Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by rudolfc)

Since hrev34197 MTRR doesn't work OK anymore on my ASUS P4 mainboard with AGP slot.

The nvidia kerneldriver maps a 1Mb buffer using MTRR-WC in main system memory. This buffer is written to by the CPU, and it's read by the GPU using AGP transfers. It turns out that the GPU sees an inconsistently written buffer which results in old / partially degraded commands being executed by the engine, resulting in an engine hang in the end. Using PCI transfers this problem doesn't occur. If I relocate the buffer to gfx memory, this problem doesn't occur. If I disable MTRR-WC mapping in the kerneldriver the problem remains in AGP.

On PCIe systems this problem doesn't exist.

hrev34190 is OK, hrev34203 is not OK. From the changes in between I think hrev34197 must be the problem. Note: BeOS also is OK on this system! (dano/R5). I tested hrev35431: still not OK, same symptoms as hrev34203.

Please have a look at the MTRR changes in relation to the nVidia kerneldriver!

The reason the problem only now surfaces is that acceleration engines are currently not used in Haiku. In order to test you need to use acceleration commands.

Thanks in advance!

Attachments (4)

syslog_mtr_ok_r34190 (468.2 KB ) - added by rudolfc 14 years ago.
syslog on OK revision hrev34190
syslog_mtr_err_r34203 (58.2 KB ) - added by rudolfc 14 years ago.
syslog on error revision hrev34203
syslog_mtr_err_r35431 (58.2 KB ) - added by rudolfc 14 years ago.
syslog on error revision hrev35431
syslog_mtr_ok_r35580 (59.0 KB ) - added by rudolfc 14 years ago.
correctly working (?) MTRR mapping again..

Download all attachments as: .zip

Change History (15)

by rudolfc, 14 years ago

Attachment: syslog_mtr_ok_r34190 added

syslog on OK revision hrev34190

by rudolfc, 14 years ago

Attachment: syslog_mtr_err_r34203 added

syslog on error revision hrev34203

by rudolfc, 14 years ago

Attachment: syslog_mtr_err_r35431 added

syslog on error revision hrev35431

comment:1 by rudolfc, 14 years ago

Description: modified (diff)

comment:2 by bonefish, 14 years ago

Blocked By: 5353 added

Here's the interesting part from the latest syslog:

1049	KERN: add_memory_type_range(-1, 0x0, 0x1ffec000, 6)
1050	KERN: set MTRRs to:
1051	KERN:   mtrr:  0: base:       0x0, size: 0x20000000, type: 6
1052	KERN:   mtrr:  1: base: 0x1fff0000, size:   0x10000, type: 0
1053	KERN:   mtrr:  2: base: 0x1ffec000, size:    0x4000, type: 0

That's the RAM range (write-back) at 0 - 0x1ffec000.

1060	KERN: add_memory_type_range(75, 0xf0000000, 0x300000, 1)
1061	KERN: set MTRRs to:
1062	KERN:   mtrr:  0: base:       0x0, size: 0x20000000, type: 6
1063	KERN:   mtrr:  1: base: 0x1fff0000, size:   0x10000, type: 0
1064	KERN:   mtrr:  2: base: 0x1ffec000, size:    0x4000, type: 0
1065	KERN:   mtrr:  3: base: 0xf0000000, size:  0x200000, type: 1
1066	KERN:   mtrr:  4: base: 0xf0200000, size:  0x100000, type: 1

That's a 3 MB write-combining memory range at 0xf0000000.

1110	KERN: add_memory_type_range(1279, 0x4d00000, 0x100000, 1)
1111	KERN: add_memory_type_range(1279, 0x4d00000, 0x100000, 1): Memory range intersects with existing one (0x0, 0x1ffec000, 6).

A one MB write-combining memory range at 0x4d00000, which intersects with the RAM range, and is therefore ignored.

The previous algorithm would just have used a free MTR register, which would have worked in this case. Generally this is not an option though, if a subtractive MTRR setup is used. Theoretically the existing range could be split and the MTRR setup be recomputed, but since oftentimes the MTRR setups are so complex that the number of MTRRs is barely sufficient (or not even that), I don't think this is a reasonable approach. As suggested in #5353 we should probably not even try to use MTRRs, but rather define memory types via the respective PTE bits.

comment:3 by rudolfc, 14 years ago

Hi there,

Thanks for the info. I did a quick read on the PTE bits and #5353: so if you go that way that would mean that MTRR-WC in system RAM would not be possible anymore on Haiku since the PWT bit only gives the options write-trough and write back? (not write-combining)

Since I cannot flush the ram buffer (partly) in userspace(?) write-back isn't useable so the writing to cache can only be disabled indeed then to get it working at all?

Do I miss something? Is the driver correctly setup for this 1Mb buffer? Should I modify something here or is this a problem that will be solved elsewhere (kernel)?

Since I am no expert here I would very much appreciate info on what I should do or what I should expect...

Thanks!

Rudolf.

in reply to:  3 comment:4 by bonefish, 14 years ago

Replying to rudolfc:

Thanks for the info. I did a quick read on the PTE bits and #5353: so if you go that way that would mean that MTRR-WC in system RAM would not be possible anymore on Haiku since the PWT bit only gives the options write-trough and write back? (not write-combining)

Oh my, you're right. I guess I misread the WT for WC. So the method would really only work with PAT support (i.e. for Pentium III and later), which allows to select any memory type. There go my hopes to get rid of the MTRR pain for good... :-/

Since I cannot flush the ram buffer (partly) in userspace(?) write-back isn't useable so the writing to cache can only be disabled indeed then to get it working at all?

Do I miss something? Is the driver correctly setup for this 1Mb buffer? Should I modify something here or is this a problem that will be solved elsewhere (kernel)?

This is really just a kernel problem. The driver is set up correctly. Well, that is I don't know how things would need to work on pre-P6 processors, i.e. those that don't have MTRRs. Obviously on those one cannot set the memory type for a RAM range to WC, anyway. I haven't found any info on when the PTE caching bits had been introduced.

comment:5 by bonefish, 14 years ago

Version: R1/alpha1R1/Development

The Architecture Compatibility chapter reveals that the PCD and PWT flags have been introduced in the 486.

Regarding the how to get your driver's RAM buffer working correctly: If WT is not acceptable for that, then on pre-P6 processors (i.e. the Pentium) the kernel will have to fall back to UC. No way around that. For Pentium III and later PAT+PCD+PWT can be used in combination with a single WB MTR. That leaves Pentium II and Pentium Pro (and supposedly the competition's equivalents). Either those are sacrificed for sake of simplicity, also using a single WB MTR with PCD+PWT and therefore having to resort to UC when WC is requested (which would also hold for the frame buffer!), or they continue to use MTRRs as best as possible, thus needing to be special-cased.

comment:6 by bonefish, 14 years ago

Owner: changed from nobody to bonefish
Status: newin-progress

comment:7 by rudolfc, 14 years ago

The driver's buffer is used for writing by the CPU only. The GPU reads from this buffer. Using MTRR-WC has a big-time acceleration performance increase compared to write-trough or uncached (especially in accelerated 3D, I benchmarked this once). If MTRR-WC cannot be done for this buffer, then the buffer should be either write-trough or uncached, but _not_ write-back. the same goes for 3D buffers in main memory holding graphics instructions or bitmaps (preventing drawing artifacts). Write-back cache can only be used if the driver's accelerant can flush the buffers before marking them as active to the GPU, without a performance penalty as compared to using MTRR-WC.

It's good to see you are going to work on this, thanks a lot!

Rudolf.

in reply to:  7 comment:8 by bonefish, 14 years ago

Blocked By: 5353 removed

Replying to rudolfc:

The driver's buffer is used for writing by the CPU only. The GPU reads from this buffer. Using MTRR-WC has a big-time acceleration performance increase compared to write-trough or uncached (especially in accelerated 3D, I benchmarked this once).

I was a bit surprised that WT is slower than WC, since the specification for WT says that "write-combining is allowed". Setting the frame buffer to WT instead of WC makes the graphics feel tremendously slower, so apparently the "is allowed" part doesn't mean it's actually done.

Anyway, the problem should be fixed for P6 and later in hrev35515, since overlapping ranges are now handled correctly. Please close the ticket, if you can verify this.

comment:9 by rudolfc, 14 years ago

Thanks for your work! I'll check and close the ticket. Might be a few days though.. :-/

Bye!

Rudolf.

comment:10 by rudolfc, 14 years ago

Hi again,

On hrev35580 it's working again, acceleration is operating correctly in AGP mode. I'll attach a syslog below.

BTW: On previous versions the nvidia kernel driver should have aborted loading because it checks for success of mapping the command buffer (if (si->dma_area < 0) fail).

Is the kernel correctly reporting failed mappings, where it should fail if it's not able to set the requested cache type?

I'm leaving the ticket open because of my question above. Feel free to close the ticket, especially after double checking if reporting is done correctly.. ;-)

Thanks again!

Rudolf.

by rudolfc, 14 years ago

Attachment: syslog_mtr_ok_r35580 added

correctly working (?) MTRR mapping again..

in reply to:  10 comment:11 by bonefish, 14 years ago

Resolution: fixed
Status: in-progressclosed

Replying to rudolfc:

On hrev35580 it's working again, acceleration is operating correctly in AGP mode. I'll attach a syslog below.

Thanks, looks beautiful. :-)

BTW: On previous versions the nvidia kernel driver should have aborted loading because it checks for success of mapping the command buffer (if (si->dma_area < 0) fail).

Is the kernel correctly reporting failed mappings, where it should fail if it's not able to set the requested cache type?

It should have worked as expected. The "Memory range intersects with existing one" output from add_memory_type_range() we see in the syslog was immediately followed by a "return B_BAD_VALUE" in that function. The return value was passed right through by arch_vm_set_memory_type(). The calling vm_map_physical_memory() checked the return value, deleted the already created area and would pass the error code back. So, yes indeed, [vm_]map_physical_memory() would fail, when the memory type could not be set.

With the exception that MTRR algorithm has changed and vm_map_physical_memory() now always sets a memory type ("uncached", if none has been specified), errors are still propagated the same way.

Note: See TracTickets for help on using tickets.