Opened 15 years ago
Closed 15 years ago
#5383 closed bug (fixed)
MTRR regression: AGP transfer inconsistencies
Reported by: | rudolfc | Owned by: | bonefish |
---|---|---|---|
Priority: | normal | Milestone: | R1 |
Component: | - General | Version: | R1/Development |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
Since hrev34197 MTRR doesn't work OK anymore on my ASUS P4 mainboard with AGP slot.
The nvidia kerneldriver maps a 1Mb buffer using MTRR-WC in main system memory. This buffer is written to by the CPU, and it's read by the GPU using AGP transfers. It turns out that the GPU sees an inconsistently written buffer which results in old / partially degraded commands being executed by the engine, resulting in an engine hang in the end. Using PCI transfers this problem doesn't occur. If I relocate the buffer to gfx memory, this problem doesn't occur. If I disable MTRR-WC mapping in the kerneldriver the problem remains in AGP.
On PCIe systems this problem doesn't exist.
hrev34190 is OK, hrev34203 is not OK. From the changes in between I think hrev34197 must be the problem. Note: BeOS also is OK on this system! (dano/R5). I tested hrev35431: still not OK, same symptoms as hrev34203.
Please have a look at the MTRR changes in relation to the nVidia kerneldriver!
The reason the problem only now surfaces is that acceleration engines are currently not used in Haiku. In order to test you need to use acceleration commands.
Thanks in advance!
Attachments (4)
Change History (15)
by , 15 years ago
Attachment: | syslog_mtr_ok_r34190 added |
---|
comment:1 by , 15 years ago
Description: | modified (diff) |
---|
comment:2 by , 15 years ago
Blocked By: | 5353 added |
---|
Here's the interesting part from the latest syslog:
1049 KERN: add_memory_type_range(-1, 0x0, 0x1ffec000, 6) 1050 KERN: set MTRRs to: 1051 KERN: mtrr: 0: base: 0x0, size: 0x20000000, type: 6 1052 KERN: mtrr: 1: base: 0x1fff0000, size: 0x10000, type: 0 1053 KERN: mtrr: 2: base: 0x1ffec000, size: 0x4000, type: 0
That's the RAM range (write-back) at 0 - 0x1ffec000.
1060 KERN: add_memory_type_range(75, 0xf0000000, 0x300000, 1) 1061 KERN: set MTRRs to: 1062 KERN: mtrr: 0: base: 0x0, size: 0x20000000, type: 6 1063 KERN: mtrr: 1: base: 0x1fff0000, size: 0x10000, type: 0 1064 KERN: mtrr: 2: base: 0x1ffec000, size: 0x4000, type: 0 1065 KERN: mtrr: 3: base: 0xf0000000, size: 0x200000, type: 1 1066 KERN: mtrr: 4: base: 0xf0200000, size: 0x100000, type: 1
That's a 3 MB write-combining memory range at 0xf0000000.
1110 KERN: add_memory_type_range(1279, 0x4d00000, 0x100000, 1) 1111 KERN: add_memory_type_range(1279, 0x4d00000, 0x100000, 1): Memory range intersects with existing one (0x0, 0x1ffec000, 6).
A one MB write-combining memory range at 0x4d00000, which intersects with the RAM range, and is therefore ignored.
The previous algorithm would just have used a free MTR register, which would have worked in this case. Generally this is not an option though, if a subtractive MTRR setup is used. Theoretically the existing range could be split and the MTRR setup be recomputed, but since oftentimes the MTRR setups are so complex that the number of MTRRs is barely sufficient (or not even that), I don't think this is a reasonable approach. As suggested in #5353 we should probably not even try to use MTRRs, but rather define memory types via the respective PTE bits.
follow-up: 4 comment:3 by , 15 years ago
Hi there,
Thanks for the info. I did a quick read on the PTE bits and #5353: so if you go that way that would mean that MTRR-WC in system RAM would not be possible anymore on Haiku since the PWT bit only gives the options write-trough and write back? (not write-combining)
Since I cannot flush the ram buffer (partly) in userspace(?) write-back isn't useable so the writing to cache can only be disabled indeed then to get it working at all?
Do I miss something? Is the driver correctly setup for this 1Mb buffer? Should I modify something here or is this a problem that will be solved elsewhere (kernel)?
Since I am no expert here I would very much appreciate info on what I should do or what I should expect...
Thanks!
Rudolf.
comment:4 by , 15 years ago
Replying to rudolfc:
Thanks for the info. I did a quick read on the PTE bits and #5353: so if you go that way that would mean that MTRR-WC in system RAM would not be possible anymore on Haiku since the PWT bit only gives the options write-trough and write back? (not write-combining)
Oh my, you're right. I guess I misread the WT for WC. So the method would really only work with PAT support (i.e. for Pentium III and later), which allows to select any memory type. There go my hopes to get rid of the MTRR pain for good... :-/
Since I cannot flush the ram buffer (partly) in userspace(?) write-back isn't useable so the writing to cache can only be disabled indeed then to get it working at all?
Do I miss something? Is the driver correctly setup for this 1Mb buffer? Should I modify something here or is this a problem that will be solved elsewhere (kernel)?
This is really just a kernel problem. The driver is set up correctly. Well, that is I don't know how things would need to work on pre-P6 processors, i.e. those that don't have MTRRs. Obviously on those one cannot set the memory type for a RAM range to WC, anyway. I haven't found any info on when the PTE caching bits had been introduced.
comment:5 by , 15 years ago
Version: | R1/alpha1 → R1/Development |
---|
The Architecture Compatibility chapter reveals that the PCD and PWT flags have been introduced in the 486.
Regarding the how to get your driver's RAM buffer working correctly: If WT is not acceptable for that, then on pre-P6 processors (i.e. the Pentium) the kernel will have to fall back to UC. No way around that. For Pentium III and later PAT+PCD+PWT can be used in combination with a single WB MTR. That leaves Pentium II and Pentium Pro (and supposedly the competition's equivalents). Either those are sacrificed for sake of simplicity, also using a single WB MTR with PCD+PWT and therefore having to resort to UC when WC is requested (which would also hold for the frame buffer!), or they continue to use MTRRs as best as possible, thus needing to be special-cased.
comment:6 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | new → in-progress |
follow-up: 8 comment:7 by , 15 years ago
The driver's buffer is used for writing by the CPU only. The GPU reads from this buffer. Using MTRR-WC has a big-time acceleration performance increase compared to write-trough or uncached (especially in accelerated 3D, I benchmarked this once). If MTRR-WC cannot be done for this buffer, then the buffer should be either write-trough or uncached, but _not_ write-back. the same goes for 3D buffers in main memory holding graphics instructions or bitmaps (preventing drawing artifacts). Write-back cache can only be used if the driver's accelerant can flush the buffers before marking them as active to the GPU, without a performance penalty as compared to using MTRR-WC.
It's good to see you are going to work on this, thanks a lot!
Rudolf.
comment:8 by , 15 years ago
Blocked By: | 5353 removed |
---|
Replying to rudolfc:
The driver's buffer is used for writing by the CPU only. The GPU reads from this buffer. Using MTRR-WC has a big-time acceleration performance increase compared to write-trough or uncached (especially in accelerated 3D, I benchmarked this once).
I was a bit surprised that WT is slower than WC, since the specification for WT says that "write-combining is allowed". Setting the frame buffer to WT instead of WC makes the graphics feel tremendously slower, so apparently the "is allowed" part doesn't mean it's actually done.
Anyway, the problem should be fixed for P6 and later in hrev35515, since overlapping ranges are now handled correctly. Please close the ticket, if you can verify this.
comment:9 by , 15 years ago
Thanks for your work! I'll check and close the ticket. Might be a few days though.. :-/
Bye!
Rudolf.
follow-up: 11 comment:10 by , 15 years ago
Hi again,
On hrev35580 it's working again, acceleration is operating correctly in AGP mode. I'll attach a syslog below.
BTW: On previous versions the nvidia kernel driver should have aborted loading because it checks for success of mapping the command buffer (if (si->dma_area < 0) fail).
Is the kernel correctly reporting failed mappings, where it should fail if it's not able to set the requested cache type?
I'm leaving the ticket open because of my question above. Feel free to close the ticket, especially after double checking if reporting is done correctly.. ;-)
Thanks again!
Rudolf.
by , 15 years ago
Attachment: | syslog_mtr_ok_r35580 added |
---|
correctly working (?) MTRR mapping again..
comment:11 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | in-progress → closed |
Replying to rudolfc:
On hrev35580 it's working again, acceleration is operating correctly in AGP mode. I'll attach a syslog below.
Thanks, looks beautiful. :-)
BTW: On previous versions the nvidia kernel driver should have aborted loading because it checks for success of mapping the command buffer (if (si->dma_area < 0) fail).
Is the kernel correctly reporting failed mappings, where it should fail if it's not able to set the requested cache type?
It should have worked as expected. The "Memory range intersects with existing one" output from add_memory_type_range() we see in the syslog was immediately followed by a "return B_BAD_VALUE" in that function. The return value was passed right through by arch_vm_set_memory_type(). The calling vm_map_physical_memory() checked the return value, deleted the already created area and would pass the error code back. So, yes indeed, [vm_]map_physical_memory() would fail, when the memory type could not be set.
With the exception that MTRR algorithm has changed and vm_map_physical_memory() now always sets a memory type ("uncached", if none has been specified), errors are still propagated the same way.
syslog on OK revision hrev34190