Opened 12 years ago

Closed 11 years ago

Last modified 6 years ago

#1950 closed bug (fixed)

Garbled screen contents with true colour mode on nvidia gf2mx

Reported by: jopadan Owned by: axeld
Priority: normal Milestone: R1
Component: - General Version: R1/pre-alpha1
Keywords: Cc: j_freeman, tigerdog
Blocked By: Blocking:
Has a Patch: no Platform: x86

Description

System is a single Opteron 240 on a Gigabyte K8NNXP-940 nForce3 150 Chipsatz Mainboard. Graphics Cards: Nvidia GeForce 2 MX 32MB AGP 2x Voodoo2 12MB SLI

Haiku version is current SVN trunk. Initial screen at first boot is garbled too. I don't know if any nvidia accelerant or vesa driver is in use I just use the default configuration Setting 8 and 16 bit modes works fine. 24 and 32 bit modes are garbled and you have a very hard time trying to figure out the contents

Attachments (2)

MTRR_serial_r24559.txt (80.6 KB ) - added by jonas.kirilla 12 years ago.
syslog (107.3 KB ) - added by jopadan 12 years ago.

Download all attachments as: .zip

Change History (28)

comment:1 by jonas.kirilla, 12 years ago

I see this too with a nVidia FX5500 card. I think hrev24494 is where it starts happening, so I suppose it's got to do with MTRR. The screen looks like the accelerant isn't working: no block fills and no block moves, everything shown leaves traces. (Even the Shutdown window leaves its mark, already at bootup!) Can provide serial and screenshots tomorrow if desired.

by jonas.kirilla, 12 years ago

Attachment: MTRR_serial_r24559.txt added

comment:3 by jonas.kirilla, 12 years ago

Serial output attached. Part of it: ... allocate MTRR slot 0, base = 0, length = 20000000, type=0x6 kernel debugger extension "debugger/hangman/v1": loaded kernel debugger extension "debugger/invalidate_on_exit/v1": loaded allocate MTRR slot 1, base = f0000000, length = 100000, type=0x1 ... loaded driver /boot/beos/system/add-ons/kernel/drivers/dev/graphics/vesa allocate MTRR failed, it overlaps an existing MTRR slot allocate MTRR slot 2, base = f0000000, length = 8000000, type=0x1

Same base? (f0000000)

comment:4 by korli, 12 years ago

I would need additional information if it's possible : on Linux, you should find something about "BIOS-provided physical RAM map" in /var/log/messages, especially lines beginning with "BIOS-e820". Please provide on Linux the result of "cat /proc/mtrr". Thanks.

comment:5 by korli, 12 years ago

Hmm I didn't notice the two graphics cards, I don't know if this is a problem. I also noticed the second slot (slot 1) seems to be allocated by the function frame_buffer_console_init_post_modules() in src/system/kernel/debug/frame_buffer_console.cpp. How should this case be handled ?

comment:6 by jopadan, 12 years ago

I'll attach it here for you without the voodoo mtrr I think and you should keep in mind it is x86_64:

reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0xe0000000 (3584MB), size= 128MB: write-combining, count=1

BIOS-provided physical RAM map:

BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)

comment:7 by rudolfc, 12 years ago

Hi,

From the pictures it's clear that the acceleration engine crashes. Since after a few timeouts the driver simply drops all accelerated drawing commands you see what you see here. Instead of the system totally hanging that is (gfx wise).

Maybe try: PCI mode versus AGP mode (nvidia.settings or disable AGP busmanager) If PCI mode works then the driver/card might dislike PCI->AGP switching after it was engaged. Fix: hmmm, don't know at this time. Is the splash icons screen being drawn accelerated? maybe not doing that would fix it, but you could call that a work around. AFAIK it's impossible (== not known) to do a full hard reset initiated by software of the cards: sometimes if the acc engine hangs a reboot is nessesary to solve that.

MTRR is used in the driver as well indeed. Inside the kerneldriver a temporary recompile with disabled MTRR support could be tried (it sits in multiple places!). If it works without MTRR, well then the MTRR change is probably the problem.

Regards,

Rudolf.

in reply to:  6 ; comment:8 by korli, 12 years ago

Replying to jopadan:

I'll attach it here for you without the voodoo mtrr I think and you should keep in mind it is x86_64:

Any chance to have a serial log or syslog ?

comment:9 by korli, 12 years ago

Could you check with hrev24582 ?

comment:10 by jonas.kirilla, 12 years ago

/proc/mtrr

reg00: base=0x00000000 (   0MB), size= 512MB: write-back, count=1
reg01: base=0xf8000000 (3968MB), size=  64MB: write-combining, count=1

/var/log/messages

 ...
Mar 25 22:02:23 kirilla kernel: [    0.000000] BIOS-provided physical RAM map:
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 0000000000100000 - 000000001ffec000 (usable)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 000000001ffec000 - 000000001ffef000 (ACPI data)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 000000001ffef000 - 000000001ffff000 (reserved)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 000000001ffff000 - 0000000020000000 (ACPI NVS)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Mar 25 22:02:23 kirilla kernel: [    0.000000]  BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
Mar 25 22:02:23 kirilla kernel: [    0.000000] 0MB HIGHMEM available.
Mar 25 22:02:23 kirilla kernel: [    0.000000] 511MB LOWMEM available.
 ...
Mar 25 22:02:23 kirilla kernel: [   13.550042] Console: colour VGA+ 80x25
 ...
Mar 25 22:02:23 kirilla kernel: [   30.342591] Linux agpgart interface v0.102 (c) Dave Jones
Mar 25 22:02:23 kirilla kernel: [   30.355408] agpgart: Detected an Intel 845G Chipset.
Mar 25 22:02:23 kirilla kernel: [   30.359285] agpgart: AGP aperture is 64M @ 0xf8000000
 ...
Mar 25 22:02:23 kirilla kernel: [   32.062146] NVRM: loading NVIDIA UNIX x86 Kernel Module  100.14.19  Wed Sep 12 14:12:24 PDT 2007
 ...
Mar 25 22:02:27 kirilla kernel: [   41.341550] agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
Mar 25 22:02:27 kirilla kernel: [   41.341578] agpgart: Putting AGP V2 device at 0000:00:00.0 into 4x mode
Mar 25 22:02:27 kirilla kernel: [   41.341602] agpgart: Putting AGP V2 device at 0000:01:00.0 into 4x mode
 ...

(BTW, just to be extra clear: jonas.kirilla != jopadan. Two separate sets of hardware.)

in reply to:  8 comment:11 by jopadan, 12 years ago

Replying to korli:

Replying to jopadan:

I'll attach it here for you without the voodoo mtrr I think and you should keep in mind it is x86_64:

Any chance to have a serial log or syslog ?

I've attached a syslog but I don't see anything MTRR related in it. Also the Tracker seems to crash everytime I try to shutdown. Since I am not able to access any ext3 partitions yet I was unable to install the developer tools yet. Maybe there is bfs write support in linux.

by jopadan, 12 years ago

Attachment: syslog added

in reply to:  description comment:12 by peat, 12 years ago

Same here (GeForce 4 Mx 440). Safemode GFX Setting (1280 x 1024 x 32 with Vesa Driver) works fine.

comment:13 by aliensoldier, 12 years ago

I have that bug also Geforce 2 MX400 i think (0x0110).

the other day i got that bug in R5 also, only once. It was a once in a lifetime occurence type of bug, so i'm not able to reproduce it but i tought it might help point to a part of the solution.

I was starting to play a video in VLC while i noticed i took the wrong one, so i stoped it just when the display was starting. From there i was having the same behavior than the haiku start bug and needed a reboot.

comment:14 by jonas.kirilla, 12 years ago

Screen still garbled for me with hrev24682. No visible change. I tried setting force_pci true in nvidia.settings in an earlier revision (prior to hrev24679) with no improvement.

comment:15 by rudolfc, 12 years ago

Hi there!

Is this still a problem today? Or was it solved already?

Anyhow: I tried Haiku of 14 april or something on a system of mine and I looked at the driver a bit. It doesn't work there indeed in 32bit color (this colorspace is accelerated by the driver). 16, 15 and 8 bit work (will not be accelerated due to app_server).

The acceleration engine doesn't work, BUT it does NOT crash. The hooks are called with (AFAIK) valid lists of drawing commands.

I don't understand yet what's wrong.. still investigating. (might take a week or two since I'm on holiday next week)

Bye!

Rudolf.

comment:16 by stippi, 12 years ago

The MTRR regions might still be a problem. IIRC, Korli disabled the overlap check, but it might still point to a problem somewhere with regards to MTRR setup. Nice to hear you want to have a look, Rudolf! Much appreciated!

comment:17 by rudolfc, 12 years ago

Hi guys,

I came around to test the driver more -- and guess what: it's in perfect working order. The fault has been introduced by axel(?) while completely rewriting the AGP busmanager. Simply disable AGP mode via nvidia.settings and reboot.

I'll try to have a look at the busmanager this weekend to see if I can nail the problem there. Unless Axel can find it quickly of course ;-)

Bye!

Rudolf.

comment:18 by rudolfc, 12 years ago

Update:

To be really sure about that AGP problem I decided to test my old driver V0.80 combined with AGP busmanager V0.02 (both on bebits).

The same behaviour applies. In other words: The rewritten AGP busmanager is probably not the problem after all! (sorry :)

So: in conclusion for now (I'll test other cards later on): the bootscreen introduced the problem? Please tell me: is the gfx driver used in any way to create this bootscreen?

It's known (by me) for nvidia that it's unwise to use the driver before AGP mode is enabled. If the driver is used for the bootscreen while AGP is still down (I don't know how this could be though since the driver's accelerant calls the manager itself, unless loading and starting the kernel driver already causes the malfunction) that might cause the problem.

The problem is in fact a communications problem with the AGP bus. Comms is partially lost. I've had this problem before (but then even worse) when I did a few things out of correct order in the AGP busmanager (diff between V0.01 and 0.02).

MTRR is not the problem BTW.

OK, back to work ;-)

Bye!

comment:19 by rudolfc, 12 years ago

Update2:

Now tested TNT1 AGP (NV04), FX5200 AGP (NV34) and MX4000 AGP (NV18). All same behaviour. Mainboard is Asus P3 mainboard with P3-600/133Mhz FSB. AGP1.0 capable.

I compared driver logs: here is what I found: === R5: (after creating internal modelist, before calling first setmode)

ACCELERANT_MODE_COUNT: the modelist contains 147 modes GET_MODE_LIST: exporting the modelist created before. SETMODE: (ENTER) initial modeflags: $8000011f --- (after comleting setting mode): 8 overlay hooks are called (except SUPPORTED_FEATURES)

Haiku: ACCELERANT_MODE_COUNT: not called GET_MODE_LIST:not called 9 overlay hooks are called (is illegal at this point!!) SETMODE: (ENTER) initial modeflags: $00000000

(eof) ===

I expect there's nothing to worry about here concerning this bug, but I see another bug (in my strong opinion): Overlay hooks should not be asked for if no mode was previously set!! Overlay hooks may or may not be available to the client (appserver here) depending on the mode previously set. The accelerant is per definition in an undefined state on haiku when the overlay hooks are called for the first time!

OK, that's it for now. I'll try to trace the AGP busmanager and kerneldriver later on.

Rudolf.

comment:20 by stippi, 12 years ago

Thank you Rudolf for all your research on this! It is very useful and appreciated. I might be able to fix some of the problems you pointed out. I can confirm that the driver is not used for anything bootscreen related. It will be used only by app_server at the point where it finds out about capabilities, which should be shortly before the mode is switched and blue desktop background is rendered. The boot screen is drawn via VESA and BIOS stuff.

comment:21 by rudolfc, 12 years ago

Hi again (Stephan :)

I've placed the harddisk containing Haiku-OS in my other system (the 'old' P4-2800 533Mhz FSB Asus board). This system has a GeForce2ti (NV15) in it. It behaves the same: that is, the acceleration engine doesn't yield any results, nor does it crash.

I've checked the syslog and saw Haiku is doing a number of things different (rather: extra) compared to R5/dano:

  • The EDID info block is fetched from the connected screen to determine the max (or native) resolution of this screen;
  • Haiku-OS switches to that mode using a VESA BIOS call (I guess). This is done before the switch takes place to protected mode (VESA 2.0 and 3.0 both work correctly on my systems here).

I've looked at the AGP busmanager messages: they are only there if the nvidia driver is in place: Hence the busmanager is only called by the gfx (kernel) driver as it should be (just like in R5/dano).

All in all everything looks nice and dandy :-) I'm getting the real BeOS feeling here guys! Haiku works neat.. (just a few KDL's still). And it's a bit slow in drawing yet?

OK, Because the VESA EDID and Setmode calls are not done on R5/dano I decided to test with 'multiple' (ok: 2) gfx cards in the P4 system. Primary BIOS card was a PCI G200 (Matrox) and the AGP NV15 was 'not used'. Using the nvidia driver I am telling R5 and Haiku to use this one for the desktop instead of the Matrox card.

Looking at the syslog after a system boot/shutdown cycle I see Haiku fetching EDID and BIOS info from the MAtrox card. The spashscreen displays there and the resolution is switched OK. After the app_server starts up the nvidia card is initialized (coldstarted, using ananog VGA connection to screen). The Desktop comes up and the system is running OK.

The acceleration engine is still down however. Conclusion, the BIOS calls are not the problem.

So, I think I tested everything I could now (well, more or less), and the system behaves OK, just like the driver on R5/dano. On Haiku however the acceleration engine doesn't do it's thing.

I think I'm looking at a compiler problem or something (I've seen it before with the 2D driver causing the 3D accelerant to nolonger function: it was a shared_info struct variable size compiler interpretation difference using different versions of the compiler.). Looks like I'll need to do a bughunt through shared_info once again. Or some other variable. Anyone any ideas or pointers for me maybe???

I'll continue searching when I have time again, hopefully within a week or two max.

Bye for now! Rudolf.

Last edited 6 years ago by mmadia (previous) (diff)

comment:22 by j_freeman, 12 years ago

Cc: j_freeman added

comment:23 by tigerdog, 12 years ago

Cc: tigerdog added

comment:24 by tigerdog, 12 years ago

this may be the cause of the problem reported in http://dev.haiku-os.org/ticket/2071 Rudolph, I'll be happy to test any potential changes.

comment:25 by rudolfc, 12 years ago

Hi there,

I've traced the versions where it went wrong: hrev24493 is working OK; hrev24511 is faulty.

It's MTRR-WC failed mapping because of overlapping regions after all: commenting out the change in the kernel done in hrev24494 fixes the problem. I'm now running hrev24865 in 32bit accelerated mode on my NV15.

I don't know what is the first mapped region that is later on overlapped by my driver's request, but that deserves to be investigated. If that's legal then overlapping regions may not be blocked I'd say!

Can someone (kernel) have a look at this????

Bye!

Rudolf.

PS: I'm posting this finding in ticket #2071 as well.

comment:26 by rudolfc, 11 years ago

Resolution: fixed
Status: newclosed

Closing as the card works these days. (MTRR fixed somehow.)

Rudolf.

Note: See TracTickets for help on using tickets.