Opened 11 years ago

Last modified 2 weeks ago

#2769 new bug

Radeon driver is Slower than VESA ...

Reported by: herdemir Owned by: euan
Priority: normal Milestone: R1
Component: Drivers/Graphics/radeon Version: R1/pre-alpha1
Keywords: Cc: mks@…
Blocked By: #7662 Blocking:
Has a Patch: no Platform: All

Description

Recently i tried to run haiku with vesa to see how it performs. And I was moved how fast it was(After using with (buggy?)radeon driver most people would think that way). Here are some observations:

Running haiku with Radeon Driver (on Radeon Mobility X600):

Pros:

  • Supports native resolution.(1680x1050)
  • Supports Video Overlay

Cons:

  • Redrawing is worse than VESA driver. Example: Resizing Terminal(or any) window causes flicker while using with Radeon driver, but while resizing with VESA driver it was really smooth(no flicker at all).
  • Decreases system performance. Example: With Radeon driver, while playing audio or video every random(between 3 - 10) seconds there is noise/glitch in sound. But after testing with VESA the glitch/noise was gone.(Before testing with vesa,i thought that it was related to MediaPlayer or MediaServer changes, but it seems not the case). And performs a bit slower in normal usage.

So far VESA driver performs much better than radeon driver at redrawing smoothly (almost with no flickering) while resizing,moving, etc.

So there is a bug or a missing function(s) in radeon driver that affects system performance really bad. With VESA,I could able to see how much better Haiku become. But it eats too much CPU time, escpecially while playing videos.

Tested on:

Laptop: HP nx8220
CPU: 1.86Ghz
RAM: 2GB
Video: ATI Radeon Mobility X600
Sound: Soundmax AC97 (AD1981B)
Haiku build: hrev27719

Attachments (6)

screenshot-radeon (65.5 KB) - added by herdemir 10 years ago.
Radeon CPU usage
screenshot-vesa (58.1 KB) - added by herdemir 10 years ago.
VESA CPU usage
screenshot-radeon.png (65.5 KB) - added by herdemir 10 years ago.
screenshot-vesa.png (58.1 KB) - added by herdemir 10 years ago.
cursor_loop-ati.png (144.5 KB) - added by herdemir 10 years ago.
cursor_loop-vesa.png (140.9 KB) - added by herdemir 10 years ago.

Download all attachments as: .zip

Change History (41)

comment:1 Changed 11 years ago by diver

With nvidia cards I have the same problems, but it seems to be related to bug #1823, which would be nice to fix before alpha1.

comment:2 Changed 11 years ago by anevilyak

Some of the performance differences you see (at least with respect to flicker) are probably because VESA mode uses double buffering for the entire desktop, while accelerated drivers don't currently get that.

comment:3 Changed 11 years ago by jackburton

You can achieve more or less the same results as Vesa using a video mode with any depth except 32 bit.

comment:4 Changed 11 years ago by stippi

That would be at the expense of using any acceleration. But I have implemented the full double buffering while at the same time using acceleration a little while ago. Because mz Radeon system exposes another bug, I can´t be sure though how well that performs. So someone could have a critical look at it by enabling a define in src/servers/app/drawing/AccelerantHWInterface.cpp, I believe in SetMode() it was.

comment:5 Changed 10 years ago by herdemir

Thanks stippi. After recent changes to app_server it's almost better than vesa driver(no flickering, no buggy drawing, etc.). I say almost, because in some drawing parts it's still slow than vesa driver. Especially in Cortex. Open Cortex and try to move around the items(System Clock,etc.) of it. You will see that, when you use VESA driver it draws almost perfectly and smooth. But when using radeon driver the dragging stutters. So that is just one thing that I wanted to mention.

Thanks again.

comment:6 Changed 10 years ago by stippi

I think with current hardware (ie CPU speed and graphics card attachement over PCIe), the main memory graphics buffer may be the better option. I could make a mode in the app_server that would mean the CPU does all the drawing, including the copy from the main memory to the video buffer, just like in VESA. The adventage of a native graphics driver in this setup would be a) you get to set the native resolution and b) you could still use video overlays.

Doing this technically is almost no work. Doing it in such a way that the user can configure it would be a little more work, but certainly possible.

I am in the same boat as you, with my nVidia system, I get the same effects you do and I was already considering this option.

For slower computers, especially when the link from main memory to video memory is slower, the constant back to front copying done in software and the unaccelerated moving of windows and filling of large rects may make the current setup perform better, though. So this would be an option for faster computers.

comment:7 Changed 10 years ago by herdemir

After your suggestion above, I tried to measure the CPU usage, if VESA or Radeon driver was using the most. But I didn't expect that VESA would beat Radeon driver. Since Radeon driver is using hardware accelerated 2D drawing so, I was expecting that Radeon driver would use less CPU than VESA. The test I did was very simple, I opened Activity Monitor and a Terminal window. Then I constanly moved around Terminal window on desktop and observed the CPU usage.

Using with Radeon Driver: Generally %51 ( %49 - %57)
Using with VESA Driver:   Generally %32 ( %30 - %36)

(I did the tests at the same resolution: 1280x1024) Screen shots will follow ...

As for your suggestion, i think you should apply it, since it will use less CPU(%15-%20 lesser) then the current setup and more smoother drawing IMO. Of course it would be much better if it were to be configurable for slower systems, since it depends on CPU speed.

Changed 10 years ago by herdemir

Attachment: screenshot-radeon added

Radeon CPU usage

Changed 10 years ago by herdemir

Attachment: screenshot-vesa added

VESA CPU usage

comment:8 Changed 10 years ago by herdemir

Sorry forgot to rename them. They are PNG files ...

Changed 10 years ago by herdemir

Attachment: screenshot-radeon.png added

Changed 10 years ago by herdemir

Attachment: screenshot-vesa.png added

comment:9 in reply to:  5 Changed 10 years ago by jackburton

Replying to herdemir:

Thanks stippi. After recent changes to app_server it's almost better than vesa driver(no flickering, no buggy drawing, etc.). I say almost, because in some drawing parts it's still slow than vesa driver. Especially in Cortex. Open Cortex and try to move around the items(System Clock,etc.) of it. You will see that, when you use VESA driver it draws almost perfectly and smooth. But when using radeon driver the dragging stutters. So that is just one thing that I wanted to mention.

That happens on my laptop too, with the intel extreme driver.

comment:10 Changed 10 years ago by herdemir

After some time, I found that the cause is in "app_server->cursor_loop" thread. When using ATI driver, I noticed that moving the cursor around randomly consumes %20-%30 of cpu. While with VESA it almost consumes nothing(maybe max. %2-%5). Is there a way just to disable cursor handling part of the ATI driver (and fallback to use the VESA one)?

Changed 10 years ago by herdemir

Attachment: cursor_loop-ati.png added

Changed 10 years ago by herdemir

Attachment: cursor_loop-vesa.png added

comment:11 Changed 10 years ago by stippi

Firefox/Ubuntu will just freeze when I try to attache a file. Anyways:

Index: src/servers/app/drawing/AccelerantHWInterface.cpp
===================================================================
--- src/servers/app/drawing/AccelerantHWInterface.cpp	(Revision 29397)
+++ src/servers/app/drawing/AccelerantHWInterface.cpp	(Arbeitskopie)
@@ -508,7 +508,7 @@
 
 	bool tryOffscreenBackBuffer = false;
 	fOffscreenBackBuffer = false;
-#if 1
+#if 0
 	if (fVGADevice < 0 && (color_space)newMode.space == B_RGB32) {
 		// we should have an accelerated graphics driver, try
 		// to allocate a frame buffer large enough to contain
@@ -594,12 +594,18 @@
 #endif
 
 	// update acceleration hooks
+#if 0
 	fAccFillRect = (fill_rectangle)fAccelerantHook(B_FILL_RECTANGLE,
 		(void *)&fDisplayMode);
 	fAccInvertRect = (invert_rectangle)fAccelerantHook(B_INVERT_RECTANGLE,
 		(void *)&fDisplayMode);
 	fAccScreenBlit = (screen_to_screen_blit)fAccelerantHook(
 		B_SCREEN_TO_SCREEN_BLIT, (void *)&fDisplayMode);
+#else
+	fAccFillRect = NULL;
+	fAccInvertRect = NULL;
+	fAccScreenBlit = NULL;
+#endif
 
 	// in case there is no accelerated blit function, using
 	// an offscreen located backbuffer will not be beneficial!
@@ -627,6 +633,9 @@
 			&& fFrontBuffer->ColorSpace() != B_RGBA32)
 			|| fVGADevice >= 0 || fOffscreenBackBuffer)
 			doubleBuffered = true;
+#if 1
+		doubleBuffered = true;
+#endif
 
 		if (doubleBuffered) {
 			if (fOffscreenBackBuffer) {

Please apply the patch above and tell me how that feels. What it does is this:

  • Disable the use of any acceleration (fill rect, invert rect, copy region) of the driver,
  • use RAM based double buffering just like in VESA.

Video overlays should still work as before. But app_server will not read from video memory anymore, only write, since it will use an offscreen buffer in main memory for compositing.

Hope this helps, -Stephan

comment:12 Changed 10 years ago by herdemir

Thank you, stippi. This is much much better. Native Resolution, Video Overlay and a responsive GUI(as in VESA)!!! Almost perfect, that I didn't want to leave Haiku, but to give feedback I had to boot to linux. Now, only wireless stack remains in order to use Haiku (almost) daily ;)

Thanks again!

comment:13 Changed 10 years ago by jackburton

Can we close this ?

comment:14 Changed 10 years ago by bga

I am just not sure disabling the hardware acceleration hooks was a good idea. For instance, I have this relatively old Athlon 64 notebook with a Radeon Mobility 9600 and I had to revert this change to get any kind of performance out of it. I think that this may need to revisited.

comment:15 Changed 10 years ago by axeld

Indeed, Stephan doesn't really believe it, though. I think we should have some kind of automatic way to detect old PCI/AGP cards, and continue to use acceleration for them.

comment:16 Changed 10 years ago by bga

I am pretty sure that if, as reported, disabling acceleration in any configuration makes the system faster, then it is a driver bug or some other system bug. In my specific case, for example, moving windows around without acceleration is a pain and with it, it is very smooth.

comment:17 Changed 10 years ago by stippi

I just realized that I may have some uncommited changes. How does it work for you if you disable the acceleration, but pick a non-32-bit screen mode? Is it suddenly fast?

In any case, I certainly didn't want to give the impression that I think the current solution is perfect, or that nobody should change it. But I do not feel like working on this at all. I know for a fact that enabling the acceleration on nVidea makes the system unstable, let alone being slower on my hardware. Since it works so well for me as it is (which may be due to my uncommited changes, I will check), I am simply turning my attention to other things, which do not work well for me. Anybody is free to work on anything, patches are always welcome. Something not working well for yourself is always a great motivation to do something about it. Whereas something working bad for somebody else, which works great for yourself, is simply not a great source of motivation. It sucks, certainly, but I have only so much time a day and I am already spending it pretty much all on Haiku.

comment:18 Changed 10 years ago by bga

Could you try committing whatever you didn't commit yet? My experience is completely different from yours. In all color modes, moving a window uses 100% of the CPU almost all the time. If I have anything else running, it gets jumpy (with acceleration enabled, this does not happen and cpu usage is well below 20%).

To make things a bit worse, some BWindowScreen programs will refuse to work if the acceleration hooks are not set as they try to get the hooks and will see they are not set.

Based on all this, I would suggest the following if you guys are ok with that:

1 - Wrap the code around a define so stippi can still set it for his machine to be the way it is now.
2 - By default, revert to have acceleration enabled.
3 - mention the nVidia problems you, stippi, are seeing to Rudolf now that he is back developing the driver. This could result in the problems you are seeing being sorted up on the probable correct place.

Are you guys ok with that?

comment:19 Changed 10 years ago by umccullough

Yes please, can we get an easy #ifdef switch for enabling/disabling these changes? I would like to test this on a couple of my older machines to see if it makes a significant difference one way or the other.

comment:20 Changed 10 years ago by axeld

I've added such a define in hrev32183 (USE_ACCELERATION).

Once we collected some more data, I would like to have the app_server detect automatically, if the hardware would benefit from using acceleration; I'm just not sure about the criteria yet.

comment:21 Changed 10 years ago by bga

I replied through email but, just to dcument it here in the bug, wouldn't making acceleration enabled by default make more sense? Otherwise we will probably not gather much data. :)

comment:22 Changed 8 years ago by scottmc

has this issue been addressed yet? Anyone try it with recent Haiku builds?

comment:23 Changed 8 years ago by scottmc

Blocking: 7662 added

comment:24 Changed 7 years ago by mks

I tested this with 3 computers and different configurations. I don't know how to write this down, hope it gets clear. :)

Computers:

  • IBM Thinkpad T23, Pentium 3 Mobile 1,13 GHz, 512 MB RAM, SuperSavage IXC 16 MB (s3)
  • Lenovo Thinkpad R61i, Core 2 Duo T5450, 3 GB RAM, Intel GMA X3000 (intel_extreme)
  • Custom build desktop, Phenom 2 X2 550, 12 GB RAM, AMD RadeonHD 5450 1 GB (radeon_hd)

Haiku configurations (all are gcc2h):

  • [A] Different recent nightlies (<20 hrevs old), USE_ACCELERATION 0, OFFSCREEN_BACK_BUFFER 0 (from haiku-files.org)
  • [B] hrev44222, USE_ACCELERATION 1, OFFSCREEN_BACK_BUFFER 0
  • [C] hrev44222, USE_ACCELERATION 1, OFFSCREEN_BACK_BUFFER 1

I am not really sure, what OFFSCREEN_BACK_BUFFER is supposed to do, but I guessed that it should have enabled double buffering with hardware acceleration enabled. I couldn't notice any double buffering (e.g. in StyledEdit) with this, though.

Results:

T23

  • A: Moving & resizing windows results in 100% CPU and ugly traces/artifacts
  • B & C: Works "perfectly", i.e. as good as it gets with this old machine. Low CPU usage when moving/resizing windows.

Note 1: I also have a BeOS R5 partition on this one, which uses the s3 driver, which I built from c1379d357b3737534088b8e62fe68df6db9f2468 (can't tell hrev). I *think* it works slightly better than the accelerated Haiku (B), but the difference is small.
Note 2: I could not play any video on this machine with neither config and also not with BeOS R5. Video overlays did never really work well with the Linux driver for this card either.

→ Accelerated works *way* better.

R61

  • A: Moving/resizing windows is perfectly smooth as is playing video. Virtually no noticeable CPU usage on any of these actions (except when resizing a video window).
  • B: Everything works perfectly, as with A. Almost no CPU usage noticeable when moving/resizing windows and playing video.
  • C: Something is wrong; some apps seem to be rendered with an y-offset of about -20 pixels, so that e.g. the Leaf button of the Deskbar would be above the screen, but in fact gets rendered at its very bottom. This is not really usable, as I have to guess where to click.

→ Accelerated and non-accelerated feel completely alike.

Desktop

  • A: Moving/resizing windows is perfectly smooth as is playing video.
  • B & C: Usable, but moving windows lags noticeable, resizing lags not as much but a bit worse than with A. Playing video works perfectly fine.

→ Non-accelerated works slightly better ATM.


I hope, this helps a bit. Sorry for the lengthy comment. :)

Last edited 7 years ago by mks (previous) (diff)

comment:25 Changed 7 years ago by stippi

Thanks a lot for doing some testing! Gosh, this is so long ago, I can't even remember what OFFSCREEN_BACK_BUFFER really means. But I think you are right, it should mean the back buffer is in video card memory and copying parts of it to the front buffer happens accelerated. If memory serves, app_server allocates a frame buffer region double the height of the visible area, and then the lower half is used as back buffer.

From re-reading the comments, I think to solve this ticket in a satisfactoring way, one just needs to implement the detection mechanism. If you can't see much difference between OFFSCREEN_BACK_BUFFER 0 and 1, just ignore it and leave it off. I think it was problematic and so I disabled it. But USE_ACCELERATION seems to benefit in some situations, we just need that automatic enabling on slow hardware. I think a good criterium should be the memory writing (and/or reading) speed to the video card memory. Should be easy to check and implement and then it just needs more testing one a broader range of hardware.

comment:26 in reply to:  25 ; Changed 7 years ago by mks

Replying to stippi:

If memory serves, app_server allocates a frame buffer region double the height of the visible area, and then the lower half is used as back buffer.

This would make sense in the context of what I have seen on the R61. Maybe it uses a wrong height somewhere.

I think a good criterium should be the memory writing (and/or reading) speed to the video card memory. Should be easy to check and implement and then it just needs more testing one a broader range of hardware.

To me it seems this depends on the driver used rather than the hardware generation, i.e. the intel_extreme and s3 seem to work fine, but some others, like the radeon_hd don't (yet?). Couldn't this possibly be enabled on a per-driver base?

comment:27 in reply to:  26 Changed 7 years ago by anevilyak

Replying to mks:

To me it seems this depends on the driver used rather than the hardware generation, i.e. the intel_extreme and s3 seem to work fine, but some others, like the radeon_hd don't (yet?). Couldn't this possibly be enabled on a per-driver base?

It's actually more hw-dependent. The reason being, a lot of the more 3D-focused cards nowadays have almost nonexistent 2D acceleration, since it mostly doesn't get used any more in the mainstream OSes thanks to compositing. As a consequence, they can sometimes be slower at using 2D acceleration hooks than by just using them as a straight frame buffer.

comment:28 Changed 7 years ago by mks

OK, I see. Thanks for the clarification. I did a quick test again with OFFSCREEN_BACK_BUFFER on the intel_extreme and made a video of the "effect": http://youtu.be/gS3BCF20uGA

Should I open a bug for this, or is this functionality going to disappear anyways?

comment:29 in reply to:  25 Changed 7 years ago by mks

Replying to stippi:

From re-reading the comments, I think to solve this ticket in a satisfactoring way, one just needs to implement the detection mechanism. [...] I think a good criterium should be the memory writing (and/or reading) speed to the video card memory. Should be easy to check and implement and then it just needs more testing one a broader range of hardware.

How would one implement this? Do we need to measure the memory speed, or does the card give this kind of information? I would like to try to implement this, because it affects me pretty much on the old laptop. But I think I will need "some" assistance/guidance. Could anybody "mentor" me?

comment:30 Changed 7 years ago by stippi

Basically, the job is very easy. I can try to mentor you, but my communication may be lagging, apologies upfront.

To measure the speed to/from the graphics memory, you would need a frame buffer address. So the easiest place for the detection code should be in AccelerantHWInterface, maybe called directly from within SetMode(). You can replace the #ifdef with a call to a new method that measures the speed to the frame buffer memory and set a new boolean member of the object. That member would then be used where USE_ACCELERATION is currently just a #define.

Hope its that simple, but I don't see why not.

comment:31 Changed 7 years ago by stippi

BTW, with USE_ACCELERATION, do you get a flickering mouse cursor?

comment:32 Changed 7 years ago by mks

I can't notice any flickering on the T23 and I am pretty certain there wasn't any flickering on the other 2 machines.

I wanted to start by putting in some test outputs for myself to better understand, what's going on but I already have some more questions …

How can I properly enable the trace output in app_server? There is a DEBUG_DRIVER_MODULE ifdef at the top of AccelerantHWInterface, but how can I tell jam to enable this? Also, where do those trace outputs appear?

Regarding the memory speed measuring: I guess I need to do this with fFrameBufferConfig.frame_buffer(_dma?). Do I understand you correctly, that I only have to get the time needed to read/write a (few) frame(s)?

And lastly: Should I rather take my questions to the mailing list, or is it Ok right here in the ticket?

comment:33 in reply to:  32 Changed 7 years ago by stippi

Replying to mks:

I can't notice any flickering on the T23 and I am pretty certain there wasn't any flickering on the other 2 machines.

I don't remember what kind of trick I pulled to avoid flickering with a S/W cursor and no double buffering...

I wanted to start by putting in some test outputs for myself to better understand, what's going on but I already have some more questions …

How can I properly enable the trace output in app_server? There is a DEBUG_DRIVER_MODULE ifdef at the top of AccelerantHWInterface, but how can I tell jam to enable this? Also, where do those trace outputs appear?

I can only think of two easy methods to see debug output.

  • One is to run Haiku in emulation, QEMU/KVM for example, and enable serial debug output to arrive in the Terminal where you start the emulation. QEMU at least used to have an easy option for this, something like "--serial=stdout" or similar. I am relatively sure that I used this before and also simple output like "printf" arrived in the Terminal running QEMU. I don't think I needed dprintf() or stuff like that.
  • The second option is to run the "app_server test environment". The app_server supports being build as a regular Haiku application that runs inside a window. It's actually quite complicated behind the scenes with applications having to use the right "libbe" to run inside the test app_server. To build this setup, go into src/test/servers/app and run "TARGET_PLATFORM=libbe_test jam install-test-apps". And then inside that directory, there is a script called "run" which you can use to launch any of the test apps in that directory within the test app_server. You can use any test app, doesn't matter, but that will launch the test app_server.

Thinking about it, however, the second option is probably totally useless, since in this case, the AccelerantHWInterface is not used, but the ViewHWInterface is used instead (it's running inside a BWindow/BView... not on an accelerated graphics card). So the second option is nice to test other stuff, but you probably have to use the first option.

Regarding the memory speed measuring: I guess I need to do this with fFrameBufferConfig.frame_buffer(_dma?). Do I understand you correctly, that I only have to get the time needed to read/write a (few) frame(s)?

Yes. You should be able to use the same memory address that is used for attaching the front buffer a little later in SetMode(). I wouldn't even read for long, since that will directly delay the boot process, a short test is hopefully reliable enough.

And lastly: Should I rather take my questions to the mailing list, or is it Ok right here in the ticket?

Either is fine, I guess. Sometimes communication gets more direct/easier with the mailing list, but your findings are better preserved here for the generations to come.

comment:34 Changed 7 years ago by mks

Cc: mks@… added

comment:35 Changed 2 weeks ago by waddlesplash

Blocked By: 7662 added
Blocking: 7662 removed
Note: See TracTickets for help on using tickets.