|
|

楼主 |
发表于 2006-12-15 18:12:00
|
显示全部楼层
Re:FAQ-11 性能表现
Performance
I'm using glDrawPixels and glReadPixels in OpenGL. I'm seeing poor performance. What should I do?
BGRA is and always has been the fastest format to use. (There are some cases where RGBA is OK, and usually BGR is better than RGB, but in general, BGRA is the safest mode.)
The fastest performance you'll get a readback is approximately 160-180 MB/s (~45 MPix/s) for RGBA/BGRA which is the GPU hardware limit (due to PCI reads on the memory interface). This is with a P4 1.5GHz and above class system. The readback rate doesn't change significantly with the GeForce FX family. Note that you'll get the highest performance when you read back large areas as opposed to small ones.
For glDrawPixels(), performance it depends on the GPU, path, and the texture size but for NV28GL we achieve ~130 MPix/s RGBA (520 MB/s) and for Quadro FX we achieve ~170 MPix/s RGBA (680 MB/s) for images >128x128.
These numbers will vary based on your particular system so you may consider measure these yourself using GLperf.
Using pixel data range on an AGP 8x system we can achieve writes at ~1.7GB/s and ~960 MB/s on an AGP 4x system. More information is available at: http://www.nvidia.com/dev_content/nvopenglspecs/GL_NV_pixel_data_range.txt.
I upgraded my application to DirectX9 and my hardware shadow maps no longer work! What's up?
We've changed the behavior of hardware shadow maps between the DirectX8 and DirectX9 interfaces. In DirectX8, you're required to scale the interpolated z component (that will be compared with the value in the shadow map) by the bit depth of the shadow map itself. Starting with the DirectX9 interfaces, we've changed this behavior to no longer require this scale, so the z value should be in the range [0..1], regardless of bit depth. Basically, we wanted to implement this new cleaner behavior, but didn't want to break shipping apps that rely on the old behavior, so we changed it only for the new DX9 interfaces.
Render-to-texture seems to really slow down my Direct3D application, even when I do it infrequently and render very little geometry to a small surface. Is this normal?
Make sure you're not creating your texture in the D3DPOOL_MANAGED pool. The Direct3D runtime needs a local copy of all MANAGED textures to restore them on mode switch or for texture management, so rendering to these implies that a readback from local video memory to system memory will occur, dramatically hurting performance. For render targets, use D3DPOOL_DEFAULT instead.
My application is slow. How can I figure out what's causing the slowdown?
The key is to identify your application's bottleneck. There are several ways to do this:
Eliminate all file accesses. Any hard disk accesses will surely kill your frame rate. This is easy enough to detect--just take a look at your computer's "hard disk in use" light.
Run identical GPUs on different speed CPUs. If the frame rate varies, your application is CPU-limited.
Decrease your AGP speed from your system BIOS. If the frame rate varies, your application is AGP bandwidth-limited.
Reduce your GPU's core clock. If the slower core clock reduces performance, then your application is limited by the vertex shader, rasterization, or the fragment shader (i.e, shader-limited).
Reduce your GPU's memory clock. If the slower memory clock affects performance, your application is limited by texture or frame-buffer bandwidth (GPU bandwidth-limited).
Ok, now I know what my application's bottleneck is. How do I get rid of it and make my application run faster?
If you are CPU-limited: Try running VTune or a similar performance tool to find out where most of your time is being spent. Note that the graphics driver is a potential CPU consumer, particularly if you are using the GPU in non-standard ways. One common way to lose parallelism between the CPU and the GPU is locking resources (vertex buffers or textures), or reading back data from the GPU to the CPU.
If you are AGP-bandwidth-limited: Make sure your AGP settings are maximized. Transfer less data per frame to the GPU. Today, we see very few applications that are AGP-bandwidth-limited.
If you are shader-limited: Make sure you've balanced the workload between the vertex and fragment programs. For example, calculations that can be linearly interpolated belong in the vertex shader, not in the fragment shader. Use only the amount of precision that you need (choose between float, half, and fixed data types prudently). Try encoding functions in textures.
If you are GPU bandwidth-limited: Try reducing the size of your textures. You may also be performing too many blending operations, which are costly.
How do I time my rendering code? (How do I know how long it takes the GPU to render something?)
The wrong answer is to time all Direct3D or OpenGL calls. This simply times how long it takes the driver to submit the rendering request to the push-buffer. The actual rendering work is done asynchronously and later. There is no direct way for you to measure how long the GPU takes to process a particular rendering call.
|
|