General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读。

♂樱♀ · 发表于 2006-9-2 02:35:00

此FAQ由来自打开视界的空明流转(Email:wuye9036@gmail.com) 维护，点击这里查看原帖：本人(sakura@china.com)负责与这里全程同步更新。以下是正文：
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
转贴请注明出处！http://www.fopen.org

在论坛上，往往会有不少的初学者询问相同的问题，为了方便大家，我们决定将nVidia公司的主要针对于图形硬件和API的FAQ进行了粗浅的翻译放到了这里。NV的FAQ提供的信息全面，回答得也很到位，对于各位来说是不错的参考资料。

由于俺技术不好，英语更烂，所以希望大家还是以原文为主吧，发现了错误请一定要跟贴或者PM我纠正，谢谢
由于翻译的发布不是按照顺序来的（但是排版是按照顺序。。。），所以我会在开头标出更新情况。

-------------------------------------------------Spliter----------------------------------------------
最近更新：

2006-09-03 更正了一些翻译中的错误，感谢言之无理的指正。
2006-09-03 General Q1 By 言之无理（翻译）空明流转（校对）
2006-09-02 Performance Updated By 空明流转
2006-09-01 Performance Q2 & Q3 By 空明流转
2006-08-31 Performance Q1 Updated By 空明流转

------------------------------------------------Spliter-----------------------------------------------
FAQ

该FAQ集中了开发者们经常向我们提到的问题，我们希望它对您能有一定的帮助。如果你遇到的问题找不到对应解答，请发送电子邮件给我们，我们会仔细阅读所有提交上来的问题，并且会定期向这里添加新的内容。
(这段就不贴原文了，因为这也是我们想说的话，呵呵。欢迎Mail：wuye9036@gmail.com , sakura@china.com。感谢您对我们工作的支持与鼓励！)

General

Q:My Direct3D app does not work. I think your drivers are broken. What's going on?
我的Direct3D应用程序不能正常运行，我认为是驱动存在缺陷，怎么解决？

A:Have you tried the following?
你试过如下的解决办法吗？

Run your app with the DirectX debug run-time. Does it run clean, i.e., it does not Generate any errors or warnings? If you do get warnings and errors, the problem may be in your app -- try fixing all the complaints the Debug runtime has.
在“Direct Debug run-time”下[注：意即使用调试版的运行时库]运行你的程序。你的程序能否正常运行而没有产生任何的错误或者是警告？如果出现这些情况，那么就是应用程序的问题??针对提示的错误和警告修正程序。

Run your app with the reference rasterizer device. It is slow, but if it generates the same output as Running on the HAL device, then your app is doing something other than what you intended.
在Reference Rasterizer Device[注：创建D3D设备的时候有此选项，使用软件的方式完成光栅化工作]下运行你的程序。相较硬件设备而言这种渲染方式速度较慢，但是如果它与使用HAL产生的输出是相同的，那么可能是你的程序背着你做了一些偷偷摸摸的事。[注：就是说软件方式和硬件方式都存在着同样的输出错误]

If the reference rasterizer results differ from the HAL results, then please get in touch with us. We will fix the problem as soon as possible. However, we will need the following information:
如果软件方式和使用HAL结果不同，那么请联系我们。我们会尽快修复这个问题。请提供给我们以下信息：

Operating system 操作系统
Driver version 驱动程序的版本
Graphics card 显卡

A sample application that clearly demonstrates the problem. We do not need source in general, and we are happy with reasonable complex apps (i.e., no need to narrow the problem down as long as the problem is clearly demonstrated). What is most important is that the problem is easy to reproduce, and that the application is self-contained. An application that goes straight to the problem is best (i.e., a "single-click" application that doesn't require navigation to get to the problem).
一个可以确切的展示Bug的程序。我们一般不需要源码，我们也欢迎合理的复杂程序（也就是说，不用为了清晰的展示bug代码而刻意简化你的程序）。最重要的是在程序中所描述的问题易于重现，并且演示程序是不依赖于其它程序的。一个直接发生问题的程序是最好的。（就像“单击”程序一样的不需要去找问题所在）

Performance

Q:I'm using glDrawPixels and glReadPixels in OpenGL. I'm seeing poor performance. What should I do?
我在程序中使用了OGL函数 glDrawPixels 和 glReadPixels。但是性能却不好。我应该怎么办。

A:BGRA is and always has been the fastest format to use. (There are some cases where RGBA is OK, and usually BGR is better than RGB, but in general, BGRA is the safest mode.)
BGRA是所有色彩格式里面性能最好的。（在某些情况下RGBA也不错，但通常用BGR都要比RGB好，BGRA是最稳妥的选择）。

The fastest performance you'll get a readback is approximately 160-180 MB/s (~45 MPix/s) for RGBA/BGRA which is the GPU hardware limit (due to PCI reads on the memory interface). This is with a P4 1.5GHz and above class system. The readback rate doesn't change significantly with the GeForce FX family. Note that you'll get the highest performance when you read back large areas as opposed to small ones.
对于BGRA/RGBA的色彩格式，你所能获得的回读的性能极限大约在160-180MB/s（约15MPix/s），这个硬件限制决定的（取决于PCI的传输速度[注：因为是从显存到内存，要经过总线传输]）。这是在与P4 1.5GHz相当的CPU或更高的条件上得出的结论。回读速度在未来的FX显卡上不会有明显改善。注意，一次读取的区域越大，你所获得性能越高。

For glDrawPixels(), performance it depends on the GPU, path, and the texture size but for NV28GL we achieve ~130 MPix/s RGBA (520 MB/s) and for Quadro FX we achieve ~170 MPix/s RGBA (680 MB/s) for images >128x128.
而对于glDrawPixels()来说，它的性能是与GPU、path[注：没理解这个词什么意思。。。高人指点一下，通道？渲染途径？]和纹理大小有关。在图像大于128*128的情况下，NV28GL上我们使用RGBA格式能获得约130MPix/s（520MB/s）的填充能力，在Quadro FX上我们使用RGBA格式能获得约170MPix/s（680MB/s）的填充能力。

These numbers will vary based on your particular system so you may consider measure these yourself using GLperf.
由于以上数据依赖于特定平台上而变化，因此你可以使用GLperf工具来测量一下实际性能。

Using pixel data range on an AGP 8x system we can achieve writes at ~1.7GB/s and ~960 MB/s on an AGP 4x system. More information is available at: http://www.nvidia.com/dev_content/nvopenglspecs/GL_NV_pixel_data_range.txt
在AGP 8x上使用像素域[注：Pixel Data Range，可能指的一段连续的像素，高人指点。。。]我们能获得1.7GB/s的写性能，在AGP 4x上能得960MB/s的写性能。更多信息请访问XXXXXXX。。。。OTZ。。。。

Q: I upgraded my application to DirectX9 and my hardware shadow maps no longer work! What's up?
我将我的程序升级到DX9以后，硬件阴影贴图不能用了，为什么？

A: We've changed the behavior of hardware shadow maps between the DirectX8 and DirectX9 interfaces. In DirectX8, you're required to scale the interpolated z component (that will be compared with the value in the shadow map) by the bit depth of the shadow map itself. Starting with the DirectX9 interfaces, we've changed this behavior to no longer require this scale, so the z value should be in the range [0..1], regardless of bit depth. Basically, we wanted to implement this new cleaner behavior, but didn't want to break shipping apps that rely on the old behavior, so we changed it only for the new DX9 interfaces.
DX8和DX9的硬件阴影贴图的接口发生了一些变化。在DX8种，我们需要将经过插值以后的z分量（它会与阴影贴图的值相比较）根据位深进行变换。从DX9开始，我们不再需要手工变换深度值值，因此z值将在0-1之间，而与位深无关。我们打算实现这个更加简洁的特性，但是并不想影响使用旧方法的程序[注：应该是这个意思吧]，所以我们仅在DX9中改变了这个接口的行为。

Q:Render-to-texture seems to really slow down my Direct3D application, even when I do it infrequently and render very little geometry to a small surface. Is this normal?
RTT似乎成为了我的D3D程序的性能瓶颈，甚至我很少使用RTT或者只在一个小表面上渲染很少量的几何体也非常慢。这种状况正常吗？

A:Make sure you're not creating your texture in the D3DPOOL_MANAGED pool. The Direct3D runtime needs a local copy of all MANAGED textures to restore them on mode switch or for texture management, so rendering to these implies that a readback from local video memory to system memory will occur, dramatically hurting performance. For render targets, use D3DPOOL_DEFAULT instead.
首先确定你不是在D3DPOOL_MANAGED Pool中创建纹理的。D3D运行时在会对所有的MANAGED纹理做本地拷贝以便在状态转换时重储和操作纹理。因此当程序渲染到这些纹理的时候便意味着从本地显存回读到系统内存的发生，从而引起显著的性能下降。对于用于渲染目标的纹理，应该使用D3DPOOL_DEFAULT申请内存池。

Q:My application is slow. How can I figure out what's causing the slowdown?
我的程序速度很慢。我怎么知道是哪个部分影响了速度的？

A:The key is to identify your application's bottleneck. There are several ways to do this:
关键是在于要找到程序的瓶颈所在。你可以用以下方法判断问题出处：

Eliminate all file accesses. Any hard disk accesses will surely kill your frame rate. This is easy enough to detect--just take a look at your computer's "hard disk in use" light.
消除所有的文件存取操作，任何的硬盘存取行为都将大幅降低你的帧率。这种情况很容易区分出来，只需要你看下计算机的硬盘工作指示灯就好。[注：这句话很有意思，但是这也说明，如果你的程序发生大量的页面文件切换，由于磁盘操作的存在，性能也好不到哪里去。页面交换时硬盘灯也会闪个不停。]

Run identical GPUs on different speed CPUs. If the frame rate varies, your application is CPU-limited.
使用相同的GPU在不同速度的CPU上跑你的程序。如果帧率不一样，那么你的程序中CPU是性能瓶颈。

Decrease your AGP speed from your system BIOS. If the frame rate varies, your application is AGP bandwidth-limited.
在BIOS中降低CPU速度。如果帧率不同，AGP带宽是程序的性能障碍。

Reduce your GPU's core clock. If the slower core clock reduces performance, then your application is limited by the vertex shader, rasterization, or the fragment shader (i.e, shader-limited).
放慢你的GPU芯片时钟。如果性能下降了，那么vs、光栅化、fs都有可能成为制约你程序性能的因素。

Reduce your GPU's memory clock. If the slower memory clock affects performance, your application is limited by texture or frame-buffer bandwidth (GPU bandwidth-limited).
放慢你的显存颗粒时钟。如果变慢的记忆体时钟影响了性能，那么程序是受到纹理或者帧缓存带宽的限制（GPU带宽限制）

Ok, now I know what my application's bottleneck is. How do I get rid of it and make my application run faster?
那么现在你已经找到性能的瓶颈所在了。我们如何消除它好让我们的程序跑的更快呢？

If you are CPU-limited: Try running VTune or a similar performance tool to find out where most of your time is being spent. Note that the graphics driver is a potential CPU consumer, particularly if you are using the GPU in non-standard ways. One common way to lose parallelism between the CPU and the GPU is locking resources (vertex buffers or textures), or reading back data from the GPU to the CPU.
如果是CPU受限：你可以运行VTune或者类似的性能评测工具用来找出代码中最耗时的部分。注意，显卡驱动也是一个潜在的CPU消费者，尤其是在没有按照标准方式使用GPU的时候。资源锁定或者从GPU回读数据至CPU都是常见的使CPU和GPU的丧失并行性的罪魁祸首。

If you are AGP-bandwidth-limited: Make sure your AGP settings are maximized. Transfer less data per frame to the GPU. Today, we see very few applications that are AGP-bandwidth-limited.
如果是AGP带宽受限：首先确定一下你的AGP的设置到最高。如果这样的话，每帧应该向GPU传递更少的数据。不过在今天，这种情况已经不多见。

If you are shader-limited: Make sure you've balanced the workload between the vertex and fragment programs. For example, calculations that can be linearly interpolated belong in the vertex shader, not in the fragment shader. Use only the amount of precision that you need (choose between float, half, and fixed data types prudently). Try encoding functions in textures.
如果是shader速度受限：首先要保证你在顶点和像素着色器上的工作量应该均衡分布[注：这里的均衡不是说“平均”，而是说将合适的工作和工作量放在合适的shader上]。例如进行线形插值的运算应该放在vs而不是fs上。另外，只选择必要的精度（在float，half，fixed data types间谨慎选择）。尽量使用纹理查找替代函数计算[注：原文是“在纹理中编写函数”]。

If you are GPU bandwidth-limited: Try reducing the size of your textures. You may also be performing too many blending operations, which are costly.
如果是GPU带宽受限：减少你的纹理大小。可能你执行了太多的混合操作，那玩意是很昂贵的。

How do I time my rendering code? (How do I know how long it takes the GPU to render something?)
我如何获取渲染代码的耗时？（我怎么知道渲染花了多长时间？）

The wrong answer is to time all Direct3D or OpenGL calls. This simply times how long it takes the driver to submit the rendering request to the push-buffer. The actual rendering work is done asynchronously and later. There is no direct way for you to measure how long the GPU takes to process a particular rendering call.
D3D和OpenGL的调用时间和并不是渲染耗时。这个时间仅仅是通过驱动向显卡提交渲染请求并将渲染请求放置到缓冲区的时间。实际的渲染工作在稍候以异步的方式完成。没有什么方法能够直接测出GPU处理一个指定的渲染请求的耗时。

lingjingqiu · 发表于 2006-9-3 01:01:00

更新一下吧，修正了不少的语法错误。。。汗。

♂樱♀ · 发表于 2006-9-3 01:12:00

我看到了，正在处理

♂樱♀ · 发表于 2006-9-3 01:19:00

唔，来回同步似乎有点麻烦
spin_lock(&lock)
//code_here
spin_unlock(&lock)

千里马肝 · 发表于 2006-9-26 11:36:00

Nice work!

plantlove · 发表于 2007-3-9 09:08:00

喔好啊

aovi · 发表于 2007-10-7 05:10:00

llm0818 · 发表于 2007-11-25 16:57:00

你是楼主吧一定很厉害~~我发了个帖子，那里边有个问题，挺长就不在这里COPY了你帮我去看看麻烦你了~~~

llm0818 · 发表于 2007-11-25 18:11:00

不好意思我那个问题解决了~~暂时不用麻烦你了
咔咔~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

tocy · 发表于 2008-7-15 09:25:00

慢慢来

账号		自动登录	找回密码
密码			立即注册

General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读。

Re:General FAQ On developer.nvidia.com(未完,9.1更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.1更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.1更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读

Re:General FAQ On developer.nvidia.com(未完,9.4更新)。初学者必读