nvidia gpu性能优化

secondage · 发表于 2004-6-9 14:45:00

随便翻译了一些，以后会陆续翻译，我只对d3d有了解
所以下面的注释都是基于d3d的，而且有些条目要引申开来有太多的话要说，我就仅仅直译了，请大家批评指正

3.6.1. Double-Speed Z-Only and Stencil Rendering
The GeForce FX and GeForce 6 Series GPUs render at double speed when
rendering only depth or stencil values. To enable this special rendering mode,
you must follow the following rules:
在仅渲染深度值或模版值的情况下，GeForce FX以及GeForce 6系列gpu 能以双倍
速度进行渲染。要实现这种特殊渲染模式，必循遵循下列规则：
Color writes are disabled
颜色写入操作必须关闭 (D3DRS_COLORWRITEENABLE to 0x0)
2x or 4x antialiasing is not enabled
2或者4倍反走样操作必须关闭（D3DRS_MULTISAMPLEANTIALIAS to false）
Texkill has not been applied to any fragments
Texkill 不能被应用到任何片断操作(d3d中的ps)中
Depth replace has not been applied to any fragments
不能在片断操作（d3d中的ps）中置换深度信息
Alpha test is disabled
关闭alpha test操作（D3DRS_ALPHATESTENABLE to FALSE）
No color key is used in any of the active textures
当前纹理不能有colorkey
No user clip planes are enabled
不能有用户自定义的裁减平面
No floating-point render targets are in use
不能使用浮点渲染目标（渲染目标的格式不能是D3DFMT_R16F，D3DFMT_G16R16F，或者D3DFMT_A16B16G16R16F）
Pixel shaders are disabled
关闭pixel shader
Render to a non-power-of-2 texture
如果渲染到贴图，贴图的尺寸不能是2次幂的

3.6.2. Early-Z Optimization
早期z优化
Early-z optimization (sometimes called “z-cull” improves performance by
早期z优化（有时候被叫做z筛选）通过剔除被遮挡的表面的渲染来达到提高性能的目的
avoiding the rendering of occluded surfaces. If the occluded surfaces have
如果我们在被遮挡的面上应用了非常复杂的shader，那么早期z优化可以帮我们节省大量的计算时间
expensive shaders applied to them, early-z optimization can save a large amount
of computation time. To take advantage of early-z optimization, follow these
guidelines:
要从早期z优化中受益，必循遵循以下准则
Don’t create triangles with holes in them (that is, avoid alpha test or texkill)
不要在三角形上开洞，也就是说，不要在一个三角形上应用alpha test或者texkill操作
Don’t modify depth (that is, allow the GPU to use the interpolated depth
value)
不要修改深度值，允许gpu得到原始的内插深度值
Violating these rules can invalidate the data the GPU uses for early
optimization, and can disable early-z optimization until the depth buffer
is cleared again.
一旦违反上述准则，将打断gpu进行早期z优化，一直到下次深度缓存被清空

QiKi · 发表于 2004-6-9 15:54:00

这是什么资料里面的？

secondage · 发表于 2004-6-9 17:25:00

NVIDIA GPU Programming Guide

hourousha · 发表于 2004-6-9 19:18:00

double z/stencial speed和early z-culling早就知道了。
与NV相反，ATI的R3xx在FSAA 2x。FSAA 4x打开以后会出现double speed z/stencial。而early z-culling原则和NV基本相同。
应该多说说shader性能优化，这点才是NV3x开发最值得注意的。

secondage · 发表于 2004-6-10 10:00:00

与NV相反，ATI的R3xx在FSAA 2x。FSAA 4x打开以后会出现double speed z/stencial

这句话有什么依据没？我没看过这文档

hourousha · 发表于 2004-6-10 13:05:00

secondage: Re:nvidia gpu性能优化

与NV相反，ATI的R3xx在FSAA 2x。FSAA 4x打开以后会出现double speed z/stencial

这句话有什么依据没？我没看过这文档

测出来的，看这个帖子，而且nv40在AA情况下z/stencial翻倍好像也不会失效。
http://bbs.gzeasy.com/index.php?showtopic=177098&st=48

secondage · 发表于 2004-6-10 14:28:00

呵呵,如果能在aa的同时保持double speed depth render
现在的aa方式是不可能的，因为double speed depth render最重要的一点就是要关闭对frame buffer的关闭，要所有的的pixel cell都处理depth值，而现在ati和nv的aa算法，都必须要访问frame buffer，所以，我真想知道你怎么测出来的

这帖子只想翻译东西用，探讨请另开一贴

secondage · 发表于 2004-6-10 19:20:00

3.4.1. Choose the Lowest Pixel Shader Version That Works
Choose the lowest pixel shader version for accomplishing your goals so that
your application looks good on the widest range of hardware. It is often wise to
choose specific shaders that you will use to differentiate your content on highend
hardware.
选择能完成你所需要功能的最低版本的pixel shader（据我测试，在很多硬件上，pixel shader
2.x要比pixel shader 1.1慢上将近40%）
3.4.2. Compile Pixel Shaders Using the ps_2_a Profile
Microsoft’s HLSL compiler (fxc.exe) adds chip-specific optimizations based
on the profile that you’re compiling for. If you’re using a GeForce FX GPU and
your shaders require ps_2_0 or higher, you should use the ps_2_a profile,
which is a superset of ps_2_0 functionality that directly corresponds to the
GeForce FX family. Compiling to the ps_2_a profile will probably give you
better performance than compiling to the generic ps_2_0 profile. Please note
that the ps_2_a profile was only available starting with the July 2003 HLSL
release.
In general, you should use the latest version of fxc (with DirectX 9.0c or
newer), since Microsoft will add smarter compilation and fix bugs with each
release. For GeForce 6 Series GPUs, simply compiling with the appropriate
profile and latest compiler is sufficient.
微软提供的Hlsl编译器（fxc.exe）增加了对特定芯片集的优化
如果你使用GeForceFX系列gpu，而且你的pixelshader需要ps 2.0或者更高版本
那么你在用fxc编译的时候，可以使用ps_2_a这个profile
作为ps_2_0的超集，使用ps_2_a可以让编译出来的asm代码在你的GeForceFX系列gpu上
获得更好的性能，注意的是ps_2_a只在2003/7月后的hlsl编译器中得到支持（dx9 sdk
Update 2003 summer里就带）
3.4.3. Choose the Lowest Data Precision That Works
Another factor that affects both performance and quality is the precision used
for operations and registers.
另外一个影响性能和品质平衡的要素就是寄存器的精度
The GeForce FX and GeForce 6 Series GPUs
support 32-bit and 16-bit floating point formats (called float and half,
respectively), and a 12-bit fixed-point format (called fixed).
GeForce FX 和GeForce 6系列gpu支持32bit和16bit的浮点格式（分别叫做float 和half）
以及12bit定点格式（叫做fixed）
Float 和half 都几乎符合IEEE的标准，其中float 是s23e8，half是s10e5
type is very IEEE-like, with an s23e8 format. The half is also IEEE-like, in
an s10e5 format. The 12-bit fixed type covers a range from [-2,2) and is not
12 bit定点格式范围在-2与2之间，而且不能在ps 2.0或者更高版本的ps中使用
在d3d中，12 bit定点格式只有ps 1.0 ~ ps 1.4支持，在Cg或者OpenGL中，要使用NV_fragment_program这个扩展来得到支持

available in the ps_2_0 and higher profiles. The fixed type is available with
the ps_1_0 to ps_1_4 profiles in Direct3D, and with either the
NV_fragment_program extension or Cg in OpenGL.

定点格式拥有最快的速度，但是也是最低的精度，一般用来做一些低精度的颜色计算

The performance of these various types varies with precision:
The fixed type is fastest and should be used for low-precision
calculations, such as color computation.
如果你的程序，需要用浮点数精度来进行计算（动态范围大于-2~2）
Half格式能给你带来更高的性能。
If you need floating-point precision, the half type delivers higher
performance than the float type. Prudent use of the half type can triple
frame rates, with more than 99% of the rendered pixels within one leastsignificant
bit (LSB) of a fully 32-bit result in most applications!
保守的说，在大多数应用程序中，当大于99%的像素渲染结果的最低有效位（LSB）是在
一个完整的32-bit范围内（这个范围非常大），使用half格式将带来3倍的fps提升
If you need the highest possible accuracy, use the float type.
如果你需要最高精度的计算结果，使用float 格式
You can use the /Gpp flag (available in the July 2003 HLSL update) to force
everything in your shaders to half precision. After you get your shaders
working and follow the tips in this section, enable this flag to see its effect on
performance and quality. If no errors appear, leave this flag enabled. Otherwise,
manually demote to half precision when it is beneficial (/Gpp provides an
upper performance bound that you can work toward).
在2003/7月以后的hlsl编译器中，可以使用/Gpp开关，将在shader中全部使用half格式，
如果达不到你的要求，可以手工修改。
When you use the half or fixed types, make sure you use them for varying
parameters, uniform parameters, variables, and constants. If you’re using
assembly language with the ps_2_0 profile, use the _pp modifier to reduce the
precision of your calculations.
当你使用half或者fixed格式，保证你在使用可变或者固定参数，变量，常量的都使用同样的格式，如果你使用汇编语言在ps 2.0上编程，使用 _pp 这个修饰后缀来降低计算精度
Many color-based operations can be performed with the fixed or half data
types without any loss of precision (for example, a tex2D*diffuseColor
operation).
许多基于颜色的操作可以工作在fixed或者half格式上，而不会带来任何精度的损失
比如（tex2D*diffuseColor）
In OpenGL, you can speed up shaders consisting of mostly floating-point
operations by doing operations (like dot products of normalized vectors) in
fixed-point precision.
在opengl里，可以加速大部分的浮点操作（比如单位向量的点积）通过让他们工作在fixed格式

For instance, the result of any normalize can be half-precision, as can colors.
Positions can be half-precision as well, but they may need to be scaled in the
vertex shader to make the relevant values near zero.
For instance, moving values to local tangent space, and then scaling positions
down can eliminate banding artifacts seen when very large positions are
converted to half precision.
举个例子，任何normalize向量都可以使用half精度，color同样可以
位置值也能在half精度下工作的很好，但是可能需要通过vertexshader将其
缩放到0周围，比如，把一个非常大的值，比如位置值，变换到局部切线空间，就可以缩小这个值而消除掉转换到half精度而带来的难堪的条带

账号		自动登录	找回密码
密码			立即注册

nvidia gpu性能优化

Re:nvidia gpu性能优化

Re:nvidia gpu性能优化

Re:nvidia gpu性能优化

Re:nvidia gpu性能优化

Re: Re:nvidia gpu性能优化

Re:nvidia gpu性能优化

Re:nvidia gpu性能优化