|

楼主 |
发表于 2004-6-10 19:20:00
|
显示全部楼层
Re:nvidia gpu性能优化
3.4.1. Choose the Lowest Pixel Shader Version That Works
Choose the lowest pixel shader version for accomplishing your goals so that
your application looks good on the widest range of hardware. It is often wise to
choose specific shaders that you will use to differentiate your content on highend
hardware.
选择能完成你所需要功能的最低版本的pixel shader(据我测试,在很多硬件上,pixel shader
2.x要比pixel shader 1.1慢上将近40%)
3.4.2. Compile Pixel Shaders Using the ps_2_a Profile
Microsoft’s HLSL compiler (fxc.exe) adds chip-specific optimizations based
on the profile that you’re compiling for. If you’re using a GeForce FX GPU and
your shaders require ps_2_0 or higher, you should use the ps_2_a profile,
which is a superset of ps_2_0 functionality that directly corresponds to the
GeForce FX family. Compiling to the ps_2_a profile will probably give you
better performance than compiling to the generic ps_2_0 profile. Please note
that the ps_2_a profile was only available starting with the July 2003 HLSL
release.
In general, you should use the latest version of fxc (with DirectX 9.0c or
newer), since Microsoft will add smarter compilation and fix bugs with each
release. For GeForce 6 Series GPUs, simply compiling with the appropriate
profile and latest compiler is sufficient.
微软提供的Hlsl编译器(fxc.exe)增加了对特定芯片集的优化
如果你使用GeForceFX系列gpu,而且你的pixelshader需要ps 2.0或者更高版本
那么你在用fxc编译的时候,可以使用ps_2_a这个profile
作为ps_2_0的超集,使用ps_2_a可以让编译出来的asm代码在你的GeForceFX系列gpu上
获得更好的性能,注意的是ps_2_a只在2003/7月后的hlsl编译器中得到支持(dx9 sdk
Update 2003 summer里就带)
3.4.3. Choose the Lowest Data Precision That Works
Another factor that affects both performance and quality is the precision used
for operations and registers.
另外一个影响性能和品质平衡的要素就是寄存器的精度
The GeForce FX and GeForce 6 Series GPUs
support 32-bit and 16-bit floating point formats (called float and half,
respectively), and a 12-bit fixed-point format (called fixed).
GeForce FX 和GeForce 6系列gpu支持32bit和16bit的浮点格式(分别叫做float 和half)
以及12bit定点格式(叫做fixed)
Float 和half 都几乎符合IEEE的标准,其中float 是s23e8,half是s10e5
type is very IEEE-like, with an s23e8 format. The half is also IEEE-like, in
an s10e5 format. The 12-bit fixed type covers a range from [-2,2) and is not
12 bit定点格式范围在-2与2之间,而且不能在ps 2.0或者更高版本的ps中使用
在d3d中,12 bit定点格式只有ps 1.0 ~ ps 1.4支持,在Cg或者OpenGL中,要使用NV_fragment_program这个扩展来得到支持
available in the ps_2_0 and higher profiles. The fixed type is available with
the ps_1_0 to ps_1_4 profiles in Direct3D, and with either the
NV_fragment_program extension or Cg in OpenGL.
定点格式 拥有最快的速度,但是也是最低的精度,一般用来做一些低精度的颜色计算
The performance of these various types varies with precision:
The fixed type is fastest and should be used for low-precision
calculations, such as color computation.
如果你的程序,需要用浮点数精度来进行计算(动态范围大于-2~2)
Half格式能给你带来更高的性能。
If you need floating-point precision, the half type delivers higher
performance than the float type. Prudent use of the half type can triple
frame rates, with more than 99% of the rendered pixels within one leastsignificant
bit (LSB) of a fully 32-bit result in most applications!
保守的说,在大多数应用程序中,当大于99%的像素渲染结果的最低有效位(LSB)是在
一个完整的32-bit范围内(这个范围非常大),使用half格式将带来3倍的fps提升
If you need the highest possible accuracy, use the float type.
如果你需要最高精度的计算结果,使用float 格式
You can use the /Gpp flag (available in the July 2003 HLSL update) to force
everything in your shaders to half precision. After you get your shaders
working and follow the tips in this section, enable this flag to see its effect on
performance and quality. If no errors appear, leave this flag enabled. Otherwise,
manually demote to half precision when it is beneficial (/Gpp provides an
upper performance bound that you can work toward).
在2003/7月以后的hlsl编译器中,可以使用/Gpp开关,将在shader中全部使用half格式,
如果达不到你的要求,可以手工修改。
When you use the half or fixed types, make sure you use them for varying
parameters, uniform parameters, variables, and constants. If you’re using
assembly language with the ps_2_0 profile, use the _pp modifier to reduce the
precision of your calculations.
当你使用half或者fixed格式,保证你在使用可变或者固定参数,变量,常量的都使用同样的格式,如果你使用汇编语言在ps 2.0上编程,使用 _pp 这个修饰后缀来降低计算精度
Many color-based operations can be performed with the fixed or half data
types without any loss of precision (for example, a tex2D*diffuseColor
operation).
许多基于颜色的操作可以工作在fixed或者half格式上,而不会带来任何精度的损失
比如(tex2D*diffuseColor)
In OpenGL, you can speed up shaders consisting of mostly floating-point
operations by doing operations (like dot products of normalized vectors) in
fixed-point precision.
在opengl里,可以加速大部分的浮点操作(比如单位向量的点积)通过让他们工作在fixed格式
For instance, the result of any normalize can be half-precision, as can colors.
Positions can be half-precision as well, but they may need to be scaled in the
vertex shader to make the relevant values near zero.
For instance, moving values to local tangent space, and then scaling positions
down can eliminate banding artifacts seen when very large positions are
converted to half precision.
举个例子,任何normalize向量都可以使用half精度,color同样可以
位置值也能在half精度下工作的很好,但是可能需要通过vertexshader将其
缩放到0周围,比如,把一个非常大的值,比如位置值,变换到局部切线空间,就可以缩小这个值而消除掉转换到half精度而带来的难堪的条带
|
|