Sin Cos 在directx中的實作

Sin-Cos 指令在CPU中一般是以CORDIC算法實作的，但是在GPU中，因為GPU處理向量運算速度更快，是以一般是以泰勒級數展開來計算的。

SINCOS Instruction

The SINCOS instruction computes sine and cosine, in radians. The X component of the result contains cos(x); the Y component contains sin(x).

Format

instruction token that contains D3DSIO_SINCOS. Instruction length is 4.

destination parameter token using the D3DSPR_TEMP register type.

first source parameter token. Requires explicit use of replicate swizzle, that is, the X, Y, Z, or W swizzle component (or the R, G, B, or A equivalent) must be specified.

The following source tokens are for pixel and vertex shader versions earlier than 3_0. That is, for pixel and vertex shader version 3_0 and later, only the first source parameter token is used.

second source parameter token using the D3DSPR_TEMP register type.

third source parameter token using the D3DSPR_TEMP register type.

Comments

The second and third sources could be used as temporary registers.

Source register rules:

src1. selected_channel is an angle measured in radians between -Pi and +Pi.

src2 = (âˆ’1.f/(7!*128), âˆ’1.f/(6!*64), 1.f/(4!*16), 1.f/(5!*32) ).

src3 = (âˆ’1.f/(3!*8), âˆ’1.f/(2!*8), 1.f, 0.5f).

The ordering of the last two numbers in src2 and src3 is specifically chosen to accommodate pixel shader 2.0, which also has the SINCOS macro. Reversing these numbers means that the macro expansion can use one of the few custom source swizzles that is available to ps_2_0 (vertex shaders have general swizzle so there is no issue). This allows the same custom constants to be used, regardless of where sincos is being used.

Destination register rules:

dest.x = cos(src1.selected_channel), dest.y = sin(src1.selected_channel), dest.z is undefined after the instruction.

dest should not be the same register as src1.

Only X and Y are allowed to be in the destination write mask.

The SINCOS instruction is a macro instruction that takes eight instruction slots.

Only X and Y are allowed to be in the destination write mask.

The maximum absolute error is 0.002.

Operation

The following shows the Taylor series for sin(x) and cos(x):

(1) cos(x) = 1 - x2/2! + x4/4! - x6/6!

sin(x) = x - x3/3! + x5/5! - x7/7! = x*(1 - x2/3! + x4/5! - x6/7!)

To increase precision we compute cos(x) using cos(x/2):

(2) cos(x) = 1 - 2*sin(x/2)*sin(x/2)

sin(x) = 2*sin(x/2)*cos(x/2)

(1) can be re-written by substituting x to x/2 as:

(3) cos(x) = 1 - x2/(2!*4) + x4/(4!*16) - x6/(6!*64)

sin(x) = x/2 - x3/(3!*8) + x5/(5!*32) - x7/(7!*128) =

= x*(0.5f - x2/(3!*8) + x4/(5!*32) - x6/(7!*128))

Lets, write (3) in vector form. Here a,b,c,d are 2D constant vectors:

a + x2*b + x4*c + x6*d = a+x2*(b + x2*(c + x2*d)

The following shows the implementation for SINCOS:

SRC2 should be constant (1.f/(7!*128), 1.f/(6!*64), 1.f/(4!*16), 1.f/(5!*32) )

SRC3 should be constant (1.f/(3!*8), 1.f/(2!*8), 1.f, 0.5f )

VECTOR v1 = EvalSource(SRC1);

VECTOR v2 = EvalSource(SRC2);

VECTOR v3 = EvalSource(SRC3);

VECTOR v;

MUL v.z, v1.w, v1.w ; x*x

MAD v.xy, v.z, v2.xy, v2.wz

MAD v.xy, v.xy, v.z, v3.xy

MAD v.xy, v.xy, v.z, v3.wz ; Partial sin(x/2) and final cos(x/2)

MUL v.x, v.x, v1.w ; sin(x/2)

MUL v.xy, v.xy, v.x ; compute sin(x/2)*sin(x/2) and sin(x/2)*cos(x/2)

ADD v.xy, v.xy, v.xy ; 2*sin(x/2)*sin(x/2) and 2*sin(x/2)*cos(x/2)

ADD v.x, -v.x, v3.z ; cos(x) and sin(x)

WriteResult(v, DST);

If an application must compute SINCOS for an arbitrary angle, the angle can be mapped to the range -Pi…+Pi by using the following macro (r0.x holds the original angle):

def c0, Pi, 0.5f, 2*Pi, 1/(2*Pi)

mad r0.x, r.x, c0.w, c0.y

frc r0.x, r0.x

mad r0.x, r0.x, c0.z, -c0.x

Requirements

Available in Windows Vista and later versions of the Windows operating systems.

Sin Cos 在directx中的實作

繼續閱讀

Codeforces 1417 D. Make Them Equal(思維+構造)

查找算法之二分查找查找算法之二分查找

查找算法學習之二分查找（Python版本）——BinarySearch

CQ V1.0分詞bates(基于雙數組tire樹)—應該是目前最快的中文分詞算法

Command Network(POJ 3164)---定根最小樹形圖模闆題題目描述輸入格式輸出格式輸入樣例輸出樣例分析源程式

開源低帶寬語音編解碼器

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

極大似然法(ML)與最大期望法(EM)

C++ 第十五周報告1--《冒泡法排序》

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希