compute shader

computer shader是在顯示卡上運作的程式，在正常的渲染管道之外。被用于大量并行的gpu算法，或加速部分遊戲渲染。想要高效利用他們，最好深入的了解cpu機制和并行算法。還有DirectCompute，OpenGL Compute，CUDA或openCL。

unity 的compute shader很像DX11 DirectCompute技術。能工作的平台有：

1. windows，有Dx11或Dx12顯示卡Api，shader model 4.5 gpu；

2. 用metal顯示卡api的macOS和ios；

3. Android linux和windows有Vulkan Api；

4. 現代openGL平台（openGL 4.3在linux和windows；gl es 3.1在安卓）。注意mac os x不支援opengl 4.3；

5. 現代控制台（sony ps4和微軟xbox one）

運作時判斷是否支援compute shader可以用SystemInfo.supportsComputeShaders。

compute shader資源

類似于普通的shader，compute shader在工程裡也是資源檔案，.compute擴充名。他們是用Dx11風格的hlsl語言縮寫。用#pragma 編譯指令指定哪些很少被當成compute shader核心編譯，如下：

#pragma kernel KMain
[numthreads(, , )]
void KMain(uint2 groupId : SV_GroupID, uint2 groupThreadId : SV_GroupThreadID, uint2 dispatchThreadId : SV_DispatchThreadID)
{
    // Upper-left pixel coordinate of quad that this thread will read
    int2 threadUL = (groupThreadId << ) + (groupId << ) - ;

    // Downsample the block
    float2 offset = float2(threadUL);
    float4 p00 = _Source.SampleLevel(sampler_LinearClamp, (offset                    + ) * _Size.zw, );
    float4 p10 = _Source.SampleLevel(sampler_LinearClamp, (offset + float2(, ) + ) * _Size.zw, );
    float4 p01 = _Source.SampleLevel(sampler_LinearClamp, (offset + float2(, ) + ) * _Size.zw, );
    float4 p11 = _Source.SampleLevel(sampler_LinearClamp, (offset + float2(, ) + ) * _Size.zw, );

    // Store the  downsampled pixels in LDS
    uint destIdx = groupThreadId.x + (groupThreadId.y << u);
    Store2Pixels(destIdx     , p00, p10);
    Store2Pixels(destIdx + u, p01, p11);

    GroupMemoryBarrierWithGroupSync();

    // Horizontally blur the pixels in LDS
    uint row = groupThreadId.y << u;
    BlurHorizontally(row + (groupThreadId.x << u), row + groupThreadId.x + (groupThreadId.x & u));

    GroupMemoryBarrierWithGroupSync();

    // Vertically blur the pixels in LDS and write the result to memory
    BlurVertically(dispatchThreadId, (groupThreadId.y << u) + groupThreadId.x);
}

表示kMain函數被當作compute shader編譯，以及：

// test.compute

#pragma kernel FillWithRed

RWTexture2D<float4> res;

[numthreads(,,)]
void FillWithRed (uint3 dtid : SV_DispatchThreadID)
{
    res[dtid.xy] = float4(,,,);
}

語言是标準dx 11 hlsl語言。一個compute shader資源檔案必須包含至少一個會被調用的compute kernel。可以有多個，寫多行#pragma語句即可。

當用#pragma時，注意同一行加“// 。。”這種注釋會産生編譯錯誤。

#pragma後面可以跟這個shader編譯需要的宏

#pragma kernel KernelOne SOME_DEFINE DEFINE_WITH_VALUE=1337
#pragma kernel KernelTwo OTHER_DEFINE

調用compute shader

在你的腳本中，定義一個ComputeShader類型的變量，給這個資源一個引用。

如下在resource類裡定義ComputeShader，如下：

public sealed class ComputeShaders
        {
            public ComputeShader exposureHistogram;
            public ComputeShader lut3DBaker;
            public ComputeShader texture3dLerp;
            public ComputeShader gammaHistogram;
            public ComputeShader waveform;
            public ComputeShader vectorscope;
            public ComputeShader multiScaleAODownsample1;
            public ComputeShader multiScaleAODownsample2;
            public ComputeShader multiScaleAORender;
            public ComputeShader multiScaleAOUpsample;
            public ComputeShader gaussianDownsample;
        }

在resource界面，将compute shader賦過來：

compute shader
調用代碼如下：先設定參數，再用ComputeShader.Dispatch方法調用。從unity腳本文檔裡查ComputeShader類的使用。如下CommandBuffer用

public void DispatchCompute(ComputeShader computeShader, int kernelIndex, int threadGroupsX, int threadGroupsY, int threadGroupsZ);

調用computeShader：

void PushUpsampleCommands(CommandBuffer cmd, int lowResDepth, int interleavedAO, int highResDepth, int? highResAO, RenderTargetIdentifier dest, Vector3 lowResDepthSize, Vector2 highResDepthSize, bool invert = false)
        {
            var cs = m_Resources.computeShaders.multiScaleAOUpsample;
            int kernel = cs.FindKernel(highResAO == null
                ? invert
                    ? "main_invert"
                    : "main"
                : "main_blendout");

            float stepSize =  / lowResDepthSize.x;
            float bTolerance =  - Mathf.Pow(, m_Settings.blurTolerance.value) * stepSize;
            bTolerance *= bTolerance;
            float uTolerance = Mathf.Pow(, m_Settings.upsampleTolerance.value);
            float noiseFilterWeight =  / (Mathf.Pow(, m_Settings.noiseFilterTolerance.value) + uTolerance);

            cmd.SetComputeVectorParam(cs, "InvLowResolution", new Vector2( / lowResDepthSize.x,  / lowResDepthSize.y));
            cmd.SetComputeVectorParam(cs, "InvHighResolution", new Vector2( / highResDepthSize.x,  / highResDepthSize.y));
            cmd.SetComputeVectorParam(cs, "AdditionalParams", new Vector4(noiseFilterWeight, stepSize, bTolerance, uTolerance));

            cmd.SetComputeTextureParam(cs, kernel, "LoResDB", lowResDepth);
            cmd.SetComputeTextureParam(cs, kernel, "HiResDB", highResDepth);
            cmd.SetComputeTextureParam(cs, kernel, "LoResAO1", interleavedAO);

            if (highResAO != null)
                cmd.SetComputeTextureParam(cs, kernel, "HiResAO", highResAO.Value);

            cmd.SetComputeTextureParam(cs, kernel, "AoResult", dest);

            int xcount = ((int)highResDepthSize.x + ) / ;
            int ycount = ((int)highResDepthSize.y + ) / ;
            cmd.DispatchCompute(cs, kernel, xcount, ycount, );
        }

和compute shader聯系緊密的是compute buffer，用法如下

if (m_Data == null)
                m_Data = new ComputeBuffer(m_NumBins, sizeof(uint));

            var compute = context.resources.computeShaders.gammaHistogram;
            var cmd = context.command;
            cmd.BeginSample("GammaHistogram");

            // Clear the buffer on every frame as we use it to accumulate values on every frame
            int kernel = compute.FindKernel("KHistogramClear");
            cmd.SetComputeBufferParam(compute, kernel, "_HistogramBuffer", m_Data);
            cmd.DispatchCompute(compute, kernel, Mathf.CeilToInt(m_NumBins / (float)m_ThreadGroupSizeX), , );

另外，其他用法請搜尋文檔。

RenderTextures也可以從compute shader寫入，如果設定随機通路權限的話。查詢RenderTexture.enableRandomWrite。

compute shader中的texture 采樣

在unity中，貼圖和采樣器不是分開的事物。是以要在compute shader中使用采樣器的話，需要遵循下面的unity專門的規則：

1. 和貼圖名稱用同樣的名稱，如貼圖名為Texture2D MyTex，則SamplerState samplerMyTex。這樣，sampler會被初始化為該貼圖的過濾模式，wrap模式，異向模式。

2. 用預定于的采樣器，名稱必須帶有Linear或Point（過濾模式）和Clamp或Repeat（wrap模式）。比如SamplerState MyLinearClampSamplers建立一個linear過濾模式和Clamp wrap模式的采樣器。更多的内容查詢SamplerState

跨平台支援

像正常shader一樣，unity能把compute shader從hlsl轉換成其他shader語言。是以，為了最簡單的跨平台編譯，你可以用hlsl寫compute shader。然而有幾個因素需要考慮。

跨平台最好的實踐

Dx 11支援許多其他平台不支援的行為（比如metal 或OpenGL ES）。是以，你需要保證你的shader在其他支援能力低的平台上工作正常。下面幾個問題要注意：

1. 記憶體溢出通路。Dx11 讀的時候會傳回0，不會出問題。但很少支援的平台可能會gpu崩潰。另外，dx11的一些技巧，比如buffer大小和你的線程組數量是無關的，[ 試圖從緩沖區的開始或結束讀取相鄰的資料元素，以及類似的不相容性。

2. 初始化你的資源，新的緩沖區和貼圖内容是未定義的。一些平台可能全是0，其他的可能是任意東西，包括為空。

3. 綁定你的compute shader需要的所有資源。即使你确定地知道這個shader在他的目前狀态下由于分支原因不會用到這個資源，你仍然需要确定這個資源綁定到shader上。

特定平台的差異

Metal（ios或tvOS平台）不支援貼圖上的原子操作。Metal也不支援buffer上的GetDimensions查詢。将buffer大小當成常量傳給shader如果必要的話。
openGL ES3.1（安卓、ios或txOS平台）一次隻支援4個compute buffer。實際實作可能支援更多，但是如果開發openGL ES，你應該将相關的資料分組放到一個結構裡，而不是每種資料放到自己的buffer裡。

hlsl-only和glsl-only compute shader

一般，compute shader檔案是用hlsl寫的，被自動編譯或翻譯到所有必要的平台。然而，可以阻止将它翻譯到其他語言，或者手動寫glsl compute shader。

下面的内容隻應用于hlsl-only和glsl-only compute shader，而不是跨平台編譯。因為這些内容會導緻compute shader從一些平台排除。

1.被CGPROGRAM和ENDCG包圍的 compute shader不能被非hlsl平台處理；

2. 被GLSLPROGRAM和ENDGLSL包圍的compute shader被當作glsl處理，逐字排除。你要注意，自動翻譯的shader的buffer遵循hlsl資料布局，手動寫的glsl shader遵循glsl布局規則。

compute shader

compute shader資源

調用compute shader

compute shader中的texture 采樣

跨平台支援

跨平台最好的實踐

特定平台的差異

hlsl-only和glsl-only compute shader

繼續閱讀

2017又圓一夢,2018重新開始

向量的叉乘和點乘在Unity中的意義

Unity 建構時不導出StreamingAssets内容🍡Follow me

unity自學第二天

Unity C# 簡單的Chat 聊天功能的實作

Unity實作撥打電話

在unity的C#中使用HDR的顔色

Unity中使用C++的類

unity之Mecanim動畫系統學習（6）：Avatar Maskunity之Mecanim動畫系統學習（6）

unity之Mecanim動畫系統學習（4）:動畫播放邏輯unity之Mecanim動畫系統學習（4）

Unity燈光與渲染Unity燈光類型光照模式燈光探針反射探針優化

unity中實作背景滾動

Unity中的數組介紹

Unity3D中背景音樂和相關音效的添加，背景音樂暫停、停止和播放的設定

Unity遊戲開發背景知識