天天看點

剖析虛幻渲染體系(10)- RHI

目錄

  • 10.1 本篇概述
  • 10.2 RHI基礎
    • 10.2.1 FRenderResource
    • 10.2.2 FRHIResource
    • 10.2.3 FRHICommand
    • 10.2.4 FRHICommandList
  • 10.3 RHIContext, DynamicRHI
    • 10.3.1 IRHICommandContext
    • 10.3.2 IRHICommandContextContainer
    • 10.3.3 FDynamicRHI
      • 10.3.3.1 FD3D11DynamicRHI
      • 10.3.3.2 FOpenGLDynamicRHI
      • 10.3.3.3 FD3D12DynamicRHI
      • 10.3.3.4 FVulkanDynamicRHI
      • 10.3.3.5 FMetalDynamicRHI
    • 10.3.4 RHI體系總覽
  • 10.4 RHI機制
    • 10.4.1 RHI指令執行
      • 10.4.1.1 FRHICommandListExecutor
      • 10.4.1.2 GRHICommandList
      • 10.4.1.3 D3D11指令執行
    • 10.4.2 ImmediateFlush
    • 10.4.3 并行渲染
      • 10.4.3.1 FParallelCommandListSet
      • 10.4.3.2 QueueParallelAsyncCommandListSubmit
      • 10.4.3.3 FParallelTranslateSetupCommandList
      • 10.4.3.4 FParallelTranslateCommandList
    • 10.4.4 Pass渲染
      • 10.4.4.1 普通Pass渲染
      • 10.4.4.2 Subpass渲染
    • 10.4.5 RHI資源管理
    • 10.4.6 再論多線程渲染
    • 10.4.7 RHI控制台變量
  • 10.5 本篇總結
    • 10.5.1 本篇思考
  • 特别說明
  • 參考文獻

RHI全稱是Render Hardware Interface(渲染硬體接口),是UE渲染體系中非常基礎且重要的子產品,封裝了衆多圖形API(DirectX、OpenGL、Vulkan、Metal)之間的差異,對Game和Renderer子產品提供了簡便且一緻的概念、資料、資源和接口,實作一份渲染代碼跑在多個平台的目标。

剖析虛幻渲染體系(10)- RHI

Game、Renderer、RHI分層示意圖,其中RHI是平台相關的内容。

最初的RHI是基于D3D11 API設計而成,包含了資源管理和指令接口:

剖析虛幻渲染體系(10)- RHI

開啟RHI線程的情況下,與RHI相伴相随的還有RHI線程,它負責将渲染線程Push進來的RHI中間指令轉譯到對應圖形平台的GPU指令。在部分圖形API(DX12、Vulkan、主機)支援并行的情況下,如果渲染線程是并行生成的RHI中間指令,那麼RHI線程也會并行轉譯。

剖析虛幻渲染體系(10)- RHI

UE4的渲染線程并行生成中間指令和RHI線程并行轉譯後送出渲染指令示意圖。

本篇将着重闡述RHI的基礎概念、類型、接口,它們之間的關聯,涉及的原理和機制等内容,也會少量涉及具體圖形API的實作細節。

本章将分析RHI涉及的基礎概念和類型,闡述它們之間的關系和原理。

FRenderResource是渲染線程的渲染資源代表,由渲染線程管理和傳遞,介于遊戲線程和RHI線程的中間資料。由于之前篇章雖然有涉及它的概念,但沒有詳細闡述,是以放到此篇章中。FRenderResource的定義如下:

// Engine\Source\Runtime\RenderCore\Public\RenderResource.h

class RENDERCORE_API FRenderResource
{
public:
    // 周遊所有資源, 執行回調接口.
    template<typename FunctionType>
    static void ForAllResources(const FunctionType& Function);
    static void InitRHIForAllResources();
    static void ReleaseRHIForAllResources();
    static void ChangeFeatureLevel(ERHIFeatureLevel::Type NewFeatureLevel);

    FRenderResource();
    FRenderResource(ERHIFeatureLevel::Type InFeatureLevel);
    virtual ~FRenderResource();
    
    // 以下接口隻能被渲染線程調用.

    // 初始化此資源的動态RHI資源和(或)RHI渲染目标紋理.
    virtual void InitDynamicRHI() {}
    // 釋放此資源的動态RHI資源和(或)RHI渲染目标紋理.
    virtual void ReleaseDynamicRHI() {}

    // 初始化此資源使用的RHI資源.
    virtual void InitRHI() {}
    // 釋放此資源使用的RHI資源.
    virtual void ReleaseRHI() {}

    // 初始化資源.
    virtual void InitResource();
    // 釋放資源.
    virtual void ReleaseResource();

    // 如果RHI資源已被初始化, 會被釋放并重新初始化.
    void UpdateRHI();

    virtual FString GetFriendlyName() const { return TEXT("undefined"); }
    FORCEINLINE bool IsInitialized() const { return ListIndex != INDEX_NONE; }

    static void InitPreRHIResources();

private:
    // 全局資源清單(靜态).
    static TArray<FRenderResource*>& GetResourceList();
    static FThreadSafeCounter ResourceListIterationActive;

    int32 ListIndex;
    TEnumAsByte<ERHIFeatureLevel::Type> FeatureLevel;
    
    (......)
};
           

下面是遊戲線程向渲染線程發送操作FRenderResource的接口:

// 初始化/更新/釋放資源.
extern RENDERCORE_API void BeginInitResource(FRenderResource* Resource);
extern RENDERCORE_API void BeginUpdateResourceRHI(FRenderResource* Resource);
extern RENDERCORE_API void BeginReleaseResource(FRenderResource* Resource);
extern RENDERCORE_API void StartBatchedRelease();
extern RENDERCORE_API void EndBatchedRelease();
extern RENDERCORE_API void ReleaseResourceAndFlush(FRenderResource* Resource);
           

FRenderResource隻是基礎父類,定義了一組渲染資源的行為,實際的資料和邏輯由子類實作。涉及的子類和層級比較多且複雜,下面是部分重要子類的定義:

// Engine\Source\Runtime\RenderCore\Public\RenderResource.h

// 紋理資源.
class FTexture : public FRenderResource
{
public:
    FTextureRHIRef        TextureRHI;         // 紋理的RHI資源.
    FSamplerStateRHIRef SamplerStateRHI; // 紋理的采樣器RHI資源.
    FSamplerStateRHIRef DeferredPassSamplerStateRHI; // 延遲通道采樣器RHI資源.

    mutable double        LastRenderTime; // 上次渲染的時間.
    FMipBiasFade        MipBiasFade;     // 淡入/淡出的Mip偏移值.
    bool                bGreyScaleFormat; // 灰階圖.
    bool                bIgnoreGammaConversions; // 是否忽略Gamma轉換.
    bool                bSRGB;             // 是否sRGB空間的顔色.
    
    virtual uint32 GetSizeX() const;
    virtual uint32 GetSizeY() const;
    virtual uint32 GetSizeZ() const;

    // 釋放資源.
    virtual void ReleaseRHI() override
    {
        TextureRHI.SafeRelease();
        SamplerStateRHI.SafeRelease();
        DeferredPassSamplerStateRHI.SafeRelease();
    }
    virtual FString GetFriendlyName() const override { return TEXT("FTexture"); }
    
    (......)

protected:
    RENDERCORE_API static FRHISamplerState* GetOrCreateSamplerState(const FSamplerStateInitializerRHI& Initializer);
};

// 包含了SRV/UAV的紋理資源.
class FTextureWithSRV : public FTexture
{
public:
    // 通路整張紋理的SRV.
    FShaderResourceViewRHIRef ShaderResourceViewRHI;
    // 通路整張紋理的UAV.
    FUnorderedAccessViewRHIRef UnorderedAccessViewRHI;

    virtual void ReleaseRHI() override;
};

// 持有RHI紋理資源引用的渲染資源.
class RENDERCORE_API FTextureReference : public FRenderResource
{
public:
    // 紋理的RHI資源引用.
    FTextureReferenceRHIRef    TextureReferenceRHI;

    // FRenderResource interface.
    virtual void InitRHI();
    virtual void ReleaseRHI();
    
    (......)
};

class RENDERCORE_API FVertexBuffer : public FRenderResource
{
public:
    // 頂點緩沖的RHI資源引用.
    FVertexBufferRHIRef VertexBufferRHI;

    virtual void ReleaseRHI() override;
    
    (......);
};

class RENDERCORE_API FVertexBufferWithSRV : public FVertexBuffer
{
public:
    // 通路整個緩沖區的SRV/UAV.
    FShaderResourceViewRHIRef ShaderResourceViewRHI;
    FUnorderedAccessViewRHIRef UnorderedAccessViewRHI;

    (......)
};

// 索引緩沖.
class FIndexBuffer : public FRenderResource
{
public:
    // 索引緩沖對應的RHI資源.
    FIndexBufferRHIRef IndexBufferRHI;

    (......)
};
           

以上可知,FRenderResource的子類就是對應地将RHI的子類資源封裝起來,以便渲染線程将遊戲線程的資料和操作傳遞到RHI線程(或子產品)中。下面來個UML圖将FRenderResource的部分繼承體系直覺地呈現出來:

classDiagram-v2

FRHIResource <-- FRenderResource

FRenderResource <|-- FTextureReference

FRenderResource <|-- FTexture

FTexture <|-- FTextureWithSRV

FTexture <|-- FTextureResource

FTextureResource <|-- FStaticShadowDepthMap

FTextureResource <|-- FTexture2DDynamicResource

FTextureResource <|-- FTextureRenderTargetResource

FTextureRenderTargetResource <|-- FTextureRenderTarget2DResource

FTextureRenderTargetResource <|-- FTextureRenderTargetCubeResource

FRenderResource <|-- FVertexBuffer

FVertexBuffer <|-- FTangentsVertexBuffer

FVertexBuffer <|-- FVertexBufferWithSRV

FVertexBuffer <|-- FColorVertexBuffer

FVertexBuffer <|-- FPositionVertexBuffer

FVertexBuffer <|-- FSkinWeightDataVertexBuffer

FRenderResource <|-- FIndexBuffer

FIndexBuffer <|-- FDynamicMeshIndexBuffer16

FIndexBuffer <|-- FDynamicMeshIndexBuffer32

FIndexBuffer <|-- FRawIndexBuffer

FIndexBuffer <|-- FRawStaticIndexBuffer

FVertexBufferWithSRV <|-- FWhiteVertexBuffer

FVertexBufferWithSRV <|-- FEmptyVertexBuffer

class FRenderResource{

InitDynamicRHI()

ReleaseDynamicRHI()

InitRHI()

ReleaseRHI()

InitResource()

ReleaseResource()

UpdateRHI()

}

class FTexture{

FTextureRHIRef TextureRHI;

FSamplerStateRHIRef SamplerStateRHI;

class FTextureWithSRV{

FShaderResourceViewRHIRef ShaderResourceViewRHI;

FUnorderedAccessViewRHIRef UnorderedAccessViewRHI;

class FTextureReference{

FTextureReferenceRHIRef TextureReferenceRHI;

class FVertexBuffer{

FVertexBufferRHIRef VertexBufferRHI;

class FVertexBufferWithSRV{

class FIndexBuffer{

FIndexBufferRHIRef IndexBufferRHI;

如果看不清請點選下面的圖檔:

剖析虛幻渲染體系(10)- RHI

再次強調,以上隻是FRenderResource的部分繼承體系,無法完整地繪制出來。可知FRenderResource擁有龐大的子類層級關系,以适應和滿足UE渲染體系在資源方面複雜多變的的需求。

FRHIResource抽象了GPU側的資源,也是衆多RHI資源類型的父類。定義如下:

// Engine\Source\Runtime\RHI\Public\RHIResources.h

class RHI_API FRHIResource
{
public:
    FRHIResource(bool InbDoNotDeferDelete = false);
    virtual ~FRHIResource();
    
    // 資源的引用計數.
    uint32 AddRef() const;
    uint32 Release() const
    {
        int32 NewValue = NumRefs.Decrement();
        if (NewValue == 0)
        {
            if (!DeferDelete())
            { 
                delete this;
            }
            else
            {
                // 加入待删除清單.
                if (FPlatformAtomics::InterlockedCompareExchange(&MarkedForDelete, 1, 0) == 0)
                {
                    PendingDeletes.Push(const_cast<FRHIResource*>(this));
                }
            }
        }
        return uint32(NewValue);
    }
    uint32 GetRefCount() const;
    
    // 靜态接口.
    static void FlushPendingDeletes(bool bFlushDeferredDeletes = false);
    static bool PlatformNeedsExtraDeletionLatency();
    static bool Bypass();

    void DoNoDeferDelete();
    // 瞬時資源追蹤.
    void SetCommitted(bool bInCommitted);
    bool IsCommitted() const;
    bool IsValid() const;

private:
    // 運作時标記和資料.
    mutable FThreadSafeCounter NumRefs;
    mutable int32 MarkedForDelete;
    bool bDoNotDeferDelete;
    bool bCommitted;

    // 待删除的資源.
    static TLockFreePointerListUnordered<FRHIResource, PLATFORM_CACHE_LINE_SIZE> PendingDeletes;
    // 正在删除的資源.
    static FRHIResource* CurrentlyDeleting;

    bool DeferDelete() const;

    // 有些api不做内部引用計數,是以必須在删除資源之前等待額外的幾幀,以確定GPU完全完成它們. 可避免昂貴的栅欄等.
    struct ResourcesToDelete
    {
        TArray<FRHIResource*>    Resources;    // 待删除的資源.
        uint32                    FrameDeleted; // 等待的幀數.
        
        (......)
    };

    // 延遲删除的資源隊列.
    static TArray<ResourcesToDelete> DeferredDeletionQueue;
    static uint32 CurrentFrame;
};
           

以上可知,FRHIResource提供了幾種功能:引用計數、延遲删除及追蹤、運作時資料和标記。它擁有數量衆多的子類,主要有:

// Engine\Source\Runtime\RHI\Public\RHIResources.h

// 狀态塊(State blocks)資源

class FRHISamplerState : public FRHIResource 
{
public:
    virtual bool IsImmutable() const { return false; }
};
class FRHIRasterizerState : public FRHIResource
{
public:
    virtual bool GetInitializer(struct FRasterizerStateInitializerRHI& Init) { return false; }
};
class FRHIDepthStencilState : public FRHIResource
{
public:
    virtual bool GetInitializer(struct FDepthStencilStateInitializerRHI& Init) { return false; }
};
class FRHIBlendState : public FRHIResource
{
public:
    virtual bool GetInitializer(class FBlendStateInitializerRHI& Init) { return false; }
};

// 着色器綁定資源.

typedef TArray<struct FVertexElement,TFixedAllocator<MaxVertexElementCount> > FVertexDeclarationElementList;
class FRHIVertexDeclaration : public FRHIResource
{
public:
    virtual bool GetInitializer(FVertexDeclarationElementList& Init) { return false; }
};

class FRHIBoundShaderState : public FRHIResource {};

// 着色器

class FRHIShader : public FRHIResource
{
public:
    void SetHash(FSHAHash InHash);
    FSHAHash GetHash() const;
    explicit FRHIShader(EShaderFrequency InFrequency);
    inline EShaderFrequency GetFrequency() const;

private:
    FSHAHash Hash;
    EShaderFrequency Frequency;
};

class FRHIGraphicsShader : public FRHIShader
{
public:
    explicit FRHIGraphicsShader(EShaderFrequency InFrequency) : FRHIShader(InFrequency) {}
};

class FRHIVertexShader : public FRHIGraphicsShader
{
public:
    FRHIVertexShader() : FRHIGraphicsShader(SF_Vertex) {}
};

class FRHIHullShader : public FRHIGraphicsShader
{
public:
    FRHIHullShader() : FRHIGraphicsShader(SF_Hull) {}
};

class FRHIDomainShader : public FRHIGraphicsShader
{
public:
    FRHIDomainShader() : FRHIGraphicsShader(SF_Domain) {}
};

class FRHIPixelShader : public FRHIGraphicsShader
{
public:
    FRHIPixelShader() : FRHIGraphicsShader(SF_Pixel) {}
};

class FRHIGeometryShader : public FRHIGraphicsShader
{
public:
    FRHIGeometryShader() : FRHIGraphicsShader(SF_Geometry) {}
};

class RHI_API FRHIComputeShader : public FRHIShader
{
public:
    FRHIComputeShader() : FRHIShader(SF_Compute), Stats(nullptr) {}
    
    inline void SetStats(struct FPipelineStateStats* Ptr) { Stats = Ptr; }
    void UpdateStats();
    
private:
    struct FPipelineStateStats* Stats;
};

// 管線狀态

class FRHIGraphicsPipelineState : public FRHIResource {};
class FRHIComputePipelineState : public FRHIResource {};
class FRHIRayTracingPipelineState : public FRHIResource {};

// 緩沖區.

class FRHIUniformBuffer : public FRHIResource
{
public:
    FRHIUniformBuffer(const FRHIUniformBufferLayout& InLayout);

    FORCEINLINE_DEBUGGABLE uint32 AddRef() const;
    FORCEINLINE_DEBUGGABLE uint32 Release() const;
    uint32 GetSize() const;
    const FRHIUniformBufferLayout& GetLayout() const;
    bool HasStaticSlot() const;

private:
    const FRHIUniformBufferLayout* Layout;
    uint32 LayoutConstantBufferSize;
};

class FRHIIndexBuffer : public FRHIResource
{
public:
    FRHIIndexBuffer(uint32 InStride,uint32 InSize,uint32 InUsage);

    uint32 GetStride() const;
    uint32 GetSize() const;
    uint32 GetUsage() const;

protected:
    FRHIIndexBuffer();

    void Swap(FRHIIndexBuffer& Other);
    void ReleaseUnderlyingResource();

private:
    uint32 Stride;
    uint32 Size;
    uint32 Usage;
};

class FRHIVertexBuffer : public FRHIResource
{
public:
    FRHIVertexBuffer(uint32 InSize,uint32 InUsage)
    uint32 GetSize() const;
    uint32 GetUsage() const;

protected:
    FRHIVertexBuffer();
    void Swap(FRHIVertexBuffer& Other);
    void ReleaseUnderlyingResource();

private:
    uint32 Size;
    // e.g. BUF_UnorderedAccess
    uint32 Usage;
};

class FRHIStructuredBuffer : public FRHIResource
{
public:
    FRHIStructuredBuffer(uint32 InStride,uint32 InSize,uint32 InUsage)

    uint32 GetStride() const;
    uint32 GetSize() const;
    uint32 GetUsage() const;

private:
    uint32 Stride;
    uint32 Size;
    uint32 Usage;
};

// 紋理

class FRHITexture : public FRHIResource
{
public:
    FRHITexture(uint32 InNumMips, uint32 InNumSamples, EPixelFormat InFormat, uint32 InFlags, FLastRenderTimeContainer* InLastRenderTime, const FClearValueBinding& InClearValue);

    // 動态類型轉換接口.
    virtual class FRHITexture2D* GetTexture2D();
    virtual class FRHITexture2DArray* GetTexture2DArray();
    virtual class FRHITexture3D* GetTexture3D();
    virtual class FRHITextureCube* GetTextureCube();
    virtual class FRHITextureReference* GetTextureReference();
    
    virtual FIntVector GetSizeXYZ() const = 0;
    // 擷取平台相關的原生資源指針.
    virtual void* GetNativeResource() const;
    virtual void* GetNativeShaderResourceView() const
    // 擷取平台相關的RHI紋理基類.
    virtual void* GetTextureBaseRHI();

    // 資料接口.
    uint32 GetNumMips() const;
    EPixelFormat GetFormat();
    uint32 GetFlags() const;
    uint32 GetNumSamples() const;
    bool IsMultisampled() const;    
    bool HasClearValue() const;
    FLinearColor GetClearColor() const;
    void GetDepthStencilClearValue(float& OutDepth, uint32& OutStencil) const;
    float GetDepthClearValue() const;
    uint32 GetStencilClearValue() const;
    const FClearValueBinding GetClearBinding() const;
    virtual void GetWriteMaskProperties(void*& OutData, uint32& OutSize);
        
    (......)
        
    // RHI資源資訊.
    FRHIResourceInfo ResourceInfo;

private:
    // 紋理資料.
    FClearValueBinding ClearValue;
    uint32 NumMips;
    uint32 NumSamples;
    EPixelFormat Format;
    uint32 Flags;
    FLastRenderTimeContainer& LastRenderTime;
    FLastRenderTimeContainer DefaultLastRenderTime;    
    FName TextureName;
};

// 2D RHI紋理.
class FRHITexture2D : public FRHITexture
{
public:
    FRHITexture2D(uint32 InSizeX,uint32 InSizeY,uint32 InNumMips,uint32 InNumSamples,EPixelFormat InFormat,uint32 InFlags, const FClearValueBinding& InClearValue);
    
    virtual FRHITexture2D* GetTexture2D() { return this; }

    uint32 GetSizeX() const { return SizeX; }
    uint32 GetSizeY() const { return SizeY; }
    inline FIntPoint GetSizeXY() const;
    virtual FIntVector GetSizeXYZ() const override;

private:
    uint32 SizeX;
    uint32 SizeY;
};

// 2D RHI紋理數組.
class FRHITexture2DArray : public FRHITexture2D
{
public:
    FRHITexture2DArray(uint32 InSizeX,uint32 InSizeY,uint32 InSizeZ,uint32 InNumMips,uint32 NumSamples, EPixelFormat InFormat,uint32 InFlags, const FClearValueBinding& InClearValue);
    
    virtual FRHITexture2DArray* GetTexture2DArray() { return this; }
    virtual FRHITexture2D* GetTexture2D() { return NULL; }

    uint32 GetSizeZ() const { return SizeZ; }
    virtual FIntVector GetSizeXYZ() const final override;

private:
    uint32 SizeZ;
};

// 2D RHI紋理.
class FRHITexture3D : public FRHITexture
{
public:
    FRHITexture3D(uint32 InSizeX,uint32 InSizeY,uint32 InSizeZ,uint32 InNumMips,EPixelFormat InFormat,uint32 InFlags, const FClearValueBinding& InClearValue);
    
    virtual FRHITexture3D* GetTexture3D() { return this; }
    uint32 GetSizeX() const { return SizeX; }
    uint32 GetSizeY() const { return SizeY; }
    uint32 GetSizeZ() const { return SizeZ; }
    virtual FIntVector GetSizeXYZ() const final override;

private:
    uint32 SizeX;
    uint32 SizeY;
    uint32 SizeZ;
};

// 立方體RHI紋理.
class FRHITextureCube : public FRHITexture
{
public:
    FRHITextureCube(uint32 InSize,uint32 InNumMips,EPixelFormat InFormat,uint32 InFlags, const FClearValueBinding& InClearValue);
    
    virtual FRHITextureCube* GetTextureCube();
    uint32 GetSize() const;
    virtual FIntVector GetSizeXYZ() const final override;

private:
    uint32 Size;
};

// 紋理引用.
class FRHITextureReference : public FRHITexture
{
public:
    explicit FRHITextureReference(FLastRenderTimeContainer* InLastRenderTime);

    virtual FRHITextureReference* GetTextureReference() override { return this; }
    inline FRHITexture* GetReferencedTexture() const;
    // 設定引用的紋理
    void SetReferencedTexture(FRHITexture* InTexture);
    virtual FIntVector GetSizeXYZ() const final override;

private:
    // 被引用的紋理資源.
    TRefCountPtr<FRHITexture> ReferencedTexture;
};

class FRHITextureReferenceNullImpl : public FRHITextureReference
{
public:
    FRHITextureReferenceNullImpl();

    void SetReferencedTexture(FRHITexture* InTexture)
    {
        FRHITextureReference::SetReferencedTexture(InTexture);
    }
};

// 雜項資源.

// 時間戳校準查詢.
class FRHITimestampCalibrationQuery : public FRHIResource
{
public:
    uint64 GPUMicroseconds = 0;
    uint64 CPUMicroseconds = 0;
};

// GPU栅欄類. 粒度因RHI而異,即它可能隻表示指令緩沖區粒度. RHI的特殊圍欄由此派生而來,實作了真正的GPU->CPU栅欄.
// 預設實作總是為輪詢(Poll)傳回false,直到插入栅欄的下一幀,因為不是所有api都有GPU/CPU同步對象,需要僞造它。
class FRHIGPUFence : public FRHIResource
{
public:
    FRHIGPUFence(FName InName) : FenceName(InName) {}
    virtual ~FRHIGPUFence() {}

    virtual void Clear() = 0;
    // 輪詢圍欄,看看GPU是否已經發出信号. 如果是, 則傳回true.
    virtual bool Poll() const = 0;
    // 輪詢GPU的子集.
    virtual bool Poll(FRHIGPUMask GPUMask) const { return Poll(); }
    // 等待寫入指令的數量.
    FThreadSafeCounter NumPendingWriteCommands;

protected:
    FName FenceName;
};

// 通用的FRHIGPUFence實作.
class RHI_API FGenericRHIGPUFence : public FRHIGPUFence
{
public:
    FGenericRHIGPUFence(FName InName);

    virtual void Clear() final override;
    virtual bool Poll() const final override;
    void WriteInternal();

private:
    uint32 InsertedFrameNumber;
};

// 渲染查詢.
class FRHIRenderQuery : public FRHIResource 
{
};

// 池化的渲染查詢.
class RHI_API FRHIPooledRenderQuery
{
    TRefCountPtr<FRHIRenderQuery> Query;
    FRHIRenderQueryPool* QueryPool = nullptr;

public:
    bool IsValid() const;
    FRHIRenderQuery* GetQuery() const;
    void ReleaseQuery();
    
    (.....)
};

// 渲染查詢池.
class FRHIRenderQueryPool : public FRHIResource
{
public:
    virtual ~FRHIRenderQueryPool() {};
    virtual FRHIPooledRenderQuery AllocateQuery() = 0;

private:
    friend class FRHIPooledRenderQuery;
    virtual void ReleaseQuery(TRefCountPtr<FRHIRenderQuery>&& Query) = 0;
};

// 計算栅欄.
class FRHIComputeFence : public FRHIResource
{
public:
    FRHIComputeFence(FName InName);

    FORCEINLINE bool GetWriteEnqueued() const;
    virtual void Reset();
    virtual void WriteFence();

private:
    // 自建立以來,标記标簽是否被寫入. 在指令建立時,當隊列等待捕獲CPU上的GPU挂起時,檢查這個标記.
    bool bWriteEnqueued;
};

// 視口.
class FRHIViewport : public FRHIResource 
{
public:
    // 擷取平台相關的原生交換鍊.
    virtual void* GetNativeSwapChain() const { return nullptr; }
    // 擷取原生的BackBuffer紋理.
    virtual void* GetNativeBackBufferTexture() const { return nullptr; }
    // 擷取原生的BackBuffer渲染紋理.
    virtual void* GetNativeBackBufferRT() const { return nullptr; }
    // 擷取原生的視窗.
    virtual void* GetNativeWindow(void** AddParam = nullptr) const { return nullptr; }

    // 在視口上設定FRHICustomPresent的handler.
    virtual void SetCustomPresent(class FRHICustomPresent*) {}
    virtual class FRHICustomPresent* GetCustomPresent() const { return nullptr; }

    // 在遊戲線程幀更新視口.
    virtual void Tick(float DeltaTime) {}
};

// 視圖: UAV/SRV

class FRHIUnorderedAccessView : public FRHIResource {};
class FRHIShaderResourceView : public FRHIResource {};

// 各種RHI資源引用類型定義.
typedef TRefCountPtr<FRHISamplerState> FSamplerStateRHIRef;
typedef TRefCountPtr<FRHIRasterizerState> FRasterizerStateRHIRef;
typedef TRefCountPtr<FRHIDepthStencilState> FDepthStencilStateRHIRef;
typedef TRefCountPtr<FRHIBlendState> FBlendStateRHIRef;
typedef TRefCountPtr<FRHIVertexDeclaration> FVertexDeclarationRHIRef;
typedef TRefCountPtr<FRHIVertexShader> FVertexShaderRHIRef;
typedef TRefCountPtr<FRHIHullShader> FHullShaderRHIRef;
typedef TRefCountPtr<FRHIDomainShader> FDomainShaderRHIRef;
typedef TRefCountPtr<FRHIPixelShader> FPixelShaderRHIRef;
typedef TRefCountPtr<FRHIGeometryShader> FGeometryShaderRHIRef;
typedef TRefCountPtr<FRHIComputeShader> FComputeShaderRHIRef;
typedef TRefCountPtr<FRHIRayTracingShader> FRayTracingShaderRHIRef;
typedef TRefCountPtr<FRHIComputeFence>    FComputeFenceRHIRef;
typedef TRefCountPtr<FRHIBoundShaderState> FBoundShaderStateRHIRef;
typedef TRefCountPtr<FRHIUniformBuffer> FUniformBufferRHIRef;
typedef TRefCountPtr<FRHIIndexBuffer> FIndexBufferRHIRef;
typedef TRefCountPtr<FRHIVertexBuffer> FVertexBufferRHIRef;
typedef TRefCountPtr<FRHIStructuredBuffer> FStructuredBufferRHIRef;
typedef TRefCountPtr<FRHITexture> FTextureRHIRef;
typedef TRefCountPtr<FRHITexture2D> FTexture2DRHIRef;
typedef TRefCountPtr<FRHITexture2DArray> FTexture2DArrayRHIRef;
typedef TRefCountPtr<FRHITexture3D> FTexture3DRHIRef;
typedef TRefCountPtr<FRHITextureCube> FTextureCubeRHIRef;
typedef TRefCountPtr<FRHITextureReference> FTextureReferenceRHIRef;
typedef TRefCountPtr<FRHIRenderQuery> FRenderQueryRHIRef;
typedef TRefCountPtr<FRHIRenderQueryPool> FRenderQueryPoolRHIRef;
typedef TRefCountPtr<FRHITimestampCalibrationQuery> FTimestampCalibrationQueryRHIRef;
typedef TRefCountPtr<FRHIGPUFence>    FGPUFenceRHIRef;
typedef TRefCountPtr<FRHIViewport> FViewportRHIRef;
typedef TRefCountPtr<FRHIUnorderedAccessView> FUnorderedAccessViewRHIRef;
typedef TRefCountPtr<FRHIShaderResourceView> FShaderResourceViewRHIRef;
typedef TRefCountPtr<FRHIGraphicsPipelineState> FGraphicsPipelineStateRHIRef;
typedef TRefCountPtr<FRHIRayTracingPipelineState> FRayTracingPipelineStateRHIRef;


// FRHIGPUMemoryReadback使用的通用分段緩沖類.
class FRHIStagingBuffer : public FRHIResource
{
public:
    FRHIStagingBuffer();
    virtual ~FRHIStagingBuffer();
    virtual void *Lock(uint32 Offset, uint32 NumBytes) = 0;
    virtual void Unlock() = 0;
protected:
    bool bIsLocked;
};

class FGenericRHIStagingBuffer : public FRHIStagingBuffer
{
public:
    FGenericRHIStagingBuffer();
    ~FGenericRHIStagingBuffer();
    virtual void* Lock(uint32 Offset, uint32 NumBytes) final override;
    virtual void Unlock() final override;
    
    FVertexBufferRHIRef ShadowBuffer;
    uint32 Offset;
};

// 自定義呈現.
class FRHICustomPresent : public FRHIResource
{
public:
    FRHICustomPresent() {}
    virtual ~FRHICustomPresent() {}
    
    // 視口尺寸改變時的調用.
    virtual void OnBackBufferResize() = 0;
    // 從渲染線程中調用,以檢視是否會請求一個原生呈現。
    virtual bool NeedsNativePresent() = 0;
    // RHI線程調用, 執行自定義呈現.
    virtual bool Present(int32& InOutSyncInterval) = 0;
    // RHI線程調用, 在Present之後調用.
    virtual void PostPresent() {};

    // 當渲染線程被捕獲時調用.
    virtual void OnAcquireThreadOwnership() {}
    // 當渲染線程被釋放時調用.
    virtual void OnReleaseThreadOwnership() {}
};
           

以上可知,FRHIResource的種類和子類都非常多,可分為狀态塊、着色器綁定、着色器、管線狀态、緩沖區、紋理、視圖以及其它雜項。需要注意的是,以上隻是顯示了平台無關的基礎類型,實際上,在不同的圖形API中,會繼承上面的類型。以FRHIUniformBuffer為例,它的繼承體系如下:

FRHIResource <|-- FRHIUniformBuffer

FRHIUniformBuffer <|-- FD3D11UniformBuffer

FRHIUniformBuffer <|-- FD3D12UniformBuffer

FRHIUniformBuffer <|-- FOpenGLUniformBuffer

FRHIUniformBuffer <|-- FVulkanUniformBuffer

FRHIUniformBuffer <|-- FMetalSuballocatedUniformBuffer

FRHIUniformBuffer <|-- FEmptyUniformBuffer

以上顯示出FRHIUniformBuffer在D3D11、D3D12、OpenGL、Vulkan、Metal等圖形API的子類,以便實作統一緩沖區的平台相關的資源和操作接口,還有一個特殊的空實作FEmptyUniformBuffer。

與FRHIUniformBuffer類似的是,FRHIResource的其它直接或間接子類也需要被具體的圖形API或作業系統子類實作,以支援在該平台的渲染。下面繪制出最複雜的紋理資源類繼承體系UML圖:

FRHIResource <|-- FRHITexture

FRHITexture <|-- FRHITexture2D

FRHITexture2D <|-- FRHITexture2DArray

FRHITexture <|-- FRHITexture3D

FRHITexture <|-- FRHITextureCube

FRHITexture <|-- FRHITextureReference

FRHITextureReference <|-- FRHITextureReferenceNullImpl

FRHITexture2D <|-- FMetalTexture2D

FRHITexture2D <|-- FD3D12BaseTexture2D

FRHITexture2D <|-- FOpenGLBaseTexture2D

FRHITexture2D <|-- FVulkanTexture2D

FRHITexture2D <|-- FD3D11BaseTexture2D

FRHITexture2D <|-- FEmptyTexture2D

如果看不清請點選放大下面的圖檔版本:

剖析虛幻渲染體系(10)- RHI

需要注意,上圖做了簡化,除了FRHITexture2D會被各個圖形API繼承子類,其它紋理類型(如FRHITexture2DArray、FRHITexture3D、FRHITextureCube、FRHITextureReference)也會被各個平台繼承并實作。

FRHICommand是RHI子產品的渲染指令基類,這些指令通常由渲染線程通過指令隊列Push到RHI線程,在合适的時機由RHI線程執行。FRHICommand同時又繼承自FRHICommandBase,它們的定義如下:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

// RHI指令基類.
struct FRHICommandBase
{
    // 下一個指令. (指令連結清單的節點)
    FRHICommandBase* Next = nullptr;
    
    // 執行指令後銷毀.
    virtual void ExecuteAndDestruct(FRHICommandListBase& CmdList, FRHICommandListDebugContext& DebugContext) = 0;
};

emplate<typename TCmd, typename NameType = FUnnamedRhiCommand>
struct FRHICommand : public FRHICommandBase
{
    // 執行指令後銷毀.
    void ExecuteAndDestruct(FRHICommandListBase& CmdList, FRHICommandListDebugContext& Context) override final
    {
        TCmd *ThisCmd = static_cast<TCmd*>(this);
        ThisCmd->Execute(CmdList);
        ThisCmd->~TCmd();
    }
};
           

值得一提的是,FRHICommandBase有指向下一個節點的Next變量,意味着FRHICommandBase是指令連結清單的節點。FRHICommand擁有數量衆多的子類,是通過特殊的宏來快速聲明:

// 定義RHI指令子類的宏
#define FRHICOMMAND_MACRO(CommandName)                                \
struct PREPROCESSOR_JOIN(CommandName##String, __LINE__)                \
{                                                                    \
    static const TCHAR* TStr() { return TEXT(#CommandName); }        \
};                                                                    \
// 指令繼承了FRHICommand.
struct CommandName final : public FRHICommand<CommandName, PREPROCESSOR_JOIN(CommandName##String, __LINE__)>
           

有了以上的宏,就可以快速定義FRHICommand的子類(亦即具體的RHI指令),例如:

FRHICOMMAND_MACRO(FRHICommandSetStencilRef)
{
    uint32 StencilRef;
    FORCEINLINE_DEBUGGABLE FRHICommandSetStencilRef(uint32 InStencilRef)
        : StencilRef(InStencilRef)
    {
    }
    RHI_API void Execute(FRHICommandListBase& CmdList);
};
           

展開宏定義之後,代碼如下:

struct FRHICommandSetStencilRefString853
{
    static const TCHAR* TStr() { return TEXT("FRHICommandSetStencilRef"); }
};

// FRHICommandSetStencilRef繼承了FRHICommand.
struct FRHICommandSetStencilRef final : public FRHICommand<FRHICommandSetStencilRef, FRHICommandSetStencilRefString853>
{
    uint32 StencilRef;
    FRHICommandSetStencilRef(uint32 InStencilRef)
        : StencilRef(InStencilRef)
    {
    }
    RHI_API void Execute(FRHICommandListBase& CmdList);
};
           

利用FRHICOMMAND_MACRO聲明的RHI指令數量衆多,下面列舉其中一部分:

FRHICOMMAND_MACRO(FRHISyncFrameCommand)
FRHICOMMAND_MACRO(FRHICommandStat)
FRHICOMMAND_MACRO(FRHICommandRHIThreadFence)
FRHICOMMAND_MACRO(FRHIAsyncComputeSubmitList)
FRHICOMMAND_MACRO(FRHICommandSubmitSubList)

FRHICOMMAND_MACRO(FRHICommandWaitForAndSubmitSubListParallel)
FRHICOMMAND_MACRO(FRHICommandWaitForAndSubmitSubList)
FRHICOMMAND_MACRO(FRHICommandWaitForAndSubmitRTSubList)
FRHICOMMAND_MACRO(FRHICommandWaitForTemporalEffect)
FRHICOMMAND_MACRO(FRHICommandWaitForTemporalEffect)
FRHICOMMAND_MACRO(FRHICommandBroadcastTemporalEffect)
    
FRHICOMMAND_MACRO(FRHICommandBeginUpdateMultiFrameResource)
FRHICOMMAND_MACRO(FRHICommandEndUpdateMultiFrameResource)
FRHICOMMAND_MACRO(FRHICommandBeginUpdateMultiFrameUAV)
FRHICOMMAND_MACRO(FRHICommandEndUpdateMultiFrameUAV)
FRHICOMMAND_MACRO(FRHICommandSetGPUMask)

FRHICOMMAND_MACRO(FRHICommandSetStencilRef)
FRHICOMMAND_MACRO(FRHICommandSetBlendFactor)
FRHICOMMAND_MACRO(FRHICommandSetStreamSource)
FRHICOMMAND_MACRO(FRHICommandSetStreamSource)
FRHICOMMAND_MACRO(FRHICommandSetViewport)
FRHICOMMAND_MACRO(FRHICommandSetScissorRect)
    
FRHICOMMAND_MACRO(FRHICommandBeginRenderPass)
FRHICOMMAND_MACRO(FRHICommandEndRenderPass)
FRHICOMMAND_MACRO(FRHICommandNextSubpass)
FRHICOMMAND_MACRO(FRHICommandBeginParallelRenderPass)
FRHICOMMAND_MACRO(FRHICommandEndParallelRenderPass)
FRHICOMMAND_MACRO(FRHICommandBeginRenderSubPass)
FRHICOMMAND_MACRO(FRHICommandEndRenderSubPass)
    
FRHICOMMAND_MACRO(FRHICommandDrawPrimitive)
FRHICOMMAND_MACRO(FRHICommandDrawIndexedPrimitive)
FRHICOMMAND_MACRO(FRHICommandDrawPrimitiveIndirect)
FRHICOMMAND_MACRO(FRHICommandDrawIndexedIndirect)
FRHICOMMAND_MACRO(FRHICommandDrawIndexedPrimitiveIndirect)
    
FRHICOMMAND_MACRO(FRHICommandSetGraphicsPipelineState)
FRHICOMMAND_MACRO(FRHICommandBeginUAVOverlap)
FRHICOMMAND_MACRO(FRHICommandEndUAVOverlap)

FRHICOMMAND_MACRO(FRHICommandSetDepthBounds)
FRHICOMMAND_MACRO(FRHICommandSetShadingRate)
FRHICOMMAND_MACRO(FRHICommandSetShadingRateImage)
FRHICOMMAND_MACRO(FRHICommandClearUAVFloat)
FRHICOMMAND_MACRO(FRHICommandCopyToResolveTarget)
FRHICOMMAND_MACRO(FRHICommandCopyTexture)
FRHICOMMAND_MACRO(FRHICommandBeginTransitions)
FRHICOMMAND_MACRO(FRHICommandEndTransitions)
FRHICOMMAND_MACRO(FRHICommandResourceTransition)
FRHICOMMAND_MACRO(FRHICommandClearColorTexture)
FRHICOMMAND_MACRO(FRHICommandClearDepthStencilTexture)
FRHICOMMAND_MACRO(FRHICommandClearColorTextures)

FRHICOMMAND_MACRO(FRHICommandSetGlobalUniformBuffers)
FRHICOMMAND_MACRO(FRHICommandBuildLocalUniformBuffer)

FRHICOMMAND_MACRO(FRHICommandBeginRenderQuery)
FRHICOMMAND_MACRO(FRHICommandEndRenderQuery)
FRHICOMMAND_MACRO(FRHICommandPollOcclusionQueries)

FRHICOMMAND_MACRO(FRHICommandBeginScene)
FRHICOMMAND_MACRO(FRHICommandEndScene)
FRHICOMMAND_MACRO(FRHICommandBeginFrame)
FRHICOMMAND_MACRO(FRHICommandEndFrame)
FRHICOMMAND_MACRO(FRHICommandBeginDrawingViewport)
FRHICOMMAND_MACRO(FRHICommandEndDrawingViewport)

FRHICOMMAND_MACRO(FRHICommandInvalidateCachedState)
FRHICOMMAND_MACRO(FRHICommandDiscardRenderTargets)

FRHICOMMAND_MACRO(FRHICommandUpdateTextureReference)
FRHICOMMAND_MACRO(FRHICommandUpdateRHIResources)
FRHICOMMAND_MACRO(FRHICommandBackBufferWaitTrackingBeginFrame)
FRHICOMMAND_MACRO(FRHICommandFlushTextureCacheBOP)
FRHICOMMAND_MACRO(FRHICommandCopyBufferRegion)
FRHICOMMAND_MACRO(FRHICommandCopyBufferRegions)

FRHICOMMAND_MACRO(FClearCachedRenderingDataCommand)
FRHICOMMAND_MACRO(FClearCachedElementDataCommand)

FRHICOMMAND_MACRO(FRHICommandRayTraceOcclusion)
FRHICOMMAND_MACRO(FRHICommandRayTraceIntersection)
FRHICOMMAND_MACRO(FRHICommandRayTraceDispatch)
FRHICOMMAND_MACRO(FRHICommandSetRayTracingBindings)
FRHICOMMAND_MACRO(FRHICommandClearRayTracingBindings)
           

FRHICommand的子類除了以上用FRHICOMMAND_MACRO聲明的,還擁有以下直接派生的:

  • FRHICommandSetShaderParameter
  • FRHICommandSetShaderUniformBuffer
  • FRHICommandSetShaderTexture
  • FRHICommandSetShaderResourceViewParameter
  • FRHICommandSetUAVParameter
  • FRHICommandSetShaderSampler
  • FRHICommandSetComputeShader
  • FRHICommandSetComputePipelineState
  • FRHICommandDispatchComputeShader
  • FRHICommandDispatchIndirectComputeShader
  • FRHICommandSetAsyncComputeBudget
  • FRHICommandCopyToStagingBuffer
  • FRHICommandWriteGPUFence
  • FRHICommandSetLocalUniformBuffer
  • FRHICommandSubmitCommandsHint
  • FRHICommandPushEvent
  • FRHICommandPopEvent
  • FRHICommandBuildAccelerationStructure
  • FRHICommandBuildAccelerationStructures
  • ......

無論是直接派生還是用FRHICOMMAND_MACRO,沒有本質的差別,都是FRHICommand的子類,都是可以提供給渲染線程操作的RHI層中間渲染指令。隻是用FRHICOMMAND_MACRO會更簡便,少寫一些重複的代碼罷了。

是以可知,RHI指令種類繁多,主要包含以下幾大類:

  • 資料和資源的設定、更新、清理、轉換、拷貝、回讀。
  • 圖元繪制。
  • Pass、SubPass、場景、ViewPort等的開始和結束事件。
  • 栅欄、等待、廣播接口。
  • 光線追蹤。
  • Slate、調試相關的指令。

下面繪制出FRHICommand的核心繼承體系:

FRHICommandBase <|-- FRHICommand

class FRHICommandBase{

FRHICommandBase* Next

ExecuteAndDestruct()

FRHICommand <|-- FRHICommandDrawPrimitive

FRHICommand <|-- FRHICommandWaitForAndSubmitSubList

FRHICommand <|-- FRHICommandResourceTransition

FRHICommand <|-- etc

FRHICommandList是RHI的指令隊列,用來管理、執行一組FRHICommand的對象。它和父類的定義如下:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

// RHI指令清單基類.
class FRHICommandListBase : public FNoncopyable
{
public:
    ~FRHICommandListBase();

    // 附帶了循環利用的自定義new/delete操作.
    void* operator new(size_t Size);
    void operator delete(void *RawMemory);

    // 重新整理指令隊列.
    inline void Flush();
    // 是否立即模式.
    inline bool IsImmediate();
    // 是否立即的異步計算.
    inline bool IsImmediateAsyncCompute();

    // 擷取已占用的記憶體.
    const int32 GetUsedMemory() const;
    
    // 入隊異步指令隊列的送出.
    void QueueAsyncCommandListSubmit(FGraphEventRef& AnyThreadCompletionEvent, class FRHICommandList* CmdList);
    // 入隊并行的異步指令隊列的送出.
    void QueueParallelAsyncCommandListSubmit(FGraphEventRef* AnyThreadCompletionEvents, bool bIsPrepass, class FRHICommandList** CmdLists, int32* NumDrawsIfKnown, int32 Num, int32 MinDrawsPerTranslate, bool bSpewMerge);
    // 入隊渲染線程指令隊列的送出.
    void QueueRenderThreadCommandListSubmit(FGraphEventRef& RenderThreadCompletionEvent, class FRHICommandList* CmdList);
    // 入隊指令隊列的送出.
    void QueueCommandListSubmit(class FRHICommandList* CmdList);
    // 增加派發前序任務.
    void AddDispatchPrerequisite(const FGraphEventRef& Prereq);
    
    // 等待接口.
    void WaitForTasks(bool bKnownToBeComplete = false);
    void WaitForDispatch();
    void WaitForRHIThreadTasks();
    void HandleRTThreadTaskCompletion(const FGraphEventRef& MyCompletionGraphEvent);

    // 配置設定接口.
    void* Alloc(int32 AllocSize, int32 Alignment);
    template <typename T>
    void* Alloc();
    template <typename T>
    const TArrayView<T> AllocArray(const TArrayView<T> InArray);
    TCHAR* AllocString(const TCHAR* Name);
    // 配置設定指令.
    void* AllocCommand(int32 AllocSize, int32 Alignment);
    template <typename TCmd>
    void* AllocCommand();

    bool HasCommands() const;
    bool IsExecuting() const;
    bool IsBottomOfPipe() const;
    bool IsTopOfPipe() const;
    bool IsGraphics() const;
    bool IsAsyncCompute() const;
    // RHI管線, ERHIPipeline::Graphics或ERHIPipeline::AsyncCompute.
    ERHIPipeline GetPipeline() const;

    // 是否忽略RHI線程而直接當同步執行.
    bool Bypass() const;

    // 交換指令隊列.
    void ExchangeCmdList(FRHICommandListBase& Other);
    // 設定Context.
    void SetContext(IRHICommandContext* InContext);
    IRHICommandContext& GetContext();
    void SetComputeContext(IRHIComputeContext* InComputeContext);
    IRHIComputeContext& GetComputeContext();
    void CopyContext(FRHICommandListBase& ParentCommandList);
    
    void MaybeDispatchToRHIThread();
    void MaybeDispatchToRHIThreadInner();
    
    (......)

private:
    // 指令連結清單的頭.
    FRHICommandBase* Root;
    // 指向Root的指針.
    FRHICommandBase** CommandLink;
    
    bool bExecuting;
    uint32 NumCommands;
    uint32 UID;
    
    // 裝置上下文.
    IRHICommandContext* Context;
    // 計算上下文.
    IRHIComputeContext* ComputeContext;
    
    FMemStackBase MemManager; 
    FGraphEventArray RTTasks;

    // 重置.
    void Reset();

public:
    enum class ERenderThreadContext
    {
        SceneRenderTargets,
        Num
    };
    
    // 渲染線程上下文.
    void *RenderThreadContexts[(int32)ERenderThreadContext::Num];

protected:
    //the values of this struct must be copied when the commandlist is split 
    struct FPSOContext
    {
        uint32 CachedNumSimultanousRenderTargets = 0;
        TStaticArray<FRHIRenderTargetView, MaxSimultaneousRenderTargets> CachedRenderTargets;
        FRHIDepthRenderTargetView CachedDepthStencilTarget;
        
        ESubpassHint SubpassHint = ESubpassHint::None;
        uint8 SubpassIndex = 0;
        uint8 MultiViewCount = 0;
        bool HasFragmentDensityAttachment = false;
    } PSOContext;

    // 綁定的着色器輸入.
    FBoundShaderStateInput BoundShaderInput;
    // 綁定的計算着色器RHI資源.
    FRHIComputeShader* BoundComputeShaderRHI;

    // 使綁定的着色器生效.
    void ValidateBoundShader(FRHIVertexShader* ShaderRHI);
    void ValidateBoundShader(FRHIPixelShader* ShaderRHI);
    (......)

    void CacheActiveRenderTargets(...);
    void CacheActiveRenderTargets(const FRHIRenderPassInfo& Info);
    void IncrementSubpass();
    void ResetSubpass(ESubpassHint SubpassHint);
    
public:
    void CopyRenderThreadContexts(const FRHICommandListBase& ParentCommandList);
    void SetRenderThreadContext(void* InContext, ERenderThreadContext Slot);
    void* GetRenderThreadContext(ERenderThreadContext Slot);

    // 通用資料.
    struct FCommonData
    {
        class FRHICommandListBase* Parent = nullptr;

        enum class ECmdListType
        {
            Immediate = 1,
            Regular,
        };
        ECmdListType Type = ECmdListType::Regular;
        bool bInsideRenderPass = false;
        bool bInsideComputePass = false;
    };

    bool DoValidation() const;
    inline bool IsOutsideRenderPass() const;
    inline bool IsInsideRenderPass() const;
    inline bool IsInsideComputePass() const;

    FCommonData Data;
};

// 計算指令隊列.
class FRHIComputeCommandList : public FRHICommandListBase
{
public:
    FRHIComputeCommandList(FRHIGPUMask GPUMask) : FRHICommandListBase(GPUMask) {}
    
    void* operator new(size_t Size);
    void operator delete(void *RawMemory);

    // 着色器參數設定和擷取.
    inline FRHIComputeShader* GetBoundComputeShader() const;
    void SetGlobalUniformBuffers(const FUniformBufferStaticBindings& UniformBuffers);
    void SetShaderUniformBuffer(FRHIComputeShader* Shader, uint32 BaseIndex, FRHIUniformBuffer* UniformBuffer);
    void SetShaderUniformBuffer(const FComputeShaderRHIRef& Shader, uint32 BaseIndex, FRHIUniformBuffer* UniformBuffer);
    void SetShaderParameter(FRHIComputeShader* Shader, uint32 BufferIndex, uint32 BaseIndex, uint32 NumBytes, const void* NewValue);
    void SetShaderParameter(FComputeShaderRHIRef& Shader, uint32 BufferIndex, uint32 BaseIndex, uint32 NumBytes, const void* NewValue);
    void SetShaderTexture(FRHIComputeShader* Shader, uint32 TextureIndex, FRHITexture* Texture);
    void SetShaderResourceViewParameter(FRHIComputeShader* Shader, uint32 SamplerIndex, FRHIShaderResourceView* SRV);
    void SetShaderSampler(FRHIComputeShader* Shader, uint32 SamplerIndex, FRHISamplerState* State);
    void SetUAVParameter(FRHIComputeShader* Shader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV);
    void SetUAVParameter(FRHIComputeShader* Shader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV, uint32 InitialCount);
    void SetComputeShader(FRHIComputeShader* ComputeShader);
    void SetComputePipelineState(FComputePipelineState* ComputePipelineState, FRHIComputeShader* ComputeShader);

    void SetAsyncComputeBudget(EAsyncComputeBudget Budget);
    // 派發計算着色器.
    void DispatchComputeShader(uint32 ThreadGroupCountX, uint32 ThreadGroupCountY, uint32 ThreadGroupCountZ);
    void DispatchIndirectComputeShader(FRHIVertexBuffer* ArgumentBuffer, uint32 ArgumentOffset);

    // 清理.
    void ClearUAVFloat(FRHIUnorderedAccessView* UnorderedAccessViewRHI, const FVector4& Values);
    void ClearUAVUint(FRHIUnorderedAccessView* UnorderedAccessViewRHI, const FUintVector4& Values);
    
    // 資源轉換.
    void BeginTransitions(TArrayView<const FRHITransition*> Transitions);
    void EndTransitions(TArrayView<const FRHITransition*> Transitions);
    inline void Transition(TArrayView<const FRHITransitionInfo> Infos);
    void BeginTransition(const FRHITransition* Transition);
    void EndTransition(const FRHITransition* Transition);
    void Transition(const FRHITransitionInfo& Info)

    // ---- 舊有的API ----

    void TransitionResource(ERHIAccess TransitionType, const FTextureRHIRef& InTexture);
    void TransitionResource(ERHIAccess TransitionType, FRHITexture* InTexture);
    inline void TransitionResources(ERHIAccess TransitionType, FRHITexture* const* InTextures, int32 NumTextures);
    void TransitionResourceArrayNoCopy(ERHIAccess TransitionType, TArray<FRHITexture*>& InTextures);
    inline void TransitionResources(ERHIAccess TransitionType, EResourceTransitionPipeline /* ignored TransitionPipeline */, FRHIUnorderedAccessView* const* InUAVs, int32 NumUAVs, FRHIComputeFence* WriteFence);
    void TransitionResource(ERHIAccess TransitionType, EResourceTransitionPipeline TransitionPipeline, FRHIUnorderedAccessView* InUAV, FRHIComputeFence* WriteFence);
    void TransitionResource(ERHIAccess TransitionType, EResourceTransitionPipeline TransitionPipeline, FRHIUnorderedAccessView* InUAV);
    void TransitionResources(ERHIAccess TransitionType, EResourceTransitionPipeline TransitionPipeline, FRHIUnorderedAccessView* const* InUAVs, int32 NumUAVs);
    void WaitComputeFence(FRHIComputeFence* WaitFence);

    void BeginUAVOverlap();
    void EndUAVOverlap();
    void BeginUAVOverlap(FRHIUnorderedAccessView* UAV);
    void EndUAVOverlap(FRHIUnorderedAccessView* UAV);
    void BeginUAVOverlap(TArrayView<FRHIUnorderedAccessView* const> UAVs);
    void EndUAVOverlap(TArrayView<FRHIUnorderedAccessView* const> UAVs);

    void PushEvent(const TCHAR* Name, FColor Color);
    void PopEvent();
    void BreakPoint();

    void SubmitCommandsHint();
    void CopyToStagingBuffer(FRHIVertexBuffer* SourceBuffer, FRHIStagingBuffer* DestinationStagingBuffer, uint32 Offset, uint32 NumBytes);

    void WriteGPUFence(FRHIGPUFence* Fence);
    void SetGPUMask(FRHIGPUMask InGPUMask);

    (......)
};

// RHI指令隊列.
class FRHICommandList : public FRHIComputeCommandList
{
public:
    FRHICommandList(FRHIGPUMask GPUMask) : FRHIComputeCommandList(GPUMask) {}

    bool AsyncPSOCompileAllowed() const;

    void* operator new(size_t Size);
    void operator delete(void *RawMemory);
    
    // 擷取綁定的着色器.
    inline FRHIVertexShader* GetBoundVertexShader() const;
    inline FRHIHullShader* GetBoundHullShader() const;
    inline FRHIDomainShader* GetBoundDomainShader() const;
    inline FRHIPixelShader* GetBoundPixelShader() const;
    inline FRHIGeometryShader* GetBoundGeometryShader() const;

    // 更新多幀資源.
    void BeginUpdateMultiFrameResource(FRHITexture* Texture);
    void EndUpdateMultiFrameResource(FRHITexture* Texture);
    void BeginUpdateMultiFrameResource(FRHIUnorderedAccessView* UAV);
    void EndUpdateMultiFrameResource(FRHIUnorderedAccessView* UAV);

    // Uniform Buffer接口.
    FLocalUniformBuffer BuildLocalUniformBuffer(const void* Contents, uint32 ContentsSize, const FRHIUniformBufferLayout& Layout);
    template <typename TRHIShader>
    void SetLocalShaderUniformBuffer(TRHIShader* Shader, uint32 BaseIndex, const FLocalUniformBuffer& UniformBuffer);
    template <typename TShaderRHI>
    void SetLocalShaderUniformBuffer(const TRefCountPtr<TShaderRHI>& Shader, uint32 BaseIndex, const FLocalUniformBuffer& UniformBuffer);
    void SetShaderUniformBuffer(FRHIGraphicsShader* Shader, uint32 BaseIndex, FRHIUniformBuffer* UniformBuffer);
    template <typename TShaderRHI>
    FORCEINLINE void SetShaderUniformBuffer(const TRefCountPtr<TShaderRHI>& Shader, uint32 BaseIndex, FRHIUniformBuffer* UniformBuffer);
    
    // 着色器參數.
    void SetShaderParameter(FRHIGraphicsShader* Shader, uint32 BufferIndex, uint32 BaseIndex, uint32 NumBytes, const void* NewValue);
    template <typename TShaderRHI>
    void SetShaderParameter(const TRefCountPtr<TShaderRHI>& Shader, uint32 BufferIndex, uint32 BaseIndex, uint32 NumBytes, const void* NewValue);
    void SetShaderTexture(FRHIGraphicsShader* Shader, uint32 TextureIndex, FRHITexture* Texture);
    template <typename TShaderRHI>
    void SetShaderTexture(const TRefCountPtr<TShaderRHI>& Shader, uint32 TextureIndex, FRHITexture* Texture);
    void SetShaderResourceViewParameter(FRHIGraphicsShader* Shader, uint32 SamplerIndex, FRHIShaderResourceView* SRV);
    template <typename TShaderRHI>
    void SetShaderResourceViewParameter(const TRefCountPtr<TShaderRHI>& Shader, uint32 SamplerIndex, FRHIShaderResourceView* SRV);
    void SetShaderSampler(FRHIGraphicsShader* Shader, uint32 SamplerIndex, FRHISamplerState* State);
    template <typename TShaderRHI>
    void SetShaderSampler(const TRefCountPtr<TShaderRHI>& Shader, uint32 SamplerIndex, FRHISamplerState* State);
    void SetUAVParameter(FRHIPixelShader* Shader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV);
    void SetUAVParameter(const TRefCountPtr<FRHIPixelShader>& Shader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV);
    void SetBlendFactor(const FLinearColor& BlendFactor = FLinearColor::White);
    
    // 圖元繪制.
    void DrawPrimitive(uint32 BaseVertexIndex, uint32 NumPrimitives, uint32 NumInstances);
    void DrawIndexedPrimitive(FRHIIndexBuffer* IndexBuffer, int32 BaseVertexIndex, uint32 FirstInstance, uint32 NumVertices, uint32 StartIndex, uint32 NumPrimitives, uint32 NumInstances);
    void DrawPrimitiveIndirect(FRHIVertexBuffer* ArgumentBuffer, uint32 ArgumentOffset);
    void DrawIndexedIndirect(FRHIIndexBuffer* IndexBufferRHI, FRHIStructuredBuffer* ArgumentsBufferRHI, uint32 DrawArgumentsIndex, uint32 NumInstances);
    void DrawIndexedPrimitiveIndirect(FRHIIndexBuffer* IndexBuffer, FRHIVertexBuffer* ArgumentsBuffer, uint32 ArgumentOffset);
    
    // 設定資料.
    void SetStreamSource(uint32 StreamIndex, FRHIVertexBuffer* VertexBuffer, uint32 Offset);
    void SetStencilRef(uint32 StencilRef);
    void SetViewport(float MinX, float MinY, float MinZ, float MaxX, float MaxY, float MaxZ);
    void SetStereoViewport(float LeftMinX, float RightMinX, float LeftMinY, float RightMinY, float MinZ, float LeftMaxX, float RightMaxX, float LeftMaxY, float RightMaxY, float MaxZ);
    void SetScissorRect(bool bEnable, uint32 MinX, uint32 MinY, uint32 MaxX, uint32 MaxY);
    void ApplyCachedRenderTargets(FGraphicsPipelineStateInitializer& GraphicsPSOInit);
    void SetGraphicsPipelineState(class FGraphicsPipelineState* GraphicsPipelineState, const FBoundShaderStateInput& ShaderInput, bool bApplyAdditionalState);
    void SetDepthBounds(float MinDepth, float MaxDepth);
    void SetShadingRate(EVRSShadingRate ShadingRate, EVRSRateCombiner Combiner);
    void SetShadingRateImage(FRHITexture* RateImageTexture, EVRSRateCombiner Combiner);
    
    // 拷貝紋理.
    void CopyToResolveTarget(FRHITexture* SourceTextureRHI, FRHITexture* DestTextureRHI, const FResolveParams& ResolveParams);
    void CopyTexture(FRHITexture* SourceTextureRHI, FRHITexture* DestTextureRHI, const FRHICopyTextureInfo& CopyInfo);
    
    void ResummarizeHTile(FRHITexture2D* DepthTexture);
    
    // 渲染查詢.
    void BeginRenderQuery(FRHIRenderQuery* RenderQuery)
    void EndRenderQuery(FRHIRenderQuery* RenderQuery)
    void CalibrateTimers(FRHITimestampCalibrationQuery* CalibrationQuery);
    void PollOcclusionQueries()

    /* LEGACY API */
    void TransitionResource(FExclusiveDepthStencil DepthStencilMode, FRHITexture* DepthTexture);
    void BeginRenderPass(const FRHIRenderPassInfo& InInfo, const TCHAR* Name);
    void EndRenderPass();
    void NextSubpass();

    // 下面接口需要在立即模式的指令隊列調用.
    void BeginScene();
    void EndScene();
    void BeginDrawingViewport(FRHIViewport* Viewport, FRHITexture* RenderTargetRHI);
    void EndDrawingViewport(FRHIViewport* Viewport, bool bPresent, bool bLockToVsync);
    void BeginFrame();
    void EndFrame();

    void RHIInvalidateCachedState();
    void DiscardRenderTargets(bool Depth, bool Stencil, uint32 ColorBitMask);
    
    void CopyBufferRegion(FRHIVertexBuffer* DestBuffer, uint64 DstOffset, FRHIVertexBuffer* SourceBuffer, uint64 SrcOffset, uint64 NumBytes);

    (......)
};
           

FRHICommandListBase定義了指令隊列所需的基本資料(指令清單、裝置上下文)和接口(指令的重新整理、等待、入隊、派發等,記憶體配置設定)。FRHIComputeCommandList定義了計算着色器相關的接口、GPU資源狀态轉換和着色器部分參數的設定。FRHICommandList定義了普通渲染管線的接口,包含VS、PS、GS的綁定,圖元繪制,更多着色器參數的設定和資源狀态轉換,資源建立、更新和等待等等。

FRHICommandList還有數個子類,定義如下:

// 立即模式的指令隊列.
class FRHICommandListImmediate : public FRHICommandList
{
    // 指令匿名函數.
    template <typename LAMBDA>
    struct TRHILambdaCommand final : public FRHICommandBase
    {
        LAMBDA Lambda;

        void ExecuteAndDestruct(FRHICommandListBase& CmdList, FRHICommandListDebugContext&) override final;
    };

    FRHICommandListImmediate();
    ~FRHICommandListImmediate();
    
public:
    // 立即重新整理指令.
    void ImmediateFlush(EImmediateFlushType::Type FlushType);
    // 阻塞RHI線程.
    bool StallRHIThread();
    // 取消阻塞RHI線程.
    void UnStallRHIThread();
    // 是否阻塞中.
    static bool IsStalled();

    void SetCurrentStat(TStatId Stat);

    static FGraphEventRef RenderThreadTaskFence();
    static FGraphEventArray& GetRenderThreadTaskArray();
    static void WaitOnRenderThreadTaskFence(FGraphEventRef& Fence);
    static bool AnyRenderThreadTasksOutstanding();
    FGraphEventRef RHIThreadFence(bool bSetLockFence = false);

    // 将給定的異步計算指令清單按目前立即指令清單的順序排列.
    void QueueAsyncCompute(FRHIComputeCommandList& RHIComputeCmdList);

    bool IsBottomOfPipe();
    bool IsTopOfPipe();
    template <typename LAMBDA>
    void EnqueueLambda(LAMBDA&& Lambda);

    // 資源建立.
    FSamplerStateRHIRef CreateSamplerState(const FSamplerStateInitializerRHI& Initializer)
    FRasterizerStateRHIRef CreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer)
    FDepthStencilStateRHIRef CreateDepthStencilState(const FDepthStencilStateInitializerRHI& Initializer)
    FBlendStateRHIRef CreateBlendState(const FBlendStateInitializerRHI& Initializer)
    FPixelShaderRHIRef CreatePixelShader(TArrayView<const uint8> Code, const FSHAHash& Hash)
    FVertexShaderRHIRef CreateVertexShader(TArrayView<const uint8> Code, const FSHAHash& Hash)
    FHullShaderRHIRef CreateHullShader(TArrayView<const uint8> Code, const FSHAHash& Hash)
    FDomainShaderRHIRef CreateDomainShader(TArrayView<const uint8> Code, const FSHAHash& Hash)
    FGeometryShaderRHIRef CreateGeometryShader(TArrayView<const uint8> Code, const FSHAHash& Hash)
    FComputeShaderRHIRef CreateComputeShader(TArrayView<const uint8> Code, const FSHAHash& Hash)
    FComputeFenceRHIRef CreateComputeFence(const FName& Name)
    FGPUFenceRHIRef CreateGPUFence(const FName& Name)
    FStagingBufferRHIRef CreateStagingBuffer()
    FBoundShaderStateRHIRef CreateBoundShaderState(...)
    FGraphicsPipelineStateRHIRef CreateGraphicsPipelineState(const FGraphicsPipelineStateInitializer& Initializer)
    TRefCountPtr<FRHIComputePipelineState> CreateComputePipelineState(FRHIComputeShader* ComputeShader)
    FUniformBufferRHIRef CreateUniformBuffer(...)
    FIndexBufferRHIRef CreateAndLockIndexBuffer(uint32 Stride, uint32 Size, EBufferUsageFlags InUsage, ERHIAccess InResourceState, FRHIResourceCreateInfo& CreateInfo, void*& OutDataBuffer)
    FIndexBufferRHIRef CreateAndLockIndexBuffer(uint32 Stride, uint32 Size, uint32 InUsage, FRHIResourceCreateInfo& CreateInfo, void*& OutDataBuffer)
    
    // 頂點/索引接口.
    void* LockIndexBuffer(FRHIIndexBuffer* IndexBuffer, uint32 Offset, uint32 SizeRHI, EResourceLockMode LockMode);
    void UnlockIndexBuffer(FRHIIndexBuffer* IndexBuffer);
    void* LockStagingBuffer(FRHIStagingBuffer* StagingBuffer, FRHIGPUFence* Fence, uint32 Offset, uint32 SizeRHI);
    void UnlockStagingBuffer(FRHIStagingBuffer* StagingBuffer);
    FVertexBufferRHIRef CreateAndLockVertexBuffer(uint32 Size, EBufferUsageFlags InUsage, ...);
    FVertexBufferRHIRef CreateAndLockVertexBuffer(uint32 Size, uint32 InUsage, FRHIResourceCreateInfo& CreateInfo, void*& OutDataBuffer);
    void* LockVertexBuffer(FRHIVertexBuffer* VertexBuffer, uint32 Offset, uint32 SizeRHI, EResourceLockMode LockMode);
    void UnlockVertexBuffer(FRHIVertexBuffer* VertexBuffer);
    void CopyVertexBuffer(FRHIVertexBuffer* SourceBuffer, FRHIVertexBuffer* DestBuffer);
    void* LockStructuredBuffer(FRHIStructuredBuffer* StructuredBuffer, uint32 Offset, uint32 SizeRHI, EResourceLockMode LockMode);
    void UnlockStructuredBuffer(FRHIStructuredBuffer* StructuredBuffer);
    
    // UAV/SRV建立.
    FUnorderedAccessViewRHIRef CreateUnorderedAccessView(FRHIStructuredBuffer* StructuredBuffer, bool bUseUAVCounter, bool bAppendBuffer)
    FUnorderedAccessViewRHIRef CreateUnorderedAccessView(FRHITexture* Texture, uint32 MipLevel)
    FUnorderedAccessViewRHIRef CreateUnorderedAccessView(FRHITexture* Texture, uint32 MipLevel, uint8 Format)
    FUnorderedAccessViewRHIRef CreateUnorderedAccessView(FRHIVertexBuffer* VertexBuffer, uint8 Format)
    FUnorderedAccessViewRHIRef CreateUnorderedAccessView(FRHIIndexBuffer* IndexBuffer, uint8 Format)
    FShaderResourceViewRHIRef CreateShaderResourceView(FRHIStructuredBuffer* StructuredBuffer)
    FShaderResourceViewRHIRef CreateShaderResourceView(FRHIVertexBuffer* VertexBuffer, uint32 Stride, uint8 Format)
    FShaderResourceViewRHIRef CreateShaderResourceView(const FShaderResourceViewInitializer& Initializer)
    FShaderResourceViewRHIRef CreateShaderResourceView(FRHIIndexBuffer* Buffer)
        
    uint64 CalcTexture2DPlatformSize(...);
    uint64 CalcTexture3DPlatformSize(...);
    uint64 CalcTextureCubePlatformSize(...);
    
    // 紋理操作.
    void GetTextureMemoryStats(FTextureMemoryStats& OutStats);
    bool GetTextureMemoryVisualizeData(...);
    void CopySharedMips(FRHITexture2D* DestTexture2D, FRHITexture2D* SrcTexture2D);
    void TransferTexture(FRHITexture2D* Texture, FIntRect Rect, uint32 SrcGPUIndex, uint32 DestGPUIndex, bool PullData);
    void TransferTextures(const TArrayView<const FTransferTextureParams> Params);
    void GetResourceInfo(FRHITexture* Ref, FRHIResourceInfo& OutInfo);
    FShaderResourceViewRHIRef CreateShaderResourceView(FRHITexture* Texture, const FRHITextureSRVCreateInfo& CreateInfo);
    FShaderResourceViewRHIRef CreateShaderResourceView(FRHITexture* Texture, uint8 MipLevel);
    FShaderResourceViewRHIRef CreateShaderResourceView(FRHITexture* Texture, uint8 MipLevel, uint8 NumMipLevels, uint8 Format);
    FShaderResourceViewRHIRef CreateShaderResourceViewWriteMask(FRHITexture2D* Texture2DRHI);
    FShaderResourceViewRHIRef CreateShaderResourceViewFMask(FRHITexture2D* Texture2DRHI);
    uint32 ComputeMemorySize(FRHITexture* TextureRHI);
    FTexture2DRHIRef AsyncReallocateTexture2D(...);
    ETextureReallocationStatus FinalizeAsyncReallocateTexture2D(FRHITexture2D* Texture2D, bool bBlockUntilCompleted);
    ETextureReallocationStatus CancelAsyncReallocateTexture2D(FRHITexture2D* Texture2D, bool bBlockUntilCompleted);
    void* LockTexture2D(...);
    void UnlockTexture2D(FRHITexture2D* Texture, uint32 MipIndex, bool bLockWithinMiptail, bool bFlushRHIThread = true);
    void* LockTexture2DArray(...);
    void UnlockTexture2DArray(FRHITexture2DArray* Texture, uint32 TextureIndex, uint32 MipIndex, bool bLockWithinMiptail);
    void UpdateTexture2D(...);
    void UpdateFromBufferTexture2D(...);
    FUpdateTexture3DData BeginUpdateTexture3D(...);
    void EndUpdateTexture3D(FUpdateTexture3DData& UpdateData);
    void EndMultiUpdateTexture3D(TArray<FUpdateTexture3DData>& UpdateDataArray);
    void UpdateTexture3D(...);
    void* LockTextureCubeFace(...);
    void UnlockTextureCubeFace(FRHITextureCube* Texture, ...);

    // 讀取紋理表面資料.
    void ReadSurfaceData(FRHITexture* Texture, ...);
    void ReadSurfaceData(FRHITexture* Texture, ...);
    void MapStagingSurface(FRHITexture* Texture, void*& OutData, int32& OutWidth, int32& OutHeight);
    void MapStagingSurface(FRHITexture* Texture, ...);
    void UnmapStagingSurface(FRHITexture* Texture);
    void ReadSurfaceFloatData(FRHITexture* Texture, ...);
    void ReadSurfaceFloatData(FRHITexture* Texture, ...);
    void Read3DSurfaceFloatData(FRHITexture* Texture,...);
    
    // 渲染線程的資源狀态轉換.
    void AcquireTransientResource_RenderThread(FRHITexture* Texture);
    void DiscardTransientResource_RenderThread(FRHITexture* Texture);
    void AcquireTransientResource_RenderThread(FRHIVertexBuffer* Buffer);
    void DiscardTransientResource_RenderThread(FRHIVertexBuffer* Buffer);
    void AcquireTransientResource_RenderThread(FRHIStructuredBuffer* Buffer);
    void DiscardTransientResource_RenderThread(FRHIStructuredBuffer* Buffer);
   
    // 擷取渲染查詢結果.
    bool GetRenderQueryResult(FRHIRenderQuery* RenderQuery, ...);
    void PollRenderQueryResults();
    
    // 視口
    FViewportRHIRef CreateViewport(void* WindowHandle, ...);
    uint32 GetViewportNextPresentGPUIndex(FRHIViewport* Viewport);
    FTexture2DRHIRef GetViewportBackBuffer(FRHIViewport* Viewport);
    void AdvanceFrameForGetViewportBackBuffer(FRHIViewport* Viewport);
    void ResizeViewport(FRHIViewport* Viewport, ...);
    
    void AcquireThreadOwnership();
    void ReleaseThreadOwnership();
    
    // 送出指令并重新整理到GPU.
    void SubmitCommandsAndFlushGPU();
    // 執行指令隊列.
    void ExecuteCommandList(FRHICommandList* CmdList);
    
    // 更新資源.
    void UpdateTextureReference(FRHITextureReference* TextureRef, FRHITexture* NewTexture);
    void UpdateRHIResources(FRHIResourceUpdateInfo* UpdateInfos, int32 Num, bool bNeedReleaseRefs);
    // 重新整理資源.
    void FlushResources();
    
    // 幀更新.
    void Tick(float DeltaTime);
    // 阻塞直到GPU空閑.
    void BlockUntilGPUIdle();
    
    // 暫停/開啟渲染.
    void SuspendRendering();
    void ResumeRendering();
    bool IsRenderingSuspended();
    
    // 壓縮/解壓資料.
    bool EnqueueDecompress(uint8_t* SrcBuffer, uint8_t* DestBuffer, int CompressedSize, void* ErrorCodeBuffer);
    bool EnqueueCompress(uint8_t* SrcBuffer, uint8_t* DestBuffer, int UnCompressedSize, void* ErrorCodeBuffer);
    
    // 其它接口.
    bool GetAvailableResolutions(FScreenResolutionArray& Resolutions, bool bIgnoreRefreshRate);
    void GetSupportedResolution(uint32& Width, uint32& Height);
    void VirtualTextureSetFirstMipInMemory(FRHITexture2D* Texture, uint32 FirstMip);
    void VirtualTextureSetFirstMipVisible(FRHITexture2D* Texture, uint32 FirstMip);

    // 擷取原生的資料.
    void* GetNativeDevice();
    void* GetNativeInstance();
    // 擷取立即模式的指令上下文.
    IRHICommandContext* GetDefaultContext();
    // 擷取指令上下文容器.
    IRHICommandContextContainer* GetCommandContextContainer(int32 Index, int32 Num);
    
    uint32 GetGPUFrameCycles();
};

// 在RHI實作中标記指令清單的遞歸使用的類型定義.
class FRHICommandList_RecursiveHazardous : public FRHICommandList
{
public:
    FRHICommandList_RecursiveHazardous(IRHICommandContext *Context, FRHIGPUMask InGPUMask = FRHIGPUMask::All());
};

// RHI内部使用的工具類,以更安全地使用FRHICommandList_RecursiveHazardous
template <typename ContextType>
class TRHICommandList_RecursiveHazardous : public FRHICommandList_RecursiveHazardous
{
    template <typename LAMBDA>
    struct TRHILambdaCommand final : public FRHICommandBase
    {
        LAMBDA Lambda;

        TRHILambdaCommand(LAMBDA&& InLambda);
        void ExecuteAndDestruct(FRHICommandListBase& CmdList, FRHICommandListDebugContext&) override final;
    };

public:
    TRHICommandList_RecursiveHazardous(ContextType *Context, FRHIGPUMask GPUMask = FRHIGPUMask::All());

    template <typename LAMBDA>
    void RunOnContext(LAMBDA&& Lambda);
};
           

FRHICommandListImmediate封裝了立即模式的圖形API接口,在UE渲染體系中被應用得非常廣泛。它額外定義了資源的操作、建立、更新、讀取和狀态轉換接口,也增加了線程同步和GPU同步的接口。

下面對FRHICommandList核心繼承體系來個UML圖總結一下:

FNoncopyable <|-- FRHICommandListBase

class FRHICommandListBase{

FRHICommandBase* Root

FRHICommandBase** CommandLink

IRHICommandContext* Context

IRHIComputeContext* ComputeContext

AllocCommand()

Flush()

WaitForXXX()

QueueCommandListXXX()

FRHICommandListBase <|-- FRHIComputeCommandList

class FRHIComputeCommandList{

DispatchComputeShader()

DispatchIndirectComputeShader()

SetShaderXXX()

FRHIComputeCommandList <|-- FRHICommandList

class FRHICommandList{

GetBoundXXXShader()

DrawPrimitive()

DrawXXX()

FRHICommandList <|-- FRHICommandListImmediate

class FRHICommandListImmediate{

SubmitCommandsAndFlushGPU()

ExecuteCommandList()

ImmediateFlush()

FlushResources()

Tick()

BlockUntilGPUIdle()

StallRHIThread()

UnStallRHIThread()

SuspendRendering()

ResumeRendering()

CreateXXX()

FRHICommandList <|-- FRHICommandList_RecursiveHazardous

FRHICommandList_RecursiveHazardous <|-- TRHICommandList_RecursiveHazardous

本章将闡述RHI Context、DynamicRHI的概念、類型和關聯。

IRHICommandContext是RHI的指令上下文接口類,定義了一組圖形API相關的操作。在可以并行處理指令清單的平台上,它是一個單獨的對象。它和相關繼承類型定義如下:

// Engine\Source\Runtime\RHI\Public\RHIContext.h

// 能夠執行計算工作的上下文。可以在gfx管道上執行異步或計算.
class IRHIComputeContext
{
public:
    virtual ~IRHIComputeContext();

    // 設定/派發計算着色器.
    virtual void RHISetComputeShader(FRHIComputeShader* ComputeShader) = 0;
    virtual void RHISetComputePipelineState(FRHIComputePipelineState* ComputePipelineState);
    virtual void RHIDispatchComputeShader(uint32 ThreadGroupCountX, uint32 ThreadGroupCountY, uint32 ThreadGroupCountZ) = 0;
    virtual void RHIDispatchIndirectComputeShader(FRHIVertexBuffer* ArgumentBuffer, uint32 ArgumentOffset) = 0;
    virtual void RHISetAsyncComputeBudget(EAsyncComputeBudget Budget) {}
    
    // 轉換資源.
    virtual void RHIBeginTransitions(TArrayView<const FRHITransition*> Transitions) = 0;
    virtual void RHIEndTransitions(TArrayView<const FRHITransition*> Transitions) = 0;

    // UAV
    virtual void RHIClearUAVFloat(FRHIUnorderedAccessView* UnorderedAccessViewRHI, const FVector4& Values) = 0;
    virtual void RHIClearUAVUint(FRHIUnorderedAccessView* UnorderedAccessViewRHI, const FUintVector4& Values) = 0;
    virtual void RHIBeginUAVOverlap() {}
    virtual void RHIEndUAVOverlap() {}
    virtual void RHIBeginUAVOverlap(TArrayView<FRHIUnorderedAccessView* const> UAVs) {}
    virtual void RHIEndUAVOverlap(TArrayView<FRHIUnorderedAccessView* const> UAVs) {}

    // 着色器參數.
    virtual void RHISetShaderTexture(FRHIComputeShader* PixelShader, uint32 TextureIndex, FRHITexture* NewTexture) = 0;
    virtual void RHISetShaderSampler(FRHIComputeShader* ComputeShader, uint32 SamplerIndex, FRHISamplerState* NewState) = 0;
    virtual void RHISetUAVParameter(FRHIComputeShader* ComputeShader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV) = 0;
    virtual void RHISetUAVParameter(FRHIComputeShader* ComputeShader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV, uint32 InitialCount) = 0;
    virtual void RHISetShaderResourceViewParameter(FRHIComputeShader* ComputeShader, uint32 SamplerIndex, FRHIShaderResourceView* SRV) = 0;
    virtual void RHISetShaderUniformBuffer(FRHIComputeShader* ComputeShader, uint32 BufferIndex, FRHIUniformBuffer* Buffer) = 0;
    virtual void RHISetShaderParameter(FRHIComputeShader* ComputeShader, uint32 BufferIndex, uint32 BaseIndex, uint32 NumBytes, const void* NewValue) = 0;
    virtual void RHISetGlobalUniformBuffers(const FUniformBufferStaticBindings& InUniformBuffers);
    
    // 壓入/彈出事件.
    virtual void RHIPushEvent(const TCHAR* Name, FColor Color) = 0;
    virtual void RHIPopEvent() = 0;

    // 其它接口.
    virtual void RHISubmitCommandsHint() = 0;
    virtual void RHIInvalidateCachedState() {}
    virtual void RHICopyToStagingBuffer(FRHIVertexBuffer* SourceBufferRHI, FRHIStagingBuffer* DestinationStagingBufferRHI, uint32 InOffset, uint32 InNumBytes);
    virtual void RHIWriteGPUFence(FRHIGPUFence* FenceRHI);
    virtual void RHISetGPUMask(FRHIGPUMask GPUMask);

    // 加速結構.
    virtual void RHIBuildAccelerationStructure(FRHIRayTracingGeometry* Geometry);
    virtual void RHIBuildAccelerationStructures(const TArrayView<const FAccelerationStructureBuildParams> Params);
    virtual void RHIBuildAccelerationStructure(FRHIRayTracingScene* Scene);

    // 擷取計算上下文.
    inline IRHIComputeContext& GetLowestLevelContext() { return *this; }
    inline IRHIComputeContext& GetHighestLevelContext() { return *this; }
};

// 指令上下文.
class IRHICommandContext : public IRHIComputeContext
{
public:
    virtual ~IRHICommandContext();

    // 派發計算.
    virtual void RHIDispatchComputeShader(uint32 ThreadGroupCountX, uint32 ThreadGroupCountY, uint32 ThreadGroupCountZ) = 0;
    virtual void RHIDispatchIndirectComputeShader(FRHIVertexBuffer* ArgumentBuffer, uint32 ArgumentOffset) = 0;
    
    // 渲染查詢.
    virtual void RHIBeginRenderQuery(FRHIRenderQuery* RenderQuery) = 0;
    virtual void RHIEndRenderQuery(FRHIRenderQuery* RenderQuery) = 0;
    virtual void RHIPollOcclusionQueries();

    // 開啟/結束接口.
    virtual void RHIBeginDrawingViewport(FRHIViewport* Viewport, FRHITexture* RenderTargetRHI) = 0;
    virtual void RHIEndDrawingViewport(FRHIViewport* Viewport, bool bPresent, bool bLockToVsync) = 0;
    virtual void RHIBeginFrame() = 0;
    virtual void RHIEndFrame() = 0;
    virtual void RHIBeginScene() = 0;
    virtual void RHIEndScene() = 0;
    virtual void RHIBeginUpdateMultiFrameResource(FRHITexture* Texture);
    virtual void RHIEndUpdateMultiFrameResource(FRHITexture* Texture);
    virtual void RHIBeginUpdateMultiFrameResource(FRHIUnorderedAccessView* UAV);
    virtual void RHIEndUpdateMultiFrameResource(FRHIUnorderedAccessView* UAV);
        
    // 設定資料.
    virtual void RHISetStreamSource(uint32 StreamIndex, FRHIVertexBuffer* VertexBuffer, uint32 Offset) = 0;
    virtual void RHISetViewport(float MinX, float MinY, float MinZ, float MaxX, float MaxY, float MaxZ) = 0;
    virtual void RHISetStereoViewport(...);
    virtual void RHISetScissorRect(bool bEnable, uint32 MinX, uint32 MinY, uint32 MaxX, uint32 MaxY) = 0;
    virtual void RHISetGraphicsPipelineState(FRHIGraphicsPipelineState* GraphicsState, bool bApplyAdditionalState) = 0;

    // 設定着色器參數.
    virtual void RHISetShaderTexture(FRHIGraphicsShader* Shader, uint32 TextureIndex, FRHITexture* NewTexture) = 0;
    virtual void RHISetShaderTexture(FRHIComputeShader* PixelShader, uint32 TextureIndex, FRHITexture* NewTexture) = 0;
    virtual void RHISetShaderSampler(FRHIComputeShader* ComputeShader, uint32 SamplerIndex, FRHISamplerState* NewState) = 0;
    virtual void RHISetShaderSampler(FRHIGraphicsShader* Shader, uint32 SamplerIndex, FRHISamplerState* NewState) = 0;
    virtual void RHISetUAVParameter(FRHIPixelShader* PixelShader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV) = 0;
    virtual void RHISetUAVParameter(FRHIComputeShader* ComputeShader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV) = 0;
    virtual void RHISetUAVParameter(FRHIComputeShader* ComputeShader, uint32 UAVIndex, FRHIUnorderedAccessView* UAV, uint32 InitialCount) = 0;
    virtual void RHISetShaderResourceViewParameter(FRHIComputeShader* ComputeShader, uint32 SamplerIndex, FRHIShaderResourceView* SRV) = 0;
    virtual void RHISetShaderResourceViewParameter(FRHIGraphicsShader* Shader, uint32 SamplerIndex, FRHIShaderResourceView* SRV) = 0;
    virtual void RHISetShaderUniformBuffer(FRHIGraphicsShader* Shader, uint32 BufferIndex, FRHIUniformBuffer* Buffer) = 0;
    virtual void RHISetShaderUniformBuffer(FRHIComputeShader* ComputeShader, uint32 BufferIndex, FRHIUniformBuffer* Buffer) = 0;
    virtual void RHISetShaderParameter(FRHIGraphicsShader* Shader, uint32 BufferIndex, uint32 BaseIndex, uint32 NumBytes, const void* NewValue) = 0;
    virtual void RHISetShaderParameter(FRHIComputeShader* ComputeShader, uint32 BufferIndex, uint32 BaseIndex, uint32 NumBytes, const void* NewValue) = 0;
    virtual void RHISetStencilRef(uint32 StencilRef) {}
    virtual void RHISetBlendFactor(const FLinearColor& BlendFactor) {}
    
    // 繪制圖元.
    virtual void RHIDrawPrimitive(uint32 BaseVertexIndex, uint32 NumPrimitives, uint32 NumInstances) = 0;
    virtual void RHIDrawPrimitiveIndirect(FRHIVertexBuffer* ArgumentBuffer, uint32 ArgumentOffset) = 0;
    virtual void RHIDrawIndexedIndirect(FRHIIndexBuffer* IndexBufferRHI, FRHIStructuredBuffer* ArgumentsBufferRHI, int32 DrawArgumentsIndex, uint32 NumInstances) = 0;
    virtual void RHIDrawIndexedPrimitive(FRHIIndexBuffer* IndexBuffer, int32 BaseVertexIndex, uint32 FirstInstance, uint32 NumVertices, uint32 StartIndex, uint32 NumPrimitives, uint32 NumInstances) = 0;
    virtual void RHIDrawIndexedPrimitiveIndirect(FRHIIndexBuffer* IndexBuffer, FRHIVertexBuffer* ArgumentBuffer, uint32 ArgumentOffset) = 0;

    // 其它接口
    virtual void RHISetDepthBounds(float MinDepth, float MaxDepth) = 0;
    virtual void RHISetShadingRate(EVRSShadingRate ShadingRate, EVRSRateCombiner Combiner);
    virtual void RHISetShadingRateImage(FRHITexture* RateImageTexture, EVRSRateCombiner Combiner);
    virtual void RHISetMultipleViewports(uint32 Count, const FViewportBounds* Data) = 0;
    virtual void RHICopyToResolveTarget(FRHITexture* SourceTexture, FRHITexture* DestTexture, const FResolveParams& ResolveParams) = 0;
    virtual void RHIResummarizeHTile(FRHITexture2D* DepthTexture);
    virtual void RHICalibrateTimers();
    virtual void RHICalibrateTimers(FRHITimestampCalibrationQuery* CalibrationQuery);
    virtual void RHIDiscardRenderTargets(bool Depth, bool Stencil, uint32 ColorBitMask) {}
    
    // 紋理
    virtual void RHIUpdateTextureReference(FRHITextureReference* TextureRef, FRHITexture* NewTexture) = 0;
    virtual void RHICopyTexture(FRHITexture* SourceTexture, FRHITexture* DestTexture, const FRHICopyTextureInfo& CopyInfo);
    virtual void RHICopyBufferRegion(FRHIVertexBuffer* DestBuffer, ...);
    
    // Pass相關.
    virtual void RHIBeginRenderPass(const FRHIRenderPassInfo& InInfo, const TCHAR* InName) = 0;
    virtual void RHIEndRenderPass() = 0;
    virtual void RHINextSubpass();

    // 光線追蹤.
    virtual void RHIClearRayTracingBindings(FRHIRayTracingScene* Scene);
    virtual void RHIBuildAccelerationStructures(const TArrayView<const FAccelerationStructureBuildParams> Params);
    virtual void RHIBuildAccelerationStructure(FRHIRayTracingGeometry* Geometry) final override;
    virtual void RHIBuildAccelerationStructure(FRHIRayTracingScene* Scene);
    virtual void RHIRayTraceOcclusion(FRHIRayTracingScene* Scene, ...);
    virtual void RHIRayTraceIntersection(FRHIRayTracingScene* Scene, ...);
    virtual void RHIRayTraceDispatch(FRHIRayTracingPipelineState* RayTracingPipelineState, ...);
    virtual void RHISetRayTracingHitGroups(FRHIRayTracingScene* Scene, ...);
    virtual void RHISetRayTracingHitGroup(FRHIRayTracingScene* Scene, ...);
    virtual void RHISetRayTracingCallableShader(FRHIRayTracingScene* Scene, ...);
    virtual void RHISetRayTracingMissShader(FRHIRayTracingScene* Scene, ...);
    
    (......)

protected:
    // 渲染Pass資訊.
    FRHIRenderPassInfo RenderPassInfo;
};
           

以上可知,IRHICommandContext的接口和FRHICommandList的接口高度相似且重疊。IRHICommandContext還有許多子類:

  • IRHICommandContextPSOFallback:不支援真正的圖形管道的RHI指令上下文。
    • FNullDynamicRHI:空實作的動态綁定RHI。
    • FOpenGLDynamicRHI:OpenGL的動态RHI。
    • FD3D11DynamicRHI:D3D11的動态RHI。
  • FMetalRHICommandContext:Metal平台的指令上下文。
  • FD3D12CommandContextBase:D3D12的指令上下文。
  • FVulkanCommandListContext:Vulkan平台的指令隊列上下文。
  • FEmptyDynamicRHI:動态綁定的RHI實作的接口。
  • FValidationContext:校驗上下文。

上述的子類中,平台相關的部分子類還繼承了FDynamicRHI。IRHICommandContextPSOFallback比較特殊,它的子類都是不支援并行繪制的圖形API(OpenGL、D3D11)。IRHICommandContextPSOFallback定義如下:

class IRHICommandContextPSOFallback : public IRHICommandContext
{
public:
    // 設定渲染狀态.
    virtual void RHISetBoundShaderState(FRHIBoundShaderState* BoundShaderState) = 0;
    virtual void RHISetDepthStencilState(FRHIDepthStencilState* NewState, uint32 StencilRef) = 0;
    virtual void RHISetRasterizerState(FRHIRasterizerState* NewState) = 0;
    virtual void RHISetBlendState(FRHIBlendState* NewState, const FLinearColor& BlendFactor) = 0;
    virtual void RHIEnableDepthBoundsTest(bool bEnable) = 0;
    // 管線狀态.
    virtual void RHISetGraphicsPipelineState(FRHIGraphicsPipelineState* GraphicsState, bool bApplyAdditionalState) override;
};
           

IRHICommandContext的核心繼承UML圖如下:

IRHIComputeContext <|.. IRHICommandContext

IRHICommandContext <|.. IRHICommandContextPSOFallback

IRHICommandContextPSOFallback <|-- FNullDynamicRHI

IRHICommandContextPSOFallback <|-- FOpenGLDynamicRHI

IRHICommandContextPSOFallback <|-- FD3D11DynamicRHI

IRHICommandContext <|-- FD3D12CommandContextBase

IRHICommandContext <|-- FMetalRHICommandContext

IRHICommandContext <|-- FVulkanCommandListContext

IRHICommandContext <|-- FEmptyDynamicRHI

class IRHIComputeContext{

IRHICommandContextContainer就是包含了IRHICommandContext對象的類型,它和核心繼承子類的定義如下:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

class IRHICommandContextContainer
{
public:
    virtual ~IRHICommandContextContainer();

    // 擷取IRHICommandContext執行個體.
    virtual IRHICommandContext* GetContext();
    virtual void SubmitAndFreeContextContainer(int32 Index, int32 Num);
    virtual void FinishContext();
};

// Engine\Source\Runtime\Apple\MetalRHI\Private\MetalContext.cpp

class FMetalCommandContextContainer : public IRHICommandContextContainer
{
    // FMetalRHICommandContext清單的下一個.
    FMetalRHICommandContext* CmdContext;
    int32 Index;
    int32 Num;
    
public:
    void* operator new(size_t Size);
    void operator delete(void *RawMemory);
    
    FMetalCommandContextContainer(int32 InIndex, int32 InNum);
    virtual ~FMetalCommandContextContainer() override final;
    
    virtual IRHICommandContext* GetContext() override final;
    virtual void FinishContext() override final;
    // 送出并釋放自己.
    virtual void SubmitAndFreeContextContainer(int32 NewIndex, int32 NewNum) override final;
};

// FMetalCommandContextContainer配置設定器.
static TLockFreeFixedSizeAllocator<sizeof(FMetalCommandContextContainer), PLATFORM_CACHE_LINE_SIZE, FThreadSafeCounter> FMetalCommandContextContainerAllocator;

// Engine\Source\Runtime\D3D12RHI\Private\D3D12CommandContext.cpp

class FD3D12CommandContextContainer : public IRHICommandContextContainer
{
    // 擴充卡.
    FD3D12Adapter* Adapter;
    // 指令上下文.
    FD3D12CommandContext* CmdContext;
    // 上下文重定向器.
    FD3D12CommandContextRedirector* CmdContextRedirector;
    FRHIGPUMask GPUMask;

    // 指令隊列清單.
    TArray<FD3D12CommandListHandle> CommandLists;

public:
    void* operator new(size_t Size);
    void operator delete(void* RawMemory);

    FD3D12CommandContextContainer(FD3D12Adapter* InAdapter, FRHIGPUMask InGPUMask);
    virtual ~FD3D12CommandContextContainer() override

    virtual IRHICommandContext* GetContext() override;
    virtual void FinishContext() override;
    virtual void SubmitAndFreeContextContainer(int32 Index, int32 Num) override;
};

// Engine\Source\Runtime\VulkanRHI\Private\VulkanContext.h

struct FVulkanCommandContextContainer : public IRHICommandContextContainer, public VulkanRHI::FDeviceChild
{
    // 指令隊列上下文.
    FVulkanCommandListContext* CmdContext;

    FVulkanCommandContextContainer(FVulkanDevice* InDevice);

    virtual IRHICommandContext* GetContext() override final;
    virtual void FinishContext() override final;
    virtual void SubmitAndFreeContextContainer(int32 Index, int32 Num) override final;

    void* operator new(size_t Size);
    void operator delete(void* RawMemory);
};
           

IRHICommandContextContainer相當于存儲了一個或一組指令上下文的容器,以支援并行化地送出指令隊列,隻在D3D12、Metal、Vulkan等現代圖形API中有實作。完整繼承UML圖如下:

IRHICommandContextContainer <|-- FMetalCommandContextContainer

class IRHICommandContextContainer{

IRHICommandContext* GetContext()

SubmitAndFreeContextContainer()

FinishContext()

class FMetalCommandContextContainer{

FMetalRHICommandContext* CmdContext

IRHICommandContextContainer <|-- FD3D12CommandContextContainer

class FD3D12CommandContextContainer{

FD3D12Adapter* Adapter

FD3D12CommandContext* CmdContext

FD3D12CommandContextRedirector* CmdContextRedirector

TArray<FD3D12CommandListHandle> CommandLists

IRHICommandContextContainer <|-- FVulkanCommandContextContainer

class FVulkanCommandContextContainer{

FVulkanCommandListContext* CmdContext

IRHICommandContextContainer <|-- FValidationRHICommandContextContainer

FDynamicRHI是由動态綁定的RHI實作的接口,它定義的接口和CommandList、CommandContext比較相似,部分如下:

class RHI_API FDynamicRHI
{
public:
    virtual ~FDynamicRHI() {}

    virtual void Init() = 0;
    virtual void PostInit() {}
    virtual void Shutdown() = 0;

    void InitPixelFormatInfo(const TArray<uint32>& PixelFormatBlockBytesIn);

    // ---- RHI接口 ----

    // 下列接口要求FlushType: Thread safe
    virtual FSamplerStateRHIRef RHICreateSamplerState(const FSamplerStateInitializerRHI& Initializer) = 0;
    virtual FRasterizerStateRHIRef RHICreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer) = 0;
    virtual FDepthStencilStateRHIRef RHICreateDepthStencilState(const FDepthStencilStateInitializerRHI& Initializer) = 0;
    virtual FBlendStateRHIRef RHICreateBlendState(const FBlendStateInitializerRHI& Initializer) = 0;

    // 下列接口要求FlushType: Wait RHI Thread
    virtual FVertexDeclarationRHIRef RHICreateVertexDeclaration(const FVertexDeclarationElementList& Elements) = 0;
    virtual FPixelShaderRHIRef RHICreatePixelShader(TArrayView<const uint8> Code, const FSHAHash& Hash) = 0;
    virtual FVertexShaderRHIRef RHICreateVertexShader(TArrayView<const uint8> Code, const FSHAHash& Hash) = 0;
    virtual FHullShaderRHIRef RHICreateHullShader(TArrayView<const uint8> Code, const FSHAHash& Hash) = 0;
    virtual FDomainShaderRHIRef RHICreateDomainShader(TArrayView<const uint8> Code, const FSHAHash& Hash) = 0;
    virtual FGeometryShaderRHIRef RHICreateGeometryShader(TArrayView<const uint8> Code, const FSHAHash& Hash) = 0;
    virtual FComputeShaderRHIRef RHICreateComputeShader(TArrayView<const uint8> Code, const FSHAHash& Hash) = 0;

     // FlushType: Must be Thread-Safe.
    virtual FRenderQueryPoolRHIRef RHICreateRenderQueryPool(ERenderQueryType QueryType, uint32 NumQueries = UINT32_MAX);
    inline FComputeFenceRHIRef RHICreateComputeFence(const FName& Name);
    
    virtual FGPUFenceRHIRef RHICreateGPUFence(const FName &Name);
    virtual void RHICreateTransition(FRHITransition* Transition, ERHIPipeline SrcPipelines, ERHIPipeline DstPipelines, ERHICreateTransitionFlags CreateFlags, TArrayView<const FRHITransitionInfo> Infos);
    virtual void RHIReleaseTransition(FRHITransition* Transition);

    // FlushType: Thread safe.    
    virtual FStagingBufferRHIRef RHICreateStagingBuffer();
    virtual void* RHILockStagingBuffer(FRHIStagingBuffer* StagingBuffer, FRHIGPUFence* Fence, uint32 Offset, uint32 SizeRHI);
    virtual void RHIUnlockStagingBuffer(FRHIStagingBuffer* StagingBuffer);
    
    // FlushType: Thread safe, but varies depending on the RHI
    virtual FBoundShaderStateRHIRef RHICreateBoundShaderState(FRHIVertexDeclaration* VertexDeclaration, FRHIVertexShader* VertexShader, FRHIHullShader* HullShader, FRHIDomainShader* DomainShader, FRHIPixelShader* PixelShader, FRHIGeometryShader* GeometryShader) = 0;
    // FlushType: Thread safe
    virtual FGraphicsPipelineStateRHIRef RHICreateGraphicsPipelineState(const FGraphicsPipelineStateInitializer& Initializer);
    
    // FlushType: Thread safe, but varies depending on the RHI
    virtual FUniformBufferRHIRef RHICreateUniformBuffer(const void* Contents, const FRHIUniformBufferLayout& Layout, EUniformBufferUsage Usage, EUniformBufferValidation Validation) = 0;
    virtual void RHIUpdateUniformBuffer(FRHIUniformBuffer* UniformBufferRHI, const void* Contents) = 0;
    
    // FlushType: Wait RHI Thread
    virtual FIndexBufferRHIRef RHICreateIndexBuffer(uint32 Stride, uint32 Size, uint32 InUsage, ERHIAccess InResourceState, FRHIResourceCreateInfo& CreateInfo) = 0;
    virtual void* RHILockIndexBuffer(FRHICommandListImmediate& RHICmdList, FRHIIndexBuffer* IndexBuffer, uint32 Offset, uint32 Size, EResourceLockMode LockMode);
    virtual void RHIUnlockIndexBuffer(FRHICommandListImmediate& RHICmdList, FRHIIndexBuffer* IndexBuffer);
    virtual void RHITransferIndexBufferUnderlyingResource(FRHIIndexBuffer* DestIndexBuffer, FRHIIndexBuffer* SrcIndexBuffer);

    // FlushType: Wait RHI Thread
    virtual FVertexBufferRHIRef RHICreateVertexBuffer(uint32 Size, uint32 InUsage, ERHIAccess InResourceState, FRHIResourceCreateInfo& CreateInfo) = 0;
    // FlushType: Flush RHI Thread
    virtual void* RHILockVertexBuffer(FRHICommandListImmediate& RHICmdList, FRHIVertexBuffer* VertexBuffer, uint32 Offset, uint32 SizeRHI, EResourceLockMode LockMode);
    virtual void RHIUnlockVertexBuffer(FRHICommandListImmediate& RHICmdList, FRHIVertexBuffer* VertexBuffer);
    // FlushType: Flush Immediate (seems dangerous)
    virtual void RHICopyVertexBuffer(FRHIVertexBuffer* SourceBuffer, FRHIVertexBuffer* DestBuffer) = 0;
    virtual void RHITransferVertexBufferUnderlyingResource(FRHIVertexBuffer* DestVertexBuffer, FRHIVertexBuffer* SrcVertexBuffer);

    // FlushType: Wait RHI Thread
    virtual FStructuredBufferRHIRef RHICreateStructuredBuffer(uint32 Stride, uint32 Size, uint32 InUsage, ERHIAccess InResourceState, FRHIResourceCreateInfo& CreateInfo) = 0;
    // FlushType: Flush RHI Thread
    virtual void* RHILockStructuredBuffer(FRHICommandListImmediate& RHICmdList, FRHIStructuredBuffer* StructuredBuffer, uint32 Offset, uint32 SizeRHI, EResourceLockMode LockMode);
    virtual void RHIUnlockStructuredBuffer(FRHICommandListImmediate& RHICmdList, FRHIStructuredBuffer* StructuredBuffer);

    // FlushType: Wait RHI Thread
    virtual FUnorderedAccessViewRHIRef RHICreateUnorderedAccessView(FRHIStructuredBuffer* StructuredBuffer, bool bUseUAVCounter, bool bAppendBuffer) = 0;
    // FlushType: Wait RHI Thread
    virtual FUnorderedAccessViewRHIRef RHICreateUnorderedAccessView(FRHITexture* Texture, uint32 MipLevel) = 0;
    // FlushType: Wait RHI Thread
    virtual FUnorderedAccessViewRHIRef RHICreateUnorderedAccessView(FRHITexture* Texture, uint32 MipLevel, uint8 Format);

    (......)

    // RHI幀更新,須從主線程調用,FlushType: Thread safe
    virtual void RHITick(float DeltaTime) = 0;
    // 阻塞CPU直到GPU執行完成變成空閑. FlushType: Flush Immediate (seems wrong)
    virtual void RHIBlockUntilGPUIdle() = 0;
    // 開始目前幀,并確定GPU正在積極地工作 FlushType: Flush Immediate (copied from RHIBlockUntilGPUIdle)
    virtual void RHISubmitCommandsAndFlushGPU() {};

    // 通知RHI準備暫停它.
    virtual void RHIBeginSuspendRendering() {};
    // 暫停RHI渲染并将控制權交給系統的操作, FlushType: Thread safe
    virtual void RHISuspendRendering() {};
    // 繼續RHI渲染, FlushType: Thread safe
    virtual void RHIResumeRendering() {};
    // FlushType: Flush Immediate
    virtual bool RHIIsRenderingSuspended() { return false; };

    // FlushType: called from render thread when RHI thread is flushed 
    // 僅在FRHIResource::FlushPendingDeletes内的延遲删除之前每幀調用.
    virtual void RHIPerFrameRHIFlushComplete();

    // 執行指令隊列, FlushType: Wait RHI Thread
    virtual void RHIExecuteCommandList(FRHICommandList* CmdList) = 0;

    // FlushType: Flush RHI Thread
    virtual void* RHIGetNativeDevice() = 0;
    // FlushType: Flush RHI Thread
    virtual void* RHIGetNativeInstance() = 0;

    // 擷取指令上下文. FlushType: Thread safe
    virtual IRHICommandContext* RHIGetDefaultContext() = 0;
    // 擷取計算上下文. FlushType: Thread safe
    virtual IRHIComputeContext* RHIGetDefaultAsyncComputeContext();

    // FlushType: Thread safe
    virtual class IRHICommandContextContainer* RHIGetCommandContextContainer(int32 Index, int32 Num) = 0;

    // 直接由渲染線程調用的接口, 以優化RHI調用.
    virtual FVertexBufferRHIRef CreateAndLockVertexBuffer_RenderThread(class FRHICommandListImmediate& RHICmdList, uint32 Size, uint32 InUsage, ERHIAccess InResourceState, FRHIResourceCreateInfo& CreateInfo, void*& OutDataBuffer);
    virtual FIndexBufferRHIRef CreateAndLockIndexBuffer_RenderThread(class FRHICommandListImmediate& RHICmdList, uint32 Stride, uint32 Size, uint32 InUsage, ERHIAccess InResourceState, FRHIResourceCreateInfo& CreateInfo, void*& OutDataBuffer);
    
    (......)

    // Buffer Lock/Unlock
    virtual void* LockVertexBuffer_BottomOfPipe(class FRHICommandListImmediate& RHICmdList, ...);
    virtual void* LockIndexBuffer_BottomOfPipe(class FRHICommandListImmediate& RHICmdList, ...);
    
    (......)
};
           

以上隻顯示了部分接口,其中部分接口要求從渲染線程調用,部分須從遊戲線程調用。大多數接口在被調用前需重新整理指定類型的指令,比如:

class RHI_API FDynamicRHI
{
    // FlushType: Wait RHI Thread
    void RHIExecuteCommandList(FRHICommandList* CmdList);

    // FlushType: Flush Immediate
    void RHIBlockUntilGPUIdle();

    // FlushType: Thread safe 
    void RHITick(float DeltaTime);
};
           

那麼調用以上接口的代碼如下:

class RHI_API FRHICommandListImmediate : public FRHICommandList
{
    void ExecuteCommandList(FRHICommandList* CmdList)
    {
        // 等待RHI線程.
        FScopedRHIThreadStaller StallRHIThread(*this);
        GDynamicRHI->RHIExecuteCommandList(CmdList);
    }
    
    void BlockUntilGPUIdle()
    {
        // 調用FDynamicRHI::RHIBlockUntilGPUIdle須重新整理RHI.
        ImmediateFlush(EImmediateFlushType::FlushRHIThread);  
        GDynamicRHI->RHIBlockUntilGPUIdle();
    }
    
    void Tick(float DeltaTime)
    {
        // 由于FDynamicRHI::RHITick是Thread Safe(線程安全), 是以不需要調用ImmediateFlush或等待事件.
        GDynamicRHI->RHITick(DeltaTime);
    }
};
           

我們繼續看FDynamicRHI的子類定義:

// Engine\Source\Runtime\Apple\MetalRHI\Private\MetalDynamicRHI.h

class FMetalDynamicRHI : public FDynamicRHI
{
public:
    FMetalDynamicRHI(ERHIFeatureLevel::Type RequestedFeatureLevel);
    ~FMetalDynamicRHI();
    
    // 設定必要的内部資源
    void SetupRecursiveResources();

    // FDynamicRHI interface.
    virtual void Init();
    virtual void Shutdown() {}
    virtual const TCHAR* GetName() override { return TEXT("Metal"); }
    
    virtual FSamplerStateRHIRef RHICreateSamplerState(const FSamplerStateInitializerRHI& Initializer) final override;
    virtual FRasterizerStateRHIRef RHICreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer) final override;
    virtual FDepthStencilStateRHIRef RHICreateDepthStencilState(...) final override;
    
    (......)
    
private:
    // 立即模式上下文.
    FMetalRHIImmediateCommandContext ImmediateContext;
    // 異步計算上下文.
    FMetalRHICommandContext* AsyncComputeContext;
    // 頂點聲明緩存.
    TMap<uint32, FVertexDeclarationRHIRef> VertexDeclarationCache;
};

// Engine\Source\Runtime\D3D12RHI\Private\D3D12RHIPrivate.h

class FD3D12DynamicRHI : public FDynamicRHI
{
    static FD3D12DynamicRHI* SingleD3DRHI;

public:
    static D3D12RHI_API FD3D12DynamicRHI* GetD3DRHI() { return SingleD3DRHI; }

    FD3D12DynamicRHI(const TArray<TSharedPtr<FD3D12Adapter>>& ChosenAdaptersIn, bool bInPixEventEnabled);
    virtual ~FD3D12DynamicRHI();

    // FDynamicRHI interface.
    virtual void Init() override;
    virtual void PostInit() override;
    virtual void Shutdown() override;
    virtual const TCHAR* GetName() override { return TEXT("D3D12"); }

    virtual FSamplerStateRHIRef RHICreateSamplerState(const FSamplerStateInitializerRHI& Initializer) final override;
    virtual FRasterizerStateRHIRef RHICreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer) final override;
    virtual FDepthStencilStateRHIRef RHICreateDepthStencilState(const FDepthStencilStateInitializerRHI& Initializer) final override;
    
    (......)
    
protected:
    // 已選擇的擴充卡.
    TArray<TSharedPtr<FD3D12Adapter>> ChosenAdapters;
    // AMD AGS工具庫上下文.
    AGSContext* AmdAgsContext;

    // D3D12裝置.
    inline FD3D12Device* GetRHIDevice(uint32 GPUIndex)
    {
        return GetAdapter().GetDevice(GPUIndex);
    }
    
    (......)
};

// Engine\Source\Runtime\EmptyRHI\Public\EmptyRHI.h

class FEmptyDynamicRHI : public FDynamicRHI, public IRHICommandContext
{
    (......)
};

// Engine\Source\Runtime\NullDrv\Public\NullRHI.h

class FNullDynamicRHI : public FDynamicRHI , public IRHICommandContextPSOFallback
{
    (......)
};


class OPENGLDRV_API FOpenGLDynamicRHI  final : public FDynamicRHI, public IRHICommandContextPSOFallback
{
public:
    FOpenGLDynamicRHI();
    ~FOpenGLDynamicRHI();

    // FDynamicRHI interface.
    virtual void Init();
    virtual void PostInit();

    virtual void Shutdown();
    virtual const TCHAR* GetName() override { return TEXT("OpenGL"); }
    
    virtual FRasterizerStateRHIRef RHICreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer) final override;
    virtual FDepthStencilStateRHIRef RHICreateDepthStencilState(const FDepthStencilStateInitializerRHI& Initializer) final override;
    virtual FBlendStateRHIRef RHICreateBlendState(const FBlendStateInitializerRHI& Initializer) final override;
    
    (......)
    
private:
    // 計數器.
    uint32 SceneFrameCounter;
    uint32 ResourceTableFrameCounter;

    // RHI裝置狀态, 獨立于使用的底層OpenGL上下文.
    FOpenGLRHIState                        PendingState;
    FOpenGLStreamedVertexBufferArray    DynamicVertexBuffers;
    FOpenGLStreamedIndexBufferArray        DynamicIndexBuffers;
    FSamplerStateRHIRef                    PointSamplerState;

    // 已建立的視口.
    TArray<FOpenGLViewport*> Viewports;
    TRefCountPtr<FOpenGLViewport>        DrawingViewport;
    bool                                bRevertToSharedContextAfterDrawingViewport;

    // 已綁定的着色器狀态曆史.
    TGlobalResource< TBoundShaderStateHistory<10000> > BoundShaderStateHistory;

    // 逐上下文狀态緩存.
    FOpenGLContextState InvalidContextState;
    FOpenGLContextState    SharedContextState;
    FOpenGLContextState    RenderingContextState;

    // 統一緩沖區.
    TArray<FRHIUniformBuffer*> GlobalUniformBuffers;
    TMap<GLuint, TPair<GLenum, GLenum>> TextureMipLimits;

    // 底層平台相關的資料.
    FPlatformOpenGLDevice* PlatformDevice;

    // 查詢相關.
    TArray<FOpenGLRenderQuery*> Queries;
    FCriticalSection QueriesListCriticalSection;
    
    // 配置和呈現資料.
    FOpenGLGPUProfiler GPUProfilingData;
    FCriticalSection CustomPresentSection;
    TRefCountPtr<class FRHICustomPresent> CustomPresent;
    
    (......)
};

// Engine\Source\Runtime\RHI\Public\RHIValidation.h

class FValidationRHI : public FDynamicRHI
{
public:
    RHI_API FValidationRHI(FDynamicRHI* InRHI);
    RHI_API virtual ~FValidationRHI();

    virtual FSamplerStateRHIRef RHICreateSamplerState(const FSamplerStateInitializerRHI& Initializer) override final;
    virtual FRasterizerStateRHIRef RHICreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer) override final;
    virtual FDepthStencilStateRHIRef RHICreateDepthStencilState(const FDepthStencilStateInitializerRHI& Initializer) override final;
    
    (......)
    
    // RHI執行個體.
    FDynamicRHI*    RHI;
    // 所屬的上下文.
    TIndirectArray<IRHIComputeContext> OwnedContexts;
    // 深度模闆狀态清單.
    TMap<FRHIDepthStencilState*, FDepthStencilStateInitializerRHI> DepthStencilStates;
};

// Engine\Source\Runtime\VulkanRHI\Public\VulkanDynamicRHI.h

class FVulkanDynamicRHI : public FDynamicRHI
{
public:
    FVulkanDynamicRHI();
    ~FVulkanDynamicRHI();

    // FDynamicRHI interface.
    virtual void Init() final override;
    virtual void PostInit() final override;
    virtual void Shutdown() final override;;
    virtual const TCHAR* GetName() final override { return TEXT("Vulkan"); }

    void InitInstance();

    virtual FSamplerStateRHIRef RHICreateSamplerState(const FSamplerStateInitializerRHI& Initializer) final override;
    virtual FRasterizerStateRHIRef RHICreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer) final override;
    virtual FDepthStencilStateRHIRef RHICreateDepthStencilState(const FDepthStencilStateInitializerRHI& Initializer) final override;
    
    (......)
    
protected:
    // 執行個體.
    VkInstance Instance;
    TArray<const ANSICHAR*> InstanceExtensions;
    TArray<const ANSICHAR*> InstanceLayers;

    // 裝置.
    TArray<FVulkanDevice*> Devices;
    FVulkanDevice* Device;

    // 視口.
    TArray<FVulkanViewport*> Viewports;
    TRefCountPtr<FVulkanViewport> DrawingViewport;

    // 緩存.
    IConsoleObject* SavePipelineCacheCmd = nullptr;
    IConsoleObject* RebuildPipelineCacheCmd = nullptr;

    // 臨界區.
    FCriticalSection LockBufferCS;

    // 内部接口.
    void CreateInstance();
    void SelectAndInitDevice();
    void InitGPU(FVulkanDevice* Device);
    void InitDevice(FVulkanDevice* Device);
    
    (......)
};

// Engine\Source\Runtime\Windows\D3D11RHI\Private\D3D11RHIPrivate.h

class D3D11RHI_API FD3D11DynamicRHI : public FDynamicRHI, public IRHICommandContextPSOFallback
{
public:
    FD3D11DynamicRHI(IDXGIFactory1* InDXGIFactory1,D3D_FEATURE_LEVEL InFeatureLevel,int32 InChosenAdapter, const DXGI_ADAPTER_DESC& ChosenDescription);
    virtual ~FD3D11DynamicRHI();

    virtual void InitD3DDevice();

    // FDynamicRHI interface.
    virtual void Init() override;
    virtual void PostInit() override;
    virtual void Shutdown() override;
    virtual const TCHAR* GetName() override { return TEXT("D3D11"); }

    // HDR display output
    virtual void EnableHDR();
    virtual void ShutdownHDR();

    virtual FSamplerStateRHIRef RHICreateSamplerState(const FSamplerStateInitializerRHI& Initializer) final override;
    virtual FRasterizerStateRHIRef RHICreateRasterizerState(const FRasterizerStateInitializerRHI& Initializer) final override;
    virtual FDepthStencilStateRHIRef RHICreateDepthStencilState(const FDepthStencilStateInitializerRHI& Initializer) final override;
    
    (......)

    ID3D11Device* GetDevice() const
    {
        return Direct3DDevice;
    }
    FD3D11DeviceContext* GetDeviceContext() const
    {
        return Direct3DDeviceIMContext;
    }
    IDXGIFactory1* GetFactory() const
    {
        return DXGIFactory1;
    }
    
protected:
    // D3D工廠(接口).
    TRefCountPtr<IDXGIFactory1> DXGIFactory1;
     // D3D裝置.
    TRefCountPtr<FD3D11Device> Direct3DDevice;
    // D3D裝置的立即上下文.
    TRefCountPtr<FD3D11DeviceContext> Direct3DDeviceIMContext;

    // 線程鎖.
    FD3D11LockTracker LockTracker;
    FCriticalSection LockTrackerCS;

    // 視口.
    TArray<FD3D11Viewport*> Viewports;
    TRefCountPtr<FD3D11Viewport> DrawingViewport;

    // AMD AGS工具庫上下文.
    AGSContext* AmdAgsContext;

    // RT, UAV, 着色器等資源.
    TRefCountPtr<ID3D11RenderTargetView> CurrentRenderTargets[D3D11_SIMULTANEOUS_RENDER_TARGET_COUNT];
    TRefCountPtr<FD3D11UnorderedAccessView> CurrentUAVs[D3D11_PS_CS_UAV_REGISTER_COUNT];
    ID3D11UnorderedAccessView* UAVBound[D3D11_PS_CS_UAV_REGISTER_COUNT];
    TRefCountPtr<ID3D11DepthStencilView> CurrentDepthStencilTarget;
    TRefCountPtr<FD3D11TextureBase> CurrentDepthTexture;
    FD3D11BaseShaderResource* CurrentResourcesBoundAsSRVs[SF_NumStandardFrequencies][D3D11_COMMONSHADER_INPUT_RESOURCE_SLOT_COUNT];
    FD3D11BaseShaderResource* CurrentResourcesBoundAsVBs[D3D11_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT];
    FD3D11BaseShaderResource* CurrentResourceBoundAsIB;
    int32 MaxBoundShaderResourcesIndex[SF_NumStandardFrequencies];
    FUniformBufferRHIRef BoundUniformBuffers[SF_NumStandardFrequencies][MAX_UNIFORM_BUFFERS_PER_SHADER_STAGE];
    uint16 DirtyUniformBuffers[SF_NumStandardFrequencies];
    TArray<FRHIUniformBuffer*> GlobalUniformBuffers;

    // 已建立的常量緩沖區.
    TArray<TRefCountPtr<FD3D11ConstantBuffer> > VSConstantBuffers;
    TArray<TRefCountPtr<FD3D11ConstantBuffer> > HSConstantBuffers;
    TArray<TRefCountPtr<FD3D11ConstantBuffer> > DSConstantBuffers;
    TArray<TRefCountPtr<FD3D11ConstantBuffer> > PSConstantBuffers;
    TArray<TRefCountPtr<FD3D11ConstantBuffer> > GSConstantBuffers;
    TArray<TRefCountPtr<FD3D11ConstantBuffer> > CSConstantBuffers;

    // 已綁定的着色器狀态曆史.
    TGlobalResource< TBoundShaderStateHistory<10000> > BoundShaderStateHistory;
    FComputeShaderRHIRef CurrentComputeShader;

    (......)
};
           

它們的核心繼承UML圖如下:

class FDynamicRHI{

void* RHIGetNativeDevice()

void* RHIGetNativeInstance()

IRHICommandContext* RHIGetDefaultContext()

IRHIComputeContext* RHIGetDefaultAsyncComputeContext()

IRHICommandContextContainer* RHIGetCommandContextContainer()

FDynamicRHI <|-- FMetalDynamicRHI

class FMetalDynamicRHI{

FMetalRHIImmediateCommandContext ImmediateContext

FMetalRHICommandContext* AsyncComputeContext

FDynamicRHI <|-- FD3D12DynamicRHI

class FD3D12DynamicRHI{

static FD3D12DynamicRHI* SingleD3DRHI

FD3D12Adapter* ChosenAdapters

FD3D12Device* GetRHIDevice()

FDynamicRHI <|-- FD3D11DynamicRHI

class FD3D11DynamicRHI{

IDXGIFactory1* DXGIFactory1

FD3D11Device* Direct3DDevice

FD3D11DeviceContext* Direct3DDeviceIMContext

FDynamicRHI <|-- FOpenGLDynamicRHI

class FOpenGLDynamicRHI{

FPlatformOpenGLDevice* PlatformDevice

FDynamicRHI <|-- FValidationRHI

class FValidationRHI{

FDynamicRHI <|-- FVulkanDynamicRHI

class FVulkanDynamicRHI{

VkInstance Instance

FVulkanDevice* Devices

FDynamicRHI <|-- FEmptyDynamicRHI

FDynamicRHI <|-- FNullDynamicRHI

可點選下面圖檔放大:

剖析虛幻渲染體系(10)- RHI

需要注意的是,傳統圖形API(D3D11、OpenGL)除了繼承FDynamicRHI,還需要繼承IRHICommandContextPSOFallback,因為需要借助後者的接口處理PSO的資料和行為,以保證傳統和現代API對PSO的一緻處理行為。也正因為此,現代圖形API(D3D12、Vulkan、Metal)不需要繼承IRHICommandContext的任何繼承體系的類型,單單直接繼承FDynamicRHI就可以處理RHI層的所有資料和操作。

既然現代圖形API(D3D12、Vulkan、Metal)的DynamicRHI沒有繼承IRHICommandContext的任何繼承體系的類型,那麼它們是如何實作FDynamicRHI::RHIGetDefaultContext的接口?下面以FD3D12DynamicRHI為例:

IRHICommandContext* FD3D12DynamicRHI::RHIGetDefaultContext()
{
    FD3D12Adapter& Adapter = GetAdapter();

    IRHICommandContext* DefaultCommandContext = nullptr;    
    if (GNumExplicitGPUsForRendering > 1) // 多GPU
    {
        DefaultCommandContext = static_cast<IRHICommandContext*>(&Adapter.GetDefaultContextRedirector());
    }
    else // 單GPU
    {
        FD3D12Device* Device = Adapter.GetDevice(0);
        DefaultCommandContext = static_cast<IRHICommandContext*>(&Device->GetDefaultCommandContext());
    }

    return DefaultCommandContext;
}
           

無論是單GPU還是多GPU,都是從FD3D12CommandContext強制轉換而來,而FD3D12CommandContext又是IRHICommandContext的子子子類,是以靜态類型轉換完全沒問題。

FD3D11DynamicRHI包含或引用了若幹D3D11平台相關的核心類型,它們的定義如下所示:

// Engine\Source\Runtime\Windows\D3D11RHI\Private\D3D11RHIPrivate.h

class D3D11RHI_API FD3D11DynamicRHI : public FDynamicRHI, public IRHICommandContextPSOFallback
{
    (......)

protected:
    // D3D工廠(接口).
    TRefCountPtr<IDXGIFactory1> DXGIFactory1;
     // D3D裝置.
    TRefCountPtr<FD3D11Device> Direct3DDevice;
    // D3D裝置的立即上下文.
    TRefCountPtr<FD3D11DeviceContext> Direct3DDeviceIMContext;

    // 視口.
    TArray<FD3D11Viewport*> Viewports;
    TRefCountPtr<FD3D11Viewport> DrawingViewport;

    // AMD AGS工具庫上下文.
    AGSContext* AmdAgsContext;

    (......)
};

// Engine\Source\Runtime\Windows\D3D11RHI\Private\Windows\D3D11RHIBasePrivate.h

typedef ID3D11DeviceContext FD3D11DeviceContext;
typedef ID3D11Device FD3D11Device;

// Engine\Source\Runtime\Windows\D3D11RHI\Public\D3D11Viewport.h

class FD3D11Viewport : public FRHIViewport
{
public:
    FD3D11Viewport(class FD3D11DynamicRHI* InD3DRHI) : D3DRHI(InD3DRHI), PresentFailCount(0), ValidState (0), FrameSyncEvent(InD3DRHI);
    FD3D11Viewport(class FD3D11DynamicRHI* InD3DRHI, HWND InWindowHandle, uint32 InSizeX, uint32 InSizeY, bool bInIsFullscreen, EPixelFormat InPreferredPixelFormat);
    ~FD3D11Viewport();

    virtual void Resize(uint32 InSizeX, uint32 InSizeY, bool bInIsFullscreen, EPixelFormat PreferredPixelFormat);
    void ConditionalResetSwapChain(bool bIgnoreFocus);
    void CheckHDRMonitorStatus();

    // 呈現交換鍊.
    bool Present(bool bLockToVsync);

    // Accessors.
    FIntPoint GetSizeXY() const;
    FD3D11Texture2D* GetBackBuffer() const;
    EColorSpaceAndEOTF GetPixelColorSpace() const;

    void WaitForFrameEventCompletion();
    void IssueFrameEvent()

    IDXGISwapChain* GetSwapChain() const;
    virtual void* GetNativeSwapChain() const override;
    virtual void* GetNativeBackBufferTexture() const override;
    virtual void* GetNativeBackBufferRT() const overrid;

    virtual void SetCustomPresent(FRHICustomPresent* InCustomPresent) override
    virtual FRHICustomPresent* GetCustomPresent() const;

    virtual void* GetNativeWindow(void** AddParam = nullptr) const override;
    static FD3D11Texture2D* GetSwapChainSurface(FD3D11DynamicRHI* D3DRHI, EPixelFormat PixelFormat, uint32 SizeX, uint32 SizeY, IDXGISwapChain* SwapChain);

protected:
    // 動态RHI.
    FD3D11DynamicRHI* D3DRHI;
    // 交換鍊.
    TRefCountPtr<IDXGISwapChain> SwapChain;
    // 後渲染緩沖.
    TRefCountPtr<FD3D11Texture2D> BackBuffer;

    FD3D11EventQuery FrameSyncEvent;
    FCustomPresentRHIRef CustomPresent;

    (......)
};
           

FD3D11DynamicRHI繪制成UML圖之後如下所示:

ID3D11DeviceContext -- FD3D11DeviceContext

ID3D11Device -- FD3D11Device

IDXGIFactory1 --* FD3D11DynamicRHI

FD3D11Device --* FD3D11DynamicRHI

FD3D11DeviceContext --* FD3D11DynamicRHI

FRenderResource <|-- FViewport

FViewport <|-- FD3D11Viewport

FD3D11Viewport --o FD3D11DynamicRHI

FD3D11Viewport* Viewports

FOpenGLDynamicRHI相關的核心類型定義如下:

class OPENGLDRV_API FOpenGLDynamicRHI  final : public FDynamicRHI, public IRHICommandContextPSOFallback
{
    (......)
    
private:
    // 已建立的視口.
    TArray<FOpenGLViewport*> Viewports;
    // 底層平台相關的資料.
    FPlatformOpenGLDevice* PlatformDevice;
};

// Engine\Source\Runtime\OpenGLDrv\Public\OpenGLResources.h

class FOpenGLViewport : public FRHIViewport
{
public:
    FOpenGLViewport(class FOpenGLDynamicRHI* InOpenGLRHI,void* InWindowHandle,uint32 InSizeX,uint32 InSizeY,bool bInIsFullscreen,EPixelFormat PreferredPixelFormat);
    ~FOpenGLViewport();

    void Resize(uint32 InSizeX,uint32 InSizeY,bool bInIsFullscreen);

    // Accessors.
    FIntPoint GetSizeXY() const;
    FOpenGLTexture2D *GetBackBuffer() const;
    bool IsFullscreen( void ) const;

    void WaitForFrameEventCompletion();
    void IssueFrameEvent();
    virtual void* GetNativeWindow(void** AddParam) const override;

    struct FPlatformOpenGLContext* GetGLContext() const;
    FOpenGLDynamicRHI* GetOpenGLRHI() const;

    virtual void SetCustomPresent(FRHICustomPresent* InCustomPresent) override;
    FRHICustomPresent* GetCustomPresent() const;
    
private:
    FOpenGLDynamicRHI* OpenGLRHI;
    struct FPlatformOpenGLContext* OpenGLContext;
    uint32 SizeX;
    uint32 SizeY;
    bool bIsFullscreen;
    EPixelFormat PixelFormat;
    bool bIsValid;
    TRefCountPtr<FOpenGLTexture2D> BackBuffer;
    FOpenGLEventQuery FrameSyncEvent;
    FCustomPresentRHIRef CustomPresent;
};

// Engine\Source\Runtime\OpenGLDrv\Private\Android\AndroidOpenGL.cpp

// 安卓系統的OpenGL裝置.
struct FPlatformOpenGLDevice
{
    bool TargetDirty;

    void SetCurrentSharedContext();
    void SetCurrentRenderingContext();
    void SetupCurrentContext();
    void SetCurrentNULLContext();

    FPlatformOpenGLDevice();
    ~FPlatformOpenGLDevice();
    
    void Init();
    void LoadEXT();
    void Terminate();
    void ReInit();
};

// Engine\Source\Runtime\OpenGLDrv\Private\Windows\OpenGLWindows.cpp

// Windows系統的OpenGL裝置.
struct FPlatformOpenGLDevice
{
    FPlatformOpenGLContext    SharedContext;
    FPlatformOpenGLContext    RenderingContext;
    TArray<FPlatformOpenGLContext*>    ViewportContexts;
    bool                    TargetDirty;

    /** Guards against operating on viewport contexts from more than one thread at the same time. */
    FCriticalSection*        ContextUsageGuard;
};

// Engine\Source\Runtime\OpenGLDrv\Private\Lumin\LuminOpenGL.cpp

// Lumin系統的OpenGL裝置.
struct FPlatformOpenGLDevice
{
    void SetCurrentSharedContext();
    void SetCurrentRenderingContext();
    void SetCurrentNULLContext();

    FPlatformOpenGLDevice();
    ~FPlatformOpenGLDevice();
    
    void Init();
    void LoadEXT();
    void Terminate();
    void ReInit();
};

// Engine\Source\Runtime\OpenGLDrv\Private\Linux\OpenGLLinux.cpp

// Linux系統的OpenGL裝置.
struct FPlatformOpenGLDevice
{
    FPlatformOpenGLContext    SharedContext;
    FPlatformOpenGLContext    RenderingContext;
    int32                    NumUsedContexts;
    FCriticalSection*        ContextUsageGuard;
};

// Engine\Source\Runtime\OpenGLDrv\Private\Lumin\LuminGL4.cpp

// Lumin系統的OpenGL裝置.
struct FPlatformOpenGLDevice
{
    FPlatformOpenGLContext    SharedContext;
    FPlatformOpenGLContext    RenderingContext;
    TArray<FPlatformOpenGLContext*>    ViewportContexts;
    bool                    TargetDirty;
    FCriticalSection*        ContextUsageGuard;
};
           

以上顯示不同作業系統,OpenGL裝置對象的定義有所不同。實際上,OpenGL上下文也因作業系統而異,下面以Windows為例:

// Engine\Source\Runtime\OpenGLDrv\Private\Windows\OpenGLWindows.cpp

struct FPlatformOpenGLContext
{
    // 視窗句柄
    HWND WindowHandle;
    // 裝置上下文.
    HDC DeviceContext;
    // OpenGL上下文.
    HGLRC OpenGLContext;
    
    // 其它實際.
    bool bReleaseWindowOnDestroy;
    int32 SyncInterval;
    GLuint    ViewportFramebuffer;
    GLuint    VertexArrayObject;    // one has to be generated and set for each context (OpenGL 3.2 Core requirements)
    GLuint    BackBufferResource;
    GLenum    BackBufferTarget;
};
           

FOpenGLDynamicRHI繪制成的UML圖如下所示:

FPlatformOpenGLDevice --* FOpenGLDynamicRHI

FViewport <|-- FOpenGLViewport

FOpenGLViewport --o FOpenGLDynamicRHI

FPlatformOpenGLDevice o-- FPlatformOpenGLContext

FOpenGLViewport* Viewports

FD3D12DynamicRHI的核心類型定義如下:

// Engine\Source\Runtime\D3D12RHI\Private\D3D12RHIPrivate.h

class FD3D12DynamicRHI : public FDynamicRHI
{
    (......)
    
protected:
    // 已選擇的擴充卡.
    TArray<TSharedPtr<FD3D12Adapter>> ChosenAdapters;

    // D3D12裝置.
    inline FD3D12Device* GetRHIDevice(uint32 GPUIndex)
    {
        return GetAdapter().GetDevice(GPUIndex);
    }
    
    (......)
};

// Engine\Source\Runtime\D3D12RHI\Private\D3D12Adapter.h

class FD3D12Adapter : public FNoncopyable
{
public:
    void Initialize(FD3D12DynamicRHI* RHI);
    void InitializeDevices();
    void InitializeRayTracing();
    
    // 資源建立.
    HRESULT CreateCommittedResource(...)
    HRESULT CreateBuffer(...);
    template <typename BufferType> 
    BufferType* CreateRHIBuffer(...);

    inline FD3D12CommandContextRedirector& GetDefaultContextRedirector();
    inline FD3D12CommandContextRedirector& GetDefaultAsyncComputeContextRedirector();
    FD3D12FastConstantAllocator& GetTransientUniformBufferAllocator();

    void BlockUntilIdle();
    
    (......)

protected:
    virtual void CreateRootDevice(bool bWithDebug);

    FD3D12DynamicRHI* OwningRHI;

    // LDA設定擁有一個ID3D12Device
    TRefCountPtr<ID3D12Device> RootDevice;
    TRefCountPtr<ID3D12Device1> RootDevice1;
    
    TRefCountPtr<IDXGIAdapter> DxgiAdapter;
    
    TRefCountPtr<IDXGIFactory> DxgiFactory;
    TRefCountPtr<IDXGIFactory2> DxgiFactory2;
    
    // 每個裝置代表一個實體GPU“節點”.
    FD3D12Device* Devices[MAX_NUM_GPUS];
    
    FD3D12CommandContextRedirector DefaultContextRedirector;
    FD3D12CommandContextRedirector DefaultAsyncComputeContextRedirector;
    
    TArray<FD3D12Viewport*> Viewports;
    TRefCountPtr<FD3D12Viewport> DrawingViewport;

    (......)
};

// Engine\Source\Runtime\D3D12RHI\Private\D3D12RHICommon.h

class FD3D12AdapterChild
{
protected:
    FD3D12Adapter* ParentAdapter;

    (......)
};

class FD3D12DeviceChild
{
protected:
    FD3D12Device* Parent;
    
    (......)
};

// Engine\Source\Runtime\D3D12RHI\Private\D3D12Device.h

class FD3D12Device : public FD3D12SingleNodeGPUObject, public FNoncopyable, public FD3D12AdapterChild
{
public:
    TArray<FD3D12CommandListHandle> PendingCommandLists;
    
    void Initialize();
    void CreateCommandContexts();
    void InitPlatformSpecific();
    virtual void Cleanup();
    bool GetQueryData(FD3D12RenderQuery& Query, bool bWait);

    ID3D12Device* GetDevice();

    void BlockUntilIdle();
    bool IsGPUIdle();

    FD3D12SamplerState* CreateSampler(const FSamplerStateInitializerRHI& Initializer);

    (......)
    
protected:
    // CommandListManager
    FD3D12CommandListManager* CommandListManager;
    FD3D12CommandListManager* CopyCommandListManager;
    FD3D12CommandListManager* AsyncCommandListManager;
    FD3D12CommandAllocatorManager TextureStreamingCommandAllocatorManager;

    // Allocator
    FD3D12OfflineDescriptorManager RTVAllocator;
    FD3D12OfflineDescriptorManager DSVAllocator;
    FD3D12OfflineDescriptorManager SRVAllocator;
    FD3D12OfflineDescriptorManager UAVAllocator;
    FD3D12DefaultBufferAllocator DefaultBufferAllocator;

    // FD3D12CommandContext
    TArray<FD3D12CommandContext*> CommandContextArray;
    TArray<FD3D12CommandContext*> FreeCommandContexts;
    TArray<FD3D12CommandContext*> AsyncComputeContextArray;

    (......)
};

// Engine\Source\Runtime\D3D12RHI\Public\D3D12Viewport.h

class FD3D12Viewport : public FRHIViewport, public FD3D12AdapterChild
{
public:
    void Init();
    void Resize(uint32 InSizeX, uint32 InSizeY, bool bInIsFullscreen, EPixelFormat PreferredPixelFormat);

    void ConditionalResetSwapChain(bool bIgnoreFocus);
    bool Present(bool bLockToVsync);

    void WaitForFrameEventCompletion();
    bool CurrentOutputSupportsHDR() const;

    (......)
    
private:
    HWND WindowHandle;

#if D3D12_VIEWPORT_EXPOSES_SWAP_CHAIN
    TRefCountPtr<IDXGISwapChain1> SwapChain1;
    TRefCountPtr<IDXGISwapChain4> SwapChain4;
#endif

    TArray<TRefCountPtr<FD3D12Texture2D>> BackBuffers;
    TRefCountPtr<FD3D12Texture2D> DummyBackBuffer_RenderThread;
    uint32 CurrentBackBufferIndex_RHIThread;
    FD3D12Texture2D* BackBuffer_RHIThread;
    TArray<TRefCountPtr<FD3D12Texture2D>> SDRBackBuffers;
    TRefCountPtr<FD3D12Texture2D> SDRDummyBackBuffer_RenderThread;
    FD3D12Texture2D* SDRBackBuffer_RHIThread;

    bool CheckHDRSupport();
    void EnableHDR();
    void ShutdownHDR();
    
    (......)
};

// Engine\Source\Runtime\D3D12RHI\Private\D3D12CommandContext.h

class FD3D12CommandContextBase : public IRHICommandContext, public FD3D12AdapterChild
{
public:
    FD3D12CommandContextBase(class FD3D12Adapter* InParent, FRHIGPUMask InGPUMask, bool InIsDefaultContext, bool InIsAsyncComputeContext);

    void RHIBeginDrawingViewport(FRHIViewport* Viewport, FRHITexture* RenderTargetRHI) final override;
    void RHIEndDrawingViewport(FRHIViewport* Viewport, bool bPresent, bool bLockToVsync) final override;
    void RHIBeginFrame() final override;
    void RHIEndFrame() final override;

    (......)

protected:
    virtual FD3D12CommandContext* GetContext(uint32 InGPUIndex) = 0;

    FRHIGPUMask GPUMask;
    
    (......)
};

class FD3D12CommandContext : public FD3D12CommandContextBase, public FD3D12DeviceChild
{
public:
    FD3D12CommandContext(class FD3D12Device* InParent, bool InIsDefaultContext, bool InIsAsyncComputeContext);
    virtual ~FD3D12CommandContext();

    void EndFrame();
    void ConditionalObtainCommandAllocator();
    void ReleaseCommandAllocator();

    FD3D12CommandListManager& GetCommandListManager();
    void OpenCommandList();
    void CloseCommandList();

    FD3D12CommandListHandle FlushCommands(bool WaitForCompletion = false, EFlushCommandsExtraAction ExtraAction = FCEA_None);
    void Finish(TArray<FD3D12CommandListHandle>& CommandLists);

    FD3D12FastConstantAllocator ConstantsAllocator;
    FD3D12CommandListHandle CommandListHandle;
    FD3D12CommandAllocator* CommandAllocator;
    FD3D12CommandAllocatorManager CommandAllocatorManager;

    FD3D12DynamicRHI& OwningRHI;

    // State Block.
    FD3D12RenderTargetView* CurrentRenderTargets[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT];
    FD3D12DepthStencilView* CurrentDepthStencilTarget;
    FD3D12TextureBase* CurrentDepthTexture;
    uint32 NumSimultaneousRenderTargets;

    // Uniform Buffer.
    FD3D12UniformBuffer* BoundUniformBuffers[SF_NumStandardFrequencies][MAX_CBS];
    FUniformBufferRHIRef BoundUniformBufferRefs[SF_NumStandardFrequencies][MAX_CBS];
    uint16 DirtyUniformBuffers[SF_NumStandardFrequencies];

    // 常量緩沖區.
    FD3D12ConstantBuffer VSConstantBuffer;
    FD3D12ConstantBuffer HSConstantBuffer;
    FD3D12ConstantBuffer DSConstantBuffer;
    FD3D12ConstantBuffer PSConstantBuffer;
    FD3D12ConstantBuffer GSConstantBuffer;
    FD3D12ConstantBuffer CSConstantBuffer;

    template <class ShaderType> void SetResourcesFromTables(const ShaderType* RESTRICT);
    template <class ShaderType> uint32 SetUAVPSResourcesFromTables(const ShaderType* RESTRICT Shader);
    void CommitGraphicsResourceTables();
    void CommitComputeResourceTables(FD3D12ComputeShader* ComputeShader);
    void ValidateExclusiveDepthStencilAccess(FExclusiveDepthStencil Src) const;
    void CommitRenderTargetsAndUAVs();

    virtual void SetDepthBounds(float MinDepth, float MaxDepth);
    virtual void SetShadingRate(EVRSShadingRate ShadingRate, EVRSRateCombiner Combiner);

    (......)

protected:
    FD3D12CommandContext* GetContext(uint32 InGPUIndex) final override;
    TArray<FRHIUniformBuffer*> GlobalUniformBuffers;
};

class FD3D12CommandContextRedirector final : public FD3D12CommandContextBase
{
public:
    FD3D12CommandContextRedirector(class FD3D12Adapter* InParent, bool InIsDefaultContext, bool InIsAsyncComputeContext);

    virtual void RHISetComputeShader(FRHIComputeShader* ComputeShader) final override;
    virtual void RHISetComputePipelineState(FRHIComputePipelineState* ComputePipelineState) final override;
    virtual void RHIDispatchComputeShader(uint32 ThreadGroupCountX, uint32 ThreadGroupCountY, uint32 ThreadGroupCountZ) final override;
    
    (......)
    
private:
    FRHIGPUMask PhysicalGPUMask;
    FD3D12CommandContext* PhysicalContexts[MAX_NUM_GPUS];
};

// Engine\Source\Runtime\D3D12RHI\Private\D3D12CommandContext.cpp

class FD3D12CommandContextContainer : public IRHICommandContextContainer
{
    FD3D12Adapter* Adapter;
    FD3D12CommandContext* CmdContext;
    FD3D12CommandContextRedirector* CmdContextRedirector;
    FRHIGPUMask GPUMask;
    TArray<FD3D12CommandListHandle> CommandLists;

    (......)
};
           

以上可知,D3D12涉及的核心類型非常多,涉及多層級的複雜的資料結構鍊,其記憶體布局如下所示:

[Engine]--
        |
        |-[RHI]--
                |
                |-[Adapter]-- (LDA)
                |            |
                |            |- [Device]
                |            |
                |            |- [Device]
                |
                |-[Adapter]--
                            |
                            |- [Device]--
                                        |
                                        |-[CommandContext]
                                        |
                                        |-[CommandContext]---
                                                            |
                                                            |-[StateCache]
           

在這種方案下,FD3D12Device表示1個節點,屬于1個實體擴充卡。這種結構允許一個RHI控制幾個不同類型的硬體設定,例如:

  • 單GPU系統(正常案例)。
  • 多GPU系統,如LDA(Crossfire/SLI)。
  • 非對稱多GPU系統,如分離、內建GPU協作系統。

将D3D12的核心類抽象成UML圖之後,如下所示:

FD3D12DynamicRHI o-- FD3D12Adapter

FNoncopyable <|-- FD3D12Adapter

ID3D12Device --* FD3D12Adapter

IDXGIAdapter --* FD3D12Adapter

IDXGIFactory --* FD3D12Adapter

FD3D12Device --o FD3D12Adapter

FD3D12Viewport --o FD3D12Adapter

FNoncopyable <|-- FD3D12Device

FD3D12AdapterChild <|-- FD3D12Device

FD3D12CommandListManager --o FD3D12Device

FD3D12CommandContext --o FD3D12Device

FRHIViewport <|-- FD3D12Viewport

FD3D12AdapterChild <|-- FD3D12Viewport

FD3D12AdapterChild <|-- FD3D12CommandContextBase

FD3D12CommandContextBase <|-- FD3D12CommandContext

FD3D12DeviceChild <|-- FD3D12CommandContext

FD3D12CommandContextBase <|-- FD3D12CommandContextRedirector

FD3D12CommandContext --o FD3D12CommandContextRedirector

FD3D12Adapter <-- FD3D12CommandContextContainer

FD3D12CommandContext <-- FD3D12CommandContextContainer

FD3D12CommandContextRedirector <-- FD3D12CommandContextContainer

看不清可以點選下面圖檔版本:

剖析虛幻渲染體系(10)- RHI

FVulkanDynamicRHI涉及的核心類如下:

// Engine\Source\Runtime\VulkanRHI\Public\VulkanDynamicRHI.h

class FVulkanDynamicRHI : public FDynamicRHI
{
public:
    // FDynamicRHI interface.
    virtual void Init() final override;
    virtual void PostInit() final override;
    virtual void Shutdown() final override;;
    void InitInstance();

    (......)
    
protected:
    // 執行個體.
    VkInstance Instance;
    
    // 裝置.
    TArray<FVulkanDevice*> Devices;
    FVulkanDevice* Device;

    // 視口.
    TArray<FVulkanViewport*> Viewports;
    
    (......)
};

// Engine\Source\Runtime\VulkanRHI\Private\VulkanDevice.h

class FVulkanDevice
{
public:
    FVulkanDevice(FVulkanDynamicRHI* InRHI, VkPhysicalDevice Gpu);
    ~FVulkanDevice();

    bool QueryGPU(int32 DeviceIndex);
    void InitGPU(int32 DeviceIndex);
    void CreateDevice();
    void PrepareForDestroy();
    void Destroy();

    void WaitUntilIdle();
    void PrepareForCPURead();
    void SubmitCommandsAndFlushGPU();

    (......)
    
private:
    void SubmitCommands(FVulkanCommandListContext* Context);

    // vk裝置.
    VkDevice Device;
    // vk實體裝置.
    VkPhysicalDevice Gpu;
    
    VkPhysicalDeviceProperties GpuProps;
    VkPhysicalDeviceFeatures PhysicalFeatures;

    // 管理器.
    VulkanRHI::FDeviceMemoryManager DeviceMemoryManager;
    VulkanRHI::FMemoryManager MemoryManager;
    VulkanRHI::FDeferredDeletionQueue2 DeferredDeletionQueue;
    VulkanRHI::FStagingManager StagingManager;
    VulkanRHI::FFenceManager FenceManager;
    FVulkanDescriptorPoolsManager* DescriptorPoolsManager = nullptr;
    
    FVulkanDescriptorSetCache* DescriptorSetCache = nullptr;
    FVulkanShaderFactory ShaderFactory;

    // 隊列.
    FVulkanQueue* GfxQueue;
    FVulkanQueue* ComputeQueue;
    FVulkanQueue* TransferQueue;
    FVulkanQueue* PresentQueue;

    // GPU品牌.
    EGpuVendorId VendorId = EGpuVendorId::NotQueried;

    // 指令隊列上下文.
    FVulkanCommandListContextImmediate* ImmediateContext;
    FVulkanCommandListContext* ComputeContext;
    TArray<FVulkanCommandListContext*> CommandContexts;

    FVulkanDynamicRHI* RHI = nullptr;
    class FVulkanPipelineStateCacheManager* PipelineStateCache;
    
    (......)
};

// Engine\Source\Runtime\VulkanRHI\Private\VulkanQueue.h

class FVulkanQueue
{
public:
    FVulkanQueue(FVulkanDevice* InDevice, uint32 InFamilyIndex);
    ~FVulkanQueue();

    void Submit(FVulkanCmdBuffer* CmdBuffer, uint32 NumSignalSemaphores = 0, VkSemaphore* SignalSemaphores = nullptr);
    void Submit(FVulkanCmdBuffer* CmdBuffer, VkSemaphore SignalSemaphore);

    void GetLastSubmittedInfo(FVulkanCmdBuffer*& OutCmdBuffer, uint64& OutFenceCounter) const;

    (......)
    
private:
    // vk隊列
    VkQueue Queue;
    // 家族索引.
    uint32 FamilyIndex;
    // 隊列索引.
    uint32 QueueIndex;
    FVulkanDevice* Device;

    // vk指令緩沖.
    FVulkanCmdBuffer* LastSubmittedCmdBuffer;
    uint64 LastSubmittedCmdBufferFenceCounter;
    uint64 SubmitCounter;
    mutable FCriticalSection CS;

    void UpdateLastSubmittedCommandBuffer(FVulkanCmdBuffer* CmdBuffer);
};

// Engine\Source\Runtime\VulkanRHI\Public\VulkanMemory.h

// 裝置子節點.
class FDeviceChild
{
public:
    FDeviceChild(FVulkanDevice* InDevice = nullptr);
    
    (......)
    
 protected:
    FVulkanDevice* Device;
};

// Engine\Source\Runtime\VulkanRHI\Private\VulkanContext.h

class FVulkanCommandListContext : public IRHICommandContext
{
public:
    FVulkanCommandListContext(FVulkanDynamicRHI* InRHI, FVulkanDevice* InDevice, FVulkanQueue* InQueue, FVulkanCommandListContext* InImmediate);
    virtual ~FVulkanCommandListContext();

    static inline FVulkanCommandListContext& GetVulkanContext(IRHICommandContext& CmdContext);

    inline bool IsImmediate() const;

    virtual void RHISetStreamSource(uint32 StreamIndex, FRHIVertexBuffer* VertexBuffer, uint32 Offset) final override;
    virtual void RHISetViewport(float MinX, float MinY, float MinZ, float MaxX, float MaxY, float MaxZ) final override;
    virtual void RHISetScissorRect(bool bEnable, uint32 MinX, uint32 MinY, uint32 MaxX, uint32 MaxY) final override;
    
    (......)

    inline FVulkanDevice* GetDevice() const;
    void PrepareParallelFromBase(const FVulkanCommandListContext& BaseContext);

protected:
    FVulkanDynamicRHI* RHI;
    FVulkanCommandListContext* Immediate;
    FVulkanDevice* Device;
    FVulkanQueue* Queue;
    
    FVulkanUniformBufferUploader* UniformBufferUploader;
    FVulkanCommandBufferManager* CommandBufferManager;
    static FVulkanLayoutManager LayoutManager;

private:
    FVulkanGPUProfiler GpuProfiler;
    TArray<FRHIUniformBuffer*> GlobalUniformBuffers;
    
    (......)
};

// 立即模式的指令隊列上下文.
class FVulkanCommandListContextImmediate : public FVulkanCommandListContext
{
public:
    FVulkanCommandListContextImmediate(FVulkanDynamicRHI* InRHI, FVulkanDevice* InDevice, FVulkanQueue* InQueue);
};

// 指令上下文容器.
struct FVulkanCommandContextContainer : public IRHICommandContextContainer, public VulkanRHI::FDeviceChild
{
    FVulkanCommandListContext* CmdContext;

    FVulkanCommandContextContainer(FVulkanDevice* InDevice);

    virtual IRHICommandContext* GetContext() override final;
    virtual void FinishContext() override final;
    virtual void SubmitAndFreeContextContainer(int32 Index, int32 Num) override final;
    
    void* operator new(size_t Size);
    void operator delete(void* RawMemory);
    
    (......)
};

// Engine\Source\Runtime\VulkanRHI\Private\VulkanViewport.h

class FVulkanViewport : public FRHIViewport, public VulkanRHI::FDeviceChild
{
public:
    FVulkanViewport(FVulkanDynamicRHI* InRHI, FVulkanDevice* InDevice, void* InWindowHandle, uint32 InSizeX,uint32 InSizeY,bool bInIsFullscreen, EPixelFormat InPreferredPixelFormat);
    ~FVulkanViewport();

    void AdvanceBackBufferFrame(FRHICommandListImmediate& RHICmdList);
    void WaitForFrameEventCompletion();

    virtual void SetCustomPresent(FRHICustomPresent* InCustomPresent) override final;
    virtual FRHICustomPresent* GetCustomPresent() const override final;
    virtual void Tick(float DeltaTime) override final;
    bool Present(FVulkanCommandListContext* Context, FVulkanCmdBuffer* CmdBuffer, FVulkanQueue* Queue, FVulkanQueue* PresentQueue, bool bLockToVsync);

    (......)
    
protected:
    TArray<VkImage, TInlineAllocator<NUM_BUFFERS*2>> BackBufferImages;
    TArray<VulkanRHI::FSemaphore*, TInlineAllocator<NUM_BUFFERS*2>> RenderingDoneSemaphores;
    TArray<FVulkanTextureView, TInlineAllocator<NUM_BUFFERS*2>> TextureViews;
    TRefCountPtr<FVulkanBackBuffer> RHIBackBuffer;
    TRefCountPtr<FVulkanTexture2D>    RenderingBackBuffer;
    
    /** narrow-scoped section that locks access to back buffer during its recreation*/
    FCriticalSection RecreatingSwapchain;

    FVulkanDynamicRHI* RHI;
    FVulkanSwapChain* SwapChain;
    void* WindowHandle;
    VulkanRHI::FSemaphore* AcquiredSemaphore;
    FCustomPresentRHIRef CustomPresent;
    FVulkanCmdBuffer* LastFrameCommandBuffer = nullptr;
    
    (......)
};
           

若将Vulkan RHI的核心類型繪制成UML圖,則是如下圖所示:

VkInstance --* FVulkanDynamicRHI

FVulkanDevice --o FVulkanDynamicRHI

FVulkanViewport --o FVulkanDynamicRHI

FRHIResource <|-- FRHIViewport

FRHIViewport <|-- FVulkanViewport

FDeviceChild <|-- FVulkanViewport

VkDevice --* FVulkanDevice

VkPhysicalDevice --* FVulkanDevice

FVulkanQueue --o FVulkanDevice

FVulkanCommandListContext --o FVulkanDevice

FVulkanCommandListContextImmediate --* FVulkanDevice

VkQueue --* FVulkanQueue

FVulkanCommandListContext <|-- FVulkanCommandListContextImmediate

FDeviceChild <|-- FVulkanCommandContextContainer

FVulkanCommandListContext <-- FVulkanCommandContextContainer

FMetalDynamicRHI的核心類型定義如下:

// Engine\Source\Runtime\Apple\MetalRHI\Private\MetalDynamicRHI.h

class FMetalDynamicRHI : public FDynamicRHI
{
public:
    // FDynamicRHI interface.
    virtual void Init();
    virtual void Shutdown() {}
    
    (......)
    
private:
    // 立即模式上下文.
    FMetalRHIImmediateCommandContext ImmediateContext;
    // 異步計算上下文.
    FMetalRHICommandContext* AsyncComputeContext;
    
    (......)
};

// Engine\Source\Runtime\Apple\MetalRHI\Public\MetalRHIContext.h

class FMetalRHICommandContext : public IRHICommandContext
{
public:
    FMetalRHICommandContext(class FMetalProfiler* InProfiler, FMetalContext* WrapContext);
    virtual ~FMetalRHICommandContext();

    virtual void RHISetComputeShader(FRHIComputeShader* ComputeShader) override;
    virtual void RHISetComputePipelineState(FRHIComputePipelineState* ComputePipelineState) override;
    virtual void RHIDispatchComputeShader(uint32 ThreadGroupCountX, uint32 ThreadGroupCountY, uint32 ThreadGroupCountZ) final override;
    
    (......)

protected:
    // Metal上下文.
    FMetalContext* Context;
    
    TSharedPtr<FMetalCommandBufferFence, ESPMode::ThreadSafe> CommandBufferFence;
    class FMetalProfiler* Profiler;
    FMetalBuffer PendingVertexBuffer;

    TArray<FRHIUniformBuffer*> GlobalUniformBuffers;

    (......)
};

class FMetalRHIComputeContext : public FMetalRHICommandContext
{
public:
    FMetalRHIComputeContext(class FMetalProfiler* InProfiler, FMetalContext* WrapContext);
    virtual ~FMetalRHIComputeContext();
    
    virtual void RHISetAsyncComputeBudget(EAsyncComputeBudget Budget) final override;
    virtual void RHISetComputeShader(FRHIComputeShader* ComputeShader) final override;
    virtual void RHISetComputePipelineState(FRHIComputePipelineState* ComputePipelineState) final override;
    virtual void RHISubmitCommandsHint() final override;
};

class FMetalRHIImmediateCommandContext : public FMetalRHICommandContext
{
public:
    FMetalRHIImmediateCommandContext(class FMetalProfiler* InProfiler, FMetalContext* WrapContext);

    // FRHICommandContext API accessible only on the immediate device context
    virtual void RHIBeginDrawingViewport(FRHIViewport* Viewport, FRHITexture* RenderTargetRHI) final override;
    virtual void RHIEndDrawingViewport(FRHIViewport* Viewport, bool bPresent, bool bLockToVsync) final override;
    
    (......)
};

// Engine\Source\Runtime\Apple\MetalRHI\Private\MetalContext.h

// 上下文.
class FMetalContext
{
public:
    FMetalContext(mtlpp::Device InDevice, FMetalCommandQueue& Queue, bool const bIsImmediate);
    virtual ~FMetalContext();
    
    mtlpp::Device& GetDevice();
    
    bool PrepareToDraw(uint32 PrimitiveType, EMetalIndexType IndexType = EMetalIndexType_None);
    void SetRenderPassInfo(const FRHIRenderPassInfo& RenderTargetsInfo, bool const bRestart = false);

    void SubmitCommandsHint(uint32 const bFlags = EMetalSubmitFlagsCreateCommandBuffer);
    void SubmitCommandBufferAndWait();
    void ResetRenderCommandEncoder();
    
    void DrawPrimitive(uint32 PrimitiveType, uint32 BaseVertexIndex, uint32 NumPrimitives, uint32 NumInstances);
    void DrawPrimitiveIndirect(uint32 PrimitiveType, FMetalVertexBuffer* VertexBuffer, uint32 ArgumentOffset);
    void DrawIndexedPrimitive(FMetalBuffer const& IndexBuffer, ...);
    void DrawIndexedIndirect(FMetalIndexBuffer* IndexBufferRHI, ...);
    void DrawIndexedPrimitiveIndirect(uint32 PrimitiveType, ...);
    void DrawPatches(uint32 PrimitiveType, ...);
    
    (......)

protected:
    // Metal底層裝置.
    mtlpp::Device Device;
    
    FMetalCommandQueue& CommandQueue;
    FMetalCommandList CommandList;
    
    FMetalStateCache StateCache;
    FMetalRenderPass RenderPass;
    
    dispatch_semaphore_t CommandBufferSemaphore;
    TSharedPtr<FMetalQueryBufferPool, ESPMode::ThreadSafe> QueryBuffer;
    TRefCountPtr<FMetalFence> StartFence;
    TRefCountPtr<FMetalFence> EndFence;
    
    int32 NumParallelContextsInPass;
    
    (......)
};

// Engine\Source\Runtime\Apple\MetalRHI\Private\MetalCommandQueue.h

class FMetalCommandQueue
{
public:
    FMetalCommandQueue(mtlpp::Device Device, uint32 const MaxNumCommandBuffers = 0);
    ~FMetalCommandQueue(void);
    
    mtlpp::CommandBuffer CreateCommandBuffer(void);
    void CommitCommandBuffer(mtlpp::CommandBuffer& CommandBuffer);
    void SubmitCommandBuffers(TArray<mtlpp::CommandBuffer> BufferList, uint32 Index, uint32 Count);
    FMetalFence* CreateFence(ns::String const& Label) const;
    void GetCommittedCommandBufferFences(TArray<mtlpp::CommandBufferFence>& Fences);
    
    mtlpp::Device& GetDevice(void);
    
    static mtlpp::ResourceOptions GetCompatibleResourceOptions(mtlpp::ResourceOptions Options);
    static inline bool SupportsFeature(EMetalFeatures InFeature);
    static inline bool SupportsSeparateMSAAAndResolveTarget();
    
    (......)

private:
    // 裝置.
    mtlpp::Device Device;
    // 指令隊列.
    mtlpp::CommandQueue CommandQueue;
    // 指令緩存區清單.(注意是數組的數組)
    TArray<TArray<mtlpp::CommandBuffer>> CommandBuffers;
    
    TLockFreePointerListLIFO<mtlpp::CommandBufferFence> CommandBufferFences;
    uint64 ParallelCommandLists;
};

// Engine\Source\Runtime\Apple\MetalRHI\Private\MetalCommandList.h

class FMetalCommandList
{
public:
    FMetalCommandList(FMetalCommandQueue& InCommandQueue, bool const bInImmediate);
    ~FMetalCommandList(void);
    
    void Commit(mtlpp::CommandBuffer& Buffer, TArray<ns::Object<mtlpp::CommandBufferHandler>> CompletionHandlers, bool const bWait, bool const bIsLastCommandBuffer);
    void Submit(uint32 Index, uint32 Count);
    
    bool IsImmediate(void) const;
    bool IsParallel(void) const;
    void SetParallelIndex(uint32 Index, uint32 Num);
    uint32 GetParallelIndex(void) const;
    uint32 GetParallelNum(void) const;

    (......)
    
private:
    // 所屬的FMetalCommandQueue.
    FMetalCommandQueue& CommandQueue;
    // 已送出的指令緩沖清單.
    TArray<mtlpp::CommandBuffer> SubmittedBuffers;
};
           

相比其它現代圖形API而言,FMetalDynamicRHI的概念和接口都簡介多了。其UML圖如下:

FMetalRHIImmediateCommandContext --* FMetalDynamicRHI

FMetalRHICommandContext --* FMetalDynamicRHI

FMetalRHICommandContext <|-- FMetalRHIComputeContext

FMetalRHIComputeContext <|-- FMetalRHIImmediateCommandContext

FMetalContext --* FMetalRHICommandContext

mtlpp_Device --* FMetalContext

FMetalCommandQueue --* FMetalContext

FMetalCommandList --* FMetalContext

mtlpp_CommandQueue --* FMetalCommandQueue

mtlpp_CommandBuffer --o TArray_CommandBuffer

TArray_CommandBuffer --o FMetalCommandQueue

mtlpp_CommandBuffer --o FMetalCommandList

10.2和10.3章節詳細闡述了RHI體系下的基礎概念和繼承體系,包含渲染層的資源、RHI層的資源、指令、上下文和動态RHI。還詳細闡述了各個主流圖形API下的具體實作和RHI抽象層的關聯。

若抛開圖形API的具體實作細節和衆多的RHI具體子類,将RHI Context/CommandList/Command/Resource等的頂層概念彙總成UML關系圖,則是如下模樣:

FRHIResource <-- FRHICommand

FRHICommandBase <-- FRHICommandListBase

IRHIComputeContext <-- FRHICommandListBase

IRHICommandContext <-- IRHICommandContextContainer

下圖是在上面的基礎上細化了子類的UML:

FRHIResource <|-- FRHIShader

FRHIResource <|-- FRHIVertexBuffer

FRHICommand <|-- FRHICommandSetShaderParameter

若看不清,可點選下圖放大:

剖析虛幻渲染體系(10)- RHI

本章将講述RHI體系設計的運作機制和原理。

FRHICommandListExecutor負責将Renderer層的RHI中間指令轉譯(或直接調用)到目标平台的圖形API,它在RHI體系中起着舉足輕重的作用,定義如下:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

class RHI_API FRHICommandListExecutor
{
public:
    enum
    {
        DefaultBypass = PLATFORM_RHITHREAD_DEFAULT_BYPASS
    };
    FRHICommandListExecutor()
        : bLatchedBypass(!!DefaultBypass)
        , bLatchedUseParallelAlgorithms(false)
    {
    }
    
    // 靜态接口, 擷取立即指令清單.
    static inline FRHICommandListImmediate& GetImmediateCommandList();
    // 靜态接口, 擷取立即異步計算指令清單.
    static inline FRHIAsyncComputeCommandListImmediate& GetImmediateAsyncComputeCommandList();

    // 執行指令清單.
    void ExecuteList(FRHICommandListBase& CmdList);
    void ExecuteList(FRHICommandListImmediate& CmdList);
    void LatchBypass();

    // 等待RHI線程栅欄.
    static void WaitOnRHIThreadFence(FGraphEventRef& Fence);

    // 是否繞過指令生成模式, 如果是, 則直接調用目标平台的圖形API.
    FORCEINLINE_DEBUGGABLE bool Bypass()
    {
#if CAN_TOGGLE_COMMAND_LIST_BYPASS
        return bLatchedBypass;
#else
        return !!DefaultBypass;
#endif
    }
    // 是否使用并行算法.
    FORCEINLINE_DEBUGGABLE bool UseParallelAlgorithms()
    {
#if CAN_TOGGLE_COMMAND_LIST_BYPASS
        return bLatchedUseParallelAlgorithms;
#else
        return  FApp::ShouldUseThreadingForPerformance() && !Bypass() && (GSupportsParallelRenderingTasksWithSeparateRHIThread || !IsRunningRHIInSeparateThread());
#endif
    }
    static void CheckNoOutstandingCmdLists();
    static bool IsRHIThreadActive();
    static bool IsRHIThreadCompletelyFlushed();

private:
    // 内部執行.
    void ExecuteInner(FRHICommandListBase& CmdList);
    // 内部執行, 真正執行轉譯.
    static void ExecuteInner_DoExecute(FRHICommandListBase& CmdList);

    bool bLatchedBypass;
    bool bLatchedUseParallelAlgorithms;
    
    // 同步變量.
    FThreadSafeCounter UIDCounter;
    FThreadSafeCounter OutstandingCmdListCount;
    
    // 立即模式的指令隊列.
    FRHICommandListImmediate CommandListImmediate;
    // 立即模式的異步計算指令隊列.
    FRHIAsyncComputeCommandListImmediate AsyncComputeCmdListImmediate;
};
           

下面是FRHICommandListExecutor部分重要接口的實作代碼:

// Engine\Source\Runtime\RHI\Private\RHICommandList.cpp

// 檢測RHI線程是否激活狀态.
bool FRHICommandListExecutor::IsRHIThreadActive()
{
    // 是否異步送出.
    bool bAsyncSubmit = CVarRHICmdAsyncRHIThreadDispatch.GetValueOnRenderThread() > 0;
    // 1. 先檢測是否存在未完成的子指令清單送出任務.
    if (bAsyncSubmit)
    {
        if (RenderThreadSublistDispatchTask.GetReference() && RenderThreadSublistDispatchTask->IsComplete())
        {
            RenderThreadSublistDispatchTask = nullptr;
        }
        if (RenderThreadSublistDispatchTask.GetReference())
        {
            return true; // it might become active at any time
        }
        // otherwise we can safely look at RHIThreadTask
    }

    // 2. 再檢測是否存在未完成的RHI線程任務.
    if (RHIThreadTask.GetReference() && RHIThreadTask->IsComplete())
    {
        RHIThreadTask = nullptr;
        PrevRHIThreadTask = nullptr;
    }
    return !!RHIThreadTask.GetReference();
}

// 檢測RHI線程是否完全重新整理了資料.
bool FRHICommandListExecutor::IsRHIThreadCompletelyFlushed()
{
    if (IsRHIThreadActive() || GetImmediateCommandList().HasCommands())
    {
        return false;
    }
    if (RenderThreadSublistDispatchTask.GetReference() && RenderThreadSublistDispatchTask->IsComplete())
    {
#if NEEDS_DEBUG_INFO_ON_PRESENT_HANG
        bRenderThreadSublistDispatchTaskClearedOnRT = IsInActualRenderingThread();
        bRenderThreadSublistDispatchTaskClearedOnGT = IsInGameThread();
#endif
        RenderThreadSublistDispatchTask = nullptr;
    }
    return !RenderThreadSublistDispatchTask;
}

void FRHICommandListExecutor::ExecuteList(FRHICommandListImmediate& CmdList)
{
    {
        SCOPE_CYCLE_COUNTER(STAT_ImmedCmdListExecuteTime);
        ExecuteInner(CmdList);
    }
}

void FRHICommandListExecutor::ExecuteList(FRHICommandListBase& CmdList)
{
    // 執行指令隊列轉換之前先重新整理已有的指令.
    if (IsInRenderingThread() && !GetImmediateCommandList().IsExecuting())
    {
        GetImmediateCommandList().ImmediateFlush(EImmediateFlushType::DispatchToRHIThread);
    }

    // 内部執行.
    ExecuteInner(CmdList);
}

void FRHICommandListExecutor::ExecuteInner(FRHICommandListBase& CmdList)
{
    // 是否在渲染線程中.
    bool bIsInRenderingThread = IsInRenderingThread();
    // 是否在遊戲線程中.
    bool bIsInGameThread = IsInGameThread();
    
    // 開啟了專用的RHI線程.
    if (IsRunningRHIInSeparateThread())
    {
        bool bAsyncSubmit = false;
        ENamedThreads::Type RenderThread_Local = ENamedThreads::GetRenderThread_Local();
        if (bIsInRenderingThread)
        {
            if (!bIsInGameThread && !FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
            {
                // 把所有需要傳遞的東西都處理掉.
                FTaskGraphInterface::Get().ProcessThreadUntilIdle(RenderThread_Local);
            }
            // 檢測子指令清單任務是否完成.
            bAsyncSubmit = CVarRHICmdAsyncRHIThreadDispatch.GetValueOnRenderThread() > 0;
            if (RenderThreadSublistDispatchTask.GetReference() && RenderThreadSublistDispatchTask->IsComplete())
            {
                RenderThreadSublistDispatchTask = nullptr;
                if (bAsyncSubmit && RHIThreadTask.GetReference() && RHIThreadTask->IsComplete())
                {
                    RHIThreadTask = nullptr;
                    PrevRHIThreadTask = nullptr;
                }
            }
            // 檢測RHI線程任務是否完成.
            if (!bAsyncSubmit && RHIThreadTask.GetReference() && RHIThreadTask->IsComplete())
            {
                RHIThreadTask = nullptr;
                PrevRHIThreadTask = nullptr;
            }
        }
        
        if (CVarRHICmdUseThread.GetValueOnRenderThread() > 0 && bIsInRenderingThread && !bIsInGameThread)
        {
             // 交換前序和RT線程任務的清單.
            FRHICommandList* SwapCmdList;
            FGraphEventArray Prereq;
            Exchange(Prereq, CmdList.RTTasks); 
            {
                QUICK_SCOPE_CYCLE_COUNTER(STAT_FRHICommandListExecutor_SwapCmdLists);
                SwapCmdList = new FRHICommandList(CmdList.GetGPUMask());

                static_assert(sizeof(FRHICommandList) == sizeof(FRHICommandListImmediate), "We are memswapping FRHICommandList and FRHICommandListImmediate; they need to be swappable.");
                SwapCmdList->ExchangeCmdList(CmdList);
                CmdList.CopyContext(*SwapCmdList);
                CmdList.GPUMask = SwapCmdList->GPUMask;
                CmdList.InitialGPUMask = SwapCmdList->GPUMask;
                CmdList.PSOContext = SwapCmdList->PSOContext;
                CmdList.Data.bInsideRenderPass = SwapCmdList->Data.bInsideRenderPass;
                CmdList.Data.bInsideComputePass = SwapCmdList->Data.bInsideComputePass;
            }
            
            // 送出任務.
            QUICK_SCOPE_CYCLE_COUNTER(STAT_FRHICommandListExecutor_SubmitTasks);

            // 建立FDispatchRHIThreadTask, 并将AllOutstandingTasks和RenderThreadSublistDispatchTask作為它的前序任務.
            if (AllOutstandingTasks.Num() || RenderThreadSublistDispatchTask.GetReference())
            {
                Prereq.Append(AllOutstandingTasks);
                AllOutstandingTasks.Reset();
                if (RenderThreadSublistDispatchTask.GetReference())
                {
                    Prereq.Add(RenderThreadSublistDispatchTask);
                }
                RenderThreadSublistDispatchTask = TGraphTask<FDispatchRHIThreadTask>::CreateTask(&Prereq, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(SwapCmdList, bAsyncSubmit);
            }
            // 建立FExecuteRHIThreadTask, 并将RHIThreadTask作為它的前序任務.
            else
            {
                if (RHIThreadTask.GetReference())
                {
                    Prereq.Add(RHIThreadTask);
                }
                PrevRHIThreadTask = RHIThreadTask;
                RHIThreadTask = TGraphTask<FExecuteRHIThreadTask>::CreateTask(&Prereq, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(SwapCmdList);
            }
            
            if (CVarRHICmdForceRHIFlush.GetValueOnRenderThread() > 0 )
            {
                // 檢測渲染線程是否死鎖.
                if (FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
                {
                    // this is a deadlock. RT tasks must be done by now or they won't be done. We could add a third queue...
                    UE_LOG(LogRHI, Fatal, TEXT("Deadlock in FRHICommandListExecutor::ExecuteInner 2."));
                }
                
                // 檢測RenderThreadSublistDispatchTask是否完成.
                if (RenderThreadSublistDispatchTask.GetReference())
                {
                    FTaskGraphInterface::Get().WaitUntilTaskCompletes(RenderThreadSublistDispatchTask, RenderThread_Local);
                    RenderThreadSublistDispatchTask = nullptr;
                }
                
                // 等待RHIThreadTask完成.
                while (RHIThreadTask.GetReference())
                {
                    FTaskGraphInterface::Get().WaitUntilTaskCompletes(RHIThreadTask, RenderThread_Local);
                    if (RHIThreadTask.GetReference() && RHIThreadTask->IsComplete())
                    {
                        RHIThreadTask = nullptr;
                        PrevRHIThreadTask = nullptr;
                    }
                }
            }
            
            return;
        }
        
        // 執行RTTasks/RenderThreadSublistDispatchTask/RHIThreadTask等任務.
        if (bIsInRenderingThread)
        {
            if (CmdList.RTTasks.Num())
            {
                if (FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
                {
                    UE_LOG(LogRHI, Fatal, TEXT("Deadlock in FRHICommandListExecutor::ExecuteInner (RTTasks)."));
                }
                FTaskGraphInterface::Get().WaitUntilTasksComplete(CmdList.RTTasks, RenderThread_Local);
                CmdList.RTTasks.Reset();

            }
            if (RenderThreadSublistDispatchTask.GetReference())
            {
                if (FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
                {
                    // this is a deadlock. RT tasks must be done by now or they won't be done. We could add a third queue...
                    UE_LOG(LogRHI, Fatal, TEXT("Deadlock in FRHICommandListExecutor::ExecuteInner (RenderThreadSublistDispatchTask)."));
                }
                FTaskGraphInterface::Get().WaitUntilTaskCompletes(RenderThreadSublistDispatchTask, RenderThread_Local);
#if NEEDS_DEBUG_INFO_ON_PRESENT_HANG
                bRenderThreadSublistDispatchTaskClearedOnRT = IsInActualRenderingThread();
                bRenderThreadSublistDispatchTaskClearedOnGT = bIsInGameThread;
#endif
                RenderThreadSublistDispatchTask = nullptr;
            }
            while (RHIThreadTask.GetReference())
            {
                if (FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
                {
                    // this is a deadlock. RT tasks must be done by now or they won't be done. We could add a third queue...
                    UE_LOG(LogRHI, Fatal, TEXT("Deadlock in FRHICommandListExecutor::ExecuteInner (RHIThreadTask)."));
                }
                FTaskGraphInterface::Get().WaitUntilTaskCompletes(RHIThreadTask, RenderThread_Local);
                if (RHIThreadTask.GetReference() && RHIThreadTask->IsComplete())
                {
                    RHIThreadTask = nullptr;
                    PrevRHIThreadTask = nullptr;
                }
            }
        }
    }
    // 非RHI專用線程.
    else
    {
        if (bIsInRenderingThread && CmdList.RTTasks.Num())
        {
            ENamedThreads::Type RenderThread_Local = ENamedThreads::GetRenderThread_Local();
            if (FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
            {
                // this is a deadlock. RT tasks must be done by now or they won't be done. We could add a third queue...
                UE_LOG(LogRHI, Fatal, TEXT("Deadlock in FRHICommandListExecutor::ExecuteInner (RTTasks)."));
            }
            FTaskGraphInterface::Get().WaitUntilTasksComplete(CmdList.RTTasks, RenderThread_Local);
            CmdList.RTTasks.Reset();
        }
    }

    // 内部執行指令.
    ExecuteInner_DoExecute(CmdList);
}

void FRHICommandListExecutor::ExecuteInner_DoExecute(FRHICommandListBase& CmdList)
{
    FScopeCycleCounter ScopeOuter(CmdList.ExecuteStat);

    CmdList.bExecuting = true;
    check(CmdList.Context || CmdList.ComputeContext);

    FMemMark Mark(FMemStack::Get());

    // 設定多GPU的Mask.
#if WITH_MGPU
    if (CmdList.Context != nullptr)
    {
        CmdList.Context->RHISetGPUMask(CmdList.InitialGPUMask);
    }
    if (CmdList.ComputeContext != nullptr && CmdList.ComputeContext != CmdList.Context)
    {
        CmdList.ComputeContext->RHISetGPUMask(CmdList.InitialGPUMask);
    }
#endif

    FRHICommandListDebugContext DebugContext;
    FRHICommandListIterator Iter(CmdList);
    // 統計執行資訊.
#if STATS
    bool bDoStats =  CVarRHICmdCollectRHIThreadStatsFromHighLevel.GetValueOnRenderThread() > 0 && FThreadStats::IsCollectingData() && (IsInRenderingThread() || IsInRHIThread());
    if (bDoStats)
    {
        while (Iter.HasCommandsLeft())
        {
            TStatIdData const* Stat = GCurrentExecuteStat.GetRawPointer();
            FScopeCycleCounter Scope(GCurrentExecuteStat);
            while (Iter.HasCommandsLeft() && Stat == GCurrentExecuteStat.GetRawPointer())
            {
                FRHICommandBase* Cmd = Iter.NextCommand();
                Cmd->ExecuteAndDestruct(CmdList, DebugContext);
            }
        }
    }
    else
    // 統計指定事件.
#elif ENABLE_STATNAMEDEVENTS
    bool bDoStats = CVarRHICmdCollectRHIThreadStatsFromHighLevel.GetValueOnRenderThread() > 0 && GCycleStatsShouldEmitNamedEvents && (IsInRenderingThread() || IsInRHIThread());
    if (bDoStats)
    {
        while (Iter.HasCommandsLeft())
        {
            PROFILER_CHAR const* Stat = GCurrentExecuteStat.StatString;
            FScopeCycleCounter Scope(GCurrentExecuteStat);
            while (Iter.HasCommandsLeft() && Stat == GCurrentExecuteStat.StatString)
            {
                FRHICommandBase* Cmd = Iter.NextCommand();
                Cmd->ExecuteAndDestruct(CmdList, DebugContext);
            }
        }
    }
    else
#endif
    // 不調試或不統計資訊的版本.
    {
        // 循環所有指令, 執行并銷毀之.
        while (Iter.HasCommandsLeft())
        {
            FRHICommandBase* Cmd = Iter.NextCommand();
            GCurrentCommand = Cmd;
            Cmd->ExecuteAndDestruct(CmdList, DebugContext);
        }
    }
    // 充值指令清單.
    CmdList.Reset();
}
           

由此可知,FRHICommandListExecutor處理了複雜的各類任務,并且要判定任務的前序、等待、依賴關系,還有各個線程之間的依賴和等待關系。上述代碼中涉及到了兩個重要的任務類型:

// 派發RHI線程任務.
class FDispatchRHIThreadTask
{
    FRHICommandListBase* RHICmdList; // 待派發的指令清單.
    bool bRHIThread; // 是否在RHI線程中派發.

public:
    FDispatchRHIThreadTask(FRHICommandListBase* InRHICmdList, bool bInRHIThread)
        : RHICmdList(InRHICmdList)
        , bRHIThread(bInRHIThread)
    {        
    }
    FORCEINLINE TStatId GetStatId() const;
    static ESubsequentsMode::Type GetSubsequentsMode() { return ESubsequentsMode::TrackSubsequents; }

    // 預期的線程由是否在RHI線程/是否在獨立的RHI線程等變量決定.
    ENamedThreads::Type GetDesiredThread()
    {
        return bRHIThread ? (IsRunningRHIInDedicatedThread() ? ENamedThreads::RHIThread : CPrio_RHIThreadOnTaskThreads.Get()) : ENamedThreads::GetRenderThread_Local();
    }
    
    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        // 前序任務是RHIThreadTask.
        FGraphEventArray Prereq;
        if (RHIThreadTask.GetReference())
        {
            Prereq.Add(RHIThreadTask);
        }
        // 将目前任務放到PrevRHIThreadTask中.
        PrevRHIThreadTask = RHIThreadTask;
        // 建立FExecuteRHIThreadTask任務并指派到RHIThreadTask.
        RHIThreadTask = TGraphTask<FExecuteRHIThreadTask>::CreateTask(&Prereq, CurrentThread).ConstructAndDispatchWhenReady(RHICmdList);
    }
};

// 執行RHI線程任務.
class FExecuteRHIThreadTask
{
    FRHICommandListBase* RHICmdList;

public:
    FExecuteRHIThreadTask(FRHICommandListBase* InRHICmdList)
        : RHICmdList(InRHICmdList)
    {
    }

    FORCEINLINE TStatId GetStatId() const;
    static ESubsequentsMode::Type GetSubsequentsMode() { return ESubsequentsMode::TrackSubsequents; }

    // 根據是否在專用的RHI線程而選擇RHI或渲染線程.
    ENamedThreads::Type GetDesiredThread()
    {
        return IsRunningRHIInDedicatedThread() ? ENamedThreads::RHIThread : CPrio_RHIThreadOnTaskThreads.Get();
    }
    
    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        // 設定全局變量GRHIThreadId
        if (IsRunningRHIInTaskThread())
        {
            GRHIThreadId = FPlatformTLS::GetCurrentThreadId();
        }
        
        // 執行RHI指令隊列.
        {
            // 臨界區, 保證線程通路安全.
            FScopeLock Lock(&GRHIThreadOnTasksCritical);
            
            FRHICommandListExecutor::ExecuteInner_DoExecute(*RHICmdList);
            delete RHICmdList;
        }
        
        // 清空全局變量GRHIThreadId
        if (IsRunningRHIInTaskThread())
        {
            GRHIThreadId = 0;
        }
    }
};
           

由上可知,在派發和轉譯指令隊列時,可能在專用的RHI線程執行,也可能在渲染線程或工作線程執行。

GRHICommandList乍一看以為是FRHICommandListBase的執行個體,但實際類型是FRHICommandListExecutor。它的聲明和實作如下:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h
extern RHI_API FRHICommandListExecutor GRHICommandList;

// Engine\Source\Runtime\RHI\Private\RHICommandList.cpp
RHI_API FRHICommandListExecutor GRHICommandList;
           

有關GRHICommandList的全局或靜态接口如下:

FRHICommandListImmediate& FRHICommandListExecutor::GetImmediateCommandList()
{
    return GRHICommandList.CommandListImmediate;
}

FRHIAsyncComputeCommandListImmediate& FRHICommandListExecutor::GetImmediateAsyncComputeCommandList()
{
    return GRHICommandList.AsyncComputeCmdListImmediate;
}
           

在UE的渲染子產品和RHI子產品中擁有大量的GRHICommandList使用案例,取其中之一:

// Engine\Source\Runtime\Renderer\Private\DeferredShadingRenderer.cpp

void ServiceLocalQueue()
{
    FTaskGraphInterface::Get().ProcessThreadUntilIdle(ENamedThreads::GetRenderThread_Local());

    if (IsRunningRHIInSeparateThread())
    {
        FRHICommandListExecutor::GetImmediateCommandList().ImmediateFlush(EImmediateFlushType::DispatchToRHIThread);
    }
}
           

在RHI指令隊列子產品,除了涉及GRHICommandList,還涉及諸多全局的任務變量:

// Engine\Source\Runtime\RHI\Private\RHICommandList.cpp

static FGraphEventArray AllOutstandingTasks;
static FGraphEventArray WaitOutstandingTasks;
static FGraphEventRef RHIThreadTask;
static FGraphEventRef PrevRHIThreadTask;
static FGraphEventRef RenderThreadSublistDispatchTask;
           

它們的建立或添加任務的代碼如下:

void FRHICommandListBase::QueueParallelAsyncCommandListSubmit(FGraphEventRef* AnyThreadCompletionEvents, ...)
{
    (......)
    
    if (Num && IsRunningRHIInSeparateThread())
    {
        (......)
            
        // 建立FParallelTranslateSetupCommandList任務.
        FGraphEventRef TranslateSetupCompletionEvent = TGraphTask<FParallelTranslateSetupCommandList>::CreateTask(&Prereq, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(CmdList, &RHICmdLists[0], Num, bIsPrepass);
        QueueCommandListSubmit(CmdList);
        // 添加到AllOutstandingTasks.
        AllOutstandingTasks.Add(TranslateSetupCompletionEvent);
        
        (......)
        
        FGraphEventArray Prereq;
        FRHICommandListBase** RHICmdLists = (FRHICommandListBase**)Alloc(sizeof(FRHICommandListBase*) * (1 + Last - Start), alignof(FRHICommandListBase*));
        // 将所有外部任務AnyThreadCompletionEvents加入到對應的清單中.
        for (int32 Index = Start; Index <= Last; Index++)
        {
            FGraphEventRef& AnyThreadCompletionEvent = AnyThreadCompletionEvents[Index];
            FRHICommandList* CmdList = CmdLists[Index];
            RHICmdLists[Index - Start] = CmdList;
            if (AnyThreadCompletionEvent.GetReference())
            {
                Prereq.Add(AnyThreadCompletionEvent);
                AllOutstandingTasks.Add(AnyThreadCompletionEvent);
                WaitOutstandingTasks.Add(AnyThreadCompletionEvent);
            }
        }
        
        (......)
        
        // 并行轉譯任務FParallelTranslateCommandList.
        FGraphEventRef TranslateCompletionEvent = TGraphTask<FParallelTranslateCommandList>::CreateTask(&Prereq, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(&RHICmdLists[0], 1 + Last - Start, ContextContainer, bIsPrepass);
        AllOutstandingTasks.Add(TranslateCompletionEvent);
        
        (......)
}
    
void FRHICommandListBase::QueueAsyncCommandListSubmit(FGraphEventRef& AnyThreadCompletionEvent, class FRHICommandList* CmdList)
{
    (......)
    
    // 處理外部任務AnyThreadCompletionEvent
    if (AnyThreadCompletionEvent.GetReference())
    {
        if (IsRunningRHIInSeparateThread())
        {
            AllOutstandingTasks.Add(AnyThreadCompletionEvent);
        }
        WaitOutstandingTasks.Add(AnyThreadCompletionEvent);
    }
    
    (......)
}
    
class FDispatchRHIThreadTask
{
    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        (......)
        
        // 建立RHI線程任務FExecuteRHIThreadTask.
        PrevRHIThreadTask = RHIThreadTask;
        RHIThreadTask = TGraphTask<FExecuteRHIThreadTask>::CreateTask(&Prereq, CurrentThread).ConstructAndDispatchWhenReady(RHICmdList);
    }
};
    
class FParallelTranslateSetupCommandList
{
    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        (......)

        // 建立并行轉譯任務FParallelTranslateCommandList.
        FGraphEventRef TranslateCompletionEvent = TGraphTask<FParallelTranslateCommandList>::CreateTask(nullptr, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(&RHICmdLists[Start], 1 + Last - Start, ContextContainer, bIsPrepass);
        MyCompletionGraphEvent->DontCompleteUntil(TranslateCompletionEvent);
        // 利用RHICmdList的接口FRHICommandWaitForAndSubmitSubListParallel送出任務, 最終會進入AllOutstandingTasks和WaitOutstandingTasks.
        ALLOC_COMMAND_CL(*RHICmdList, FRHICommandWaitForAndSubmitSubListParallel)(TranslateCompletionEvent, ContextContainer, EffectiveThreads, ThreadIndex++);
    
};
    
void FRHICommandListExecutor::ExecuteInner(FRHICommandListBase& CmdList)
{
    (......)
    
    if (IsRunningRHIInSeparateThread())
    {
        (......)
        
        if (AllOutstandingTasks.Num() || RenderThreadSublistDispatchTask.GetReference())
        {
            (......)
            // 建立渲染線程子指令派發(送出)任務FDispatchRHIThreadTask.
            RenderThreadSublistDispatchTask = TGraphTask<FDispatchRHIThreadTask>::CreateTask(&Prereq, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(SwapCmdList, bAsyncSubmit);
        }
        else
        {
            (......)
            PrevRHIThreadTask = RHIThreadTask;
            // 建立渲染線程子指令轉譯任務FExecuteRHIThreadTask.
            RHIThreadTask = TGraphTask<FExecuteRHIThreadTask>::CreateTask(&Prereq, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(SwapCmdList);
        }
        
        (......)
}
           

總結一下這些任務變量的作用:

任務變量 執行線程 描述
AllOutstandingTasks 渲染、RHI、工作 所有在處理或待處理的任務清單。類型是FParallelTranslateSetupCommandList、FParallelTranslateCommandList。
WaitOutstandingTasks 待處理的任務清單。類型是FParallelTranslateSetupCommandList、FParallelTranslateCommandList。
RHIThreadTask RHI、工作 正在處理的RHI線程任務。類型是FExecuteRHIThreadTask。
PrevRHIThreadTask 上一次處理的RHIThreadTask。類型是FExecuteRHIThreadTask。
RenderThreadSublistDispatchTask 正在派發(送出)的任務。類型是FDispatchRHIThreadTask。

本節将研究UE4.26在PC平台的通用RHI及D3D11指令運作過程和機制。由于UE4.26在PC平台預設的RHI是D3D11,并且關鍵的幾個控制台變量的預設值如下:

剖析虛幻渲染體系(10)- RHI

也就是說開啟了指令跳過模式,并且禁用了RHI線程。在此情況下,FRHICommandList的某個接口被調用時,不會生成單獨的FRHICommand,而是直接調用Context的方法。以FRHICommandList::DrawPrimitive為例:

class RHI_API FRHICommandList : public FRHIComputeCommandList
{
    void DrawPrimitive(uint32 BaseVertexIndex, uint32 NumPrimitives, uint32 NumInstances)
    {
        // 預設情況下Bypass為1, 進入此分支.
        if (Bypass())
        {
            // 直接調用圖形API的上下文的對應方法.
            GetContext().RHIDrawPrimitive(BaseVertexIndex, NumPrimitives, NumInstances);
            return;
        }
        
        // 配置設定單獨的FRHICommandDrawPrimitive指令.
        ALLOC_COMMAND(FRHICommandDrawPrimitive)(BaseVertexIndex, NumPrimitives, NumInstances);
    }
}
           

是以,在PC的預設圖形API(D3D11)下,r.RHICmdBypass1且r.RHIThread.Enable0,FRHICommandList将直接調用圖形API的上下文的接口,相當于同步調用圖形API,此時的圖形API運作于渲染線程(如果開啟)。

接着将r.RHICmdBypass設為0,但保持r.RHIThread.Enable為0,此時不再直接調用Context的方法,而是通過生成一條條單獨的FRHICommand,然後由FRHICommandList相關的對象執行。還是以FRHICommandList::DrawPrimitive為例,調用堆棧如下所示:

class RHI_API FRHICommandList : public FRHIComputeCommandList
{
    void FRHICommandList::DrawPrimitive(uint32 BaseVertexIndex, uint32 NumPrimitives, uint32 NumInstances)
    {
        // 預設情況下Bypass為1, 進入此分支.
        if (Bypass())
        {
            // 直接調用圖形API的上下文的對應方法.
            GetContext().RHIDrawPrimitive(BaseVertexIndex, NumPrimitives, NumInstances);
            return;
        }
        
        // 配置設定單獨的FRHICommandDrawPrimitive指令.
        // ALLOC_COMMAND宏會調用AllocCommand接口.
        ALLOC_COMMAND(FRHICommandDrawPrimitive)(BaseVertexIndex, NumPrimitives, NumInstances);
    }
    
    template <typename TCmd>
    void* AllocCommand()
    {
        return AllocCommand(sizeof(TCmd), alignof(TCmd));
    }
    
    void* AllocCommand(int32 AllocSize, int32 Alignment)
    {
        FRHICommandBase* Result = (FRHICommandBase*) MemManager.Alloc(AllocSize, Alignment);
        ++NumCommands;
        // CommandLink指向了上一個指令節點的Next.
        *CommandLink = Result;
        // 将CommandLink指派為目前節點的Next.
        CommandLink = &Result->Next;
        return Result;
    }
}
           

利用ALLOC_COMMAND配置設定的指令執行個體會進入FRHICommandListBase的指令連結清單,但此時并未執行,而是等待其它合适的時機執行,例如在FRHICommandListImmediate::ImmediateFlush。下面是執行FRHICommandList的調用堆棧:

剖析虛幻渲染體系(10)- RHI

由調用堆棧可以得知,在此情況下,指令執行的過程變得複雜起來,多了很多中間執行步驟。還是以FRHICommandList::DrawPrimitive為例,調用流程示意圖如下:

graph TD

A[FRHICommandListImmediate::ImmediateFlush] --> B[FRHICommandListExecutor::ExecuteList]

B --> C[FRHICommandListExecutor::ExecuteInner]

C --> D[FRHICommandListExecutor::ExecuteInner_DoExecute]

D --> E[FRHICommand::ExecuteAndDestruct]

E --> F[FRHICommandDrawPrimitive::Execute]

F --> G[INTERNAL_DECORATOR]

G --> H[FD3D11DynamicRHI::RHIDrawPrimitive]

上圖的使用了宏INTERNAL_DECORATOR,其和相關宏的定義如下:

// Engine\Source\Runtime\RHI\Public\RHICommandListCommandExecutes.inl

#define INTERNAL_DECORATOR(Method) CmdList.GetContext().Method
#define INTERNAL_DECORATOR_COMPUTE(Method) CmdList.GetComputeContext().Method
           

相當于通過宏來調用CommandList的Context接口。

在RHI禁用(r.RHIThread.Enable==0)情況下,以上的調用在渲染線程執行:

剖析虛幻渲染體系(10)- RHI

接下來将r.RHIThread.Enable設為1,以開啟RHI線程。此時運作指令的線程變成了RHI:

剖析虛幻渲染體系(10)- RHI

并且調用堆棧是從TaskGraph的RHI線程發起任務:

剖析虛幻渲染體系(10)- RHI

此時,指令執行的流程圖如下:

C --> C1(FExecuteRHIThreadTask::DoTask)

C1 --> D(FRHICommandListExecutor::ExecuteInner_DoExecute)

D --> E(FRHICommand::ExecuteAndDestruct)

E --> F(FRHICommandDrawPrimitive::Execute)

F --> G(INTERNAL_DECORATOR)

G --> H(FD3D11DynamicRHI::RHIDrawPrimitive)

上面流程圖中,方角表示在渲染線程執行,而圓角在RHI線程執行。開啟RHI線程後,将出現它的統計資料:

剖析虛幻渲染體系(10)- RHI

左:未開啟RHI線程的統計資料;右:開啟RHI線程後的統計資料。

下面繪制出開啟或關閉Bypass和RHI線程的流程圖(以調用D3D11的DrawPrimitive為例):

a1[FRHICommandList::DrawPrimitive] --> a2{Bypass?}

a2 -->|No| a4[ALLOC_COMMAND_FRHICommandDrawPrimitive]

a2 -->|Yes| a3[FD3D11DynamicRHI::RHIDrawPrimitive]

a4 --> a5[FRHICommandListBase::AllocCommand]

a5 --> a6[......]

a6 --> A[FRHICommandListImmediate::ImmediateFlush]

A --> B[FRHICommandListExecutor::ExecuteList]

C --> C11{RHIThreadEnabled?}

C11 -->|No| D11[FRHICommandListExecutor::ExecuteInner_DoExecute]

D11 --> E11[FRHICommand::ExecuteAndDestruct]

E11 --> F11[FRHICommandDrawPrimitive::Execute]

F11 --> G11[INTERNAL_DECORATOR_RHIDrawPrimitive]

G11 --> H11[FD3D11DynamicRHI::RHIDrawPrimitive]

C11 -->|Yes| c0(.....)

c0 -->C1(FExecuteRHIThreadTask::DoTask)

F --> G(INTERNAL_DECORATOR_RHIDrawPrimitive)

上面流程圖中,方角表示在渲染線程中執行,圓角表示在RHI線程中執行。

在章節10.3.3 FDynamicRHI中,提及了重新整理類型(FlushType),是指EImmediateFlushType定義的類型:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

namespace EImmediateFlushType
{
    enum Type
    { 
        WaitForOutstandingTasksOnly = 0, // 等待僅正在處理的任務完成.
        DispatchToRHIThread,             // 派發到RHI線程.
        WaitForDispatchToRHIThread,      // 等待派發到RHI線程.
        FlushRHIThread,                  // 重新整理RHI線程.
        FlushRHIThreadFlushResources,    // 重新整理RHI線程和資源
        FlushRHIThreadFlushResourcesFlushDeferredDeletes // 重新整理RHI線程/資源和延遲删除.
    };
};
           

EImmediateFlushType中各個值的差別在FRHICommandListImmediate::ImmediateFlush的實作代碼中展現出來:

// Engine\Source\Runtime\RHI\Public\RHICommandList.inl

void FRHICommandListImmediate::ImmediateFlush(EImmediateFlushType::Type FlushType)
{
    switch (FlushType)
    {
    // 等待任務完成.
    case EImmediateFlushType::WaitForOutstandingTasksOnly:
        {
            WaitForTasks();
        }
        break;
    // 派發RHI線程(執行指令隊列)
    case EImmediateFlushType::DispatchToRHIThread:
        {
            if (HasCommands())
            {
                GRHICommandList.ExecuteList(*this);
            }
        }
        break;
    // 等待RHI線程派發.
    case EImmediateFlushType::WaitForDispatchToRHIThread:
        {
            if (HasCommands())
            {
                GRHICommandList.ExecuteList(*this);
            }
            WaitForDispatch();
        }
        break;
    // 重新整理RHI線程.
    case EImmediateFlushType::FlushRHIThread:
        {
            // 派發并等待RHI線程.
            if (HasCommands())
            {
                GRHICommandList.ExecuteList(*this);
            }
            WaitForDispatch();
            
            // 等待RHI線程任務.
            if (IsRunningRHIInSeparateThread())
            {
                WaitForRHIThreadTasks();
            }
            
            // 重置正在處理的任務清單.
            WaitForTasks(true);
        }
        break;
    case EImmediateFlushType::FlushRHIThreadFlushResources:
    case EImmediateFlushType::FlushRHIThreadFlushResourcesFlushDeferredDeletes:
        {
            if (HasCommands())
            {
                GRHICommandList.ExecuteList(*this);
            }
            WaitForDispatch();
            WaitForRHIThreadTasks();
            WaitForTasks(true);
            
            // 重新整理管線狀态緩存的資源.
            PipelineStateCache::FlushResources();
            // 重新整理将要删除的資源.
            FRHIResource::FlushPendingDeletes(FlushType == EImmediateFlushType::FlushRHIThreadFlushResourcesFlushDeferredDeletes);
        }
        break;
    }
}
           

上面代碼中涉及到了若幹種處理和等待任務的接口,它們的實作如下:

// 等待任務完成.
void FRHICommandListBase::WaitForTasks(bool bKnownToBeComplete)
{
    if (WaitOutstandingTasks.Num())
    {
        // 檢測是否存在未完成的等待任務.
        bool bAny = false;
        for (int32 Index = 0; Index < WaitOutstandingTasks.Num(); Index++)
        {
            if (!WaitOutstandingTasks[Index]->IsComplete())
            {
                bAny = true;
                break;
            }
        }
        // 存在就利用TaskGraph的接口開啟線程等待.
        if (bAny)
        {
            ENamedThreads::Type RenderThread_Local = ENamedThreads::GetRenderThread_Local();
            FTaskGraphInterface::Get().WaitUntilTasksComplete(WaitOutstandingTasks, RenderThread_Local);
        }
        // 重置等待任務清單.
        WaitOutstandingTasks.Reset();
    }
}

// 等待渲染線程派發完成.
void FRHICommandListBase::WaitForDispatch()
{
    // 如果RenderThreadSublistDispatchTask已完成, 則置空.
    if (RenderThreadSublistDispatchTask.GetReference() && RenderThreadSublistDispatchTask->IsComplete())
    {
        RenderThreadSublistDispatchTask = nullptr;
    }
    
    // RenderThreadSublistDispatchTask有未完成的任務.
    while (RenderThreadSublistDispatchTask.GetReference())
    {
        ENamedThreads::Type RenderThread_Local = ENamedThreads::GetRenderThread_Local();
        FTaskGraphInterface::Get().WaitUntilTaskCompletes(RenderThreadSublistDispatchTask, RenderThread_Local);
        if (RenderThreadSublistDispatchTask.GetReference() && RenderThreadSublistDispatchTask->IsComplete())
        {
            RenderThreadSublistDispatchTask = nullptr;
        }
    }
}

// 等待RHI線程任務完成.
void FRHICommandListBase::WaitForRHIThreadTasks()
{
    bool bAsyncSubmit = CVarRHICmdAsyncRHIThreadDispatch.GetValueOnRenderThread() > 0;
    ENamedThreads::Type RenderThread_Local = ENamedThreads::GetRenderThread_Local();
    
    // 相當于執行FRHICommandListBase::WaitForDispatch()
    if (bAsyncSubmit)
    {
        if (RenderThreadSublistDispatchTask.GetReference() && RenderThreadSublistDispatchTask->IsComplete())
        {
            RenderThreadSublistDispatchTask = nullptr;
        }
        while (RenderThreadSublistDispatchTask.GetReference())
        {
            if (FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
            {
                while (!RenderThreadSublistDispatchTask->IsComplete())
                {
                    FPlatformProcess::SleepNoStats(0);
                }
            }
            else
            {
                FTaskGraphInterface::Get().WaitUntilTaskCompletes(RenderThreadSublistDispatchTask, RenderThread_Local);
            }
            
            if (RenderThreadSublistDispatchTask.GetReference() && RenderThreadSublistDispatchTask->IsComplete())
            {
                RenderThreadSublistDispatchTask = nullptr;
            }
        }
        // now we can safely look at RHIThreadTask
    }
    
    // 如果RHI線程任務已完成, 則置空任務.
    if (RHIThreadTask.GetReference() && RHIThreadTask->IsComplete())
    {
        RHIThreadTask = nullptr;
        PrevRHIThreadTask = nullptr;
    }
    
    // 如果RHI線程有任務未完成, 則執行并等待.
    while (RHIThreadTask.GetReference())
    {
        // 如果已在處理, 則用sleep(0)跳過此時間片.
        if (FTaskGraphInterface::Get().IsThreadProcessingTasks(RenderThread_Local))
        {
            while (!RHIThreadTask->IsComplete())
            {
                FPlatformProcess::SleepNoStats(0);
            }
        }
        // 任務尚未處理, 開始并等待之.
        else
        {
            FTaskGraphInterface::Get().WaitUntilTaskCompletes(RHIThreadTask, RenderThread_Local);
        }
        
        // 如果RHI線程任務已完成, 則置空任務.
        if (RHIThreadTask.GetReference() && RHIThreadTask->IsComplete())
        {
            RHIThreadTask = nullptr;
            PrevRHIThreadTask = nullptr;
        }
    }
}
           

本篇開頭也提到了在開啟RHI線程的情況下,RHI線程負責将渲染線程Push進來的RHI中間指令轉譯到對應圖形平台的GPU指令。如果渲染線程是并行生成的RHI中間指令,那麼RHI線程也會并行轉譯。

剖析虛幻渲染體系(10)- RHI

在正式闡述并行渲染和轉譯之前,需要先了解一些基礎概念和類型。

FParallelCommandListSet的定義如下:

// Engine\Source\Runtime\Renderer\Private\SceneRendering.h

class FParallelCommandListSet
{
public:
    // 所屬的視圖.
    const FViewInfo& View;
    // 父指令隊列.
    FRHICommandListImmediate& ParentCmdList;
    // 場景RT快照.
    FSceneRenderTargets* Snapshot;
    
    TStatId    ExecuteStat;
    int32 Width;
    int32 NumAlloc;
    int32 MinDrawsPerCommandList;
    // 是否平衡指令隊列, 見r.RHICmdBalanceParallelLists
    bool bBalanceCommands;
    // see r.RHICmdSpewParallelListBalance
    bool bSpewBalance;
    
    // 指令隊列清單.
    TArray<FRHICommandList*,SceneRenderingAllocator> CommandLists;
    // 同步事件.
    TArray<FGraphEventRef,SceneRenderingAllocator> Events;
    // 指令隊列的繪制次數, 若是-1則未知. 高估總比沒有好.
    TArray<int32,SceneRenderingAllocator> NumDrawsIfKnown;
    
    FParallelCommandListSet(TStatId InExecuteStat, const FViewInfo& InView, FRHICommandListImmediate& InParentCmdList, bool bInCreateSceneContext);
    virtual ~FParallelCommandListSet();

    // 擷取數量.
    int32 NumParallelCommandLists() const;
    // 建立一個并行的指令隊列.
    FRHICommandList* NewParallelCommandList();
    // 擷取前序任務.
    FORCEINLINE FGraphEventArray* GetPrereqs();
    // 增加并行的指令隊列.
    void AddParallelCommandList(FRHICommandList* CmdList, FGraphEventRef& CompletionEvent, int32 InNumDrawsIfKnown = -1);    
    virtual void SetStateOnCommandList(FRHICommandList& CmdList) {}
    // 等待任務完成.
    static void WaitForTasks();
    
protected:
    // 派發, 須由子類調用.
    void Dispatch(bool bHighPriority = false);
    // 配置設定新的指令隊列.
    FRHICommandList* AllocCommandList();
    // 是否建立場景上下文.
    bool bCreateSceneContext;
    
private:
    void WaitForTasksInternal();
};
           

下面是FParallelCommandListSet的重要接口的實作代碼:

// Engine\Source\Runtime\Renderer\Private\SceneRendering.cpp

FRHICommandList* FParallelCommandListSet::AllocCommandList()
{
    NumAlloc++;
    return new FRHICommandList(ParentCmdList.GetGPUMask());
}

void FParallelCommandListSet::Dispatch(bool bHighPriority)
{
    ENamedThreads::Type RenderThread_Local = ENamedThreads::GetRenderThread_Local();
    if (bSpewBalance)
    {
        // 等待之前的任務完成.
        for (auto& Event : Events)
        {
            FTaskGraphInterface::Get().WaitUntilTaskCompletes(Event, RenderThread_Local);
        }
    }
    
    // 是否并行轉譯.
    bool bActuallyDoParallelTranslate = GRHISupportsParallelRHIExecute && CommandLists.Num() >= CVarRHICmdMinCmdlistForParallelSubmit.GetValueOnRenderThread();
    if (bActuallyDoParallelTranslate)
    {
        int32 Total = 0;
        bool bIndeterminate = false;
        for (int32 Count : NumDrawsIfKnown)
        {
            // 不能确定這裡面有多少, 假設應該進行平行轉譯.
            if (Count < 0)
            {
                bIndeterminate = true;
                break; 
            }
            Total += Count;
        }
        
        // 指令隊列數量太少, 不并行轉譯.
        if (!bIndeterminate && Total < MinDrawsPerCommandList)
        {
            bActuallyDoParallelTranslate = false;
        }
    }

    if (bActuallyDoParallelTranslate)
    {
        // 確定支援并行的RHI執行.
        check(GRHISupportsParallelRHIExecute);
        NumAlloc -= CommandLists.Num();
        
        // 用父指令隊列入隊并行異步指令隊列送出.
        ParentCmdList.QueueParallelAsyncCommandListSubmit(&Events[0], bHighPriority, &CommandLists[0], &NumDrawsIfKnown[0], CommandLists.Num(), (MinDrawsPerCommandList * 4) / 3, bSpewBalance);
        SetStateOnCommandList(ParentCmdList);
        // 結束Pass渲染.
        ParentCmdList.EndRenderPass();
    }
    else // 非并行模式.
    {
        for (int32 Index = 0; Index < CommandLists.Num(); Index++)
        {
            ParentCmdList.QueueAsyncCommandListSubmit(Events[Index], CommandLists[Index]);
            NumAlloc--;
        }
    }
    
    // 重置資料.
    CommandLists.Reset();
    Snapshot = nullptr;
    Events.Reset();
    
    // 等待渲染線程處理完成.
    FTaskGraphInterface::Get().ProcessThreadUntilIdle(RenderThread_Local);
}

FParallelCommandListSet::~FParallelCommandListSet()
{
    GOutstandingParallelCommandListSet = nullptr;
}

FRHICommandList* FParallelCommandListSet::NewParallelCommandList()
{
    // 建立一個指令隊列.
    FRHICommandList* Result = AllocCommandList();
    Result->ExecuteStat = ExecuteStat;
    SetStateOnCommandList(*Result);
    if (bCreateSceneContext)
    {
        FSceneRenderTargets& SceneContext = FSceneRenderTargets::Get(ParentCmdList);
        // 建立場景RT快照.
        if (!Snapshot)
        {
            Snapshot = SceneContext.CreateSnapshot(View);
        }
        // 将RT快照設定到指令隊列上.
        Snapshot->SetSnapshotOnCmdList(*Result);
    }
    return Result;
}

// 增加并行指令隊列.
void FParallelCommandListSet::AddParallelCommandList(FRHICommandList* CmdList, FGraphEventRef& CompletionEvent, int32 InNumDrawsIfKnown)
{
    // 增加指令隊列.
    CommandLists.Add(CmdList);
    // 增加等待事件.
    Events.Add(CompletionEvent);
    // 增加數量.
    NumDrawsIfKnown.Add(InNumDrawsIfKnown);
}

void FParallelCommandListSet::WaitForTasks()
{
    if (GOutstandingParallelCommandListSet)
    {
        GOutstandingParallelCommandListSet->WaitForTasksInternal();
    }
}

void FParallelCommandListSet::WaitForTasksInternal()
{
    // 收集等待處理的事件.
    FGraphEventArray WaitOutstandingTasks;
    for (int32 Index = 0; Index < Events.Num(); Index++)
    {
        if (!Events[Index]->IsComplete())
        {
            WaitOutstandingTasks.Add(Events[Index]);
        }
    }
    
    // 如果有正在處理的任務, 則等待其完成.
    if (WaitOutstandingTasks.Num())
    {
        ENamedThreads::Type RenderThread_Local = ENamedThreads::GetRenderThread_Local();
        FTaskGraphInterface::Get().WaitUntilTasksComplete(WaitOutstandingTasks, RenderThread_Local);
    }
}
           

FParallelCommandListSet擁有以下子類,以滿足不同Pass或場合的并行渲染邏輯:

  • FAnisotropyPassParallelCommandListSet:各項異性Pass的并行渲染指令隊列集合。
  • FPrePassParallelCommandListSet:提前深度Pass的并行渲染指令隊列集合。
  • FShadowParallelCommandListSet:陰影渲染的并行渲染指令隊列集合。
  • FRDGParallelCommandListSet:RDG系統的并行渲染指令隊列集合。

下面以FPrePassParallelCommandListSet和FShadowParallelCommandListSet為剖析對象:

// Engine\Source\Runtime\Renderer\Private\DepthRendering.cpp

class FPrePassParallelCommandListSet : public FParallelCommandListSet
{
public:
    FPrePassParallelCommandListSet(FRHICommandListImmediate& InParentCmdList, const FSceneRenderer& InSceneRenderer, const FViewInfo& InView, bool bInCreateSceneContext)
        : FParallelCommandListSet(GET_STATID(STAT_CLP_Prepass), InView, InParentCmdList, bInCreateSceneContext)
        , SceneRenderer(InSceneRenderer)
    {
    }

    virtual ~FPrePassParallelCommandListSet()
    {
        // 在析構函數内派發指令清單.
        Dispatch(true);
    }

    // 在指令清單上設定狀态.
    virtual void SetStateOnCommandList(FRHICommandList& CmdList) override
    {
        FParallelCommandListSet::SetStateOnCommandList(CmdList);
        FSceneRenderTargets::Get(CmdList).BeginRenderingPrePass(CmdList, false);
        SetupPrePassView(CmdList, View, &SceneRenderer);
    }

private:
    const FSceneRenderer& SceneRenderer;
};

class FShadowParallelCommandListSet : public FParallelCommandListSet
{
public:
    FShadowParallelCommandListSet(
        FRHICommandListImmediate& InParentCmdList,
        const FViewInfo& InView,
        bool bInCreateSceneContext,
        FProjectedShadowInfo& InProjectedShadowInfo,
        FBeginShadowRenderPassFunction InBeginShadowRenderPass)
        : FParallelCommandListSet(GET_STATID(STAT_CLP_Shadow), InView, InParentCmdList, bInCreateSceneContext)
        , ProjectedShadowInfo(InProjectedShadowInfo)
        , BeginShadowRenderPass(InBeginShadowRenderPass)
    {
        bBalanceCommands = false;
    }

    virtual ~FShadowParallelCommandListSet()
    {
        // 在析構函數内派發指令清單.
        Dispatch();
    }

    virtual void SetStateOnCommandList(FRHICommandList& CmdList) override
    {
        FParallelCommandListSet::SetStateOnCommandList(CmdList);
        BeginShadowRenderPass(CmdList, false);
        ProjectedShadowInfo.SetStateForView(CmdList);
    }

private:
    // 投射陰影資訊.
    FProjectedShadowInfo& ProjectedShadowInfo;
    // 開始陰影渲染pass函數.
    FBeginShadowRenderPassFunction BeginShadowRenderPass;
    // 陰影深度渲染模式.
    EShadowDepthRenderMode RenderMode;
};
           

使用以上的邏輯比較簡單,以PrePass為例:

// Engine\Source\Runtime\Renderer\Private\DepthRendering.cpp

bool FDeferredShadingSceneRenderer::RenderPrePassViewParallel(const FViewInfo& View, FRHICommandListImmediate& ParentCmdList, TFunctionRef<void()> AfterTasksAreStarted, bool bDoPrePre)
{
    bool bDepthWasCleared = false;

    {
        // 構造FPrePassParallelCommandListSet執行個體.
        FPrePassParallelCommandListSet ParallelCommandListSet(ParentCmdList, *this, View,
            CVarRHICmdFlushRenderThreadTasksPrePass.GetValueOnRenderThread() == 0 && CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() == 0);

        // 調用FParallelMeshDrawCommandPass::DispatchDraw.
        View.ParallelMeshDrawCommandPasses[EMeshPass::DepthPass].DispatchDraw(&ParallelCommandListSet, ParentCmdList);

        if (bDoPrePre)
        {
            bDepthWasCleared = PreRenderPrePass(ParentCmdList);
        }
    }

    if (bDoPrePre)
    {
        AfterTasksAreStarted();
    }

    return bDepthWasCleared;
}

// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FParallelMeshDrawCommandPass::DispatchDraw(FParallelCommandListSet* ParallelCommandListSet, FRHICommandList& RHICmdList) const
{
    if (MaxNumDraws <= 0)
    {
        return;
    }

    FRHIVertexBuffer* PrimitiveIdsBuffer = PrimitiveIdVertexBufferPoolEntry.BufferRHI;
    const int32 BasePrimitiveIdsOffset = 0;

    // 并行模式.
    if (ParallelCommandListSet)
    {
        if (TaskContext.bUseGPUScene)
        {
            // 在完成FMeshDrawCommandPassSetupTask後,RHI線程将上傳PrimitiveIdVertexBuffer指令.
            FRHICommandListImmediate &RHICommandList = GetImmediateCommandList_ForRenderCommand();

            if (TaskEventRef.IsValid())
            {
                RHICommandList.AddDispatchPrerequisite(TaskEventRef);
            }

            RHICommandList.EnqueueLambda([
                VertexBuffer = PrimitiveIdsBuffer,
                VertexBufferData = TaskContext.PrimitiveIdBufferData, 
                VertexBufferDataSize = TaskContext.PrimitiveIdBufferDataSize,
                PrimitiveIdVertexBufferPoolEntry = PrimitiveIdVertexBufferPoolEntry](FRHICommandListImmediate& CmdList)
            {
                // Upload vertex buffer data.
                void* RESTRICT Data = (void* RESTRICT)CmdList.LockVertexBuffer(VertexBuffer, 0, VertexBufferDataSize, RLM_WriteOnly);
                FMemory::Memcpy(Data, VertexBufferData, VertexBufferDataSize);
                CmdList.UnlockVertexBuffer(VertexBuffer);

                FMemory::Free(VertexBufferData);
            });

            RHICommandList.RHIThreadFence(true);

            bPrimitiveIdBufferDataOwnedByRHIThread = true;
        }

        const ENamedThreads::Type RenderThread = ENamedThreads::GetRenderThread();

        // 處理前序任務
        FGraphEventArray Prereqs;
        if (ParallelCommandListSet->GetPrereqs())
        {
            Prereqs.Append(*ParallelCommandListSet->GetPrereqs());
        }
        if (TaskEventRef.IsValid())
        {
            Prereqs.Add(TaskEventRef);
        }

        // 基于NumEstimatedDraws将工作平均配置設定給可用的task graph工作線程.  
        // 每個任務将根據FVisibleMeshDrawCommandProcessTask結果調整它的工作範圍.
        const int32 NumThreads = FMath::Min<int32>(FTaskGraphInterface::Get().GetNumWorkerThreads(), ParallelCommandListSet->Width);
        const int32 NumTasks = FMath::Min<int32>(NumThreads, FMath::DivideAndRoundUp(MaxNumDraws, ParallelCommandListSet->MinDrawsPerCommandList));
        const int32 NumDrawsPerTask = FMath::DivideAndRoundUp(MaxNumDraws, NumTasks);

        // 建立NumTasks個FRHICommandList, 添加到ParallelCommandListSet.
        for (int32 TaskIndex = 0; TaskIndex < NumTasks; TaskIndex++)
        {
            const int32 StartIndex = TaskIndex * NumDrawsPerTask;
            const int32 NumDraws = FMath::Min(NumDrawsPerTask, MaxNumDraws - StartIndex);
            checkSlow(NumDraws > 0);

            // 建立指令隊列.
            FRHICommandList* CmdList = ParallelCommandListSet->NewParallelCommandList();

            // 建立任務FDrawVisibleMeshCommandsAnyThreadTask, 獲得事件對象.
            FGraphEventRef AnyThreadCompletionEvent = TGraphTask<FDrawVisibleMeshCommandsAnyThreadTask>::CreateTask(&Prereqs, RenderThread)
                .ConstructAndDispatchWhenReady(*CmdList, TaskContext.MeshDrawCommands, TaskContext.MinimalPipelineStatePassSet, PrimitiveIdsBuffer, BasePrimitiveIdsOffset, TaskContext.bDynamicInstancing, TaskContext.InstanceFactor, TaskIndex, NumTasks);
            // 添加指令/事件等資料到ParallelCommandListSet.
            ParallelCommandListSet->AddParallelCommandList(CmdList, AnyThreadCompletionEvent, NumDraws);
        }
    }
    else // 非并行模式.
    {
        (......)
    }
}
           

以上可以知道,FParallelMeshDrawCommandPass::DispatchDraw調用之後,建立若幹個FRHICommandList、FDrawVisibleMeshCommandsAnyThreadTask任務和任務同步事件,然後将它們全部加入到ParallelCommandListSet的清單中。這樣,當ParallelCommandListSet被析構時,就可以真正地派發指令隊列。

上一小節調用FParallelCommandListSet::Dispatch之後,會進入FRHICommandListBase::QueueParallelAsyncCommandListSubmit的接口:

void FRHICommandListBase::QueueParallelAsyncCommandListSubmit(FGraphEventRef* AnyThreadCompletionEvents, bool bIsPrepass, FRHICommandList** CmdLists, int32* NumDrawsIfKnown, int32 Num, int32 MinDrawsPerTranslate, bool bSpewMerge)
{
    if (IsRunningRHIInSeparateThread())
    {
        // 在送出并行建構的子清單之前,執行立即指令清單上排隊的所有指令.
        FRHICommandListImmediate& ImmediateCommandList = FRHICommandListExecutor::GetImmediateCommandList();
        ImmediateCommandList.ImmediateFlush(EImmediateFlushType::DispatchToRHIThread);
        
        // 清空栅欄.
        if (RHIThreadBufferLockFence.GetReference() && RHIThreadBufferLockFence->IsComplete())
        {
            RHIThreadBufferLockFence = nullptr;
        }
    }
    
#if !UE_BUILD_SHIPPING
    // 處理前先重新整理指令,這樣就能知道這個平行集打碎了什麼東西,或是之前有什麼東西.
    if (CVarRHICmdFlushOnQueueParallelSubmit.GetValueOnRenderThread())
    {
        CSV_SCOPED_TIMING_STAT(RHITFlushes, QueueParallelAsyncCommandListSubmit);
        FRHICommandListExecutor::GetImmediateCommandList().ImmediateFlush(EImmediateFlushType::FlushRHIThread);
    }
#endif

    // 確定開啟了RHI線程.
    if (Num && IsRunningRHIInSeparateThread())
    {
        static const auto ICVarRHICmdBalanceParallelLists = IConsoleManager::Get().FindTConsoleVariableDataInt(TEXT("r.RHICmdBalanceParallelLists"));

        // r.RHICmdBalanceParallelLists==0 且 GRHISupportsParallelRHIExecute==true 且 使用延遲上下文.
        // 不平衡指令隊列送出模式.
        if (ICVarRHICmdBalanceParallelLists->GetValueOnRenderThread() == 0 && CVarRHICmdBalanceTranslatesAfterTasks.GetValueOnRenderThread() > 0 && GRHISupportsParallelRHIExecute && CVarRHICmdUseDeferredContexts.GetValueOnAnyThread() > 0)
        {
            // 處理前序任務.
            FGraphEventArray Prereq;
            FRHICommandListBase** RHICmdLists = (FRHICommandListBase**)Alloc(sizeof(FRHICommandListBase*) * Num, alignof(FRHICommandListBase*));
            for (int32 Index = 0; Index < Num; Index++)
            {
                FGraphEventRef& AnyThreadCompletionEvent = AnyThreadCompletionEvents[Index];
                FRHICommandList* CmdList = CmdLists[Index];
                RHICmdLists[Index] = CmdList;
                if (AnyThreadCompletionEvent.GetReference())
                {
                    Prereq.Add(AnyThreadCompletionEvent);
                    WaitOutstandingTasks.Add(AnyThreadCompletionEvent);
                }
            }
            
            // 確定在開始任何并行轉譯之前,所有舊的緩沖區鎖都已完成.
            if (RHIThreadBufferLockFence.GetReference())
            {
                Prereq.Add(RHIThreadBufferLockFence);
            }
            
            // 建立FRHICommandList.
            FRHICommandList* CmdList = new FRHICommandList(GetGPUMask());
            // 拷貝渲染線程上下文.
            CmdList->CopyRenderThreadContexts(*this);
            // 建立設定轉譯任務(FParallelTranslateSetupCommandList).
            FGraphEventRef TranslateSetupCompletionEvent = TGraphTask<FParallelTranslateSetupCommandList>::CreateTask(&Prereq, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(CmdList, &RHICmdLists[0], Num, bIsPrepass);
            // 入隊指令隊列送出.
            QueueCommandListSubmit(CmdList);
            // 添加設定轉譯事件到清單.
            AllOutstandingTasks.Add(TranslateSetupCompletionEvent);
            // 避免在異步指令清單之後的東西被綁定到它.
            if (IsRunningRHIInSeparateThread())
            {
                FRHICommandListExecutor::GetImmediateCommandList().ImmediateFlush(EImmediateFlushType::DispatchToRHIThread);
            }
            // 重新整理指令到RHI線程.
#if !UE_BUILD_SHIPPING
            if (CVarRHICmdFlushOnQueueParallelSubmit.GetValueOnRenderThread())
            {
                FRHICommandListExecutor::GetImmediateCommandList().ImmediateFlush(EImmediateFlushType::FlushRHIThread);
            }
#endif
            return;
        }
        
        // 平衡指令隊列送出模式.
        IRHICommandContextContainer* ContextContainer = nullptr;
        bool bMerge = !!CVarRHICmdMergeSmallDeferredContexts.GetValueOnRenderThread();
        int32 EffectiveThreads = 0;
        int32 Start = 0;
        int32 ThreadIndex = 0;
        if (GRHISupportsParallelRHIExecute && CVarRHICmdUseDeferredContexts.GetValueOnAnyThread() > 0)
        {
            // 由于需要提前知道作業的數量,是以運作了兩次合并邏輯.(可改進)
            while (Start < Num)
            {
                int32 Last = Start;
                int32 DrawCnt = NumDrawsIfKnown[Start];

                if (bMerge && DrawCnt >= 0)
                {
                    while (Last < Num - 1 && NumDrawsIfKnown[Last + 1] >= 0 && DrawCnt + NumDrawsIfKnown[Last + 1] <= MinDrawsPerTranslate)
                    {
                        Last++;
                        DrawCnt += NumDrawsIfKnown[Last];
                    }
                }
                check(Last >= Start);
                Start = Last + 1;
                EffectiveThreads++;
            }

            Start = 0;
            ContextContainer = RHIGetCommandContextContainer(ThreadIndex, EffectiveThreads, GetGPUMask());
        }
        
        if (ContextContainer)
        {
            // 又一次合并操作.
            while (Start < Num)
            {
                int32 Last = Start;
                int32 DrawCnt = NumDrawsIfKnown[Start];
                int32 TotalMem = bSpewMerge ? CmdLists[Start]->GetUsedMemory() : 0; 

                if (bMerge && DrawCnt >= 0)
                {
                    while (Last < Num - 1 && NumDrawsIfKnown[Last + 1] >= 0 && DrawCnt + NumDrawsIfKnown[Last + 1] <= MinDrawsPerTranslate)
                    {
                        Last++;
                        DrawCnt += NumDrawsIfKnown[Last];
                        TotalMem += bSpewMerge ? CmdLists[Start]->GetUsedMemory() : 0;
                    }
                }

            // 後面的邏輯和非平衡模式比較相似, 省略.
            
            (......)
                
            return;
        }
    }
    
    // 非并行模式.
    (......)
}
           

以上可知,開啟并行指令隊列送出需要滿足以下條件:

  • 開啟了RHI線程,即IsRunningRHIInSeparateThread()為true。
  • 目前使用的圖形API支援并行執行,即GRHISupportsParallelRHIExecute要為true。
  • 開啟了延遲上下文,即CVarRHICmdUseDeferredContexts不為0。

無論是哪個圖形API,都需要指定一個主CommandList(即ParentCommandList),以便調用它的QueueParallelAsyncCommandListSubmit送出設定指令隊列的任務。上面送出到RHI線程的任務對象是FParallelTranslateSetupCommandList,由下一小節闡述。

FParallelTranslateSetupCommandList用于建立并行(或串行)送出子指令隊列的任務,定義如下:

class FParallelTranslateSetupCommandList
{
    // 用于送出子指令清單的父指令清單.
    FRHICommandList* RHICmdList;
    // 待送出的子指令隊列清單.
    FRHICommandListBase** RHICmdLists;
    
    int32 NumCommandLists;
    bool bIsPrepass;
    int32 MinSize;
    int32 MinCount;
    
public:
    FParallelTranslateSetupCommandList(FRHICommandList* InRHICmdList, FRHICommandListBase** InRHICmdLists, int32 InNumCommandLists, bool bInIsPrepass)
        : RHICmdList(InRHICmdList)
        , RHICmdLists(InRHICmdLists)
        , NumCommandLists(InNumCommandLists)
        , bIsPrepass(bInIsPrepass)
    {
        // 單個子指令隊列的最小尺寸.
        MinSize = CVarRHICmdMinCmdlistSizeForParallelTranslate.GetValueOnRenderThread() * 1024;
        MinCount = CVarRHICmdMinCmdlistForParallelTranslate.GetValueOnRenderThread();
    }

    static FORCEINLINE TStatId GetStatId();
    // 預期的線程.
    static FORCEINLINE ENamedThreads::Type GetDesiredThread()
    {
        return CPrio_FParallelTranslateSetupCommandList.Get();
    }
    static FORCEINLINE ESubsequentsMode::Type GetSubsequentsMode() { return ESubsequentsMode::TrackSubsequents; }

    // 執行設定任務.
    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        TArray<int32, TInlineAllocator<64> > Sizes;
        Sizes.Reserve(NumCommandLists);
        for (int32 Index = 0; Index < NumCommandLists; Index++)
        {
            Sizes.Add(RHICmdLists[Index]->GetUsedMemory());
        }

        int32 EffectiveThreads = 0;
        int32 Start = 0;
        // 合并繪制指令, 計算所需的線程數量.
        while (Start < NumCommandLists)
        {
            int32 Last = Start;
            int32 DrawCnt = Sizes[Start];

            while (Last < NumCommandLists - 1 && DrawCnt + Sizes[Last + 1] <= MinSize)
            {
                Last++;
                DrawCnt += Sizes[Last];
            }
            check(Last >= Start);
            Start = Last + 1;
            EffectiveThreads++;
        } 

        // 如果需要的線程數量太少, 則串行送出子指令隊列.
        if (EffectiveThreads < MinCount)
        {
            FGraphEventRef Nothing;
            for (int32 Index = 0; Index < NumCommandLists; Index++)
            {
                FRHICommandListBase* CmdList = RHICmdLists[Index];
                // 使用了ALLOC_COMMAND_CL配置設定子指令隊列送出接口.
                ALLOC_COMMAND_CL(*RHICmdList, FRHICommandWaitForAndSubmitSubList)(Nothing, CmdList);
#if WITH_MGPU
                ALLOC_COMMAND_CL(*RHICmdList, FRHICommandSetGPUMask)(RHICmdList->GetGPUMask());
#endif
            }
        }
        // 并行送出.
        else
        {
            Start = 0;
            int32 ThreadIndex = 0;

            // 合并數量太少的指令隊列.
            while (Start < NumCommandLists)
            {
                int32 Last = Start;
                int32 DrawCnt = Sizes[Start];

                while (Last < NumCommandLists - 1 && DrawCnt + Sizes[Last + 1] <= MinSize)
                {
                    Last++;
                    DrawCnt += Sizes[Last];
                }

                // 擷取ContextContainer
                IRHICommandContextContainer* ContextContainer =  RHIGetCommandContextContainer(ThreadIndex, EffectiveThreads, RHICmdList->GetGPUMask());

                // 建立并行轉譯任務FParallelTranslateCommandList.
                FGraphEventRef TranslateCompletionEvent = TGraphTask<FParallelTranslateCommandList>::CreateTask(nullptr, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(&RHICmdLists[Start], 1 + Last - Start, ContextContainer, bIsPrepass);
                // 此任務結束前須確定轉譯任務完成.
                MyCompletionGraphEvent->DontCompleteUntil(TranslateCompletionEvent);
                // 調用RHICmdList的FRHICommandWaitForAndSubmitSubListParallel接口.
                ALLOC_COMMAND_CL(*RHICmdList, FRHICommandWaitForAndSubmitSubListParallel)(TranslateCompletionEvent, ContextContainer, EffectiveThreads, ThreadIndex++);
                Start = Last + 1;
            }
            check(EffectiveThreads == ThreadIndex);
        }
    }
};
           

以上代碼中,可以補充幾點:

  • 如果指令數量太少,所需的線程數量過少,直接使用串行轉譯接口FRHICommandWaitForAndSubmitSubList。
  • 并行邏輯分支中,RHIGetCommandContextContainer從具體的RHI子類中擷取上下文容器,隻在D3D12、Vulkan、Metal等現代圖形平台中有實作,其它圖形平台皆傳回nullptr。
  • 每個線程會送出1~N個子指令隊列,以確定它們的繪制指令總數不少于MinSize,提升每個線程的送出效率。
  • 每個線程會建立一個轉譯任務FParallelTranslateCommandList,然後利用RHICmdList的FRHICommandWaitForAndSubmitSubListParallel取等待子指令清單的并行送出。
  • 注意FParallelTranslateSetupCommandList的預期線程由CPrio_FParallelTranslateSetupCommandList決定:
    FAutoConsoleTaskPriority CPrio_FParallelTranslateSetupCommandList
        // 控制台名稱.
        TEXT("TaskGraph.TaskPriorities.ParallelTranslateSetupCommandList"), 
        // 描述.
        TEXT("Task and thread priority for FParallelTranslateSetupCommandList."),
        // 如果有高優先級的線程, 使用之.
        ENamedThreads::HighThreadPriority,
        // 使用高任務優先級.
        ENamedThreads::HighTaskPriority,
        // 如果沒有高優先級的線程, 則使用普遍優先級的線程, 但使用高任務優先級代替之.
        ENamedThreads::HighTaskPriority
        );
               
    是以可知,設定轉譯的任務會被TaskGraph系統優先執行,但發起設定轉譯任務的線程還是渲染線程而非RHI線程。

FParallelTranslateCommandList便是真正地轉譯指令隊列,它的定義如下:

class FParallelTranslateCommandList
{
    // 待轉譯的指令清單.
    FRHICommandListBase** RHICmdLists;
    // 需轉譯的指令清單數量.
    int32 NumCommandLists;
    // 上下文容器.
    IRHICommandContextContainer* ContextContainer;
    // 是否提前深度pass.
    bool bIsPrepass;
    
public:
    FParallelTranslateCommandList(FRHICommandListBase** InRHICmdLists, int32 InNumCommandLists, IRHICommandContextContainer* InContextContainer, bool bInIsPrepass)
        : RHICmdLists(InRHICmdLists)
        , NumCommandLists(InNumCommandLists)
        , ContextContainer(InContextContainer)
        , bIsPrepass(bInIsPrepass)
    {
        check(RHICmdLists && ContextContainer && NumCommandLists);
    }

    static FORCEINLINE TStatId GetStatId();

    // 預期的線程, 根據是否Prepass而定.
    ENamedThreads::Type GetDesiredThread()
    {
        return bIsPrepass ? CPrio_FParallelTranslateCommandListPrepass.Get() : CPrio_FParallelTranslateCommandList.Get();
    }

    static ESubsequentsMode::Type GetSubsequentsMode() { return ESubsequentsMode::TrackSubsequents; }

    // 執行任務.
    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        IRHICommandContext* Context = ContextContainer->GetContext();
        for (int32 Index = 0; Index < NumCommandLists; Index++)
        {
            // 設定子指令隊列的上下文.
            RHICmdLists[Index]->SetContext(Context);
            // 删除子指令隊列. 
            delete RHICmdLists[Index];
        }
        // 清理上下文.
        ContextContainer->FinishContext();
    }
};
           

上面的代碼需要補充幾點說明:

  • GetDesiredThread根據是否prepass由兩個控制台周遊決定:
    FAutoConsoleTaskPriority CPrio_FParallelTranslateCommandListPrepass(
        TEXT("TaskGraph.TaskPriorities.ParallelTranslateCommandListPrepass"),
        TEXT("Task and thread priority for FParallelTranslateCommandList for the prepass, which we would like to get to the GPU asap."),
        ENamedThreads::NormalThreadPriority,
        ENamedThreads::HighTaskPriority
        );
    
    FAutoConsoleTaskPriority CPrio_FParallelTranslateCommandList(
        TEXT("TaskGraph.TaskPriorities.ParallelTranslateCommandList"),
        TEXT("Task and thread priority for FParallelTranslateCommandList."),
        ENamedThreads::NormalThreadPriority,
        ENamedThreads::NormalTaskPriority
        );
               
    由此可知,如果是prepass,使用普通優先級的線程但高任務優先級,其它pass則使用普通優先級的線程和普通的任務優先級。
  • DoTask邏輯非常簡單,給指令隊列設定上下文,然後将指令隊列删除,最後清理上下文。不過這裡有個疑問,轉譯任務在哪裡執行?幾番盤查之後,發現是在FRHICommandListBase的析構函數之中,調用堆棧如下:
    FRHICommandListBase::~FRHICommandListBase()
    {
        // 重新整理指令清單.
        Flush();
        GRHICommandList.OutstandingCmdListCount.Decrement();
    }
    
    void FRHICommandListBase::Flush()
    {
        // 如果存在指令.
        if (HasCommands())
        {
            check(!IsImmediate());
            // 用全局指令清單執行之. GRHICommandList的類型是FRHICommandListExecutor.
            GRHICommandList.ExecuteList(*this);
        }
    }
    
    void FRHICommandListExecutor::ExecuteList(FRHICommandListBase& CmdList)
    {
        if (IsInRenderingThread() && !GetImmediateCommandList().IsExecuting())
        {
            GetImmediateCommandList().ImmediateFlush(EImmediateFlushType::DispatchToRHIThread);
        }
    
        ExecuteInner(CmdList);
    }
    
    void FRHICommandListExecutor::ExecuteInner(FRHICommandListBase& CmdList)
    {
        (......)
    }
               
    到了

    FRHICommandListExecutor::ExecuteInner

    這一步,就交給FRHICommandListExecutor處理了,具體過程和解析見10.4.1 RHI指令執行。

不過再次強調的是,需要圖形API支援并行送出和轉譯,才能開啟真正的并行渲染,否則就隻能按照普通的任務放到渲染線程執行。

普通Pass的渲染涉及到以下接口和類型:

// Engine\Source\Runtime\RHI\Public\RHIResources.h

// 渲染通道資訊.
struct FRHIRenderPassInfo
{
    // 渲染紋理資訊.
    struct FColorEntry
    {
        FRHITexture* RenderTarget;
        FRHITexture* ResolveTarget;
        int32 ArraySlice;
        uint8 MipIndex;
        ERenderTargetActions Action;
    };
    FColorEntry ColorRenderTargets[MaxSimultaneousRenderTargets];

    // 深度模闆資訊.
    struct FDepthStencilEntry
    {
        FRHITexture* DepthStencilTarget;
        FRHITexture* ResolveTarget;
        EDepthStencilTargetActions Action;
        FExclusiveDepthStencil ExclusiveDepthStencil;
    };
    FDepthStencilEntry DepthStencilRenderTarget;

    // 解析參數.
    FResolveParams ResolveParameters;

    // 部分RHI可以使用紋理來控制不同區域的采樣和/或陰影分辨率
    FTextureRHIRef FoveationTexture = nullptr;

    // 部分RHI需要一個提示,遮擋查詢将在這個渲染通道中使用
    uint32 NumOcclusionQueries = 0;
    bool bOcclusionQueries = false;

    // 部分RHI需要知道,在為部分資源轉換生成mip映射的情況下,這個渲染通道是否将讀取和寫入相同的紋理.
    bool bGeneratingMips = false;

    // 如果這個renderpass應該是多視圖,則需要多少視圖.
    uint8 MultiViewCount = 0;

    // 部分RHI的提示,渲染通道将有特定的子通道.
    ESubpassHint SubpassHint = ESubpassHint::None;

    // 是否太多UAV.
    bool bTooManyUAVs = false;
    bool bIsMSAA = false;

    // 不同的構造函數.
    
    // Color, no depth, optional resolve, optional mip, optional array slice
    explicit FRHIRenderPassInfo(FRHITexture* ColorRT, ERenderTargetActions ColorAction, FRHITexture* ResolveRT = nullptr, uint32 InMipIndex = 0, int32 InArraySlice = -1);
    // Color MRTs, no depth
    explicit FRHIRenderPassInfo(int32 NumColorRTs, FRHITexture* ColorRTs[], ERenderTargetActions ColorAction);
    // Color MRTs, no depth
    explicit FRHIRenderPassInfo(int32 NumColorRTs, FRHITexture* ColorRTs[], ERenderTargetActions ColorAction, FRHITexture* ResolveTargets[]);
    // Color MRTs and depth
    explicit FRHIRenderPassInfo(int32 NumColorRTs, FRHITexture* ColorRTs[], ERenderTargetActions ColorAction, FRHITexture* DepthRT, EDepthStencilTargetActions DepthActions, FExclusiveDepthStencil InEDS = FExclusiveDepthStencil::DepthWrite_StencilWrite);
    // Color MRTs and depth
    explicit FRHIRenderPassInfo(int32 NumColorRTs, FRHITexture* ColorRTs[], ERenderTargetActions ColorAction, FRHITexture* ResolveRTs[], FRHITexture* DepthRT, EDepthStencilTargetActions DepthActions, FRHITexture* ResolveDepthRT, FExclusiveDepthStencil InEDS = FExclusiveDepthStencil::DepthWrite_StencilWrite);
    // Depth, no color
    explicit FRHIRenderPassInfo(FRHITexture* DepthRT, EDepthStencilTargetActions DepthActions, FRHITexture* ResolveDepthRT = nullptr, FExclusiveDepthStencil InEDS = FExclusiveDepthStencil::DepthWrite_StencilWrite);
    // Depth, no color, occlusion queries
    explicit FRHIRenderPassInfo(FRHITexture* DepthRT, uint32 InNumOcclusionQueries, EDepthStencilTargetActions DepthActions, FRHITexture* ResolveDepthRT = nullptr, FExclusiveDepthStencil InEDS = FExclusiveDepthStencil::DepthWrite_StencilWrite);
    // Color and depth
    explicit FRHIRenderPassInfo(FRHITexture* ColorRT, ERenderTargetActions ColorAction, FRHITexture* DepthRT, EDepthStencilTargetActions DepthActions, FExclusiveDepthStencil InEDS = FExclusiveDepthStencil::DepthWrite_StencilWrite);
    // Color and depth with resolve
    explicit FRHIRenderPassInfo(FRHITexture* ColorRT, ERenderTargetActions ColorAction, FRHITexture* ResolveColorRT,
        FRHITexture* DepthRT, EDepthStencilTargetActions DepthActions, FRHITexture* ResolveDepthRT, FExclusiveDepthStencil InEDS = FExclusiveDepthStencil::DepthWrite_StencilWrite);
    // Color and depth with resolve and optional sample density
    explicit FRHIRenderPassInfo(FRHITexture* ColorRT, ERenderTargetActions ColorAction, FRHITexture* ResolveColorRT,
        FRHITexture* DepthRT, EDepthStencilTargetActions DepthActions, FRHITexture* ResolveDepthRT, FRHITexture* InFoveationTexture, FExclusiveDepthStencil InEDS = FExclusiveDepthStencil::DepthWrite_StencilWrite);

    enum ENoRenderTargets
    {
        NoRenderTargets,
    };
    explicit FRHIRenderPassInfo(ENoRenderTargets Dummy);
    explicit FRHIRenderPassInfo();

    inline int32 GetNumColorRenderTargets() const;
    RHI_API void Validate() const;
    RHI_API void ConvertToRenderTargetsInfo(FRHISetRenderTargetsInfo& OutRTInfo) const;

    (......)
};

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

class RHI_API FRHICommandList : public FRHIComputeCommandList
{
public:
    void BeginRenderPass(const FRHIRenderPassInfo& InInfo, const TCHAR* Name)
    {
        if (InInfo.bTooManyUAVs)
        {
            UE_LOG(LogRHI, Warning, TEXT("RenderPass %s has too many UAVs"));
        }
        InInfo.Validate();

        // 直接調用RHI的接口.
        if (Bypass())
        {
            GetContext().RHIBeginRenderPass(InInfo, Name);
        }
        // 配置設定RHI指令.
        else
        {
            TCHAR* NameCopy  = AllocString(Name);
            ALLOC_COMMAND(FRHICommandBeginRenderPass)(InInfo, NameCopy);
        }
        // 設定在RenderPass内标記.
        Data.bInsideRenderPass = true;

        // 緩存活動的RT.
        CacheActiveRenderTargets(InInfo);
        // 重置子Pass.
        ResetSubpass(InInfo.SubpassHint);
        Data.bInsideRenderPass = true;
    }

    void EndRenderPass()
    {
        // 調用或配置設定RHI接口.
        if (Bypass())
        {
            GetContext().RHIEndRenderPass();
        }
        else
        {
            ALLOC_COMMAND(FRHICommandEndRenderPass)();
        }
        // 重置在RenderPass内标記.
        Data.bInsideRenderPass = false;
        // 重置子Pass标記為None.
        ResetSubpass(ESubpassHint::None);
    }
};
           

它們的使用案例如下:

void FSceneRenderer::RenderShadowDepthMaps(FRHICommandListImmediate& RHICmdList)
{
    (......)
    
    for (int32 AtlasIndex = 0; AtlasIndex < SortedShadowsForShadowDepthPass.TranslucencyShadowMapAtlases.Num(); AtlasIndex++)
    {
        const FSortedShadowMapAtlas& ShadowMapAtlas = SortedShadowsForShadowDepthPass.TranslucencyShadowMapAtlases[AtlasIndex];
        FIntPoint TargetSize = ShadowMapAtlas.RenderTargets.ColorTargets[0]->GetDesc().Extent;

        FSceneRenderTargetItem ColorTarget0 = ShadowMapAtlas.RenderTargets.ColorTargets[0]->GetRenderTargetItem();
        FSceneRenderTargetItem ColorTarget1 = ShadowMapAtlas.RenderTargets.ColorTargets[1]->GetRenderTargetItem();

        FRHITexture* RenderTargetArray[2] =
        {
            ColorTarget0.TargetableTexture,
            ColorTarget1.TargetableTexture
        };

        // 建立FRHIRenderPassInfo執行個體.
        FRHIRenderPassInfo RPInfo(UE_ARRAY_COUNT(RenderTargetArray), RenderTargetArray, ERenderTargetActions::Load_Store);
        TransitionRenderPassTargets(RHICmdList, RPInfo);
        // 開始渲染Pass.
        RHICmdList.BeginRenderPass(RPInfo, TEXT("RenderTranslucencyDepths"));
        {
            // 渲染陰影.
            for (int32 ShadowIndex = 0; ShadowIndex < ShadowMapAtlas.Shadows.Num(); ShadowIndex++)
            {
                FProjectedShadowInfo* ProjectedShadowInfo = ShadowMapAtlas.Shadows[ShadowIndex];
                ProjectedShadowInfo->SetupShadowUniformBuffers(RHICmdList, Scene);
                ProjectedShadowInfo->RenderTranslucencyDepths(RHICmdList, this);
            }
        }
        // 結束渲染Pass.
        RHICmdList.EndRenderPass();

        RHICmdList.Transition(FRHITransitionInfo(ColorTarget0.TargetableTexture, ERHIAccess::Unknown, ERHIAccess::SRVMask));
        RHICmdList.Transition(FRHITransitionInfo(ColorTarget1.TargetableTexture, ERHIAccess::Unknown, ERHIAccess::SRVMask));
    }
    
    (......)
}
           

先說一下Subpass的由來、作用和特點。

在傳統的多Pass渲染中,每個Pass結束時通常會渲染出一組渲染紋理,部分成為着色器參數提供給下一個Pass采樣讀取。這種紋理采樣方式不受任何限制,可以讀取任意的領域像素,使用任意的紋理過濾方式。這種方式雖然使用靈活,但在TBR(Tile-Based Renderer)硬體架構的裝置中會有較大的消耗:渲染紋理的Pass通常會将渲染結果存儲在On-chip的Tile Memory中,待Pass結束後會寫回GPU顯存(VRAM)中,寫回GPU顯存是個耗時耗耗電的操作。

剖析虛幻渲染體系(10)- RHI

傳統多Pass之間的記憶體存取模型,多次發生于On-Chip和全局存儲器之間。

如果出現一種特殊的紋理使用情況:上一個Pass渲染處理的紋理,立即被下一個Pass使用,并且下一個Pass隻采樣像素位置自身的資料,而不需要采樣鄰域像素的位置。這種情況就符合了Subpass的使用情景。使用Subpass渲染的紋理結果隻會存儲在Tile Memory中,在Subpass結束後不會寫回VRAM,而直接提供Tile Memory的資料給下一個Subpass采樣讀取。這樣就避免了傳統Pass結束寫回GPU顯存以及下一個Pass又從GPU顯存讀資料的耗時耗電操作,進而提升了性能。

剖析虛幻渲染體系(10)- RHI

Subpass之間的記憶體存取模型,都發生在On-Chip内。

UE涉及Subpass的接口和類型如下:

// Engine\Source\Runtime\RHI\Public\RHIResources.h

// 提供給RHI的Subpass标記.
enum class ESubpassHint : uint8
{
    None,                    // 傳統渲染(非Subpass)
    DepthReadSubpass,        // 深度讀取Subpass.
    DeferredShadingSubpass, // 移動端延遲着色Subpass.
};

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

class RHI_API FRHICommandListBase : public FNoncopyable
{
    (......)
    
protected:
    // PSO上下文.
    struct FPSOContext
    {
        uint32 CachedNumSimultanousRenderTargets = 0;
        TStaticArray<FRHIRenderTargetView, MaxSimultaneousRenderTargets> CachedRenderTargets;
        FRHIDepthRenderTargetView CachedDepthStencilTarget;
        
        // Subpass提示标記.
        ESubpassHint SubpassHint = ESubpassHint::None;
        uint8 SubpassIndex = 0;
        uint8 MultiViewCount = 0;
        bool HasFragmentDensityAttachment = false;
    } PSOContext;
};

class RHI_API FRHICommandList : public FRHIComputeCommandList
{
public:
    void BeginRenderPass(const FRHIRenderPassInfo& InInfo, const TCHAR* Name)
    {
        (......)

        CacheActiveRenderTargets(InInfo);
        // 設定Subpass資料.
        ResetSubpass(InInfo.SubpassHint);
        Data.bInsideRenderPass = true;
    }

    void EndRenderPass()
    {
        (......)
        
        // 重置Subpass标記為None.
        ResetSubpass(ESubpassHint::None);
    }

    // 下一個Subpass.
    void NextSubpass()
    {
        // 配置設定或調用RHI接口.
        if (Bypass())
        {
            GetContext().RHINextSubpass();
        }
        else
        {
            ALLOC_COMMAND(FRHICommandNextSubpass)();
        }
        
        // 增加Subpass計數.
        IncrementSubpass();
    }
    
    // 增加subpass計數.
    void IncrementSubpass()
    {
        PSOContext.SubpassIndex++;
    }
    
    // 重置Subpass資料.
    void ResetSubpass(ESubpassHint SubpassHint)
    {
        PSOContext.SubpassHint = SubpassHint;
        PSOContext.SubpassIndex = 0;
    }
};
           

UE的Subpass主要集中在移動端渲染器:

剖析虛幻渲染體系(10)- RHI

原因是移動端TBR架構的硬體裝置越來越多,占比愈來愈大,Subpass成為移動端主渲染器的首選是必然且合理的。

在Subpass渲染中,還是涉及到了Pass的Overlap問題,采用Overlap可以提升GPU的使用率,提升渲染性能(下圖)。

剖析虛幻渲染體系(10)- RHI
剖析虛幻渲染體系(10)- RHI

上:未采用Overlap技術的Subpass管線;下:采用了Overlap技術的Subpass管線。

RHI有關Overlap的指令主要是UAV:

class RHI_API FRHIComputeCommandList : public FRHICommandListBase
{
    (......)
    
    void BeginUAVOverlap()
    {
        if (Bypass())
        {
            GetContext().RHIBeginUAVOverlap();
            return;
        }
        ALLOC_COMMAND(FRHICommandBeginUAVOverlap)();
    }

    void EndUAVOverlap()
    {
        if (Bypass())
        {
            GetContext().RHIEndUAVOverlap();
            return;
        }
        ALLOC_COMMAND(FRHICommandEndUAVOverlap)();
    }

    void BeginUAVOverlap(FRHIUnorderedAccessView* UAV)
    {
        FRHIUnorderedAccessView* UAVs[1] = { UAV };
        BeginUAVOverlap(MakeArrayView(UAVs, 1));
    }

    void EndUAVOverlap(FRHIUnorderedAccessView* UAV)
    {
        FRHIUnorderedAccessView* UAVs[1] = { UAV };
        EndUAVOverlap(MakeArrayView(UAVs, 1));
    }

    void BeginUAVOverlap(TArrayView<FRHIUnorderedAccessView* const> UAVs)
    {
        if (Bypass())
        {
            GetContext().RHIBeginUAVOverlap(UAVs);
            return;
        }

        const uint32 AllocSize = UAVs.Num() * sizeof(FRHIUnorderedAccessView*);
        FRHIUnorderedAccessView** InlineUAVs = (FRHIUnorderedAccessView**)Alloc(AllocSize, alignof(FRHIUnorderedAccessView*));
        FMemory::Memcpy(InlineUAVs, UAVs.GetData(), AllocSize);
        ALLOC_COMMAND(FRHICommandBeginSpecificUAVOverlap)(MakeArrayView(InlineUAVs, UAVs.Num()));
    }

    void EndUAVOverlap(TArrayView<FRHIUnorderedAccessView* const> UAVs)
    {
        if (Bypass())
        {
            GetContext().RHIEndUAVOverlap(UAVs);
            return;
        }

        const uint32 AllocSize = UAVs.Num() * sizeof(FRHIUnorderedAccessView*);
        FRHIUnorderedAccessView** InlineUAVs = (FRHIUnorderedAccessView**)Alloc(AllocSize, alignof(FRHIUnorderedAccessView*));
        FMemory::Memcpy(InlineUAVs, UAVs.GetData(), AllocSize);
        ALLOC_COMMAND(FRHICommandEndSpecificUAVOverlap)(MakeArrayView(InlineUAVs, UAVs.Num()));
    }
}
           

10.2.2 FRHIResource章節已經闡述過RHI資源的基本接口,FRHIResource自身擁有引用計數和引用計數增加、減少的接口:

class RHI_API FRHIResource
{
public:
    // 增加引用計數.
    uint32 AddRef() const;
    // 減少引用計數.
    uint32 Release() const;
    // 擷取引用計數.
    uint32 GetRefCount() const;
};
           

當然,我們不需要直接引用和管理FRHIResource的執行個體和計數,而是結合TRefCountPtr的模闆類實作自動化管理RHI資源:

// 各種RHI資源引用類型定義.
typedef TRefCountPtr<FRHISamplerState> FSamplerStateRHIRef;
typedef TRefCountPtr<FRHIRasterizerState> FRasterizerStateRHIRef;
typedef TRefCountPtr<FRHIDepthStencilState> FDepthStencilStateRHIRef;
typedef TRefCountPtr<FRHIBlendState> FBlendStateRHIRef;
typedef TRefCountPtr<FRHIVertexDeclaration> FVertexDeclarationRHIRef;
typedef TRefCountPtr<FRHIVertexShader> FVertexShaderRHIRef;
typedef TRefCountPtr<FRHIHullShader> FHullShaderRHIRef;
typedef TRefCountPtr<FRHIDomainShader> FDomainShaderRHIRef;
typedef TRefCountPtr<FRHIPixelShader> FPixelShaderRHIRef;
typedef TRefCountPtr<FRHIGeometryShader> FGeometryShaderRHIRef;
typedef TRefCountPtr<FRHIComputeShader> FComputeShaderRHIRef;
typedef TRefCountPtr<FRHIRayTracingShader> FRayTracingShaderRHIRef;
typedef TRefCountPtr<FRHIComputeFence>    FComputeFenceRHIRef;
typedef TRefCountPtr<FRHIBoundShaderState> FBoundShaderStateRHIRef;
typedef TRefCountPtr<FRHIUniformBuffer> FUniformBufferRHIRef;
typedef TRefCountPtr<FRHIIndexBuffer> FIndexBufferRHIRef;
typedef TRefCountPtr<FRHIVertexBuffer> FVertexBufferRHIRef;
typedef TRefCountPtr<FRHIStructuredBuffer> FStructuredBufferRHIRef;
typedef TRefCountPtr<FRHITexture> FTextureRHIRef;
typedef TRefCountPtr<FRHITexture2D> FTexture2DRHIRef;
typedef TRefCountPtr<FRHITexture2DArray> FTexture2DArrayRHIRef;
typedef TRefCountPtr<FRHITexture3D> FTexture3DRHIRef;
typedef TRefCountPtr<FRHITextureCube> FTextureCubeRHIRef;
typedef TRefCountPtr<FRHITextureReference> FTextureReferenceRHIRef;
typedef TRefCountPtr<FRHIRenderQuery> FRenderQueryRHIRef;
typedef TRefCountPtr<FRHIRenderQueryPool> FRenderQueryPoolRHIRef;
typedef TRefCountPtr<FRHITimestampCalibrationQuery> FTimestampCalibrationQueryRHIRef;
typedef TRefCountPtr<FRHIGPUFence>    FGPUFenceRHIRef;
typedef TRefCountPtr<FRHIViewport> FViewportRHIRef;
typedef TRefCountPtr<FRHIUnorderedAccessView> FUnorderedAccessViewRHIRef;
typedef TRefCountPtr<FRHIShaderResourceView> FShaderResourceViewRHIRef;
typedef TRefCountPtr<FRHIGraphicsPipelineState> FGraphicsPipelineStateRHIRef;
typedef TRefCountPtr<FRHIRayTracingPipelineState> FRayTracingPipelineStateRHIRef;
           

使用以上類型之後,RHI資源由TRefCountPtr自動管理引用計數,其中資源的釋放是在FRHIResource::Release中:

class RHI_API FRHIResource
{
    uint32 Release() const
    {
        // 計數-1.
        int32 NewValue = NumRefs.Decrement();
        // 如果計數為0, 處理資源删除.
        if (NewValue == 0)
        {
            // 非延遲删除, 直接delete.
            if (!DeferDelete())
            { 
                delete this;
            }
            // 延遲删除模式.
            else
            {
                // 使用平台相關的原子對比, 為0則加入待删除清單.
                if (FPlatformAtomics::InterlockedCompareExchange(&MarkedForDelete, 1, 0) == 0)
                {
                    PendingDeletes.Push(const_cast<FRHIResource*>(this));
                }
            }
        }
        
        // 傳回新的值.
        return uint32(NewValue);
    }
    
    bool DeferDelete() const
    {
        // 啟用了多線程渲染且GRHINeedsExtraDeletionLatency為true, 且資源沒有不延遲删除的标記.
        return !bDoNotDeferDelete && (GRHINeedsExtraDeletionLatency || !Bypass());
    }
};
           

PendingDeletes

是FRHIResource的靜态變量,與它相關的資料和接口有:

class RHI_API FRHIResource
{
public:
    FRHIResource(bool InbDoNotDeferDelete = false)
        : MarkedForDelete(0)
        , bDoNotDeferDelete(InbDoNotDeferDelete)
        , bCommitted(true)
    {
    }
    virtual ~FRHIResource() 
    {
        check(PlatformNeedsExtraDeletionLatency() || (NumRefs.GetValue() == 0 && (CurrentlyDeleting == this || bDoNotDeferDelete || Bypass()))); // this should not have any outstanding refs
    }

    // 待删除資源清單, 注意是無鎖無序的指針清單.
    static TLockFreePointerListUnordered<FRHIResource, PLATFORM_CACHE_LINE_SIZE> PendingDeletes;
    // 目前正在删除的資源.
    static FRHIResource* CurrentlyDeleting;
    
    // 平台需要額外的删除延遲.
    static bool PlatformNeedsExtraDeletionLatency()
    {
        return GRHINeedsExtraDeletionLatency && GIsRHIInitialized;
    }

    // 待删除資源清單.
    struct ResourcesToDelete
    {
        TArray<FRHIResource*>    Resources;
        uint32                    FrameDeleted;
    };

    // 延遲删除隊列.
    static TArray<ResourcesToDelete> DeferredDeletionQueue;
    static uint32 CurrentFrame;
};

void FRHIResource::FlushPendingDeletes(bool bFlushDeferredDeletes)
{
    FRHICommandListImmediate& RHICmdList = FRHICommandListExecutor::GetImmediateCommandList();
    
    // 在删除RHI資源之前, 先確定指令清單已被重新整理到GPU.
    RHICmdList.ImmediateFlush(EImmediateFlushType::FlushRHIThread);
    // 確定沒有等待的任務.
    FRHICommandListExecutor::CheckNoOutstandingCmdLists();
    // 通知RHI重新整理完成.
    if (GDynamicRHI)
    {
        GDynamicRHI->RHIPerFrameRHIFlushComplete();
    }

    // 删除匿名函數.
    auto Delete = [](TArray<FRHIResource*>& ToDelete)
    {
        for (int32 Index = 0; Index < ToDelete.Num(); Index++)
        {
            FRHIResource* Ref = ToDelete[Index];
            check(Ref->MarkedForDelete == 1);
            if (Ref->GetRefCount() == 0) // caches can bring dead objects back to life
            {
                CurrentlyDeleting = Ref;
                delete Ref;
                CurrentlyDeleting = nullptr;
            }
            else
            {
                Ref->MarkedForDelete = 0;
                FPlatformMisc::MemoryBarrier();
            }
        }
    };

    while (1)
    {
        if (PendingDeletes.IsEmpty())
        {
            break;
        }
        
        // 平台需要額外的删除延遲.
        if (PlatformNeedsExtraDeletionLatency())
        {
            const int32 Index = DeferredDeletionQueue.AddDefaulted();
            // 加入延遲删除隊列DeferredDeletionQueue.
            ResourcesToDelete& ResourceBatch = DeferredDeletionQueue[Index];
            ResourceBatch.FrameDeleted = CurrentFrame;
            PendingDeletes.PopAll(ResourceBatch.Resources);
        }
        // 不需要額外的延遲, 删除整個清單.
        else
        {
            TArray<FRHIResource*> ToDelete;
            PendingDeletes.PopAll(ToDelete);
            Delete(ToDelete);
        }
    }

    const uint32 NumFramesToExpire = RHIRESOURCE_NUM_FRAMES_TO_EXPIRE;

    // 删除DeferredDeletionQueue.
    if (DeferredDeletionQueue.Num())
    {
        // 清空整個DeferredDeletionQueue隊列.
        if (bFlushDeferredDeletes)
        {
            FRHICommandListExecutor::GetImmediateCommandList().BlockUntilGPUIdle();

            for (int32 Idx = 0; Idx < DeferredDeletionQueue.Num(); ++Idx)
            {
                ResourcesToDelete& ResourceBatch = DeferredDeletionQueue[Idx];
                Delete(ResourceBatch.Resources);
            }

            DeferredDeletionQueue.Empty();
        }
        // 删除過期的資源清單.
        else
        {
            int32 DeletedBatchCount = 0;
            while (DeletedBatchCount < DeferredDeletionQueue.Num())
            {
                ResourcesToDelete& ResourceBatch = DeferredDeletionQueue[DeletedBatchCount];
                if (((ResourceBatch.FrameDeleted + NumFramesToExpire) < CurrentFrame) || !GIsRHIInitialized)
                {
                    Delete(ResourceBatch.Resources);
                    ++DeletedBatchCount;
                }
                else
                {
                    break;
                }
            }

            if (DeletedBatchCount)
            {
                DeferredDeletionQueue.RemoveAt(0, DeletedBatchCount);
            }
        }

        ++CurrentFrame;
    }
}
           

不過,需要特意指出,FRHIResource的析構函數并沒有釋放任何RHI資源,通常需要在FRHIResource的圖形平台相關的子類析構函數中執行,以FD3D11UniformBuffer:

// Engine\Source\Runtime\Windows\D3D11RHI\Public\D3D11Resources.h

class FD3D11UniformBuffer : public FRHIUniformBuffer
{
public:
    // D3D11固定緩沖資源.
    TRefCountPtr<ID3D11Buffer> Resource;
    // 包含了RHI引用的資源表.
    TArray<TRefCountPtr<FRHIResource> > ResourceTable;

    FD3D11UniformBuffer(class FD3D11DynamicRHI* InD3D11RHI, const FRHIUniformBufferLayout& InLayout, ID3D11Buffer* InResource,const FRingAllocation& InRingAllocation);
    virtual ~FD3D11UniformBuffer();

    (......)
};

// Engine\Source\Runtime\Windows\D3D11RHI\Private\D3D11UniformBuffer.cpp

FD3D11UniformBuffer::~FD3D11UniformBuffer()
{
    if (!RingAllocation.IsValid() && Resource != nullptr)
    {
        D3D11_BUFFER_DESC Desc;
        Resource->GetDesc(&Desc);

        // 将此統一緩沖區傳回給空閑池.
        if (Desc.CPUAccessFlags == D3D11_CPU_ACCESS_WRITE && Desc.Usage == D3D11_USAGE_DYNAMIC)
        {
            FPooledUniformBuffer NewEntry;
            NewEntry.Buffer = Resource;
            NewEntry.FrameFreed = GFrameNumberRenderThread;
            NewEntry.CreatedSize = Desc.ByteWidth;

            // Add to this frame's array of free uniform buffers
            const int32 SafeFrameIndex = (GFrameNumberRenderThread - 1) % NumSafeFrames;
            const uint32 BucketIndex = GetPoolBucketIndex(Desc.ByteWidth);
            int32 LastNum = SafeUniformBufferPools[SafeFrameIndex][BucketIndex].Num();
            SafeUniformBufferPools[SafeFrameIndex][BucketIndex].Add(NewEntry);

            FPlatformMisc::MemoryBarrier(); // check for unwanted concurrency
        }
    }
}
           

上面的分析顯示,RHI資源的釋放主要在FlushPendingDeletes接口中,涉及它的調用有:

// Engine\Source\Runtime\RenderCore\Private\RenderingThread.cpp

void FlushPendingDeleteRHIResources_RenderThread()
{
    if (!IsRunningRHIInSeparateThread())
    {
        FRHIResource::FlushPendingDeletes();
    }
}

// Engine\Source\Runtime\RHI\Private\RHICommandList.cpp

void FRHICommandListExecutor::LatchBypass()
{
#if CAN_TOGGLE_COMMAND_LIST_BYPASS
    if (IsRunningRHIInSeparateThread())
    {
        (......)
    }
    else
    {
        (......)

        if (NewBypass && !bLatchedBypass)
        {
            FRHIResource::FlushPendingDeletes();
        }
    }
#endif
    
    (......)
}

// Engine\Source\Runtime\RHI\Public\RHICommandList.inl

void FRHICommandListImmediate::ImmediateFlush(EImmediateFlushType::Type FlushType)
{
    switch (FlushType)
    {
    (......)
            
    case EImmediateFlushType::FlushRHIThreadFlushResources:
    case EImmediateFlushType::FlushRHIThreadFlushResourcesFlushDeferredDeletes:
        {
            (......)
            
            PipelineStateCache::FlushResources();
            FRHIResource::FlushPendingDeletes(FlushType == EImmediateFlushType::FlushRHIThreadFlushResourcesFlushDeferredDeletes);
        }
        break;
    (......)
    }
}
           

RHI抽象層主要是以上幾處調用FlushPendingDeletes,但以下的圖形平台相關的接口也會調用:

  • FD3D12Adapter::Cleanup()
  • FD3D12Device::Cleanup()
  • FVulkanDevice::Destroy()
  • FVulkanDynamicRHI::Shutdown()
  • FD3D11DynamicRHI::CleanupD3DDevice()

剖析虛幻渲染體系(02)- 多線程渲染篇章中已經詳盡地闡述了UE多線程的體系和渲染機制,本節結合下圖補充一些說明。

剖析虛幻渲染體系(10)- RHI

UE的渲染流程中,最多存在4種工作線程:遊戲線程(Game Thread)、渲染線程(Render Thread)、RHI線程和GPU(含驅動)。

遊戲線程是整個引擎的驅動者,提供所有的源資料和事件,以驅動渲染線程和RHI線程。遊戲線程領先渲染線程不超過1幀,更具體地說如果第N幀的渲染線程在第N+1幀的遊戲線程的Tick結束時還沒有完成,那麼遊戲線程會被渲染線程卡住。反之,如果遊戲線程負載過重,沒能及時發送事件和資料給渲染線程,也會導緻渲染線程卡住。

渲染線程負責産生RHI的中間指令,在适當的時機派發、重新整理指令到RHI線程。是以,渲染線程的卡頓也可能導緻RHI的卡頓。

RHI線程負責派發(可選)、轉譯、送出指令,且渲染的最後一步需要SwapBuffer,這一步需要等待GPU完成渲染工作。是以,渲染GPU的繁忙也會導緻RHI線程的卡頓。

除了遊戲線程,渲染線程、RHI線程和GPU的工作都是存在間隙的,即遊戲線程提供給渲染任務的時機會影響渲染工作的密度,也會影響到渲染的時間,小量多次會浪費渲染效率。

前面章節的代碼也顯示RHI體系涉及的控制台變量非常多,下面列出部分控制台變量,以便調試、優化RHI渲染效果或效率:

名稱
r.RHI.Name 顯示目前RHI的名字,如D3D11。
r.RHICmdAsyncRHIThreadDispatch 實驗選項,是否執行RHI排程異步。可使資料更快地重新整理到RHI線程,避免幀末尾出現卡頓。
r.RHICmdBalanceParallelLists 允許啟用DrawList的預處理,以嘗試在指令清單之間均衡負載。0:關閉,1:開啟,2:實驗選項,使用上一幀的結果(在分屏等不做任何事情)。
r.RHICmdBalanceTranslatesAfterTasks 實驗選項,平衡并行翻譯後的渲染任務完成。可最小化延遲上下文的數量,但會增加啟動轉譯的延遲。
r.RHICmdBufferWriteLocks 僅與RHI線程相關。用于診斷緩沖鎖問題的調試選項。
r.RHICmdBypass 是否繞過RHI指令清單,立即發送RHI指令。0:禁用(需開啟多線程渲染),1:開啟。
r.RHICmdCollectRHIThreadStatsFromHighLevel 這将在執行的RHI線程上推送統計資訊,這樣就可以确定它們來自哪個高層級的Pass。對幀速率有不利影響。預設開啟。
r.RHICmdFlushOnQueueParallelSubmit 在送出後立即等待并行指令清單的完成。問題診斷。隻适用于部分RHI。
r.RHICmdFlushRenderThreadTasks 如果為真,則每次調用時都重新整理渲染線程任務。問題診斷。這是一個更細粒度cvars的主開關。
r.RHICmdForceRHIFlush 對每個任務強制重新整理發送給RHI線程。問題診斷。
r.RHICmdMergeSmallDeferredContexts 合并小的并行轉譯任務,基于r.RHICmdMinDrawsPerParallelCmdList。
r.RHICmdUseDeferredContexts 使用延遲上下文并行執行指令清單。隻适用于部分RHI。
r.RHICmdUseParallelAlgorithms True使用并行算法。如果r.RHICmdBypass為1則忽略。
r.RHICmdUseThread 使用RHI線程。問題診斷。
r.RHICmdWidth 控制并行渲染器中大量事物的任務粒度。
r.RHIThread.Enable 啟用/禁用RHI線程,并确定RHI工作是否在專用線程上運作。
RHI.GPUHitchThreshold GPU上檢測卡頓的門檻值(毫秒)。
RHI.MaximumFrameLatency 可以排隊進行渲染的幀數。
RHI.SyncThreshold 在垂直同步功能啟用前的連續“快速”幀數。
RHI.TargetRefreshRate 如果非零,則顯示的更新頻率永遠不會超過目标重新整理率(以Hz為機關)。

需要注意的是,以上隻列出部分RHI相關的變量,還有很多未列出,具體可以在下列菜單中檢視全面指令:

剖析虛幻渲染體系(10)- RHI

本篇主要闡述了UE的RHI體系的基礎概念、類型、機制,希望童鞋們學習完本篇之後,對UE的RHI不再陌生,能夠輕松自如地掌握、應用、擴充它。

按慣例,本篇也布置一些小思考,以助了解和加深UE RHI體系的掌握和了解:

  • RHI資源有哪些類型?和渲染層的資源有什麼關系和差別?渲染系統如何删除RHI資源?
  • RHI的指令有哪些主要類型?指令清單的執行機制和流程是怎樣的?
  • 簡述RHI的上下文和DynamicRHI之間的關聯。簡述D3D11的實作架構。
  • UE的多線程之間的關聯如何?什麼因素會導緻它們的卡頓?

  • 感謝所有參考文獻的作者,部分圖檔來自參考文獻和網絡,侵删。
  • 本系列文章為筆者原創,隻發表在部落格園上,歡迎分享本文連結,但未經同意,不允許轉載!
  • 系列文章,未完待續,完整目錄請戳内容綱目。

  • Unreal Engine Source
  • Rendering and Graphics
  • Materials
  • Graphics Programming
  • Graphics Programming Overview
  • UE4渲染子產品分析
  • UE4 Render System Sheet
  • 【UE4 Renderer】<03> PipelineBase
  • Scalability for All: Unreal Engine 4 with Intel
  • 自下而上反思Shader的執行機制(兼Vulkan學習總結)
  • Learning DirectX 12 – Lesson 1 – Initialize DirectX 12
  • Learning DirectX 12 – Lesson 2 – Rendering
  • Learning DirectX 12 – Lesson 3 – Framework
  • Learning DirectX 12 – Lesson 4 – Textures
  • UE進階性能剖析技術之RHI
  • 進擊的 Vulkan 移動開發之 Command Buffer
  • BRINGING UNREAL ENGINE 4 TO OPENGL Nick Penwarden Epic Games
  • Subpass 初步
  • Best Practice for Mobile

繼續閱讀