天天看點

移動端實時三維重建KinectFusion-ios(2)——算法調用、算法架構

       部落客的源碼位址:https://github.com/sjy234sjy234/KinectFusion-ios

       原論文:"KinectFusion: Real-time dense surface mapping and tracking."

       本文主要介紹KinectFusion-ios中的接口調用方法,以及KinectFusion的算法架構。

一、算法調用示例

       在上一篇的博文裡已經介紹過,算法調用示例的代碼主要編寫在ViewController.mm檔案中。其中,從depth.bin讀取深度幀資料流的過程,這是ios研發相關内容,不再展開介紹。這裡主要解釋一下KinectFusion的調用示例,主要是FusionProcessor類的初始化和調用。

       首先,是viewDidLoad方法中FusionProcessor的初始化代碼

self.fusionProcessor = [FusionProcessor shareFusionProcessorWithContext: _metalContext];
    [self.fusionProcessor setRenderBackColor: {24.0 / 255, 31.0 / 255, 50.0 / 255, 1}];
    simd::float4 cube = {-107.080887, -96.241348, -566.015991, 223.474106};
    [self.fusionProcessor setupTsdfParameterWithCube: cube];
           

        其中,第1行是類的初始化,第2行設定渲染展示時的背景顔色,第3、4行設定一個立方體(x, y, z, w)。這裡解釋一下,KinectFusion算法需要初始化一個立方體包圍盒,代碼中的cube就是一個預先确定的人臉立方體包圍盒,這可以通過計算機視覺的方法獲得,這裡不展開介紹。其中,(x, y, z)表示立方體的一個頂點,w表示立方體的邊長,進而确定三維空間中的一個立方體包圍盒。

        然後,是讀取深度資料流後執行的回調方法

- (void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)eventCode {
    switch(eventCode) {
        case NSStreamEventHasBytesAvailable:
        {
            //read every frame from depth.bin, which contains one single disparity frame of 640 x 480 x float16,
            //we can easily derive depth from disparity: depth = 1.0 / disparity;
            int frameLen = PORTRAIT_WIDTH * PORTRAIT_HEIGHT * 2;
            uint8_t* buf = new uint8_t[frameLen];
            unsigned int len = 0;
            len = [(NSInputStream *)stream read:buf maxLength:frameLen];
            if(len == frameLen)
            {
                BOOL isFusionOK = [self.fusionProcessor processDisparityData:buf withIndex:m_fusionFrameIndex withTsdfUpdate: YES];
                if(isFusionOK)
                {
                    id<MTLTexture> textureAfterFusion=[self.fusionProcessor getColorTexture];
                    [self.scanningRenderer render: textureAfterFusion];
                    m_fusionFrameIndex++;
                }
                else
                {
                    NSLog(@"Fusion Failed");
                }
            }
            delete buf;
            break;
        }
        default:
            if(m_fusionFrameIndex > 0)
            {
                m_isFusionComplete = YES;
            }
    }
}
           

        其中,7-10行從資料流讀取一幀的深度資料,單幀的大小是480x640x2 byte;13行完成一次KinectFusion的單幀處理的調用;16-17行是将目前重建的模型進行渲染展示;18行更新幀的序列号m_fusionFrameIndex。

        下面是重置FusionProcessor的操作,也是app中唯一的一個按鍵回調,重新進入一次全新的掃描

- (IBAction)onResetScan:(id)sender {
    if(m_isFusionComplete)
    {
        m_fusionFrameIndex = 0;
        m_isFusionComplete = NO;
        simd::float4 cube = {-107.080887, -96.241348, -566.015991, 223.474106};
        [self.fusionProcessor setupTsdfParameterWithCube: cube];
        [self setUpStreamForFile: self.streamPath];
    }
}
           

        第4行中,幀的序列号m_fusionFrameIndex初始化時必須為0,FusionProcessor會根據m_fusionFrameIndex的值是否為0,自動完成初始化,無需顯式調用初始化方法。然後,第6-7行,重新設定新的立方體cube,就可以進入一次全新的掃描。

二、算法架構介紹

        這裡,從算法實作的角度,畫了一個KinectFusion算法架構的流程圖如下

移動端實時三維重建KinectFusion-ios(2)——算法調用、算法架構

        其中,圓形框表示資料流,矩形框表示處理子產品。可以看到,算法主要由4個處理子產品構成:資料準備、TSDF、ICP、MarchingCube。

        之前的文章介紹過,項目的FusionProcessor檔案夾裡包含KinectFusion的源碼。其中,所有的4個處理子產品都存在其下的FusionComputer子檔案夾中。并且FusionComputer下的4個子檔案夾一一對應各子產品的實作:FuPreProcess、FuTsdfFusioner、FuICPMatrix、FuMarchingCube。最後,FusionProcessor類的檔案FusionProcessor.mm則直接放在FusionProcessor檔案夾裡,實作了上面算法流程圖裡面的整個過程,将各個子產品組織成完整的KinectFusion算法。

       需要注意的是,上述的流程圖,需要對照源碼進行了解,并且不一定十分完備,不要過于深究。

       首先,介紹下FusionProcessor類的對外主要接口,有如下兩個

- (BOOL) processDisparityData: (uint8_t *)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;

- (BOOL) processDisparityPixelBuffer: (CVPixelBufferRef)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
           

        其中,第1行就是在前面的算法調用示例中出現過的接口調用,傳入每一幀新的深度幀資料byte流,以更新重建三維場景。第3行是用于直接從iphoneX擷取的深度資料類型CVPixelBufferRef的實時調用借口。實際上,兩者完成的操作是一模一樣的,隻是傳入的資料流格式略有差别,下面是兩個方法的實作

- (BOOL) processDisparityData: (uint8_t *)disparityData withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate
{
    id<MTLBuffer> inDisparityMapBuffer = [_metalContext.device newBufferWithBytes:disparityData
                                                                           length:PORTRAIT_WIDTH*PORTRAIT_HEIGHT*2
                                                                          options:MTLResourceOptionCPUCacheModeDefault];
    return [self processFrame: inDisparityMapBuffer withIndex: fusionFrameIndex withTsdfUpdate: isTsdfUpdate];
}

- (BOOL) processDisparityPixelBuffer: (CVPixelBufferRef)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate
{
    id<MTLBuffer> inDisparityMapBuffer = [_metalContext bufferWithF16PixelBuffer: disparityPixelBuffer];
    return [self processFrame: inDisparityMapBuffer withIndex: fusionFrameIndex withTsdfUpdate: isTsdfUpdate];
}
           

        可以看到,兩個方法隻是做了資料的中轉處理,把深度資料統一處理成MTLBuffer資料類型,然後統一調用processFrame私有方法,實作KinectFusion的單幀處理過程。

        processFrame是KinectFusion算法架構的核心處理函數,其内容如下        

- (BOOL) processFrame: (id<MTLBuffer>) inDisparityMapBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
{
    //pre-process
    [_fuMeshToTexture drawPoints: m_mCubeExtractPointBuffer normals: m_mCubeExtractNormalBuffer intoColorTexture: m_colorTexture andDepthTexture: m_depthTexture withTransform: m_projectionTransform * m_globalToFrameTransform];
    [_fuTextureToDepth compute: m_depthTexture intoTexture: m_preDepthMapPyramid[0] with: m_cameraNDC2Depth];
    [_fuDisparityToDepth compute: inDisparityMapBuffer intoDepthMapBuffer: m_currentDepthMapPyramid[0]];
    for(int level=1;level<PYRAMID_LEVEL;++level)
    {
        [_fuPyramidDepthMap compute: m_currentDepthMapPyramid[level - 1] intoDepthMapBuffer: m_currentDepthMapPyramid[level] withLevel: level];
        [_fuPyramidDepthMap compute: m_preDepthMapPyramid[level - 1] intoDepthMapBuffer: m_preDepthMapPyramid[level] withLevel: level];
    }
    for(int level=0;level<PYRAMID_LEVEL;++level)
    {
        [_fuDepthToVertex compute: m_currentDepthMapPyramid[level] intoVertexMapBuffer: m_currentVertexMapPyramid[level] withLevel: level andIntrinsicUVD2XYZ: m_intrinsicUVD2XYZ[level]];
        [_fuVertexToNormal compute: m_currentVertexMapPyramid[level] intoNormalMapBuffer: m_currentNormalMapPyramid[level] withLevel: level];
        [_fuDepthToVertex compute: m_preDepthMapPyramid[level] intoVertexMapBuffer: m_preVertexMapPyramid[level] withLevel: level andIntrinsicUVD2XYZ: m_intrinsicUVD2XYZ[level]];
        [_fuVertexToNormal compute: m_preVertexMapPyramid[level] intoNormalMapBuffer: m_preNormalMapPyramid[level] withLevel: level];
    }
    
    //icp
    if(fusionFrameIndex<=0)
    {
        //first frame, no icp
        NSLog(@"first frame, fusion reset");
        [self reset];
    }
    else
    {
        //icp iteration
        BOOL isSolvable=YES;
        simd::float3x3 currentF2gRotate;
        simd::float3 currentF2gTranslate;
        simd::float3x3 preF2gRotate;
        simd::float3 preF2gTranslate;
        simd::float3x3 preG2fRotate;
        simd::float3 preG2fTranslate;
        matrix_transform_extract(m_frameToGlobalTransform,currentF2gRotate,currentF2gTranslate);
        matrix_transform_extract(m_frameToGlobalTransform, preF2gRotate, preF2gTranslate);
        matrix_transform_extract(m_globalToFrameTransform, preG2fRotate, preG2fTranslate);
        for(int level=PYRAMID_LEVEL-1;level>=0;--level)
        {
            uint iteratorNumber=ICPIteratorNumber[level];
            for(int it=0;it<iteratorNumber;++it)
            {
                uint occupiedPixelNumber = [_fuICPPrepareMatrix computeCurrentVMap:m_currentVertexMapPyramid[level] andCurrentNMap:m_currentNormalMapPyramid[level] andPreVMap:m_preVertexMapPyramid[level] andPreNMap:m_preNormalMapPyramid[level] intoLMatrix:m_icpLeftMatrixPyramid[level] andRMatrix:m_icpRightMatrixPyramid[level] withCurrentR:currentF2gRotate andCurrentT:currentF2gTranslate andPreF2gR:preF2gRotate andPreF2gT:preF2gTranslate andPreG2fR:preG2fRotate andPreG2fT:preG2fTranslate  andThreshold:m_icpThreshold andIntrinsicXYZ2UVD:m_intrinsicXYZ2UVD[level] withLevel:level];
                if(occupiedPixelNumber==0)
                {
                    isSolvable=NO;
                }
                if(isSolvable)
                {
                    [_fuICPReduceMatrix computeLeftMatrix:m_icpLeftMatrixPyramid[level] andRightmatrix:m_icpRightMatrixPyramid[level] intoLeftReduce:m_icpLeftReduceBuffer andRightReduce:m_icpRightReduceBuffer withLevel:level andOccupiedNumber:occupiedPixelNumber];
                    float result[6];
                    isSolvable=matrix_float6x6_solve((float*)m_icpLeftReduceBuffer.contents, (float*)m_icpRightReduceBuffer.contents, result);
                    if(isSolvable)
                    {
                        simd::float3x3 rotateIncrement=matrix_float3x3_rotation(result[0], result[1], result[2]);
                        simd::float3 translateIncrement={result[3], result[4], result[5]};
                        currentF2gRotate=rotateIncrement*currentF2gRotate;
                        currentF2gTranslate=rotateIncrement*currentF2gTranslate+translateIncrement;
                    }
                }
            }
            if(!isSolvable)
            {
                break;
            }
        }
        if(isSolvable)
        {
            matrix_transform_compose(m_frameToGlobalTransform, currentF2gRotate, currentF2gTranslate);
            m_globalToFrameTransform=simd::inverse(m_frameToGlobalTransform);
        }
        else
        {
            NSLog(@"lost frame");
            return NO;
        }
    }
    
    if(isTsdfUpdate||fusionFrameIndex<=0)
    {
        //tsdf fusion updater
        [_fuTsdfFusioner compute:m_currentDepthMapPyramid[0] intoTsdfVertexBuffer:m_tsdfVertexBuffer withIntrinsicXYZ2UVD:m_intrinsicXYZ2UVD[0] andTsdfParameter:m_tsdfParameter andTransform:m_globalToFrameTransform];
        
        //marching cube
        int activeVoxelNumber = [_fuMCubeTraverse compute:m_tsdfVertexBuffer intoActiveVoxelInfo:m_mCubeActiveVoxelInfoBuffer withMCubeParameter:m_mCubeParameter];
        if(activeVoxelNumber==0)
        {
            NSLog(@"alert: no active voxel");
            m_mCubeExtractPointBuffer = nil;
            m_mCubeExtractNormalBuffer = nil;
            return NO;
        }
        else if(activeVoxelNumber>=m_mCubeParameter.maxActiveNumber)
        {
            NSLog(@"alert: too mush active voxels");
            m_mCubeExtractPointBuffer = nil;
            m_mCubeExtractNormalBuffer = nil;
            return NO;
        }
        else
        {
            void *baseAddress=m_mCubeActiveVoxelInfoBuffer.contents;
            ActiveVoxelInfo *activeVoxelInfo=(ActiveVoxelInfo*)baseAddress;
            for(int i=1;i<activeVoxelNumber;++i)
            {
                activeVoxelInfo[i].vertexNumber=activeVoxelInfo[i-1].vertexNumber+activeVoxelInfo[i].vertexNumber;
            }
            uint totalVertexNumber=activeVoxelInfo[activeVoxelNumber-1].vertexNumber;
            m_mCubeExtractPointBuffer = [_metalContext.device newBufferWithLength: 3 * totalVertexNumber * sizeof(float) options:MTLResourceOptionCPUCacheModeDefault];
            m_mCubeExtractNormalBuffer = [_metalContext.device newBufferWithLength: 3 * totalVertexNumber * sizeof(float) options:MTLResourceOptionCPUCacheModeDefault];
            [_fuMCubeExtract compute: m_mCubeActiveVoxelInfoBuffer andTsdfVertexBuffer: m_tsdfVertexBuffer withActiveVoxelNumber: activeVoxelNumber andTsdfParameter: m_tsdfParameter andMCubeParameter: m_mCubeParameter andOutMCubeExtractPointBufferT: m_mCubeExtractPointBuffer andOutMCubeExtractNormalBuffer: m_mCubeExtractNormalBuffer];
        }
    }
    
    return YES;
}
           

        部落客已經注釋出了4個處理子產品的代碼被調用所在的位置。其中,3-18行是資料準備子產品,20-79行是ICP子產品,83-84行是TSDF子產品,86-114行是MarchingCube子產品。

        這個方法的代碼量較多,不在這裡一次性全部展開介紹。後續的文章中,部落客會分4個部分,對每一個子產品的原理和代碼實作做進一步的詳細講解。