部落客的源碼位址:https://github.com/sjy234sjy234/KinectFusion-ios
原論文:"KinectFusion: Real-time dense surface mapping and tracking."
本文主要介紹KinectFusion-ios中的接口調用方法,以及KinectFusion的算法架構。
一、算法調用示例
在上一篇的博文裡已經介紹過,算法調用示例的代碼主要編寫在ViewController.mm檔案中。其中,從depth.bin讀取深度幀資料流的過程,這是ios研發相關内容,不再展開介紹。這裡主要解釋一下KinectFusion的調用示例,主要是FusionProcessor類的初始化和調用。
首先,是viewDidLoad方法中FusionProcessor的初始化代碼
self.fusionProcessor = [FusionProcessor shareFusionProcessorWithContext: _metalContext];
[self.fusionProcessor setRenderBackColor: {24.0 / 255, 31.0 / 255, 50.0 / 255, 1}];
simd::float4 cube = {-107.080887, -96.241348, -566.015991, 223.474106};
[self.fusionProcessor setupTsdfParameterWithCube: cube];
其中,第1行是類的初始化,第2行設定渲染展示時的背景顔色,第3、4行設定一個立方體(x, y, z, w)。這裡解釋一下,KinectFusion算法需要初始化一個立方體包圍盒,代碼中的cube就是一個預先确定的人臉立方體包圍盒,這可以通過計算機視覺的方法獲得,這裡不展開介紹。其中,(x, y, z)表示立方體的一個頂點,w表示立方體的邊長,進而确定三維空間中的一個立方體包圍盒。
然後,是讀取深度資料流後執行的回調方法
- (void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)eventCode {
switch(eventCode) {
case NSStreamEventHasBytesAvailable:
{
//read every frame from depth.bin, which contains one single disparity frame of 640 x 480 x float16,
//we can easily derive depth from disparity: depth = 1.0 / disparity;
int frameLen = PORTRAIT_WIDTH * PORTRAIT_HEIGHT * 2;
uint8_t* buf = new uint8_t[frameLen];
unsigned int len = 0;
len = [(NSInputStream *)stream read:buf maxLength:frameLen];
if(len == frameLen)
{
BOOL isFusionOK = [self.fusionProcessor processDisparityData:buf withIndex:m_fusionFrameIndex withTsdfUpdate: YES];
if(isFusionOK)
{
id<MTLTexture> textureAfterFusion=[self.fusionProcessor getColorTexture];
[self.scanningRenderer render: textureAfterFusion];
m_fusionFrameIndex++;
}
else
{
NSLog(@"Fusion Failed");
}
}
delete buf;
break;
}
default:
if(m_fusionFrameIndex > 0)
{
m_isFusionComplete = YES;
}
}
}
其中,7-10行從資料流讀取一幀的深度資料,單幀的大小是480x640x2 byte;13行完成一次KinectFusion的單幀處理的調用;16-17行是将目前重建的模型進行渲染展示;18行更新幀的序列号m_fusionFrameIndex。
下面是重置FusionProcessor的操作,也是app中唯一的一個按鍵回調,重新進入一次全新的掃描
- (IBAction)onResetScan:(id)sender {
if(m_isFusionComplete)
{
m_fusionFrameIndex = 0;
m_isFusionComplete = NO;
simd::float4 cube = {-107.080887, -96.241348, -566.015991, 223.474106};
[self.fusionProcessor setupTsdfParameterWithCube: cube];
[self setUpStreamForFile: self.streamPath];
}
}
第4行中,幀的序列号m_fusionFrameIndex初始化時必須為0,FusionProcessor會根據m_fusionFrameIndex的值是否為0,自動完成初始化,無需顯式調用初始化方法。然後,第6-7行,重新設定新的立方體cube,就可以進入一次全新的掃描。
二、算法架構介紹
這裡,從算法實作的角度,畫了一個KinectFusion算法架構的流程圖如下
其中,圓形框表示資料流,矩形框表示處理子產品。可以看到,算法主要由4個處理子產品構成:資料準備、TSDF、ICP、MarchingCube。
之前的文章介紹過,項目的FusionProcessor檔案夾裡包含KinectFusion的源碼。其中,所有的4個處理子產品都存在其下的FusionComputer子檔案夾中。并且FusionComputer下的4個子檔案夾一一對應各子產品的實作:FuPreProcess、FuTsdfFusioner、FuICPMatrix、FuMarchingCube。最後,FusionProcessor類的檔案FusionProcessor.mm則直接放在FusionProcessor檔案夾裡,實作了上面算法流程圖裡面的整個過程,将各個子產品組織成完整的KinectFusion算法。
需要注意的是,上述的流程圖,需要對照源碼進行了解,并且不一定十分完備,不要過于深究。
首先,介紹下FusionProcessor類的對外主要接口,有如下兩個
- (BOOL) processDisparityData: (uint8_t *)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
- (BOOL) processDisparityPixelBuffer: (CVPixelBufferRef)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
其中,第1行就是在前面的算法調用示例中出現過的接口調用,傳入每一幀新的深度幀資料byte流,以更新重建三維場景。第3行是用于直接從iphoneX擷取的深度資料類型CVPixelBufferRef的實時調用借口。實際上,兩者完成的操作是一模一樣的,隻是傳入的資料流格式略有差别,下面是兩個方法的實作
- (BOOL) processDisparityData: (uint8_t *)disparityData withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate
{
id<MTLBuffer> inDisparityMapBuffer = [_metalContext.device newBufferWithBytes:disparityData
length:PORTRAIT_WIDTH*PORTRAIT_HEIGHT*2
options:MTLResourceOptionCPUCacheModeDefault];
return [self processFrame: inDisparityMapBuffer withIndex: fusionFrameIndex withTsdfUpdate: isTsdfUpdate];
}
- (BOOL) processDisparityPixelBuffer: (CVPixelBufferRef)disparityPixelBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate
{
id<MTLBuffer> inDisparityMapBuffer = [_metalContext bufferWithF16PixelBuffer: disparityPixelBuffer];
return [self processFrame: inDisparityMapBuffer withIndex: fusionFrameIndex withTsdfUpdate: isTsdfUpdate];
}
可以看到,兩個方法隻是做了資料的中轉處理,把深度資料統一處理成MTLBuffer資料類型,然後統一調用processFrame私有方法,實作KinectFusion的單幀處理過程。
processFrame是KinectFusion算法架構的核心處理函數,其内容如下
- (BOOL) processFrame: (id<MTLBuffer>) inDisparityMapBuffer withIndex: (int) fusionFrameIndex withTsdfUpdate:(BOOL)isTsdfUpdate;
{
//pre-process
[_fuMeshToTexture drawPoints: m_mCubeExtractPointBuffer normals: m_mCubeExtractNormalBuffer intoColorTexture: m_colorTexture andDepthTexture: m_depthTexture withTransform: m_projectionTransform * m_globalToFrameTransform];
[_fuTextureToDepth compute: m_depthTexture intoTexture: m_preDepthMapPyramid[0] with: m_cameraNDC2Depth];
[_fuDisparityToDepth compute: inDisparityMapBuffer intoDepthMapBuffer: m_currentDepthMapPyramid[0]];
for(int level=1;level<PYRAMID_LEVEL;++level)
{
[_fuPyramidDepthMap compute: m_currentDepthMapPyramid[level - 1] intoDepthMapBuffer: m_currentDepthMapPyramid[level] withLevel: level];
[_fuPyramidDepthMap compute: m_preDepthMapPyramid[level - 1] intoDepthMapBuffer: m_preDepthMapPyramid[level] withLevel: level];
}
for(int level=0;level<PYRAMID_LEVEL;++level)
{
[_fuDepthToVertex compute: m_currentDepthMapPyramid[level] intoVertexMapBuffer: m_currentVertexMapPyramid[level] withLevel: level andIntrinsicUVD2XYZ: m_intrinsicUVD2XYZ[level]];
[_fuVertexToNormal compute: m_currentVertexMapPyramid[level] intoNormalMapBuffer: m_currentNormalMapPyramid[level] withLevel: level];
[_fuDepthToVertex compute: m_preDepthMapPyramid[level] intoVertexMapBuffer: m_preVertexMapPyramid[level] withLevel: level andIntrinsicUVD2XYZ: m_intrinsicUVD2XYZ[level]];
[_fuVertexToNormal compute: m_preVertexMapPyramid[level] intoNormalMapBuffer: m_preNormalMapPyramid[level] withLevel: level];
}
//icp
if(fusionFrameIndex<=0)
{
//first frame, no icp
NSLog(@"first frame, fusion reset");
[self reset];
}
else
{
//icp iteration
BOOL isSolvable=YES;
simd::float3x3 currentF2gRotate;
simd::float3 currentF2gTranslate;
simd::float3x3 preF2gRotate;
simd::float3 preF2gTranslate;
simd::float3x3 preG2fRotate;
simd::float3 preG2fTranslate;
matrix_transform_extract(m_frameToGlobalTransform,currentF2gRotate,currentF2gTranslate);
matrix_transform_extract(m_frameToGlobalTransform, preF2gRotate, preF2gTranslate);
matrix_transform_extract(m_globalToFrameTransform, preG2fRotate, preG2fTranslate);
for(int level=PYRAMID_LEVEL-1;level>=0;--level)
{
uint iteratorNumber=ICPIteratorNumber[level];
for(int it=0;it<iteratorNumber;++it)
{
uint occupiedPixelNumber = [_fuICPPrepareMatrix computeCurrentVMap:m_currentVertexMapPyramid[level] andCurrentNMap:m_currentNormalMapPyramid[level] andPreVMap:m_preVertexMapPyramid[level] andPreNMap:m_preNormalMapPyramid[level] intoLMatrix:m_icpLeftMatrixPyramid[level] andRMatrix:m_icpRightMatrixPyramid[level] withCurrentR:currentF2gRotate andCurrentT:currentF2gTranslate andPreF2gR:preF2gRotate andPreF2gT:preF2gTranslate andPreG2fR:preG2fRotate andPreG2fT:preG2fTranslate andThreshold:m_icpThreshold andIntrinsicXYZ2UVD:m_intrinsicXYZ2UVD[level] withLevel:level];
if(occupiedPixelNumber==0)
{
isSolvable=NO;
}
if(isSolvable)
{
[_fuICPReduceMatrix computeLeftMatrix:m_icpLeftMatrixPyramid[level] andRightmatrix:m_icpRightMatrixPyramid[level] intoLeftReduce:m_icpLeftReduceBuffer andRightReduce:m_icpRightReduceBuffer withLevel:level andOccupiedNumber:occupiedPixelNumber];
float result[6];
isSolvable=matrix_float6x6_solve((float*)m_icpLeftReduceBuffer.contents, (float*)m_icpRightReduceBuffer.contents, result);
if(isSolvable)
{
simd::float3x3 rotateIncrement=matrix_float3x3_rotation(result[0], result[1], result[2]);
simd::float3 translateIncrement={result[3], result[4], result[5]};
currentF2gRotate=rotateIncrement*currentF2gRotate;
currentF2gTranslate=rotateIncrement*currentF2gTranslate+translateIncrement;
}
}
}
if(!isSolvable)
{
break;
}
}
if(isSolvable)
{
matrix_transform_compose(m_frameToGlobalTransform, currentF2gRotate, currentF2gTranslate);
m_globalToFrameTransform=simd::inverse(m_frameToGlobalTransform);
}
else
{
NSLog(@"lost frame");
return NO;
}
}
if(isTsdfUpdate||fusionFrameIndex<=0)
{
//tsdf fusion updater
[_fuTsdfFusioner compute:m_currentDepthMapPyramid[0] intoTsdfVertexBuffer:m_tsdfVertexBuffer withIntrinsicXYZ2UVD:m_intrinsicXYZ2UVD[0] andTsdfParameter:m_tsdfParameter andTransform:m_globalToFrameTransform];
//marching cube
int activeVoxelNumber = [_fuMCubeTraverse compute:m_tsdfVertexBuffer intoActiveVoxelInfo:m_mCubeActiveVoxelInfoBuffer withMCubeParameter:m_mCubeParameter];
if(activeVoxelNumber==0)
{
NSLog(@"alert: no active voxel");
m_mCubeExtractPointBuffer = nil;
m_mCubeExtractNormalBuffer = nil;
return NO;
}
else if(activeVoxelNumber>=m_mCubeParameter.maxActiveNumber)
{
NSLog(@"alert: too mush active voxels");
m_mCubeExtractPointBuffer = nil;
m_mCubeExtractNormalBuffer = nil;
return NO;
}
else
{
void *baseAddress=m_mCubeActiveVoxelInfoBuffer.contents;
ActiveVoxelInfo *activeVoxelInfo=(ActiveVoxelInfo*)baseAddress;
for(int i=1;i<activeVoxelNumber;++i)
{
activeVoxelInfo[i].vertexNumber=activeVoxelInfo[i-1].vertexNumber+activeVoxelInfo[i].vertexNumber;
}
uint totalVertexNumber=activeVoxelInfo[activeVoxelNumber-1].vertexNumber;
m_mCubeExtractPointBuffer = [_metalContext.device newBufferWithLength: 3 * totalVertexNumber * sizeof(float) options:MTLResourceOptionCPUCacheModeDefault];
m_mCubeExtractNormalBuffer = [_metalContext.device newBufferWithLength: 3 * totalVertexNumber * sizeof(float) options:MTLResourceOptionCPUCacheModeDefault];
[_fuMCubeExtract compute: m_mCubeActiveVoxelInfoBuffer andTsdfVertexBuffer: m_tsdfVertexBuffer withActiveVoxelNumber: activeVoxelNumber andTsdfParameter: m_tsdfParameter andMCubeParameter: m_mCubeParameter andOutMCubeExtractPointBufferT: m_mCubeExtractPointBuffer andOutMCubeExtractNormalBuffer: m_mCubeExtractNormalBuffer];
}
}
return YES;
}
部落客已經注釋出了4個處理子產品的代碼被調用所在的位置。其中,3-18行是資料準備子產品,20-79行是ICP子產品,83-84行是TSDF子產品,86-114行是MarchingCube子產品。
這個方法的代碼量較多,不在這裡一次性全部展開介紹。後續的文章中,部落客會分4個部分,對每一個子產品的原理和代碼實作做進一步的詳細講解。