Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

2023-05-21 03:11:37

Depth Map Prediction

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

利用多尺度深度网络预测单个图片的深度图。所谓多尺度，其实是一个粗粒度和一个细粒度。这两个尺度分别对应了神经网络结构的两个部分，如下图。首先由粗粒度网络预测除整体的景深，然后通过细粒度网络对局部进行精细。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

Coarse网络中除了最后一层是线性输出以外，其它层都使用了ReLu激活单元。在Coarse 6引入了droupout，防止过拟合。Fine为全卷积层。

定义了scale-invariant mean squared error 如下。

D(y,y∗)=12n∑i=1n(logyi−logy∗i+α(y,y∗))2

预测深度图 y ，以及ground truth->y∗。输出map的大小与输入图像的大小不一致，因此 y∗ 也应该转换成与 y 一样大小。另外

α(y,y∗)=1n∑ilogy∗i−logyi

可以看出 α 表示平均误差。在回过来看 D(y,y∗) ，其第一项为每个像素的误差，第二项为平均误差，同时满足了单个像素误差和平均像素误差，那为什么不给第二项加个因子 β∈[0,1] ？所有是 y 像素倍数的像素都有相同错误,因此认为这类像素都具有尺度不变性（原文：All scalar multiples of y have the same error, hence the scale invariance.）。

设di=logyi−logy∗i，带入 D(y,y∗) 得，

D(y,y∗)=1n∑id2i−1n2(∑idi)2

Training loss

L(y,y∗)=1n∑id2i−λn2(∑idi)2,λ∈[0,1]

Data Augmentation

Scale
Rotation
Translation
Color
Flips

Predicting Depth, Surface Normals and Semantic Labels

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

这篇文章是Depth Map Prediction from a Single Image using a Multi-Scale Deep Network 的升级版，首先增加了更多的卷积层（模型加深），其次增加了第三个尺度提高的分辨率（输出的szie为输入的0.5）。最后多通道的特征图代替了scale 1和2的输出。基于这个框架（下图），实现了用一个基础框架分别预测了深度图，曲面法线图和语义标注。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

模型的训练（using SGD）分为两个部分。首先训练scales 1 和 2，然后上采用scale 2的输出，训练scale 3。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

Coarse network(Scale 1)对提高像素准确率扮演者重要的角色。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

Depth Map Prediction

Predicting Depth, Surface Normals and Semantic Labels

继续阅读

o3d纹理映射(转)

android使用MediaPlayer+Surface实现简单视频播放器

TensorFlow strides 参数讨论

解决Android实现照相机程序时不能显示摄像头的问题

Emacs + ECB + CygWin + Cedet Emacs + ECB + CygWin + Cedet

Activate PDP Context Reject SM Cause

Realsense SR300 和 R200 提取深度图像并保存

LLVM 与 Clang 介绍

Learning Spirit 2Learning Spirit 2

Ingredients-300ghighglutenflour-200mlwater-1egg-1.5gsalt-3gy

What's the difference between →, ⊢ and ⊨ ?

什么是ejb?

Android SurfaceView总结及代码示例

Android8.0 图形引擎与窗体管理服务

能自动提示的.emacs ubuntu 11.04 （本文件自动提示）

Ajax Patterns 读书笔记 --3