Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

2023-05-21 03:11:37

Depth Map Prediction

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

利用多尺度深度網絡預測單個圖檔的深度圖。所謂多尺度，其實是一個粗粒度和一個細粒度。這兩個尺度分别對應了神經網絡結構的兩個部分，如下圖。首先由粗粒度網絡預測除整體的景深，然後通過細粒度網絡對局部進行精細。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

Coarse網絡中除了最後一層是線性輸出以外，其它層都使用了ReLu激活單元。在Coarse 6引入了droupout，防止過拟合。Fine為全卷積層。

定義了scale-invariant mean squared error 如下。

D(y,y∗)=12n∑i=1n(logyi−logy∗i+α(y,y∗))2

預測深度圖 y ，以及ground truth->y∗。輸出map的大小與輸入圖像的大小不一緻，是以 y∗ 也應該轉換成與 y 一樣大小。另外

α(y,y∗)=1n∑ilogy∗i−logyi

可以看出 α 表示平均誤差。在回過來看 D(y,y∗) ，其第一項為每個像素的誤差，第二項為平均誤差，同時滿足了單個像素誤差和平均像素誤差，那為什麼不給第二項加個因子 β∈[0,1] ？所有是 y 像素倍數的像素都有相同錯誤,是以認為這類像素都具有尺度不變性（原文：All scalar multiples of y have the same error, hence the scale invariance.）。

設di=logyi−logy∗i，帶入 D(y,y∗) 得，

D(y,y∗)=1n∑id2i−1n2(∑idi)2

Training loss

L(y,y∗)=1n∑id2i−λn2(∑idi)2,λ∈[0,1]

Data Augmentation

Scale
Rotation
Translation
Color
Flips

Predicting Depth, Surface Normals and Semantic Labels

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

這篇文章是Depth Map Prediction from a Single Image using a Multi-Scale Deep Network 的更新版，首先增加了更多的卷積層（模型加深），其次增加了第三個尺度提高的分辨率（輸出的szie為輸入的0.5）。最後多通道的特征圖代替了scale 1和2的輸出。基于這個架構（下圖），實作了用一個基礎架構分别預測了深度圖，曲面法線圖和語義标注。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

模型的訓練（using SGD）分為兩個部分。首先訓練scales 1 和 2，然後上采用scale 2的輸出，訓練scale 3。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

Coarse network(Scale 1)對提高像素準确率扮演者重要的角色。

Depth EstimationDepth Map PredictionPredicting Depth, Surface Normals and Semantic Labels

Depth Map Prediction

Predicting Depth, Surface Normals and Semantic Labels

繼續閱讀

o3d紋理映射(轉)

android使用MediaPlayer+Surface實作簡單視訊播放器

TensorFlow strides 參數讨論

解決Android實作照相機程式時不能顯示攝像頭的問題

Emacs + ECB + CygWin + Cedet Emacs + ECB + CygWin + Cedet

Activate PDP Context Reject SM Cause

Realsense SR300 和 R200 提取深度圖像并儲存

LLVM 與 Clang 介紹

Learning Spirit 2Learning Spirit 2

Ingredients-300ghighglutenflour-200mlwater-1egg-1.5gsalt-3gy

What's the difference between →, ⊢ and ⊨ ?

什麼是ejb?

Android SurfaceView總結及代碼示例

Android8.0 圖形引擎與窗體管理服務

能自動提示的.emacs ubuntu 11.04 （本檔案自動提示）

Ajax Patterns 讀書筆記 --3