【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

ECCV 2016

部分筆記學習摘抄自 SSD詳解

文章目錄

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
- 4.1 Matching strategy
- 4.2 Training objective
- 4.3 Choosing scales and aspect ratios for default boxes
- 4.4 Hard negative mining
- 4.5 Data augmentation
5 Experimental Results
- 5.1 Datasets
- 5.2 PASCAL VOC 2007
- 5.3 Model analysis
- 5.4 PASCAL VOC2012
- 5.5 COCO
- 5.6 Data Augmentation for Small Object Accuracy
- 5.7 Inference time
6 Conclusion（own）

1 Background and Motivation

目前 SOTA 的目标檢測方法（two-stage）都是基于如下的 pipeline：

hypothesize bounding boxes
resample pixels or features for each box
apply a high-quality classifier

雖然精度蠻高的，但 these approaches have been too computationally intensive for embedded systems and, even with high-end hardware, too slow for real-time applications.

作者摒棄了标準 pipeline 中的第二步（resample pixels or features for each box），設計了 SSD 目标檢測器（Single Shot MultiBox Detector），以達到又好又快地進行目标檢測的目的

2 Related Work

傳統方法 DPM, SS

R-CNN 極大了提升了傳統方法的準确度, 但速度較慢！

提速：SPPnet, Fast R-CNN 基于 R-CNN 速度進行了優化
提升 hypothesize bounding boxes 品質：Multi box，Faster RCNN
skip the proposal step：OverFeat, YOLO

3 Advantages / Contributions

SSD 算法是在 YOLO 的基礎上改進的單階段方法，通過融合多個 feature map 上的 default boxes，在提高速度的同時提高了檢測的精度，性能超過了 YOLO 和 Faster-rcnn。

This results in a significant improvement in speed for high-accuracy detection（59 FPS with mAP 74.3% on VOC2007 test, vs Faster-rcnn 7 FPS with mAP 73.2% or YOLO 45 FPS with mAP 63.4%）

優點：

運作速度超過YOLO，精度超過 Faster-rcnn（一定條件下，對于稀疏場景的大目标而言）。

缺點：

需要人工設定 prior box 的 min_size，max_size 和 aspect_ratio 值。網絡中 default box 的基礎大小和形狀不能直接通過學習獲得，而是需要手工設定。而網絡中每一層 feature 使用的 default box 大小和形狀恰好都不一樣，導緻調試過程非常依賴經驗。(相比之下，YOLO2 使用聚類找出大部分的 anchor box 形狀，這個思想能直接套在 SSD上)

雖然采用了 pyramid feature hierarchy 的思路，但是對小目标的 recall 依然一般，并沒有達到碾壓 Faster RCNN 的級别。可能是因為 SSD 使用 conv4_3 低級 feature 去檢測小目标，而低級特征卷積層數少，存在特征提取不充分的問題。

4 Method

Multi-scale feature maps for detection（FPN）
Convolutional predictors for detection（不像 YOLO 那樣用 FC）
Default boxes and aspect ratios（anchor）

( c + 4 ) k m n (c+4)kmn (c+4)kmn outputs for a m × n m \times n m×n

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

SSD算法步驟：

1）輸入一幅圖檔（300x300），将其輸入到預訓練好的分類網絡中來獲得不同大小的特征映射，修改了傳統的 VGG16 網絡；

将 VGG16 的 FC6 和 FC7 層轉化為卷積層，如上圖的 Conv6 和 Conv7；
去掉所有的 Dropout 層和 FC8 層；
添加了 Atrous 算法（hole算法）；
将 Pool5 從 2x2-S2 變換到 3x3-S1；

2）抽取 Conv4_3、Conv7、Conv8_2、Conv9_2、Conv10_2、Conv11_2 層的 feature map，然後分别在這些 feature map 層上面的每一個點構造 4，6，6，6，4，4 個不同尺度大小的 default box，然後分别進行檢測和分類；

3）将不同 feature map 獲得的 prediction boxes 結合起來，經過 NMS（非極大值抑制）方法來抑制掉一部分重疊或者不正确的 boxes，生成最終的 boxes 集合（即檢測結果）；

4.1 Matching strategy

這裡指的是 default box 和 GT 之間的比對

比對政策為 jaccard overlap 也即 IoU，大于 0.5 的都比對

首先，尋找與每一個 ground truth 有最大的 IoU 的 default box，這樣就能保證 ground truth 至少有 default box 比對；
SSD之後又将剩餘還沒有配對的 default box 與任意一個 ground truth 嘗試配對，隻要兩者之間的 jaccard overlap 大于門檻值（SSD 300 門檻值為0.5），就認為 match；
配對到 ground truth 的 default box 就是 positive，沒有配對的 default box 就是 negative。

總之，一個 ground truth 可能對應多個 positive default box，而不再像 MultiBox 那樣隻取一個 IoU 最大的 default box。其他的作為負樣本（每個 default box 要麼是正樣本 box 要麼是負樣本 box）。

4.2 Training objective

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

N N N is the number of matched default boxes
α = 1 \alpha = 1 α=1

L l o c L_{loc} Lloc 是 smooth L1 loss，公式和 RCNN 一樣

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

x i j p = { 1 , 0 } x_{ij}^p =\{1,0\} xijp={1,0} 表示 match the i i i -th default box to the j j j-th ground truth box of category p p p
l l l 為 predicted box，回歸的四個偏移量為 center (cx,cy) 和 width w，height h
d d d 為 default box
g g g 為 gt box

L c o n f L_{conf} Lconf confidence loss 為 softmax loss

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

定位隻對 matching 上了的 default box 計算 loss，分類既要對 matching 上的 default box 計算 loss，也要對沒 matching 上的計算 loss

4.3 Choosing scales and aspect ratios for default boxes

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

s m i n s_{min} smin 為 0.2，lowest feature map has a scale of 0.2 * 原始圖
s m a x s_{max} smax 為 0.9，highest feature map has a scale of 0.9 * 原始圖
m m m，use m m m feature maps for prediction
a r ∈ { 1 , 2 , 3 , 1 2 , 1 3 } a_r \in\{1,2,3,\frac{1}{2},\frac{1}{3}\} ar∈{1,2,3,21,31}，default box 的不同 aspect ratio，
w k a = s k a r w_k^a = s_k\sqrt{a_r} wka=skar

，default box 的 width
h k a = s k / a r h_k^a = s_k/\sqrt{a_r} hka=sk/ar

，default box 的 height
在 a r a_r ar 為 1 時，作者又引入了 s k ′ = s k s k + 1 s_k' = \sqrt{s_ks_{k+1}} sk′=sksk+1

，resulting in 6 default boxes per feature map location
( i + 0.5 ∣ f k ∣ , j + 0.5 ∣ f k ∣ ) \left (\frac{i+0.5}{|f_k|},\frac{j+0.5}{|f_k|} \right ) (∣fk∣i+0.5,∣fk∣j+0.5) 是 default box 在 feature map 上的 center，其中 ∣ f k ∣ |f_k| ∣fk∣ is the size of the k k k-th square feature map， i , j ∈ [ 0 , ∣ f k ∣ ] i,j \in [0,|f_k|] i,j∈[0,∣fk∣]

SSD300 model,

conv4_3 上的 default box 的 scale 為 0.1

conv4_3, conv10_2 and conv11_2 隻用了 4 個 aspect ratio（舍棄了 3 和 1/3）

SSD512 model,

we add extra conv12_2 for prediction, set smin to 0.15, and 0.07 on conv4_3.

4.4 Hard negative mining

多層特征圖負責預測，每層特征圖每個 position 都有 6 個 default boxes，這種火力壓制顯然會導緻 significant imbalance between the positive and negative training examples

作者 mining top negative examples（sort by confidence loss），讓正負樣本比例保持約 1：3

一般情況下negative default boxes 數量是遠大于 positive default boxes 數量，如果随機選取樣本訓練會導緻網絡過于重視負樣本（因為抽取到負樣本的機率值更大一些），這會使得 loss 不穩定。是以需要平衡正負樣本的個數，我們常用的方法就是 Hard Ngative Mining，即依據 confidience score 對 default box 進行排序，挑選其中 confidience 高的 box 進行訓練，将正負樣本的比例控制在positive：negative=1：3，這樣會取得更好的效果。如果我們不加控制的話，很可能會出現Sample到的所有樣本都是負樣本（即讓網絡從這些負樣本中找正确目标，這顯然是不可以的），這樣就會使得網絡的性能變差。

4.5 Data augmentation

1）先取 patch

Use the entire original input image.
Sample a patch so that the minimum jaccard overlap with the objects is 0.1, 0.3, 0.5, 0.7, or 0.9.
Randomly sample a patch.

patch 的大小為原圖的 [0.1,1]，aspect ratio 在 0.5 到 2 之間

We keep the overlapped part of the ground truth box if the center of it is in the sampled patch

2）把 patch resize 到 fixed size

50% 機率 horizontally flip
some photo-metric distortions

5 Experimental Results

base network（backbone）是 VGG 16

5.1 Datasets

PASCAL VOC 2007
PASCAL VOC 2012
COCO

5.2 PASCAL VOC 2007

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

左邊，每條曲線随着橫坐标 BBox Area 尺寸 XS->S->M->L->XL 的變化還是蠻大的（左上大輸入網絡 SSD512 相對左下小輸入網絡 SSD300 變化趨勢還是要緩和些）

右邊，每條曲線随着縱坐标 Aspect Ratio 比例 XS->S->M->L->XL 的變化不是那麼明顯（SSD 本身就多 scale 多 aspect ratio）

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

5.3 Model analysis

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

1）Data augmentation is crucial

sampling strategy（similar to YOLO）

65.5 to 74.3

2）More default box shapes is better

移除 1/3 和 3 aspect ratio 掉 0.6 個點，進一步移除 1/2 和 2 掉 2.1 個點

3）Atrous is faster

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

就是空洞卷積，會提升網絡的速度 20%

4）Multiple output layers at different resolutions is better

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

ignoring boxes which are on the boundary

5.4 PASCAL VOC2012

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

5.5 COCO

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

5.6 Data Augmentation for Small Object Accuracy

Figure 4 可以分析出，the classification task for small objects is relatively hard for SSD（對輸入尺度的變化很敏感）

前文介紹的 sampling strategy 相當于 zoom in 操作， can generate many larger training examples！

作者準備實作一個 zoom out 操作（we first randomly place an image on a canvas of 16× of the original image size filled with mean values before we do any random crop operation），來 creates more small training examples

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

左邊上下對比着看，曲線下限（小目标）還是提升蠻多的

5.7 Inference time

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

SSD 加速的原因

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

如上圖所示，當 Faster-rcnn 的輸入分辨率為 1000x600 時，産生的 BB 是 6000 個；當 SSD300 的輸入分辨率為 300x300 時，産生的 BB 是 8372 個；當 SSD512 的輸入分辨率為 512x512時，産生的 BB 是 24564個，大家像一個情況，當SSD的分辨率也是1000x600時，會産生多少個BB呢？這個數字可能會很大！但是它卻說自己比 Faster-rcnn 和 YOLO 等算法快很多，我們來分析分析原因。

原因1：首先 SSD 是一個單階段網絡，隻需要一個階段就可以輸出結果；而 Faster-rcnn 是一個雙階段網絡，盡管 Faster-rcnn 的BB 少很多，但是其需要大量的前向和反向推理（訓練階段），而且需要交替的訓練兩個網絡；

原因2：Faster-rcnn 中不僅需要訓練 RPN，而且需要訓練 Fast-rcnn，而 SSD 其實相當于一個優化了的 RPN 網絡，不需要進行後面的檢測，僅僅前向推理就會花費很多時間；

原因3：YOLO 網絡雖然比 SSD 網絡看起來簡單，但是 YOLO 網絡中含有大量的 FC 層，和 FC 層相比，CONV 層具有更少的參數；同時 YOLO 獲得候選 BB 的操作比較費時；

原因4：SSD算法中，調整了 VGG 網絡的架構，将其中的 FC 層替換為 CONV 層，這一點會大大的提升速度，因為 VGG 中的 FC 層都需要大量的運算，有大量的參數，需要進行前向推理；

原因5：使用了 atrous 算法，論文中明确提出該算法能夠提速20%。

原因6：SSD 設定了輸入圖檔的大小，它會将不同大小的圖檔裁剪為 300x300，或者 512x512，和 Faster-rcnn 相比，在輸入上就會少很多的計算，不要說後面的啦，不快就怪啦！！！

6 Conclusion（own）

default boxes，哈哈哈，也就是 anchor 啦
data augmentation 中采用的 sampling 政策提了 8.8 個點， 666
To the best of our knowledge
SSD到底好不好，需要根據你的應用和需求來講，真正合适你的應用場景的檢測算法需要你去做性能驗證，比如你的場景是密集的包含多個小目标的，我很建議你用 Faster-rcnn，針對特定的網絡進行優化，也是可以加速的；如果你的應用對速度要求很苛刻，那麼肯定首先考慮SSD，至于那些測試集上的評估結果，和真實的資料還是有很大的差距，算法的性能也需要進一步進行評估。
SSD在Yolo的基礎上主要改進了三點：
- 多尺度特征圖
- 利用卷積進行檢測（not fc）
- 設定先驗框。
SSD為什麼要用到permute層？

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

【SSD】《SSD：Single Shot MultiBox Detector》1 Background and Motivation2 Related Work3 Advantages / Contributions4 Method5 Experimental Results6 Conclusion（own）

文章目錄

1 Background and Motivation

2 Related Work

3 Advantages / Contributions

4 Method

4.1 Matching strategy

4.2 Training objective

4.3 Choosing scales and aspect ratios for default boxes

4.4 Hard negative mining

4.5 Data augmentation

5 Experimental Results

5.1 Datasets

5.2 PASCAL VOC 2007

5.3 Model analysis

5.4 PASCAL VOC2012

5.5 COCO

5.6 Data Augmentation for Small Object Accuracy

5.7 Inference time

6 Conclusion（own）

繼續閱讀

如何使用前景和背景建立更清晰的照片

在Ubuntu16.04上提取相鄰序列圖像之間的ORB的特征點，并用暴力方法找到比對點并連線一、什麼是ORB特征二、什麼是暴力比對三、實作代碼四、運作方法

Matlab中将二維灰階圖像三維顯示

車道線檢測

自監督｜「CoCLR」視訊自監督對比學習筆記

FLASH高速PCB布局布線設計指南 FLASH高速PCB布局布線設計指南

深度學習之卷積神經網絡(CNN) — 理論與代碼結合

深度學習之卷積神經網絡CNN及tensorflow代碼實作示例詳細介紹(轉載) 深度學習之卷積神經網絡CNN及tensorflow代碼實作示例詳細介紹

視訊對象分割（Video Object Segmentation）研究小記任務定義與資料集技術路線分類基于神經網絡的模型總結

目标檢測：YOLOV3論文解讀一、yolov3論文解讀

Pytorch機器學習（九）—— YOLO中對于錨框，預測框，産生候選區域及對候選區域進行标注詳解 Pytorch機器學習（九）—— YOLO中錨框，預測框，産生候選區域及對候選區域進行标注詳解前言一、基本概念二、代碼講解總結

opencv視覺跟蹤——消除背景模組化

圖形處理單元(GPU)的演進

2021-09-30三維點雲測量正方形包裹體積

DOG算子

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡