目标檢測R-CNN
-
- 論文大綱
- 主要思想
- 流程圖
- 流程圖說明
- 重要的數學公式
- 重要觀點
- 參考文獻(R-CNN)
- 歡迎評論讨論
論文大綱
标題:Rich feature hierarchies for accurate object detection and semantic segmentation
abstract
1introduction
2object detection with R-CNN
2.1module design
2.2test-time detection
2.3training
2.4results on pascal voc 2010-12
2.5results on ILSVRC2013 detection
3Visualization ablation and modes of errro
3.1visualizing learned features
3.2ablation studies
3.3nework architetures
3.4detection error analysis
3.5bounding-box regression
3.6qualitative results
4the ILSVRC2013 detection dataset
4.1dataset overview
4.2region proposals
4.3training data
4.4validation and evaluation
4.5ablation study
4.6relationship to OverFeat
5Semantic segmentation
6Conclusion
Appendix
A.object proposal transformations
B.positive vs.negative examples and softmax
C.Bounding-box regression
D.Addition feature visualizations
E.Per-category segmentaion results
F.Analysis of cross-dataset redundancy
G.document changelog
主要思想
參考人的思維,要确定目标是什麼(目标分類)和目标在哪(目标定位),人會在整張圖的不同區域進行檢索,是以作者提出對于每一張圖像,都給出多個proposal region,然後判斷是什麼(目标分類)和還要繼續往哪個方向搜尋(邊框回歸)
流程圖
流程圖說明
(1)輸入圖像
(2)提取出大約2000個提議(采用selective search的方法)
(3)對于提議進行resize之後作為CNN的輸入,對于每一個提議region proposal都會得到一個特征圖(一一對應)
(4)采用SVMs進行分類
重要的數學公式
根據預測的回歸值計算預測的Bounding Box
回歸目标函數或損失函數
邊框回歸訓練時對應的标簽
重要觀點
(1)At test time, we score each proposal and predict its new
detection window only once. In principle, we could iterate
this procedure (i.e., re-score the newly predicted bounding
box, and then predict a new bounding box from it, and so
on). However, we found that iterating does not improve
results.
可知作者嘗試了使用疊代邊框回歸,但是效果不好,之後也有不少人在往這個方向走,
但是目前做的做好的是Cascade R-CNN
Cascade R-CNN: Delving into High Quality Object Detection
(2)It is worth noting that OverFeat has
a significant speed advantage over R-CNN: it is about 9x
faster, based on a figure of 2 seconds per image quoted from
[34]. This speed comes from the fact that OverFeat’s sliding
windows (i.e., region proposals) are not warped at the
image level and therefore computation can be easily shared
between overlapping windows. Sharing is implemented by
running the entire network in a convolutional fashion over
arbitrary-sized inputs. Speeding up R-CNN should be possible
in a variety of ways and remains as future work.
作者也說了,OverFeat由于使用了sliding windows作為初始提議的速度9倍于R-CNN,并提議之後加速就往這個方向發展。果不其然
論速度要看一階段法的Yolo SSD Retinanet,而二階段法的Faster R-CNN也使用了sliding windows作為初始提議,隻不過名稱變為了anchor,同一個RoI對應好幾個anchor
參考文獻(R-CNN)
[1]: Rich feature hierarchies for accurate object detection and semantic segmentation