天天看點

CVPR 2018 目标檢測(Object Detection)

Cascade R-CNN: Delving into High Quality Object Detection

In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection performance tends to degrade with increasing the IoU thresholds. Two main factors are responsible for this: 1) overfitting during training, due to exponentially vanishing positive samples, and 2) inference-time mismatch between the IoUs for which the detector is optimal and those of the input hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, is proposed to address these problems. It consists of a sequence of detectors trained with increasing IoU thresholds, to be sequentially more selective against close false positives. The detectors are trained stage by stage, leveraging the observation that the output of a detector is a good distribution for training the next higher quality detector. The resampling of progressively improved hypotheses guarantees that all detectors have a positive set of examples of equivalent size, reducing the overfitting problem. The same cascade procedure is applied at inference, enabling a closer match between the hypotheses and the detector quality of each stage. A simple implementation of the Cascade R-CNN is shown to surpass all single-model object detectors on the challenging COCO dataset. Experiments also show that the Cascade R-CNN is widely applicable across detector architectures, achieving consistent gains independently of the baseline detector strength. The code will be made available at https://github.com/zhaoweicai/cascade-rcnn.

在物體檢測中, 需要跨越聯合(IoU)門檻值的交點來定義正數和負數。一個目标檢測器,用低的IoU門檻值訓練, 如 0.5, 通常會産生噪音檢測。但是, 檢測性能往往會随着IoU門檻值的增加而降低。兩個主要因素是: 1) 在訓練期間 overfitting, 由于正樣本指數消失和 2)檢測器最佳IoU和那些輸入假說 inference-time不比對。針對這些問題, 提出了一種multi-stage目标檢測體系結構, 即Cascade R-CNN。它包括由增加的IoU門限訓練的檢測器序列, 循序地對接近的錯誤positives更具選擇性。探測器是經過階段訓練的, 利用觀測結果表明, 檢測器的輸出是一個很好的分布, 用于訓練下一個更高品質的探測器。不斷改進的假設的重新取樣保證了所有探測器都有一個等價大小的正數集合, 減少了 overfitting 問題。同一級聯程式在推理中應用, 使假設與每個階段的檢測器的品質比對更接近。一個簡單實作的Cascade R-CNN 在COCO資料集挑戰賽上被展示超過所有單模型物體探測器。實驗還表明, Cascade R-CNN 廣泛适用于探測器體系結構, 實作了獨立于baseline檢測器強度的一緻增益。代碼将在 https://github.com/zhaoweicai/cascade-rcnn 提供。

CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)

Relation Network for Object Detection

Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning.

This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector.

雖然多年來人們一直相信, 模組化物體之間的關系将有助于物體的識别, 但沒有證據表明這個想法是在深度學習的時代起效的。所有state-of-the-art物體檢測系統仍然依賴于單獨識别物體執行個體, 而不利用它們在學習過程中的關系。

本項目提出了一個object relation module。它通過它們的外觀特征和幾何之間的互相作用同時處理一組物體, 進而允許模組化它們之間的關系。它是輕量級和in-place。它不需要額外的監督, 很容易嵌入到現有的網絡中。該方法對改進現代目标檢測領域中的目辨別别和重複移動steps有較好的效果。驗證了基于 CNN 的檢測中物體關系模組化的有效性。它産生了第一個完全端到端的物體探測器。

CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)

An Analysis of Scale Invariance in Object Detection-SNIP

An analysis of different techniques for recognizing and detecting objects under extreme scale variation is presented. Scale specific and scale invariant design of detectors are compared by training them with different configurations of input data. To examine if upsampling images is necessary for detecting small objects, we evaluate the performance of different network architectures for classifying small objects on ImageNet. Based on this analysis, we propose a deep end-to-end trainable Image Pyramid Network for object detection which operates on the same image scales during training and inference. Since small and large objects are difficult to recognize at smaller and larger scales respectively, we present a novel training scheme called Scale Normalization for Image Pyramids (SNIP) which selectively back-propagates the gradients of object instances of different sizes as a function of the image scale. On the COCO dataset, our single model performance is 45.7% and an ensemble of 3 networks obtains an mAP of 48.3%. We use ImageNet-1000 pre-trained models and only train with bounding box supervision. Our submission won the Best Student Entry in the COCO 2017 challenge. Code will be made available at http://bit.ly/2yXVg4c.

分析了在極端尺度變化下識别和檢測物體的不同技術。通過對不同的輸入資料配置進行訓練, 比較了檢測器設計的 Scale specific 和 scale invariant。為了檢查是否需要 upsampling 圖像來檢測小物體, 我們評估了不同網絡體系結構對 ImageNet 上的小物體進行分類的性能。在此基礎上, 提出了一種深端到端可訓練圖像金字塔網絡, 用于目标檢測, 在訓練和推理過程中在同一圖像尺度上操作。由于小規模和大型物體難以識别在較小和更大的尺度上, 我們提出了一個新的訓練方案稱為Scale Normalization for Image Pyramids (SNIP), 有選擇地反向傳播的物體執行個體的梯度不同大小的圖像縮放功能。在COCO資料集上, 我們的單模型性能為 45.7%, 3個網絡的內建得到了48.3% 的映射。我們使用 ImageNet-1000 預先訓練的模型, 隻訓練與邊界箱監督。我們的投稿赢得了COCO2017挑戰的Best Student Entry。代碼将在 http://bit.ly/2yXVg4c 提供。

CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)

R-FCN-3000 at 30fps: Decoupling Detection and Classification

We present R-FCN-3000, a large-scale real-time object detector in which objectness detection and classification are decoupled. To obtain the detection score for an RoI, we multiply the objectness score with the fine-grained classification score. Our approach is a modification of the R-FCN architecture in which position-sensitive filters are shared across different object classes for performing localization. For fine-grained classification, these position-sensitive filters are not needed. R-FCN-3000 obtains an mAP of 34.9% on the ImageNet detection dataset and outperforms YOLO9000 by 18% while processing 30 images per second. We also show that the objectness learned by R-FCN-3000 generalizes to novel classes and the performance increases with the number of training object classes supporting the hypothesis that it is possible to learn a universal objectness detector. Code will be made available.

提出了一種大規模的實時目标檢測器 R-FCN-3000, objectness 檢測與分類同時進行。為了獲得 RoI 的檢測分數, 我們将 objectness 分數與細粒度分類評分相乘。我們的方法是對 R-FCN 體系結構的修改, 其中位置敏感過濾器是在不同的物體類之間共享的, 用于執行定位。對于細粒度分類, 不需要這些位置敏感的篩選器。R-FCN-3000 在 ImageNet 檢測資料集上獲得34.9% 的mAP, 并在每秒處理30幅圖像時優于 18%的YOLO9000。我們還表明, objectness通過 R-FCN-3000 一般化到新的類學習并且随着正在訓練的物體級别數量增加, 支援假設性能增加。有必要學習一個通用的 objectness 探測器。代碼将被提供。

CVPR 2018 目标檢測(Object Detection)

Single-Shot Refinement Neural Network for Object Detection

For object detection, the two-stage approach (e.g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has the advantage of high efficiency. To inherit the merits of both while overcoming their disadvantages, in this paper, we propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multitask loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at https: //github.com/sfzhang15/RefineDet.

對于目标檢測, two-stage方法 (例如, Faster R-CNN) 已經達到了最高的精确度, 而one-stage方法 (如 SSD) 具有高效率的優點。為了繼承兩者的優缺點, 本文提出了一種新的基于single-shot的探測器, 稱為 RefineDet, 它比two-stage方法具有更好的精度, 并保持了一級方法的可比效率。RefineDet 由兩個互相連接配接的子產品組成, 即定位細化子產品和物體檢測子產品。具體而言, 前者旨在 (1) 過濾掉負錨, 以減少分類器的搜尋空間, (2) 粗略地調整錨點的位置和大小, 為後續的回歸提供更好的初始化。後一個子產品以精緻的錨杆作為前者的輸入, 進一步改善回歸, 預測多類标簽。同時, 我們設計了一個傳輸連接配接塊來傳輸定位細化子產品中的特征, 以預測物體檢測子產品中物體的位置、大小和類标簽。多任務損耗函數使我們能夠以端到端的方式對整個網絡進行訓練。在PASCAL VOC 2007, PASCAL VOC 2012 和MS COCO的廣泛實驗表明, RefineDet 達到,高效的state-of-the-art的檢測精度。代碼可在 https://github.com/sfzhang15/RefineDet。

CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)

MegDet: A Large Mini-Batch Object Detector

The development of object detection in the era of deep learning, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from novel network, new framework, or loss design. However, mini-batch size, a key factor for the training of deep neural networks, has not been well studied for object detection. In this paper, we propose a Large Mini-Batch Object Detector (MegDet) to enable the training with a large minibatch size up to 256, so that we can effectively utilize at most 128 GPUs to significantly shorten the training time. Technically, we suggest a warmup learning rate policy and Cross-GPU Batch Normalization, which together allow us to successfully train a large mini-batch detector in much less time (e.g., from 33 hours to 4 hours), and achieve even better accuracy. The MegDet is the backbone of our submission (mmAP 52.5%) to COCO 2017 Challenge, where we won the 1st place of Detection task.

目标檢測在深入學習時代的發展, 從R-CNN [11], Fast/Faster R-CNN [10, 31] 到最近的Mask R-CNN [14] 和 RetinaNet [24], 主要來自新的網絡, 新的架構或損失設計。然而, 小批量是深層神經網絡訓練的關鍵因素, 在目标檢測方面還沒有得到很好的研究。本文提出了一個Large Mini-Batch Object Detector (MegDet), 使 large minibatch size達到 256, 使我們能夠有效地利用最多 128 GPUs 來顯著縮短訓練時間。從技術上講, 我們建議一個warmup learning rate政策和Cross-GPU Batch Normalization, 這使得我們能夠在更少的時間 (例如從33小時到4小時) 成功地訓練一個large mini-batch檢測器, 并取得更高的精确度。MegDet 是我們送出 (mmAP 52.5%) COCO2017挑戰的backbone,并且赢得了檢測任務第一名。

CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)

Dynamic Zoom-in Network for Fast Object Detection in Large Images

We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (Rnet) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset, where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.

我們引入了一個通用架構, 它降低了物體檢測的計算成本, 同時保留了不同大小的物體在高分辨率圖像中出現的情況的準确性。檢測過程中以coarse-to-fine的方式進行, 首先對圖像的下采樣版本, 然後再對被識别為可能提高檢測精度的更高分辨率區域排序。在強化學習的基礎上, 我們的方法包括一個模型 (R-net), 使用粗檢測結果來預測在更高分辨率下分析一個區域的潛在精度增益, 另一個模型 (Q-net), 繼續選擇區域放大.在 Caltech Pedestrians的行人資料集的實驗表明, 我們的方法減少了處理像素的數量超過50% ,檢測精度沒有下降。我們的方法的優點在從 YFCC100M 資料集收集的高分辨率測試集上變得更加重要, 我們的方法保持了較高的檢測性能, 同時将處理的像素的數量減少了 70%, 并且檢測時間超過了50%。

CVPR 2018 目标檢測(Object Detection)
CVPR 2018 目标檢測(Object Detection)