
论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献


  • 1、论文总述
  • 2、RCNN和SPPnet的缺点
  • 3、SPPnet不能更新SPP层之前的参数的原因
  • 4、Multi-task loss
  • 5、Truncated SVD for faster detection
  • 6、Which layers to fine-tune?(检测时从哪个层开始finetune)
  • 7、 Does multi-task training help?
  • 参考文献


论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献


In this paper, we streamline the training process for state of-the-art ConvNet-based object detectors [9, 11]. We propose a single-stage training algorithm that jointly learns to

classify object proposals and refine their spatial locations.

The resulting method can train a very deep detection

network (VGG16 [20]) 9× faster than R-CNN [9] and 3×

faster than SPPnet [11]. At runtime, the detection network

processes images in 0.3s (excluding object

proposal time)

while achieving top accuracy on PASCAL VOC 2012 [7]

with a mAP of 66% (vs. 62% for R-CNN).

The Fast RCNN method has several advantages:

1 Higher detection quality (mAP) than R-CNN, SPPnet

2.Training is single-stage, using a multi-task loss

3.Training can update all network layers



R-CNN,however, has notable drawbacks:

1.Training is a multi-stage pipeline. R-CNN first finetunes a ConvNet on object proposals using log loss.Then, it fits SVMs to ConvNet features. These SVMs

act as object detectors, replacing the softmax classi-

fier learnt by fine-tuning. In the third training stage,

bounding-box regressors are learned.

2.Training is expensive in space and time. For SVM

and bounding-box regressor training, features are extracted from each object proposal in each image and written to disk. With very deep networks, such as

VGG16, this process takes 2.5 GPU-days for the 5k

images of the VOC07 trainval set. These features require hundreds of gigabytes of storage.

3.Object detection is slow. At test-time, features are

extracted from each object proposal in each test image.

Detection with VGG16 takes 47s / image (on a GPU).


SPPnet also has notable drawbacks.

Like R-CNN, training is a multi-stage pipeline that involves extracting features, fine-tuning a network with log loss, training SVMs,

and finally fitting bounding-box regressors. Features are

also written to disk. But unlike R-CNN, the fine-tuning algorithm proposed in [11] cannot update the convolutional

layers that precede the spatial pyramid pooling. Unsurprisingly, this limitation (fixed convolutional layers) limits the

accuracy of very deep networks (SPP层之前的卷积层的参数不能更新,限制了性能)


The root cause is that back-propagation through the SPP

layer is highly inefficient when each training sample (i.e.

RoI) comes from a different image, which is exactly how

R-CNN and SPPnet networks are trained. The inefficiency

stems from the fact that each RoI may have a very large

receptive field, often spanning the entire input image. Since

the forward pass must process the entire receptive field, the

training inputs are large (often the entire image).


然后重点来了: 本文提出了一个新的训练时的ROI采样策略:

We propose a more efficient training method that takes

advantage of feature sharing during training. In Fast RCNN training, stochastic gradient descent (SGD) minibatches are sampled hierarchically, first by sampling N images and then by sampling R/N RoIs from each image.

Critically, RoIs from the same image share computation

and memory in the forward and backward passes. Making

N small decreases mini-batch computation. For example,

when using N = 2 and R = 128, the proposed training

scheme is roughly 64× faster than sampling one RoI from

128 different images (i.e., the R-CNN and SPPnet strategy).

One concern over this strategy is it may cause slow training convergence because RoIs from the same image are correlated. This concern does not appear to be a practical issue

and we achieve good results with N = 2 and R = 128

using fewer SGD iterations than R-CNN。

4、Multi-task loss

论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献
论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献

这里对 ti和 vi的编码参考了RCNN的,是偏移量的预测,并不是坐标本身。

5、Truncated SVD for faster detection

论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献


论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献

6、Which layers to fine-tune?(检测时从哪个层开始finetune)

论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献



7、 Does multi-task training help?

论文阅读:Fast R-CNN1、论文总述2、RCNN和SPPnet的缺点3、SPPnet不能更新SPP层之前的参数的原因4、Multi-task loss5、Truncated SVD for faster detection6、Which layers to fine-tune?(检测时从哪个层开始finetune)7、 Does multi-task training help?参考文献


1、Fast R-CNN

2、Fast R-CNN学习总结
