目标檢測特殊層：ROIPooling層

ROI Pooling的意義

ROIs Pooling顧名思義，是Pooling層的一種，而且是針對RoIs的Pooling，他的特點是輸入特征圖尺寸不固定，但是輸出特征圖尺寸固定；

什麼是ROI呢？
ROI是Region of Interest的簡寫，指的是在“特征圖上的框”；
）在Fast RCNN中， RoI是指Selective Search完成後得到的“候選框”在特征圖上的映射，如下圖所示；
）在Faster RCNN中，候選框是經過RPN産生的，然後再把各個“候選框”映射到特征圖上，得到RoIs。

目标檢測特殊層：ROIPooling層

圖1 Fast RCNN整體結構

往往經過rpn後輸出的不止一個矩形框，是以這裡我們是對多個ROI進行Pooling。

ROI Pooling的輸入

輸入有兩部分組成：

1. 特征圖：指的是圖1中所示的特征圖，在Fast RCNN中，它位于RoI Pooling之前，在Faster RCNN中，它是與RPN共享那個特征圖，通常我們常常稱之為“share_conv”；

2. rois：在Fast RCNN中，指的是Selective Search的輸出；在Faster RCNN中指的是RPN的輸出，一堆矩形候選框框，形狀為1x5x1x1（4個坐标+索引index），其中值得注意的是：坐标的參考系不是針對feature map這張圖的，而是針對原圖的（神經網絡最開始的輸入）

ROI Pooling的輸出

輸出是batch個vector，其中batch的值等于RoI的個數，vector的大小為channel * w * h；RoI Pooling的過程就是将一個個大小不同的box矩形框，都映射成大小固定（w * h）的矩形框；

ROI Pooling的過程

目标檢測特殊層：ROIPooling層

如圖所示，我們先把roi中的坐标映射到feature map上，映射規則比較簡單，就是把各個坐标除以“輸入圖檔與feature map的大小的比值”，得到了feature map上的box坐标後，我們使用Pooling得到輸出；由于輸入的圖檔大小不一，是以這裡我們使用的類似Spp Pooling，在Pooling的過程中需要計算Pooling後的結果對應到feature map上所占的範圍，然後在那個範圍中進行取max或者取average

Caffe ROI Pooling的源碼解析

1. LayerSetUp

template <typename Dtype>
void ROIPoolingLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  ROIPoolingParameter roi_pool_param = this->layer_param_.roi_pooling_param();
  //經過Pooling後的feature map的高
  pooled_height_ = roi_pool_param.pooled_h();
  //經過Pooling後的feature map的寬
  pooled_width_ = roi_pool_param.pooled_w();
  //輸入圖檔與feature map之前的比值，這個feature map指roi pooling層的輸入
  spatial_scale_ = roi_pool_param.spatial_scale();
}

2. Reshape

template <typename Dtype>
void ROIPoolingLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  //輸入的feature map的channel數
  channels_ = bottom[]->channels();
  //輸入的feature map的高
  height_ = bottom[]->height();
  //輸入的feature map的寬
  width_ = bottom[]->width();
  //設定輸出的形狀NCHW，N=ROI的個數，C=channels_，H=pooled_height_，W=pooled_width_
  top[]->Reshape(bottom[]->num(), channels_, pooled_height_,
      pooled_width_);
  //max_idx_的形狀與top一緻
  max_idx_.Reshape(bottom[]->num(), channels_, pooled_height_,
      pooled_width_);
}

3. Forward

template <typename Dtype>
void ROIPoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  //輸入有兩部分組成，data和rois
  const Dtype* bottom_data = bottom[]->cpu_data();
  const Dtype* bottom_rois = bottom[]->cpu_data();
  // Number of ROIs
  int num_rois = bottom[]->num();
  int batch_size = bottom[]->num();
  int top_count = top[]->count();
  Dtype* top_data = top[]->mutable_cpu_data();
  caffe_set(top_count, Dtype(-FLT_MAX), top_data);
  int* argmax_data = max_idx_.mutable_cpu_data();
  caffe_set(top_count, -, argmax_data);

  // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
  for (int n = ; n < num_rois; ++n) {
    int roi_batch_ind = bottom_rois[];
    //把原圖的坐标映射到feature map上面
    int roi_start_w = round(bottom_rois[] * spatial_scale_);
    int roi_start_h = round(bottom_rois[] * spatial_scale_);
    int roi_end_w = round(bottom_rois[] * spatial_scale_);
    int roi_end_h = round(bottom_rois[] * spatial_scale_);
    //計算每個roi在feature map上面的大小
    int roi_height = max(roi_end_h - roi_start_h + , );
    int roi_width = max(roi_end_w - roi_start_w + , );
    //pooling之後的feature map的一個值對應于pooling之前的feature map上的大小
    //注：由于roi的大小不一緻，是以每次都需要計算一次
    const Dtype bin_size_h = static_cast<Dtype>(roi_height)
                             / static_cast<Dtype>(pooled_height_);
    const Dtype bin_size_w = static_cast<Dtype>(roi_width)
                             / static_cast<Dtype>(pooled_width_);
    //找到對應的roi的feature map，如果input data的batch size為1
    //那麼roi_batch_ind=0
    const Dtype* batch_data = bottom_data + bottom[]->offset(roi_batch_ind);
    //pooling的過程是針對每一個channel的，是以需要循環周遊
    for (int c = ; c < channels_; ++c) {
      //計算output的每一個值，是以需要周遊一遍output，然後求出所有值
      for (int ph = ; ph < pooled_height_; ++ph) {
        for (int pw = ; pw < pooled_width_; ++pw) {
          // Compute pooling region for this output unit:
          //  start (included) = floor(ph * roi_height / pooled_height_)
          //  end (excluded) = ceil((ph + 1) * roi_height / pooled_height_)
          // 計算output上的一點對應于input上面區域的大小[hstart, wstart, hend, wend]
          int hstart = static_cast<int>(floor(static_cast<Dtype>(ph)
                                              * bin_size_h));
          int hend = static_cast<int>(ceil(static_cast<Dtype>(ph + )
                                           * bin_size_h));
          int wstart = static_cast<int>(floor(static_cast<Dtype>(pw)
                                              * bin_size_w));
          int wend = static_cast<int>(ceil(static_cast<Dtype>(pw + )
                                           * bin_size_w));
          //将映射後的區域平動到對應的位置[hstart, wstart, hend, wend]
          hstart = min(max(hstart + roi_start_h, ), height_);
          hend = min(max(hend + roi_start_h, ), height_);
          wstart = min(max(wstart + roi_start_w, ), width_);
          wend = min(max(wend + roi_start_w, ), width_);
          //如果映射後的矩形框不符合
          bool is_empty = (hend <= hstart) || (wend <= wstart);
          //pool_index指的是此時計算的output的值對應于output的位置
          const int pool_index = ph * pooled_width_ + pw;
          //如果矩形不符合，此處output的值設為0，此處的對應于輸入區域的最大值為-1
          if (is_empty) {
            top_data[pool_index] = ;
            argmax_data[pool_index] = -;
          }
          //周遊output的值對應于input的區域塊
          for (int h = hstart; h < hend; ++h) {
            for (int w = wstart; w < wend; ++w) {
             // 對應于input上的位置
              const int index = h * width_ + w;
              //計算區域塊的最大值，儲存在output對應的位置上
              //同時記錄最大值的索引
              if (batch_data[index] > top_data[pool_index]) {
                top_data[pool_index] = batch_data[index];
                argmax_data[pool_index] = index;
              }
            }
          }
        }
      }
      // Increment all data pointers by one channel
      batch_data += bottom[]->offset(, );
      top_data += top[]->offset(, );
      argmax_data += max_idx_.offset(, );
    }
    // Increment ROI data pointer
    bottom_rois += bottom[]->offset();
  }
}

目标檢測特殊層：ROIPooling層

ROI Pooling的意義

ROI Pooling的輸入

ROI Pooling的輸出

ROI Pooling的過程

Caffe ROI Pooling的源碼解析

繼續閱讀

ROI 詳解

python+opencv圖像分割：分割不規則ROI區域方法彙總一、已知邊界坐标，直接畫出多邊形二、通過形态學操作産生Mask三、人機互動式

關于提升短信ROI，我的6點思考

異形ROI的建立與使用

提高Affiliate Offer ROI的秘訣 - 後退按鈕重定向

高德地圖擷取POI、ROI面或邊界

Matlab圖像處理學習筆記（二）：基于顔色的圖像分割

目标檢測特殊層：RFCN中PSROIPooling層

【OpenCV】grabcut摳圖

利用OpenCV的函數Rect()提取感興趣區域的C++代碼

判斷點與多邊形的關系（5）：OpenCV互動式繪制圖像ROI應用代碼

OpenCV圖像處理使用筆記（一）——使用滑鼠選取任何形狀的ROI

OpenCV利用滑鼠進行矩形ROI選取

Emgu CV4圖像處理之ROI與mask掩碼10(C#)

opencv-python 視訊滑鼠動态選擇矩形區域ROI

opencv：感興趣區域的操作：ROI