天天看點

論文閱讀--CVPR2018--video object segmentation--1

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

論文閱讀--CVPR2018--video object segmentation--1

Research background:

  1. As the consecutive frames among a video show strong spatio-temporal correlation, motion estimation e.g. optical flow and pixel trajectory, is essential for video object segmentation. Although many conventional motion-based methods achieve good performance, motion estimation itself is a difficult task.
  2. Deep CNNs have achieved very good performance in static image object segmentation task. Following this pipeline, some works attempt to finetune the deep CNNs on the first frame annotation, which give the CNNs the memories about the target object. However, the appearance change or background similarity may confuse these methods.

    希望解決的問題:

    之前工作沒有充分利用motion資訊,依賴第一幀finetune的方法不能很好适應目标物體外觀的變化。

Motivation and proposed approach

This work aims to jointly utilize the spatio-temporal information in motion cues and the superior learning ability of deep CNNs.

The proposed method consists of two parts, optical-flow based moving object segmentation and Cascaded Refinement Network. The former learns to extract the coarse segmentation of the target object from optical flow input. The latter takes the coarse segmentation as guidance and generates an accurate segmentation.

提出的方法:

直接由光流和上一幀的mask估計目前幀的object segmentation,将估計結果看作guidance map,用修正網絡基于目前幀得到精細的分割mask。guidance map由傳統的active contour的方法估計出。Refinement Network是U-Net的形式,guidance map以attention的形式影響網絡。

優點

  1. 由光流中直接提取coarse segmentation mask,巧妙地将光流與分割任務結合
  2. Refinement Net利用guidance map修正,效率高

潛在的不足

  1. 使用FlowNetV2進行光流估計,速度不會很快
  2. 如果物體消失之後重出現,或者出現嚴重的遮擋、形變,光流估計會出現嚴重的問題,此時這個方法如何work?

SeedNet: Automatic Seed Generation with Deep Reinforcement Learning for Robust Interactive Segmentation

####略讀,motivation

對于互動式image object segmentation問題,基于使用者輸入的一對前景背景點,使用強化學習算法自動生成前景/背景點,逐漸提升segmentation的效果。

motivation:将automatic seed generation的問題模組化為一個馬爾科夫決策過程,用deep Q-Network進行優化

Analysis of Hand Segmentation in the Wild

略讀,motivation

研究detection and segmentation of hands in first-person videos的問題,finetune一個目前最優的semantic segmentation model–RefineNet。标定了EgoYouTube-Hands和HandOverFace兩個資料集。最終在activity recognition任務中引入hand因素,性能得到較大提升。(此工作參考意義不大)

Motion Segmentation by Exploiting Complementary Geometric Models

略讀,motivation

對于motion segmentation任務,fundamental matrix或homography會遇到問題,本文提出一個multi-view spectral clustering 架構來綜合考慮fundamental matrix和homography,解決motion segmentation問題。(此工作參考意義不大)

Weakly Supervised Instance Segmentation using Class Peak Response

論文閱讀--CVPR2018--video object segmentation--1

Research background

  1. As for weakly supervised object segmentation task, typical existing researches first convert pre-trained classification network to FCNs so as to generate class response maps.
  2. Although response maps can indicate important regions used by the network to identify an image class, it cannot distinguish multiple instances belonging to the same category.

Motivation and proposed approach

The researchers observe that local maximums, i.e., peaks, in a class response map typically correspond to strong visual cues residing in an instance. They back-propagate these peaks and map them to highly informative regions of each object instance e.g., instance boundaries. These peak response maps specify both spatial layouts and fine-detailed boundaries of each instance. Thus, even with some off-the-shelf methods, the proposed approach can extract instance masks.

優點

  1. 通過peak response map巧妙地将image-level的标注轉化為instance-level cues,使得弱監督情形下的instance segmentation可以進行

    ####潛在的不足

    (目前不明确)

疑惑

  1. peak response如何反向傳播得到highly informative regions of each object instance?為什麼能得到很清晰的邊緣而不是整塊物體區域?
  2. 如何确定一幅圖像中,同一個category包含幾個物體?設定門檻值嗎?

啟迪

  1. 利用類别标簽作為監督資訊,則隻能處理Pascal特定的20類或COCO特定的80類,其它類别的物體(如視訊中的物體)怎麼處理?
  2. 對于多個object的視訊目标分割任務,能否考慮利用response map定位物體的位置?

具體分析方法:

  1. 如何生成peak stimulation?

    Step1: 對于FCN網絡(本研究使用FC-ResNet50),移除掉global pooling層,并将全連接配接替換為卷積層,可以将網絡轉化為一個全卷積網絡。通過前饋,網絡可以輸出class response map,并保留白間分辨率。

    Step2:class response map為 M ∈ R C × H × × W M\in R^{C\times H\times \times W} M∈RC×H××W,我們希望得到class-wise confidence score s ∈ R C s\in R^{C} s∈RC。可以使用采樣核 G C ∈ R H × W G^{C}\in R^{H\times W} GC∈RH×W, class-wise confidence score s ∈ R C s\in R^{C} s∈RC可以通過卷積求得:

    s c = M c ∗ G c = 1 N c ∑ k = 1 N c M i k , j k c s^{c}=M^{c}*G^{c}=\frac{1}{N^{c}}\sum_{k=1}^{N^{c}}M^{c}_{i_{k},j_{k}} sc=Mc∗Gc=Nc1​∑k=1Nc​Mik​,jk​c​

    即,對 N c N^{c} Nc個local maximal數值進行平均,既為此類的score

    反向傳播時,梯度的計算方式為

    δ c = ∂ L ∂ M c = ∂ L ∂ s c × ∂ s ∂ M c = 1 N c ∂ L ∂ s c G c \delta ^{c}=\frac{\partial L}{\partial M^{c}}=\frac{\partial L}{\partial s^{c}}\times \frac{\partial s}{\partial M^{c}}=\frac{1}{N^{c}}\frac{\partial L}{\partial s^{c}}G^{c} δc=∂Mc∂L​=∂sc∂L​×∂Mc∂s​=Nc1​∂sc∂L​Gc

    傳統方法将loss值反向傳播,在所有的感受野内進行密集采樣。而此研究通過上述過程,限制從感受野中進行稀疏的采樣,主要包含potential positive和hard negative,提升學習效果。

  2. 如何進行peak back-propagation

    作者将peak back-propagation的過程類比為一個walker從top layer出發,随機地向bottom layer前進的過程。對于一個卷積層,輸入為 U U U輸出為 V V V,則visiting probability的計算方式為:

    論文閱讀--CVPR2018--video object segmentation--1
    論文閱讀--CVPR2018--video object segmentation--1
    其它層與普通的CNN層相同,可以進行反向傳播,由class peak response得到fine detailed visual cues residing inside each object。

繼續閱讀