天天看点

论文阅读--CVPR2018--video object segmentation--1

Motion-Guided Cascaded Refinement Network for Video Object Segmentation

论文阅读--CVPR2018--video object segmentation--1

Research background:

  1. As the consecutive frames among a video show strong spatio-temporal correlation, motion estimation e.g. optical flow and pixel trajectory, is essential for video object segmentation. Although many conventional motion-based methods achieve good performance, motion estimation itself is a difficult task.
  2. Deep CNNs have achieved very good performance in static image object segmentation task. Following this pipeline, some works attempt to finetune the deep CNNs on the first frame annotation, which give the CNNs the memories about the target object. However, the appearance change or background similarity may confuse these methods.

    希望解决的问题:

    之前工作没有充分利用motion信息,依赖第一帧finetune的方法不能很好适应目标物体外观的变化。

Motivation and proposed approach

This work aims to jointly utilize the spatio-temporal information in motion cues and the superior learning ability of deep CNNs.

The proposed method consists of two parts, optical-flow based moving object segmentation and Cascaded Refinement Network. The former learns to extract the coarse segmentation of the target object from optical flow input. The latter takes the coarse segmentation as guidance and generates an accurate segmentation.

提出的方法:

直接由光流和上一帧的mask估计当前帧的object segmentation,将估计结果看作guidance map,用修正网络基于当前帧得到精细的分割mask。guidance map由传统的active contour的方法估计出。Refinement Network是U-Net的形式,guidance map以attention的形式影响网络。

优点

  1. 由光流中直接提取coarse segmentation mask,巧妙地将光流与分割任务结合
  2. Refinement Net利用guidance map修正,效率高

潜在的不足

  1. 使用FlowNetV2进行光流估计,速度不会很快
  2. 如果物体消失之后重出现,或者出现严重的遮挡、形变,光流估计会出现严重的问题,此时这个方法如何work?

SeedNet: Automatic Seed Generation with Deep Reinforcement Learning for Robust Interactive Segmentation

####略读,motivation

对于交互式image object segmentation问题,基于用户输入的一对前景背景点,使用强化学习算法自动生成前景/背景点,逐步提升segmentation的效果。

motivation:将automatic seed generation的问题建模为一个马尔科夫决策过程,用deep Q-Network进行优化

Analysis of Hand Segmentation in the Wild

略读,motivation

研究detection and segmentation of hands in first-person videos的问题,finetune一个目前最优的semantic segmentation model–RefineNet。标定了EgoYouTube-Hands和HandOverFace两个数据集。最终在activity recognition任务中引入hand因素,性能得到较大提升。(此工作参考意义不大)

Motion Segmentation by Exploiting Complementary Geometric Models

略读,motivation

对于motion segmentation任务,fundamental matrix或homography会遇到问题,本文提出一个multi-view spectral clustering 框架来综合考虑fundamental matrix和homography,解决motion segmentation问题。(此工作参考意义不大)

Weakly Supervised Instance Segmentation using Class Peak Response

论文阅读--CVPR2018--video object segmentation--1

Research background

  1. As for weakly supervised object segmentation task, typical existing researches first convert pre-trained classification network to FCNs so as to generate class response maps.
  2. Although response maps can indicate important regions used by the network to identify an image class, it cannot distinguish multiple instances belonging to the same category.

Motivation and proposed approach

The researchers observe that local maximums, i.e., peaks, in a class response map typically correspond to strong visual cues residing in an instance. They back-propagate these peaks and map them to highly informative regions of each object instance e.g., instance boundaries. These peak response maps specify both spatial layouts and fine-detailed boundaries of each instance. Thus, even with some off-the-shelf methods, the proposed approach can extract instance masks.

优点

  1. 通过peak response map巧妙地将image-level的标注转化为instance-level cues,使得弱监督情形下的instance segmentation可以进行

    ####潜在的不足

    (目前不明确)

疑惑

  1. peak response如何反向传播得到highly informative regions of each object instance?为什么能得到很清晰的边缘而不是整块物体区域?
  2. 如何确定一幅图像中,同一个category包含几个物体?设定阈值吗?

启迪

  1. 利用类别标签作为监督信息,则只能处理Pascal特定的20类或COCO特定的80类,其它类别的物体(如视频中的物体)怎么处理?
  2. 对于多个object的视频目标分割任务,能否考虑利用response map定位物体的位置?

具体分析方法:

  1. 如何生成peak stimulation?

    Step1: 对于FCN网络(本研究使用FC-ResNet50),移除掉global pooling层,并将全连接替换为卷积层,可以将网络转化为一个全卷积网络。通过前馈,网络可以输出class response map,并保留空间分辨率。

    Step2:class response map为 M ∈ R C × H × × W M\in R^{C\times H\times \times W} M∈RC×H××W,我们希望得到class-wise confidence score s ∈ R C s\in R^{C} s∈RC。可以使用采样核 G C ∈ R H × W G^{C}\in R^{H\times W} GC∈RH×W, class-wise confidence score s ∈ R C s\in R^{C} s∈RC可以通过卷积求得:

    s c = M c ∗ G c = 1 N c ∑ k = 1 N c M i k , j k c s^{c}=M^{c}*G^{c}=\frac{1}{N^{c}}\sum_{k=1}^{N^{c}}M^{c}_{i_{k},j_{k}} sc=Mc∗Gc=Nc1​∑k=1Nc​Mik​,jk​c​

    即,对 N c N^{c} Nc个local maximal数值进行平均,既为此类的score

    反向传播时,梯度的计算方式为

    δ c = ∂ L ∂ M c = ∂ L ∂ s c × ∂ s ∂ M c = 1 N c ∂ L ∂ s c G c \delta ^{c}=\frac{\partial L}{\partial M^{c}}=\frac{\partial L}{\partial s^{c}}\times \frac{\partial s}{\partial M^{c}}=\frac{1}{N^{c}}\frac{\partial L}{\partial s^{c}}G^{c} δc=∂Mc∂L​=∂sc∂L​×∂Mc∂s​=Nc1​∂sc∂L​Gc

    传统方法将loss值反向传播,在所有的感受野内进行密集采样。而此研究通过上述过程,约束从感受野中进行稀疏的采样,主要包含potential positive和hard negative,提升学习效果。

  2. 如何进行peak back-propagation

    作者将peak back-propagation的过程类比为一个walker从top layer出发,随机地向bottom layer前进的过程。对于一个卷积层,输入为 U U U输出为 V V V,则visiting probability的计算方式为:

    论文阅读--CVPR2018--video object segmentation--1
    论文阅读--CVPR2018--video object segmentation--1
    其它层与普通的CNN层相同,可以进行反向传播,由class peak response得到fine detailed visual cues residing inside each object。

继续阅读