天天看点

论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

文章目录

      • 1. 论文总述
      • 2. 利用CNN预测光流的另一种思路
      • 3. 通过堆叠网络来进行光流估计优化的idea来源
      • 4. FlyingThings3D (Things3D) dataset
      • 5. The order of presenting training data with different properties matters.
      • 6. 堆叠两个相同网络时(FlowNetS)
      • 7. 堆叠不同的网络结构(FlowNetC+FlowNetS,或者3/8 通道数的FlowNetS)
      • 8. FlowNet-CSS的训练好像很麻烦
      • 9. Small Displacements 在传统方法里被解决的很好
      • 10. 为小位移场景增加一个网络
      • 11. 不同模型结果对比
      • 参考文献

1. 论文总述

本文是FlowNet的进化版,由于FlowNet是基于CNN光流估计的开创之作,所以肯定有很多不足之处,本文FlowNet 2.0就从三个方面做了改进:

  • (1)数据方面:首先扩充数据集,FlyThings3D,以及侧重 small displacements的数据集ChairsSDHom;然后实验验证了不同数据集的训练顺序对模型性能也有很大影响,先学习简单数据集,再学习困难点的数据集,这样比较合理
  • (2)网络结构:通过在FlowNetC的基础上堆叠FlowNetS来加大模型,通过堆叠子网络来优化光流估计结果,同时在模型中引入了warp的概念:即将光流估计的中间结果wrap回到原图上(原feature),然后继续优化,即逐步缩小预测结果与GT的差距,类似机器学习中GBDT的理念,其实在传统光流估计算法中(如DIS的真正实现中),不断迭代的过程就是逐步缩小差距理念的体现。
  • (3)small displacements:特意为小位移场景下的光流估计设计了一个网络结构以及合成了一个小位移的数据集ChairsSDHom
论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

First, we evaluate the influence

of dataset schedules. Interestingly, the more sophisticated

training data provided by Mayer et al. [18] leads to inferior results if used in isolation. However, a learning schedule consisting of multiple datasets improves results signifi-

cantly. In this scope, we also found that the FlowNet version

with an explicit correlation layer outperforms the version

without such layer. This is in contrast to the results reported

in Dosovitskiy et al. [10].

As a second contribution, we introduce a warping operation and show how stacking multiple networks using this

operation can significantly improve the results. By varying

the depth of the stack and the size of individual components

we obtain many network variants with different size and

runtime. This allows us to control the trade-off between accuracy and computational resources. We provide networks

for the spectrum between 8fps and 140fps.

Finally, we focus on small, subpixel motion and realworld data. To this end, we created a special training dataset

and a specialized network. We show that the architecture

trained with this dataset performs well on small motions

typical for real-world videos. To reach optimal performance

on arbitrary displacements, we add a network that learns to

fuse the former stacked network with the small displacement network in an optimal manner

2. 利用CNN预测光流的另一种思路

An alternative approach to learning-based optical flow

estimation is to use CNNs to match image patches. Thewlis

et al. [29] formulate Deep Matching [31] as a CNN and

optimize it end-to-end. Gadot & Wolf [12] and Bailer et

al. [3] learn image patch descriptors using Siamese network

architectures. These methods can reach good accuracy, but

require exhaustive matching of patches. Thus, they are restrictively slow for most practical applications. Moreover,

methods based on (small) patches are inherently unable to

use the larger whole-image context.

利用cnn进行patch的match有问题:(1)速度太慢(2)由于是局部patch,不能利用image的上下文信息(我觉得这样导致不能充分挖掘CNN的潜力)

3. 通过堆叠网络来进行光流估计优化的idea来源

CNNs trained for per-pixel prediction tasks often produce noisy or blurry results. As a remedy, off-the-shelf optimization can be applied to the network predictions (e.g.,

optical flow can be postprocessed with a variational approach [10]). In some cases, this refinement can be approximated by neural networks: Chen & Pock [9] formulate

their reaction diffusion model as a CNN and apply it to image denoising, deblocking and superresolution. Recently,

it has been shown that similar refinement can be obtained

by stacking several CNNs on top of each other. This led to

improved results in human pose estimation [17, 8] and semantic instance segmentation [22]. In this paper we adapt

the idea of stacking networks to optical flow estimation.

Our network architecture includes warping layers th

利用CNN进行像素级预测的时候,确实容易出现很多噪声或者模糊边界

4. FlyingThings3D (Things3D) dataset

The FlyingThings3D (Things3D) dataset proposed by

Mayer et al. [18] can be seen as a three-dimensional version of Chairs: 22k renderings of random scenes show 3D

models from the ShapeNet dataset [23] moving in front of

static 3D backgrounds. In contrast to Chairs, the images

show true 3D motion and lighting effects and there is more

variety among the object models

5. The order of presenting training data with different properties matters.

论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

同时:FlowNetC outperforms FlowNetS.

推翻了FlowNet_V1 中的结论

6. 堆叠两个相同网络时(FlowNetS)

We make the following observations: (1) Just stacking

networks without warping yields better results on Chairs,

but worse on Sintel; the stacked network is over-fitting. (2)

Stacking with warping always improves results. (3) Adding

an intermediate loss after Net1 is advantageous when training the stacked network end-to-end. (4) The best results are

obtained by keeping the first network fixed and only training the second network after the warping operation.

Clearly, since the stacked network is twice as big as the

single network, over-fitting is an issue. The positive effect

of flow refinement after warping can counteract this problem, yet the best of both is obtained when the stacked networks are trained one after the other, since this avoids over-

fitting while having the benefit of flow refinement.

堆叠两个相同网络时(FlowNetS):容易过拟合,加入warp,或者模型参数一个接一个学习(更新后一个网络的参数时,冻结前一个网络的参数),这两招都可以一定程度上避免过拟合。

文中还提到一点:以前都是在FlyingChairs上训练,在Sintel上测试,但为了查看模型是否过拟合, 作者测试时用了一部分FlyingChairs的数据。

7. 堆叠不同的网络结构(FlowNetC+FlowNetS,或者3/8 通道数的FlowNetS)

小写的s表示:3/8 通道数的FlowNetS

论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

两个小网络可能要比一个大网络又快又好。。

8. FlowNet-CSS的训练好像很麻烦

. As also done in [17, 9], we therefore add networks

with different weights to the stack. Compared to identical

weights, stacking networks with different weights increases

the memory footprint, but does not increase the runtime. In

this case the top networks are not constrained to a general

improvement of their input, but can perform different tasks

at different stages and the stack can be trained in smaller

pieces by fixing existing networks and adding new networks

one-by-one. We do so by using the Chairs→Things3D

schedule from Section 3 for every new network and the

best configuration with warping from Section 4.1. Furthermore, we experiment with different network sizes and alternatively use FlowNetS or FlowNetC as a bootstrapping

network. We use FlowNetC only in case of the bootstrap

network, as the input to the next network is too diverse to be

properly handeled by the Siamese structure of FlowNetC.

Smaller size versions of the networks were created by taking only a fraction of the number of channels for every layer

in the network. Figure 4 shows the network accuracy and

runtime for different network sizes of a single FlowNetS.

Factor 3/8

yields a good trade-off between speed and accuracy when aiming for faster networks.

9. Small Displacements 在传统方法里被解决的很好

While the original FlowNet [10] performed well on the

Sintel benchmark, limitations in real-world applications

have become apparent. In particular, the network cannot

reliably estimate small motions (see Figure 1). This is

counter-intuitive, since small motions are easier for traditional methods, and there is no obvious reason why networks should not reach the same performance in this setting. Thus, we compared the training data to the UCF101

dataset [25] as one example of real-world data. While

Chairs are similar to Sintel, UCF101 is fundamentally different (we refer to our supplemental material for the analysis): Sintel is an action movie and as such contains many

fast movements that are difficult for traditional methods,

while the displacements we see in the UCF101 dataset are

much smaller, mostly smaller than 1 pixel. Thus, we created

a dataset in the visual style of Chairs but with very small displacements and a displacement histogram much more like

UCF101. We also added cases with a background that is

homogeneous or just consists of color gradients. We call

this dataset ChairsSDHom.

所以觉得是数据集的问题,所以专门为Small Displacements 合成了一个数据集:ChairsSDHom(按照UCF101的直方图分布)

We fine-tuned our FlowNet2-CSS network for smaller

displacements by further training the whole network

stack on a mixture of Things3D and ChairsSDHom

and by applying a non-linearity to the error to down-

weight large displacements2

. We denote this network by

FlowNet2-CSS-ft-sd. This improves results on small displacements and we found that this particular mixture does

not sacrifice performance on large displacements. However, in case of subpixel motion, noise still remains a problem and we conjecture that the FlowNet architecture might

in general not be perfect for such motion.

FlowNet2-CSS-ft-sd仅仅表示在小位移数据上微调,并没有改变模型结构。

虽然有精度提升,但仍然不够,所以还必须得改变模型结构

10. 为小位移场景增加一个网络

论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Therefore, we

slightly modified the original FlowNetS architecture and removed the stride 2 in the first layer. We made the beginning

of the network deeper by exchanging the 7×7 and 5×5

kernels in the beginning with multiple 3×3 kernels2

. Because noise tends to be a problem with small displacements,

we add convolutions between the upconvolutions to obtain

smoother estimates like in [18]. We denote the resulting

architecture by FlowNet2-SD; see Figure 2.

注:说是在FlowNetS上做的修改

注:小位移场景下,噪声是个大问题!!!!

有了FlowNet2-SD之后,就需要把两者做个融合,即Figure2所示的最终网络结构。

Finally, we created a small network that fuses

FlowNet2-CSS-ft-sd and FlowNet2-SD (see Figure 2). The

fusion network receives the flows, the flow magnitudes and

the errors in brightness after warping as input. It contracts

the resolution twice by a factor of 2 and expands again.

Contrary to the original FlowNet architecture it expands to

the full resolution (这块不理解) . We find that this produces crisp motion

boundaries (本文的这种方法对运动边界的效果更好?)and performs well on small as well as on large

displacements. We denote the final network as FlowNet2.

11. 不同模型结果对比

论文阅读:FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

参考文献

1. FlowNet 论文笔记

2. 光流介绍以及FlowNet学习笔记

3. 图像处理中的全局优化技术(Global optimization techniques in image processing and computer vision) (三)(主要是介绍变分优化)

4. 光流Optical Flow介绍与OpenCV实现(光流可视化的C++代码)

5. FlowNet到FlowNet2.0:基于卷积神经网络的光流预测算法

继续阅读