

最近在看DPCNN这个模型,这个模型是一个腾讯提出的文本分类模型,借鉴了Text CNN和ResNet两个模型的特征和架构,因为是用在文本上应用了CNN,但是一直对卷积的概念不太清楚,所以趁着坐火车的时间看了看相关的博客。


       对于图像而言,卷积应该是一个革命性的操作,本质上是一种图像特征提取的方法,为什么我们不用全连接层硬怼呢?首先,如果每个特征都要对所有的像素点进行加权的话,其实很容易造成过拟合,而且参数过多,所以最开始我对卷积的理解是,使用卷积层代替全连接层能够有效地减少参数(这一点跟RNN共享参数是一个道理)。将一个卷积核作用于一个图像的一个通道,那么就能得到这幅图在该卷积核下的一个feature map,然后一般会有很多卷积核,然后得到很多个feature map,以前我们做数字图像处理,其实已经有卷积的概念了,那时候我们是手动构造特征,也就是规定卷积核到底长什么样子,比如用于边缘检测的Sobel卷积核,通过了Sobel卷积核作用后得到的feature map就能得到一幅图的边缘,即亮度突变的地方。但是,卷积核的本质是啥?







       Frequencies for images

       The Fourier domain for images

       The convolution theorem

Due to the convolution theorem, we can imagine that convolutional nets operate on images in the Fourier domain and from the images above we now know that images in that domain contain a lot of information about orientation. Thus convolutional nets should be better than traditional algorithms when it comes to rotated images and this is indeed the case (although convolutional nets are still very bad at this when we compare them to human vision).

       The reason why the convolution operation is often described as a filtering operation, and why convolution kernels are often named filters will be apparent from the next example, which is very close to convolution.

       If we transform the original image with a Fourier transform and then multiply it by a circle padded by zeros (zeros=black) in the Fourier domain, we filter out all high frequency values (they will be set to zero, due to the zero padded values). Note that the filtered image still has the same striped pattern, but its quality is much worse now — this is how jpeg compression works (although a different but similar transform is used), we transform the image, keep only certain frequencies and transform back to the spatial image domain; the compression ratio would be the size of the black area to the size of the circle in this example.


  1. https://timdettmers.com/2015/03/26/convolution-deep-learning/
  2. http://www.hankcs.com/ml/understanding-the-convolution-in-deep-learning.html
