ShuffleNet中add層和concatenate層的差別

最近學習輕量級網絡：ShuffleNet V1，看論文過程中對其模型中add和concat層不了解，檢視一番資料後，總結如下：

主流輕量級CNN網絡

ShuffleNet V1 和ShuffleNet V2；
MobileNet V1 和 MobileNet V1；
Xception
SqueezeNet

這幾個模型是16年來比較流行的網絡，值得去學習其論文，跑代碼。

參考連結： https://blog.csdn.net/u014451076/article/details/80162924

add層和concat層的差別

通俗講： add層就是輸出結果累加，其次元不變，但資訊量增加了；concat層就是把資料結果級聯，增加了次元，資訊量不同。

舉一個例子：

add：

a = [[1,2], [3, 4]]
b =  [[11,12], [13, 14]]
c = add(a, b)  # c = [[12,14], [16, 18]]  這裡add表示add層操作，把輸出結果值相加了

concat：

a = [[1,2], [3, 4]]
b =  [[11,12], [13, 14]]
c = concat(a, b)  # c = [[1,2], [3, 4], [11,12], [13, 14]]  這裡concat表示concat層操作，把輸出結果級聯，增加了次元

DenseNet和Inception中更多采用的是concatenate操作，而ResNet更多采用的add操作，Resnet是做值的疊加，通道數是不變的，DenseNet是做通道的合并。你可以這麼了解，add是描述圖像的特征下的資訊量增多了，但是描述圖像的次元本身并沒有增加，隻是每一維下的資訊量在增加，這顯然是對最終的圖像的分類是有益的。而concatenate是通道數的合并，也就是說描述圖像本身的特征增加了，而每一特征下的資訊是沒有增加。

在代碼層面就是ResNet使用的都是add操作，而DenseNet使用的是concatenate。

這些對我們設計網絡結構其實有很大的啟發。

通過看keras的源碼，發現add操作，

def _merge_function(self, inputs):
    output = inputs[0]
    for i in range(1, len(inputs)):
        output += inputs[i]
    return output

執行的就是加和操作，舉個例子

import keras
 
input1 = keras.layers.Input(shape=(16,))
x1 = keras.layers.Dense(8, activation='relu')(input1)
input2 = keras.layers.Input(shape=(32,))
x2 = keras.layers.Dense(8, activation='relu')(input2)
added = keras.layers.add([x1, x2])
 
out = keras.layers.Dense(4)(added)
model = keras.models.Model(inputs=[input1, input2], outputs=out)
model.summary()

列印出來模型結構就是：

Layer (type) Output Shape Param Connected to

================================================================================

input_1 (InputLayer) (None, 16) 0

input_2 (InputLayer) (None, 32) 0

dense_1 (Dense) (None, 8) 136 input_1[0][0]

dense_2 (Dense) (None, 8) 264 input_2[0][0]

add_1 (Add) (None, 8) 0 dense_1[0][0]

dense_2[0][0]

dense_3 (Dense) (None, 4) 36 add_1[0][0]

=================================================================================

Total params: 436

Trainable params: 436

Non-trainable params: 0

這個比較好了解，add層就是接在dense_1,dense_2後面的是一個連接配接操作，并沒有訓練參數。

相對來說，concatenate操作比較難了解一點。

if py_all([is_sparse(x) for x in tensors]):
    return tf.sparse_concat(axis, tensors)
else:
    return tf.concat([to_dense(x) for x in tensors], axis)

通過keras源碼發現，一個傳回sparse_concate，一個傳回concate，這個就比較明朗了，

concate操作，舉個例子

t1 = [[1, 2, 3], [4, 5, 6]]
t2 = [[7, 8, 9], [10, 11, 12]]
tf.concat([t1, t2], 0) ==> [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
tf.concat([t1, t2], 1) ==> [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]]

tensor t3 with shape [2, 3]
tensor t4 with shape [2, 3]
tf.shape(tf.concat([t3, t4], 0)) ==> [4, 3]
tf.shape(tf.concat([t3, t4], 1)) ==> [2, 6]

事實上，是關于次元的一個聯合，axis=0表示列維，1表示行維，沿着通道次元連接配接兩個張量。另一個sparse_concate則是關于稀疏矩陣的級聯，也比較好了解。

ShuffleNet中add層和concatenate層的差別

主流輕量級CNN網絡

add層和concat層的差別

Layer (type) Output Shape Param Connected to

繼續閱讀

GPT 原了解析

Deep contextualized word representations (ELMo) 閱讀筆記雙向語言模型ELMo

ELMo 原了解析

CentOS上Docker安裝GPU支援Nvidia-docker

場景文本檢測，CTPN tensorflow版本text-detection-ctpnpreparetraindemosome results

論文閱讀筆記20.05-第三周：ResNet的多種變種Residual Attention Network for Image ClassificationRes2Net: A New Multi-scale Backbone ArchitectureResNeSt: Split-Attention Networks

如何寫一篇好的科研論文背景我能夠從你的論文裡學到什麼？

Fast Spatio-Temporal Residual Network for Video Super-Resolution閱讀了解

Visual Attention

Tensorflow Day19 Denoising Autoencoder

Tensorflow Day16 Autoencoder 實作

Tensorflow Day17 Sparse Autoencoder

基于keras的多GPU深度學習網絡模型及參數儲存-筆記

A Guide For Time Series Prediction Using Recurrent Neural Networks (LSTMs)

ICLR 2017 | GAN Missing Modes 和 GAN

【深度學習-基礎知識】batchNormal原理及caffe中是如何使用的