天天看點

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

先貼論文連結及GitHub網址,如若侵權,請聯系删除。

本文的創新是提出了一個Relation-Aware Global Attention (RGA) module,該module可以捕獲全局結構資訊用于注意力學習。

創新點(貢獻):

  1. 本文通過從features之間相關性的全局視角來學習每個feature node之間的注意力。作者認為全局範圍内的相關性擁有有價值的結構(類似于聚類)資訊,并從一個學到的函數(卷積網絡)來從相關性中挖掘語義關系,進而獲得注意力。
  2. 本文設計了一個Relation-aware global attention (RGA) module,該module通過兩個卷積層(卷積核為1x1)來表示全局範圍内的相關性,并基于他們來獲得注意力。該思想被應用到空間次元(RGA-S),和通道次元(RGA-C)。

關鍵技術(對應于文章第三章)

3.2 Spatial Relation-Aware Global Attention(空間)

輸入:特征張量(feature map): 

輸入:空間注意力圖(spatial attention map)大小為 

根據輸入feature map,獲得  feature nodes,其中每個feature nodes 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的次元為 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 ,即 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 。将這 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 個nodes看成圖 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 個節點。

從節點  到節點 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的對級相關性 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 定義為嵌入空間中的點乘相關性(dot-product affinity)。

其中  和 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 為兩個嵌入函數(1x1卷積層 + BN層 + ReLU)。反映在代碼中,如下所示:

# Embedding functions for modeling relations
		if self.use_spatial:
			self.theta_spatial = nn.Sequential(
				nn.Conv2d(in_channels=self.in_channel, out_channels=self.inter_channel,
								kernel_size=1, stride=1, padding=0, bias=False),
				nn.BatchNorm2d(self.inter_channel),
				nn.ReLU()
			)
			self.phi_spatial = nn.Sequential(
				nn.Conv2d(in_channels=self.in_channel, out_channels=self.inter_channel,
							kernel_size=1, stride=1, padding=0, bias=False),
				nn.BatchNorm2d(self.inter_channel),
				nn.ReLU()
			)
           

同理,可以得到節點  到節點 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的對級相關性 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 。

本文使用  來描述 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 和 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的雙向相關性,然後使用相似性矩陣 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 來表示所有節點之間的對級相關性。

對于第  個feature node,本文以一個确切的固定順序來堆疊所有nodes的對級相關性,其中node id為 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 ,相關性向量為 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 ,示例可參考圖3(a), 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 。

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

為了學習第  個node的注意力,本文将feature 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 本身和相關性向量串聯來探索global scope structural information relative和local original information。

因為  與相關性向量不在同一個feature domain中,是以使用下列公式将其轉換,并進行串聯獲得 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 。

其中  和 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 分别為特征 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 本身和全局相關性 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的嵌入函數(1x1卷積層 + BN層 + ReLU)。反映在代碼中,如下:

# Embedding functions for original features
		if self.use_spatial:
			self.gx_spatial = nn.Sequential(
				nn.Conv2d(in_channels=self.in_channel, out_channels=self.inter_channel,
						kernel_size=1, stride=1, padding=0, bias=False),
				nn.BatchNorm2d(self.inter_channel),
				nn.ReLU()
			)
# ...
# Embedding functions for relation features
		if self.use_spatial:
			self.gg_spatial = nn.Sequential(
				nn.Conv2d(in_channels=self.in_spatial * 2, out_channels=self.inter_spatial,
						kernel_size=1, stride=1, padding=0, bias=False),
				nn.BatchNorm2d(self.inter_spatial),
				nn.ReLU()
           

然後通過學到的模型從其中挖掘有價值的知識來生成注意力值  。

其中  和 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 為1x1卷積操作 + BN, 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 通過比例因子 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

來降維, 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 将通道次元降為1。反映在代碼中如下:

# Networks for learning attention weights
		if self.use_spatial:
			num_channel_s = 1 + self.inter_spatial
			self.W_spatial = nn.Sequential(
				nn.Conv2d(in_channels=num_channel_s, out_channels=num_channel_s//down_ratio,
						kernel_size=1, stride=1, padding=0, bias=False),
				nn.BatchNorm2d(num_channel_s//down_ratio),
				nn.ReLU(),
				nn.Conv2d(in_channels=num_channel_s//down_ratio, out_channels=1,
						kernel_size=1, stride=1, padding=0, bias=False),
				nn.BatchNorm2d(1)
			)
           

3.3 Channel Relation-Aware Global Attention (通道)

輸入:特征張量(feature map): 

輸入:通道注意力向量(channel attention vector)大小為 

根據輸入feature map,獲得  個feature nodes,其中每個feature nodes 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的次元為 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 ,即 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 。将這 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 個nodes看成圖 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 個節點。

從節點  到節點 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 的對級相關性 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 定義為嵌入空間中的點乘相關性(dot-product affinity)。

其中  和 

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

 和空間通道的定義一緻。

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

3.4 Analysis and Discussion

RGA VS CBAM: CBAM采用7x7的卷積核+sigmiod激活函數來決定空間特征位置的注意力,即隻有中心位置的7x7=49個近鄰feature nodes被用來計算注意力。而RGA-S聯合利用所有空間位置的feature nodes來全局決策注意力值,并且僅使用1x1的卷積核。

RGA VS NL and SNL: 對于目标特征位置,NL彙總特征後加到原始特征中用于refinement,計算源位置特征的權重和。NL僅通過一種決策方式使用相關性作為權重用于特征合成。缺乏目标位置的特定适應性。RGA通過一學到的模組化函數(卷積)從相關性的全局範圍結構資訊來挖掘知識。

4.2 實驗結果

CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄
CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄
CVPR2020(ReID):Relation-Aware Global Attention for Person Re-identification,論文閱讀個人記錄

5 Conclusion

本文提出一種簡單有效的Relation-Aware Global Attention module,可以模組化全局範圍内的結構資訊,并通過學到的模型來推理注意力。對于每個特征位置,堆疊每個特征和所有特征之間的相關性,串聯特征本身來推理目前位置處的注意力。

個人總結:

個人感覺這篇文章創新點隻有一個,也就是RGA-S,和RGA-C,優勢是可以添加到不同的backbone中,作為一個單獨的注意力子產品,從實驗結果上看是比較有優勢的。看到baseline也能達到一個很高的準确率,這點我比較疑惑,因為其他論文裡baseline的效果無法達到94.2%的準确率。羅大神的reid strong baseline也隻能達到94.5%。我甚至覺得,這個baseline裡面用的東西對提升準确率也有很大幫助。後續若有時間會再更新。

繼續閱讀