天天看點

【GCN-RS】Deep GCN with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD‘21)Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)

Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)

一句話總結這篇文章就是在LR-GCCF和LightGCN的基礎上,同時結合了 left normalization (給不同的鄰居配置設定相等的歸一化,PinSAGE)和 symmetric normalization (degree大的鄰居配置設定小權重,LightCGN)。

Abstract

摘要完全概括了文章的内容:

現有的GCN RS模型在淺層就達到了最佳性能,這就沒有用到高階信号;現有的GCN模型對鄰居信号聚合時,使用相同的歸一化規則,要麼PinSAGE那樣同等重要,要麼是LightGCN那樣根據degree配置設定重要性。

但是同一套歸一化規則肯定不能适配所有節點,是以提出了一個新的模型Deep Graph Convolutional Network with Hybrid Normalization (DGCN-HN)。

首先設計了一個residual connection 和 holistic connection**(LR-GCCF和LightGCN的混合版)**,以解決over-smoothing,可以訓練深層GCN(8層)。

然後設計了hybrid normalization layer。通過一個簡化的attention**(正常的attention沒用)**實作hybrid 兩種正則化。

實驗結果還證明了這種做法有利于degree低的user。

Intro

又強調了一邊現在的GCN RS都不深、使用了固定的歸一化規則,使用相同的歸一化規則會導緻局部最優。

文章用一個例子解釋了上述缺點:

【GCN-RS】Deep GCN with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD‘21)Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)
  • 淺層GCN不能利用高階信号,給喜歡電子産品的U2推薦無人機就需要高階信号。
  • 給所有節點使用 left normalization ,流行度很高的IPhone就會嚴重影響U1的興趣,U1買IPhone隻是剛需,興趣并不在電子産品,因為U1經常購買紙質文具。
  • 給所有節點使用 symmetric normalization,流行度很高的IPhone就無法提供”電子産品“這一興趣,U2和U3的興趣就難以被刻畫出來。

Model

【GCN-RS】Deep GCN with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD‘21)Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)

所謂Residual就是在LightGCN的基礎上加一個Residual:

h u ( l + 1 ) = ∑ v ∈ N u A ~ u v h v ( l ) + h u ( l ) , h u ( 0 ) = e u h v ( l + 1 ) = ∑ u ∈ N v A ~ v u h u ( l ) + h v ( l ) , h v ( 0 ) = e v \begin{aligned} \mathbf{h}_{u}^{(l+1)} &=\sum_{v \in N_{u}} \tilde{\mathrm{A}}_{u v} \mathbf{h}_{v}^{(l)}+\mathbf{h}_{u}^{(l)}, \quad \mathbf{h}_{u}^{(0)}=\mathbf{e}_{u} \\ \mathbf{h}_{v}^{(l+1)} &=\sum_{u \in N_{v}} \tilde{\mathrm{A}}_{v u} \mathbf{h}_{u}^{(l)}+\mathbf{h}_{v}^{(l)}, \quad \mathbf{h}_{v}^{(0)}=\mathbf{e}_{v} \end{aligned} hu(l+1)​hv(l+1)​​=v∈Nu​∑​A~uv​hv(l)​+hu(l)​,hu(0)​=eu​=u∈Nv​∑​A~vu​hu(l)​+hv(l)​,hv(0)​=ev​​

A ~ u v \tilde{\mathrm{A}}_{u v} A~uv​ 是Hybrid Normalization,混合了left normalization 和 symmetric normalization。一個簡單的想法肯定兩種正則化平均嘛,那進一步的想法就是做一個權重平均,上Attention:

h N u , L N ( l + 1 ) = ∑ v ∈ N u 1 ∣ N u ∣ h v ( l ) h N u , S N ( l + 1 ) = ∑ v ∈ N u 1 ∣ N u ∣ ∣ N v ∣ h v ( l ) h u ( l + 1 ) = h u ( l ) + α u , L N ( l + 1 ) h N u , L N ( l + 1 ) + α u , S N ( l + 1 ) h N u , S N ( l + 1 ) \begin{aligned} \mathbf{h}_{N_{u}, L N}^{(l+1)} &=\sum_{v \in N_{u}} \frac{1}{\left|N_{u}\right|} \mathbf{h}_{v}^{(l)} \\ \mathbf{h}_{N_{u}, S N}^{(l+1)} &=\sum_{v \in N_{u}} \frac{1}{\sqrt{\left|N_{u}\right|} \sqrt{\left|N_{v}\right|}} \mathbf{h}_{v}^{(l)} \\ \mathbf{h}_{u}^{(l+1)} &=\mathbf{h}_{u}^{(l)}+\alpha_{u, L N}^{(l+1)} \mathbf{h}_{N_{u}, L N}^{(l+1)}+\alpha_{u, S N}^{(l+1)} \mathbf{h}_{N_{u}, S N}^{(l+1)} \end{aligned} hNu​,LN(l+1)​hNu​,SN(l+1)​hu(l+1)​​=v∈Nu​∑​∣Nu​∣1​hv(l)​=v∈Nu​∑​∣Nu​∣

​∣Nv​∣

​1​hv(l)​=hu(l)​+αu,LN(l+1)​hNu​,LN(l+1)​+αu,SN(l+1)​hNu​,SN(l+1)​​

怎麼算 Attention score?作者設計了一個Attention Layer:

z u , ∗ ( l + 1 ) = W 1 ( l ) σ ( W 2 ( l ) ( h N u , ∗ ( l + 1 ) + h N u , ∗ ( l + 1 ) ⊙ h u ( l ) ) ) z_{u, *}^{(l+1)}=\mathrm{W}_{1}^{(l)} \sigma\left(\mathrm{W}_{2}^{(l)}\left(\mathrm{h}_{N_{u}, *}^{(l+1)}+\mathrm{h}_{N_{u}, *}^{(l+1)} \odot \mathrm{h}_{u}^{(l)}\right)\right) zu,∗(l+1)​=W1(l)​σ(W2(l)​(hNu​,∗(l+1)​+hNu​,∗(l+1)​⊙hu(l)​))

結果實驗上訓不起來無法收斂,換了一個簡單的:

z u , ∗ ( l + 1 ) = ave ⁡ ( h N u , ∗ ( l + 1 ) + h N u , ∗ ( l + 1 ) ⊙ h u ( l ) ) z_{u, *}^{(l+1)}=\operatorname{ave}\left(\mathbf{h}_{N_{u}, *}^{(l+1)}+\mathbf{h}_{N_{u}, *}^{(l+1)} \odot \mathbf{h}_{u}^{(l)}\right) zu,∗(l+1)​=ave(hNu​,∗(l+1)​+hNu​,∗(l+1)​⊙hu(l)​)

α u , ∗ ( l ) = exp ⁡ ( z u , ∗ ( l ) ) ∑ k ∈ { L N , S N } exp ⁡ ( z u , k ( l ) ) \alpha_{u, *}^{(l)}=\frac{\exp \left(z_{u, *}^{(l)}\right)}{\sum_{k \in\{L N, S N\}} \exp \left(z_{u, k}^{(l)}\right)} αu,∗(l)​=∑k∈{LN,SN}​exp(zu,k(l)​)exp(zu,∗(l)​)​

Loss就是BPR loss,負采樣比例1

實驗結果

【GCN-RS】Deep GCN with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD‘21)Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)

提出的模型可以訓到八層:

【GCN-RS】Deep GCN with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD‘21)Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)

從Recall上看left normalization 和 symmetric normalization結合起來性能最好:

【GCN-RS】Deep GCN with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD‘21)Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)

文章從實驗結果分析,對degree低的user提升更大:

【GCN-RS】Deep GCN with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD‘21)Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and Diverse Recommendation (DLP-KDD’21)

我覺得是正常現象強行分析,因為degree低的user recall本來就低,基數小,是以顯得提升大。

繼續閱讀