天天看點

【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS

目錄

  • PROBLEM
    • 1 lossy session encoding
    • 2 ineffective long-range dependency capturing
  • SOLUTION
    • Converting Sessions to Graphs
      • S2MG that converts sessions to EOP multigraphs
      • S2SG that converts sessions to shortcut graphs
      • Framework
  • PRELIMINARIES
    • LESSR
      • Edge-Order Preserving Aggregation (EOPA) Layer
      • Shortcut Graph Attention (SGAT) Layer
      • Stacking Multiple Layers
      • Generating Session Embedding
      • Prediction and Training
  • EXPERIMENTS
    • Performance Comparisons
    • Ablation Studies

PROBLEM

本文指出了基于GNN的SBR中存在兩個資訊損失問題.

資訊損失問題 原因 解決方法
lossy session encoding

a.lossy encoding from sessions to graphs

b.permutation-invariant aggregation during message passing

an edge-order preserving aggregation layer based on GRU
ineffective long-range dependency capturing limited number of layers a shortcut graph attention layer

1 lossy session encoding

如圖一,兩個不同的session生成相同的graph

【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS

2 ineffective long-range dependency capturing

GNN models每層僅捕捉1-hop relation,考慮到過拟合以及過平滑問題,GNN models最佳層數一般不超過3,是以最多隻能捕捉3-hop relations.

SOLUTION

l o s s l e s s   e n c o d i n g   s c h e m e { e d g e − o r d e r   p r e s e r v i n g   m u l t i g r a p h s h o r t c u t   g r a p h \mathrm{lossless \ encoding \ scheme} \begin{cases} \mathrm{edge-order \ preserving \ multigraph} \\ \mathrm{shortcut \ graph} \\ \end{cases} lossless encoding scheme{edge−order preserving multigraphshortcut graph​

Converting Sessions to Graphs

S2MG that converts sessions to EOP multigraphs

【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS

如圖三b,所謂保留邊序,即保留某點 v v v的入邊順序,依據 s e s s i o n [ v 1 , v 2 , v 3 , v 3 , v 2 , v 2 , v 4 ] \mathrm{session}[v_1,v_2,v_3,v_3,v_2,v_2,v_4] session[v1​,v2​,v3​,v3​,v2​,v2​,v4​]針對 v 2 v_2 v2​,第一條入邊為 v 1 → v 2 v_1 \rightarrow v_2 v1​→v2​,第二條入邊為 v 3 → v 2 v_3 \rightarrow v_2 v3​→v2​,第三條入邊為 v 2 → v 2 v_2 \rightarrow v_2 v2​→v2​.

S2SG that converts sessions to shortcut graphs

如圖三c,所謂捷徑,即對于session内任意 v a , v b   a n d   a < b v_a,v_b \ and \ a<b va​,vb​ and a<b,有 v a → v b v_a \rightarrow v_b va​→vb​.

Framework

【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS

PRELIMINARIES

給定一個圖 G = ( V , E ) G=(V,E) G=(V,E), V V V是點集、 E E E是邊集. 在GNN的每層,在邊上傳播資訊以更新結點表示:

x i ( l + 1 ) = f u p d ( l ) ( x i ( l ) , agg i ( l ) ) agg i l = f a g g l ( { f m s g ( l ) ( x i ( l ) , x j ( l ) ) : ( j , i ) ∈ E i n ( i ) } ) (1,2) \begin{aligned} \bm{x}_i^{(l+1)}&=f_{upd}^{(l)}(\bm{x}_i^{(l)},\text{agg}_i^{(l)}) \\ \text{agg}_i^{l}&=f_{agg}^{l}(\{f_{msg}^{(l)}(\bm{x}_i^{(l)},\bm{x}_j^{(l)}): \small(j,i) \in E_{in}(i) \normalsize \}) \end{aligned} \tag{1,2} xi(l+1)​aggil​​=fupd(l)​(xi(l)​,aggi(l)​)=faggl​({fmsg(l)​(xi(l)​,xj(l)​):(j,i)∈Ein​(i)})​(1,2)

x i ( l ) \bm{x}_i^{(l)} xi(l)​是第 l l l層結點 i i i的表示, E i n ( i ) E_{in}(i) Ein​(i)是結點 i i i的入邊集,資訊函數 f m s g ( l ) f_{msg}^{(l)} fmsg(l)​計算從鄰居結點到目标結點要傳播的資訊,聚合函數 f a g g ( l ) f_{agg}^{(l)} fagg(l)​聚合傳給目标結點的資訊,更新函數 f u p d ( l ) f_{upd}^{(l)} fupd(l)​依據原始結點表示和聚合資訊計算新的結點表示.

L L L表示GNN的層數,在 L L L層中的 L L L步資訊傳播後,最終的結點表示捕捉圖結構資訊和L-hop關系的資訊,readout function f o u t f_{out} fout​通過聚合最後一層所有結點的表示生成圖級别的表示 h G \bm{h}_G hG​:

h G = f o u t ( { x i ( L ) : i ∈ V } ) (3) \bm{h}_G = f_{out}( \{\bm{x}_i^{(L)}:i \in V \} ) \tag{3} hG​=fout​({xi(L)​:i∈V})(3)

LESSR

Edge-Order Preserving Aggregation (EOPA) Layer

使用GRU聚合鄰居結點資訊, OE i n ( i ) = [ ( j 1 , i ) , ( j 2 , i ) , ⋯   , ( j d i , i ) ] \text{OE}_{in}(i)=[(j_1,i),(j_2,i),\cdots,(j_{d_{i}},i)] OEin​(i)=[(j1​,i),(j2​,i),⋯,(jdi​​,i)]是邊集 E i n ( i ) E_{in}(i) Ein​(i)的有序清單, d i d_{i} di​是結點 i i i的入度,從鄰居結點聚合資訊如下:

agg i ( l ) = h d i ( l ) h k ( l ) = GRU ( l ) ( f m s g ( l ) ( x i ( l ) , x j k ( l ) ) , h k − 1 ( l ) ) (4,5) \begin{aligned} \text{agg}_i^{(l)} &= \bm{h}_{d_i}^{(l)} \\ \bm{h}_k^{(l)} &= \text{GRU}^{(l)}(f_{msg}^{(l)}(\bm{x}_i^{(l)},\bm{x}_{j_k}^{(l)}), \bm{h}_{k-1}^{(l)}) \end{aligned} \tag{4,5} aggi(l)​hk(l)​​=hdi​(l)​=GRU(l)(fmsg(l)​(xi(l)​,xjk​(l)​),hk−1(l)​)​(4,5)

{ h k ( l ) : 0 ≤ k ≤ d i } \{\bm{h}_k^{(l)}:0 \le k \le d_i \} {hk(l)​:0≤k≤di​}是GRU的隐藏狀态, f m s g ( l ) f_{msg}^{(l)} fmsg(l)​可以是任意能夠計算從結點 j k j_k jk​到結點 i i i傳播資訊的資訊函數, h 0 ( l ) \bm{h}_0^{(l)} h0(l)​被設定為零向量.

對資訊和更新函數簡單使用線性變換,最終EOPA層定義如下:

x i ( l + 1 ) = W u p d ( l ) ( x i ( l ) ∥ h d i ( l ) ) h k ( l ) = GRU ( l ) ( W m s g ( l ) x j k ( l ) , h k − 1 ( l ) ) (6,7) \begin{aligned} \bm{x}_i^{(l+1)}&=\bm{W}_{upd}^{(l)}(\bm{x}_i^{(l)} \Vert \bm{h}_{d_{i}}^{(l)}) \\ \bm{h}_k^{(l)}&=\text{GRU}^{(l)}(\bm{W}_{msg}^{(l)}\bm{x}_{j_{k}}^{(l)},\bm{h}_{k-1}^{(l)}) \end{aligned} \tag{6,7} xi(l+1)​hk(l)​​=Wupd(l)​(xi(l)​∥hdi​(l)​)=GRU(l)(Wmsg(l)​xjk​(l)​,hk−1(l)​)​(6,7)

W u p d ( l ) ∈ R d × 2 d , W m s g ( l ) ∈ R d × d \bm{W}_{upd}^{(l)} \in \mathbb{R}^{d \times 2d},\bm{W}_{msg}^{(l)} \in \mathbb{R}^{d \times d} Wupd(l)​∈Rd×2d,Wmsg(l)​∈Rd×d是可學習的參數.

Shortcut Graph Attention (SGAT) Layer

x i l + 1 = ∑ ( j , i ) ∈ E i n ( i ) α i j ( l ) W v a l ( l ) x j ( l ) α i ( l ) = softmax ( e i ( l ) ) e i j ( l ) = ( p ( l ) ) T σ ( W k e y ( l ) x i ( l ) + W q r y ( l ) x j ( l ) + b ( l ) ) (8,9,10) \begin{aligned} \bm{x}_i^{l+1}&=\sum_{(j,i) \in E_{in}(i)}\alpha_{ij}^{(l)}\bm{W}_{val}^{(l)}\bm{x}_j^{(l)} \\ \bm{\alpha}_i^{(l)}&=\text{softmax}(\bm{e}_i^{(l)}) \\ e_{ij}^{(l)}&=(\bm{p}^{(l)})^T\sigma(\bm{W}_{key}^{(l)}\bm{x}_i^{(l)}+\bm{W}_{qry}^{(l)}\bm{x}_j^{(l)}+\bm{b}^{(l)}) \end{aligned} \tag{8,9,10} xil+1​αi(l)​eij(l)​​=(j,i)∈Ein​(i)∑​αij(l)​Wval(l)​xj(l)​=softmax(ei(l)​)=(p(l))Tσ(Wkey(l)​xi(l)​+Wqry(l)​xj(l)​+b(l))​(8,9,10)

E i n ( i ) E_{in}(i) Ein​(i)是結點 i i i在shortcut graph裡的入邊集, p ( l ) , b ( l ) ∈ R d  and  W k e y ( l ) , W q r y ( l ) , W v a l ( l ) ∈ R d × d \bm{p}^{(l)},\bm{b}^{(l)} \in \mathbb{R}^d \ \text{and} \ \bm{W}_{key}^{(l)},\bm{W}_{qry}^{(l)},\bm{W}_{val}^{(l)} \in \mathbb{R}^{d \times d} p(l),b(l)∈Rd and Wkey(l)​,Wqry(l)​,Wval(l)​∈Rd×d是可學習的參數.

Stacking Multiple Layers

交叉使用EOPA層和SGAT層,優點如下:

a.通過交叉EOPA層和SGAT層,在随後的EOPA層損失的資訊得以保留,SGAT層隻需關注捕捉長期依賴性.

b.每種層可以高效地利用從另一種層捕捉到的特征.

為了進一步便利特征的複用,引入dense connections,每一層的輸入由之前所有層的輸出特征組成.

Generating Session Embedding

x l a s t ( L ) \bm{x}_{last}^{(L)} xlast(L)​表示session最後一個item,圖級别的表示 h G \bm{h}_G hG​定義如下:

h G = ∑ i ∈ V β i x i ( L ) β = softmax ( ϵ ) ϵ i = q T σ ( W 1 x i ( L ) + W 2 x l a s t ( L ) + r ) (11,12,13) \begin{aligned} \bm{h}_G&=\sum_{i \in V}\beta_i\bm{x}_i^{(L)} \\ \bm{\beta}&=\text{softmax}(\bm{\epsilon}) \\ \epsilon_i&=\bm{q}^T\sigma(\bm{W}_1\bm{x}_i^{(L)}+\bm{W}_2\bm{x}_{last}^{(L)}+\bm{r}) \end{aligned} \tag{11,12,13} hG​βϵi​​=i∈V∑​βi​xi(L)​=softmax(ϵ)=qTσ(W1​xi(L)​+W2​xlast(L)​+r)​(11,12,13)

q , r ∈ R d  and  W 1 , W 2 ∈ R d × d \bm{q},\bm{r} \in \mathbb{R}^d \ \text{and} \ \bm{W}_1,\bm{W}_2 \in \mathbb{R}^{d \times d} q,r∈Rd and W1​,W2​∈Rd×d是可學習的參數.

目前session的global preferences s g = h G \bm{s}_g=\bm{h}_G sg​=hG​,local preference s l = x l a s t ( L ) \bm{s}_l=\bm{x}_{last}^{(L)} sl​=xlast(L)​,則最終的session embedding如下:

s h = W h ( s g ∥ s l ) (14) \bm{s}_h=\bm{W}_h(\bm{s}_g \Vert \bm{s}_l) \tag{14} sh​=Wh​(sg​∥sl​)(14)

W h ∈ R d × 2 d \bm{W}_h \in \mathbb{R}^{d \times 2d} Wh​∈Rd×2d是可學習的參數.

Prediction and Training

對于每一item i ∈ I i \in I i∈I,使用其embedding v i \bm{v}_i vi​和session embedding計算得分, y ^ \hat{y} y^​表示下一item為 i i i的可能性,計算如下:

z i = s h T v i y ^ i = exp ( z i ) ∑ j ∈ I exp ( z j ) (15,16) \begin{aligned} z_i&=\bm{s}_h^T\bm{v}_i \\ \hat{y}_i&=\frac{\text{exp}(z_i)}{\sum_{j \in I}\text{exp}(z_j)} \tag{15,16} \end{aligned} zi​y^​i​​=shT​vi​=∑j∈I​exp(zj​)exp(zi​)​​(15,16)

使用交叉熵損失函數預測:

L ( y , y ^ ) = − y T log y ^ \mathcal{L}(\bm{y},\bm{\hat{y}})=-\bm{y}^T\text{log}\bm{\hat{y}} L(y,y^​)=−yTlogy^​

EXPERIMENTS

Performance Comparisons

【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS

Ablation Studies

【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS
【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS
【論文筆記】KDD2020 Handling Information Loss of Graph Neural Networks for Session-based RecommendationPROBLEMSOLUTIONPRELIMINARIESEXPERIMENTS

繼續閱讀