GraphSage 算法原理介紹與源碼淺析

文章目錄

GraphSage 算法原理介紹與源碼淺析

前言
廣而告之
文章資訊
核心觀點
核心觀點解讀
源碼分析

鄰居采樣

采樣方法

鄰居聚合

聚合方法

MeanAggregator
GCNAggregator
MaxPoolingAggregator

總結

前言

最近在做 Graph 相關的工作, 兩年前做過一段時間, 想不到兜兜轉轉又回到最初的起點~???????????? 工作繼續穩步推進, 同時打算複習下基礎算法. 論文也忒多了, 一段時間沒看, 已經跟不上了 ????????????

這裡插句題外話, 之前我寫的一些部落格, 代碼分析的太過細節了, 我自己平時翻看的時候, 都會直接将瑣碎的東西給略過. 從這一行為可以看出, 之前部落格中記錄了太多備援的内容, 不僅在記錄時浪費了時間, 更給後續查閱帶來了一些阻礙. 鑒于此, 以後做代碼分析打算盡力隻分析源碼的核心部分, 再加上部分感興趣的内容.

廣而告之

可以在微信中搜尋 “珍妮的算法之路” 或者 “world4458” 關注我的微信公衆号；另外可以看看知乎專欄 PoorMemory-機器學習, 以後文章也會發在知乎專欄中；

文章資訊

論文标題: Inductive Representation Learning on Large Graphs
論文位址:https://arxiv.org/pdf/1706.02216.pdf
代碼位址:https://github.com/williamleif/GraphSAGE
發表時間: NIPS, 2017
論文作者: William L. Hamilton, Rex Ying, Jure Leskovec
作者機關: Stanford

補充: 在全面了解 PinSage 文章中詳細介紹了 PinSage 算法, GraphSage 算法是 PinSage 的理論基礎, 而 PinSage 包含了很多工程上的實踐經驗, 兩者可以結合起來看看.

核心觀點

GraphSage (Graph SAmple and aggreGatE) 屬于 Inductive learning 算法, 它學習一種聚合函數, 通過聚合節點鄰居的特征資訊來學習目标節點本身的 embedding 表達. 從它的名字中可以看出算法的核心步驟分别是鄰居采樣以及特征聚合; GraphSage 就是我們通常意義的機器學習任務, 對于未知的節點具有泛化能力, 它和 Transductive Learning 算法 (如 GCN, DeepWalk, 在固定的圖結構上學習節點的 embedding) 不同的是, Transductive Learning 算法在圖中加入新節點後, 需要将模型重新訓練.

核心觀點解讀

在介紹具體的算法之前, 先簡要對比一下 Inductive learning 與 Transductive learning. 關于它們的詳細介紹推薦閱讀文章 Inductive vs. Transductive Learning.

其中:

Inductive learning is the same as what we commonly know as traditional supervised learning. We build and train a machine learning model based on a labelled training dataset we already have. Then we use this trained model to predict the labels of a testing dataset which we have never encountered before.

In contrast to inductive learning, transductive learning techniques have observed all the data beforehand, both the training and testing datasets. We learn from the already observed training dataset and then predict the labels of the testing dataset. Even though we do not know the labels of the testing datasets, we can make use of the patterns and additional information present in this data during the learning process.

The main difference is that during transductive learning, you have already encountered both the training and testing datasets when training the model. However, inductive learning encounters only the training data when training the model and applies the learned model on a dataset which it has never seen before.

Transduction does not build a predictive model. If a new data point is added to the testing dataset, then we will have to re-run the algorithm from the beginning, train the model and then use it to predict the labels. On the other hand, inductive learning builds a predictive model. When you encounter new data points, there is no need to re-run the algorithm from the beginning.

(Inductive Learning 被翻譯為歸納式學習, Transductive Leanring 為直推式學習. 說實話, 這兩個翻譯把我整迷糊了, 從來沒有記住過, 但是上面的英文釋義卻非常好記, 不容易忘????????????)

GraphSage 屬于 Inductive learning 算法, 它學習一種聚合函數, 通過聚合節點鄰居的特征資訊來學習目标節點本身的 embedding 表達. 它的主要步驟就記錄在它的名字中: Sample 與 Aggregate. 其中 Sample 階段通過随機采樣擷取多跳鄰居; Aggregate 階段聚合鄰居節點特征生成目标節點自身的 embedding. 以聚合 2 跳鄰居為例, 它将首先聚合 2 跳鄰居的特征生成 1 跳鄰居的 embedding, 之後再聚合 1 跳鄰居的 embedding 來生成節點本身的 embedding. 由于生成 1 跳鄰居 embedding 時, 已經包含了 2 跳鄰居的特征資訊, 此時目标節點也将獲得 2 跳鄰居的特征資訊. 論文中的圖示形象地展示了這一過程:

生成完目标節點的 embedding 後, 可以提供給下遊的機器學習系統做諸如節點分類的預估任務.

GraphSage 的前向傳播算法如下圖:

第一個 for 循環針對層數進行周遊, 第二個 for 循環用于周遊 Graph 中的所有節點, 針對每個節點 , 對鄰居進行采樣得到 , 并通過對鄰居節點的 embedding 進行聚合, 得到 , 再将它與目标節點目前的 embedding 進行拼接, 經過非線性變換後賦給 , 進而完成目标節點的一次更新. 當外層的 for 循環 () 周遊結束時, 節點将完成

在具體代碼實作時, 實際上采用的是 minibatch 的形式, 論文 Appendix A 進行了介紹, 待會在源碼分析中也将進行描述.

源碼分析

本次分析的代碼位于 https://github.com/williamleif/GraphSAGE, 是官方開源的 TensorFlow 版本.

GraphSage 的核心在于 Sample 和 Aggregate. 由于訓練模型時, 我們一般采用 minibatch 的方式進行訓練, 是以在論文的 Appendix A 中, 還給出了一份 minibatch 版本的僞代碼, 如下:

其中代碼 1 ~ 7 行表示對鄰居進行采樣, 而 8 ~ 15 行表示鄰居聚合.

在 GraphSage 的代碼中, 鄰居采樣以及聚合代碼均位于 https://github.com/williamleif/GraphSAGE/blob/master/graphsage/models.py 檔案中, 在進行介紹之前, 需要解釋一個會令人困惑的點. 作者對于 Graph 中每層節點的采樣個數設定如下:

flags.DEFINE_integer('samples_1', 25, 'number of samples in layer 1')
flags.DEFINE_integer('samples_2', 10, 'number of samples in layer 2')

實際上表達的含義如下圖:

注意圖中的

layer 1

層采樣的節點數為 10, 而

layer 2

層采樣的節點數為

, 剛好和代碼中的定義相反. 關于這一點作者在 Appendix A 中介紹僞代碼的下方進行了說明, 而且注意到上面僞代碼的第一行, 令 , 一開始就将目标節點指派給 , 采樣的時候是按的順序進行周遊 (僞代碼第 2 行), 而聚合時則是按的順序進行周遊 (僞代碼第 9 行).

看源碼時如果不注意這一點, 容易有些困惑. 為了友善介紹, 後續我就拿具體的數字, 比如 10, 25 之類的來說明代碼含義, 這樣可以快速判斷目前在 Graph 中的第幾層.

鄰居采樣

GraphSage 鄰居采樣代碼定義如下:

def sample(self, inputs, layer_infos, batch_size=None):
    """ Sample neighbors to be the supportive fields for multi-layer convolutions.

    Args:
        inputs: batch inputs
        batch_size: the number of inputs (different for batch inputs and negative samples).
    """
    
    if batch_size is None:
        batch_size = self.batch_size
    samples = [inputs]
    # size of convolution support at each layer per node
    support_size = 1
    support_sizes = [support_size]
    for k in range(len(layer_infos)):
        t = len(layer_infos) - k - 1
        support_size *= layer_infos[t].num_samples
        sampler = layer_infos[t].neigh_sampler
        node = sampler((samples[k], layer_infos[t].num_samples))
        samples.append(tf.reshape(node, [support_size * batch_size,]))
        support_sizes.append(support_size)
    return samples,

閱讀時不需要太在意實作細節 (比如與

inputs : 大小為 [B,] 的 Tensor, 表示目标節點的 ID;
layer_infos : 假設 Graph 深度為, 那麼 layer_infos 的大小為, 儲存 Graph 中每一層的相關資訊, 比如采樣的鄰居數 num_samples , 采樣方法 neigh_sampler 等.

由于從目标節點開始采樣, 采樣結束後:

samples 儲存 3 個 Tensor, 大小為: [Tensor(B*1,), Tensor(B*10,), Tensor(B*250,)] , 表示 Graph 中每一層的節點 id
support_sizes 為 [1, 10, 250] , 表示對每一個目标節點, 它在 Graph 中每一層的鄰居個數.

采樣方法

sampler

定義在: https://github.com/williamleif/GraphSAGE/blob/master/graphsage/neigh_samplers.py, 代碼如下:

class UniformNeighborSampler(Layer):
    """
    Uniformly samples neighbors.
    Assumes that adj lists are padded with random re-sampling
    """
    def __init__(self, adj_info, **kwargs):
        super(UniformNeighborSampler, self).__init__(**kwargs)
        ## 假設 Graph 中的節點個數為 N, 設定 Graph 的最大出度為 max_degreee, 
        ## 那麼 adj_info 的大小為 [N + 1, max_degree],
        ## 它為鄰接矩陣, 記錄每個節點的鄰居對應的 node_id 
        self.adj_info = adj_info

    def _call(self, inputs):
        ## 設 ids 大小為 [B,]
        ids, num_samples = inputs
        ## 擷取 ids 對應的鄰居節點, adj_lists 為 [B, max_degree]
        adj_lists = tf.nn.embedding_lookup(self.adj_info, ids) 
        ## 對鄰居進行采樣, 由于 tf.random_shuffle 是在 axis=0 的次元
        ## 上進行 shuffle, 是以先對 adj_lists 做個 transpose 的操作, 
        ## shuffle 結束後再變換回來. 最後用 tf.slice 選出 num_samples 個鄰居
        adj_lists = tf.transpose(tf.random_shuffle(tf.transpose(adj_lists)))
        adj_lists = tf.slice(adj_lists, [0,0], [-1, num_samples])
        return

對其進行調用時傳入

inputs

, 包含目标節點的

ids

以及采樣個數

num_samples

, 最後傳回大小為

[B, num_samples]

的 Tensor. 另外注意鄰接矩陣的生成也包括放回采樣以及不放回采樣, 具體見作者源碼, 這裡不過多介紹. (詳見 https://github.com/williamleif/GraphSAGE/blob/master/graphsage/minibatch.py#L76 中

construct_adj

函數)

鄰居聚合

鄰居聚合代碼位于: https://github.com/williamleif/GraphSAGE/blob/master/graphsage/models.py, 定義如下 (隻保留了核心的代碼):

def aggregate(self, samples, input_features, dims, num_samples, support_sizes, batch_size=None,
            aggregators=None, name=None, concat=False, model_size="small"):

    ## hidden 為 [Tensor(B, 1, E), Tensor(B, 10, E), Tensor(B, 250, E)]
    hidden = [tf.nn.embedding_lookup(input_features, node_samples) for node_samples in samples]

    for layer in range(len(num_samples)):
        ## ...... 
        # hidden representation at current layer for all support nodes that are various hops away
        next_hidden = []
        # as layer increases, the number of support nodes needed decreases
        for hop in range(len(num_samples) - layer):
            dim_mult = 2 if concat and (layer != 0) else 1
            neigh_dims = [batch_size * support_sizes[hop], 
                          num_samples[len(num_samples) - hop - 1], 
                          dim_mult*dims[layer]]
            h = aggregator((hidden[hop],
                            tf.reshape(hidden[hop + 1], neigh_dims)))
            next_hidden.append(h)
        hidden = next_hidden
    return hidden[0],

由于之前提到的, 作者在采樣的時候是按的順序進行周遊 (僞代碼第 2 行), 而聚合時則是按

聚合方法

關于聚合方法, 主要定義在: https://github.com/williamleif/GraphSAGE/blob/master/graphsage/aggregators.py

MeanAggregator

定義如下:

class MeanAggregator(Layer):
    """
    Aggregates via mean followed by matmul and non-linearity.
    """

    def __init__(self, input_dim, output_dim, neigh_input_dim=None,
            dropout=0., bias=False, act=tf.nn.relu, 
            name=None, concat=False, **kwargs):
        super(MeanAggregator, self).__init__(**kwargs)
    ## ......

    def _call(self, inputs):
        ## self_vecs: [B, E]
        ## neigh_vecs: [B, H, E], H 為鄰居節點的個數
        self_vecs, neigh_vecs = inputs

        neigh_vecs = tf.nn.dropout(neigh_vecs, 1-self.dropout)
        self_vecs = tf.nn.dropout(self_vecs, 1-self.dropout)
        neigh_means = tf.reduce_mean(neigh_vecs, axis=1)
       
        # [nodes] x [out_dim]
        from_neighs = tf.matmul(neigh_means, self.vars['neigh_weights'])

        from_self = tf.matmul(self_vecs, self.vars["self_weights"])
         
        if not self.concat:
            output = tf.add_n([from_self, from_neighs])
        else:
            output = tf.concat([from_self, from_neighs], axis=1)

        # bias
        if self.bias:
            output += self.vars['bias']
       
        return self.act(output)

對

neigh_vecs

鄰居節點的 embedding 進行 mean pooling 後, 再和目标節點本身的 embedding 進行相加或者拼接.

GCNAggregator

class GCNAggregator(Layer):
    """
    Aggregates via mean followed by matmul and non-linearity.
    Same matmul parameters are used self vector and neighbor vectors.
    """

    def __init__(self, input_dim, output_dim, neigh_input_dim=None,
            dropout=0., bias=False, act=tf.nn.relu, name=None, concat=False, **kwargs):
        super(GCNAggregator, self).__init__(**kwargs)

    ## ......

    def _call(self, inputs):
        ## self_vecs: [B, E]
        ## neigh_vecs: [B, H, E], H 為鄰居節點的個數
        self_vecs, neigh_vecs = inputs

        neigh_vecs = tf.nn.dropout(neigh_vecs, 1-self.dropout)
        self_vecs = tf.nn.dropout(self_vecs, 1-self.dropout)
        means = tf.reduce_mean(tf.concat([neigh_vecs, 
            tf.expand_dims(self_vecs, axis=1)], axis=1), axis=1)
       
        # [nodes] x [out_dim]
        output = tf.matmul(means, self.vars['weights'])

        # bias
        if self.bias:
            output += self.vars['bias']
       
        return self.act(output)

先使用

tf.expand_dims(self_vecs, axis=1)

展開成

[B, 1, E]

的形式, 再和

neigh_vecs

進行 concat, 最後整體求 mean;

MaxPoolingAggregator

class MaxPoolingAggregator(Layer):
    """ Aggregates via max-pooling over MLP functions.
    """
    def __init__(self, input_dim, output_dim, model_size="small", neigh_input_dim=None,
            dropout=0., bias=False, act=tf.nn.relu, name=None, concat=False, **kwargs):
        super(MaxPoolingAggregator, self).__init__(**kwargs)

    ## ............
    
    def _call(self, inputs):
        self_vecs, neigh_vecs = inputs
        neigh_h = neigh_vecs

        dims = tf.shape(neigh_h)
        batch_size = dims[0]
        num_neighbors = dims[1]
        # [nodes * sampled neighbors] x [hidden_dim]
        h_reshaped = tf.reshape(neigh_h, (batch_size * num_neighbors, self.neigh_input_dim))

        for l in self.mlp_layers:
            h_reshaped = l(h_reshaped)
        neigh_h = tf.reshape(h_reshaped, (batch_size, num_neighbors, self.hidden_dim))
        neigh_h = tf.reduce_max(neigh_h, axis=1)
        
        from_neighs = tf.matmul(neigh_h, self.vars['neigh_weights'])
        from_self = tf.matmul(self_vecs, self.vars["self_weights"])
        
        if not self.concat:
            output = tf.add_n([from_self, from_neighs])
        else:
            output = tf.concat([from_self, from_neighs], axis=1)

        # bias
        if self.bias:
            output += self.vars['bias']
       
        return self.act(output)

代碼中使用

neigh_h = tf.reduce_max(neigh_h, axis=1)

對鄰居 embedding 進行聚合.

GraphSage 算法原理介紹與源碼淺析