laitimes

DeepMind combines category theory and abstract algebra to discover the connection between GNN and DP

Reports from the Heart of the Machine

Machine Heart Editorial Department

How should the relationship between graph neural networks (GNNs) and dynamic programming (DP) be described? DeepMind's researchers derived a general integral transformation graph that demonstrated that there is an intricate connection between GNNs and DP, far beyond the initial observations of individual algorithms such as Bellman-Ford.

In recent years, advances in neural algorithmic reasoning based on graph neural networks (GNNs) have benefited from the concept of algorithmic alignment. Broadly speaking, if the components of the neural network align well with the target algorithm, the neural network will be better able to learn to perform the inference task (in terms of sample complexity). Specifically, GNNs are considered consistent with dynamic programming (DP), a general problem-solving strategy that expresses polynomial-time algorithms. However, is this alignment really proven and theoretically quantified?

The researchers from DeepMind used category theory and abstract algebra to show that there is an intricate connection between GNN and DP that goes far beyond the initial observations of a single algorithm such as Bellman-Ford. Knowing this connection makes it easy to verify some of the previous findings in the literature, and DeepMind hopes it will be the basis for building more powerful algorithmic alignment GNNs.

Address of the paper: https://arxiv.org/pdf/2203.15544.pdf

In summary, the study describes the use of category theory and abstract algebra to explicitly extend GNN-DP connections, which were previously only available on specific examples. The study derives a general-purpose integral transformation graph (based on standard category concepts such as pullback, push-forward, and swap half-groups) and discusses why it is versatile enough to support both GNN and DP computations. With this graph, we can immediately unify a lot of previous work and simply operate an arrow in the integral transformation.

Graph neural networks are dynamic planners

Difficulties in connecting GNN to DP

Establishing a strict correspondence between neural networks and DPs is difficult because their computations vary greatly. Neural networks are built on real linear algebras, and DP is often a generalization of path-finding problems, which usually occurs on objects such as (N∪{∞},min, +), which in mathematics are usually classified as degradations of Euclidean space. GNN and DP cannot be connected directly using simple equations.

However, if we define an arbitrary potential space R and make as few assumptions as possible, we can observe that for GNNs and DP, many of the behaviors we care about are produced by looking at the function S R, where S is a finite set. Under GNN, R can be thought of as a set of real-valued vectors; under DP, R can be thought of as a tropical numbers. So DeepMind's main research object is finite set categories and the quantification of R values. Categories here refer to the concept of a collection of objects (all finite sets) and composable arrows (functions between finite sets).

In order to draw a GNN-DP connection, you first need to design an abstract object that captures the message passing/aggregation phase of the GNN (Equation 1) and the scoring/reorganization phase of the DP (Equation 2).

DeepMind will build the integral transformation by combining transformations of the input features, which will be minimally dependent on the specific choice of R. In doing so, DeepMind built a computational graph that would apply to GNNs and DP (and their own R of choice), a way that allowed the study to focus on aligning the components of these graphs as much as possible.

Integral transformation, pullback and push forward

The general case of an integral transformation is a graph called span, as follows:

Here V, E, W are any finite sets, and s, t are arbitrary functions. Note that when W = V, this is exactly the data required for a directed multigraph structure (V, E), where s(e) is interpreted as the source of the edge and t(e) is the target of an edge.

Two operations need to be described here, one is pullback s^* along s and one is pushforward along t t_. The result of performing a pullback on data defined on V is the data defined on E. Then the pushforward should get the data defined on E and give the data defined on W, although we'll see that it actually gives a packet that needs to be aggregated. Conveniently, this process will naturally align with the aggregation step of GNNs and the reorganization step of DP.

The data contains the function f : V R, which makes it easy to define the pullback: s ^ f := f s (map the edge to its sending node and then look for the feature in f).

However, pushing forward is problematic because t faces the wrong direction when using function combinations. In order to get a correct arrow, a preimage t^-1 : W P(E) is required, which takes the value of the power set of E. The primitive image is defined as t^-1 (w) = .

All that is missing to complete the conversion is an aggregator. As we'll see in the next section, specifying a well-behaved aggregator is the same as applying an interchangeable monoid structure on R.

The four arrows of the integral transformation

As mentioned earlier, the directed graph G = (V, E) is equivalent to span: where s and t are the sending and receiving nodes for each edge, respectively.

Starting with some input features f : V R, the integral transformation of f can now be described using all the objects described earlier. Initially, the study focused on messages sent along edge e ∈ E depended only on the sender and not the receiver node.

The study first defines a kernel transformation k:[E, R][E, R], which performs some computations on edge features, such as in the case of Bellman-Ford, adding the sender distance d_s(e) (previously pulled back to the edge by s^) to the edge weight w_s(e)t(e). Here DeepMind uses [E,R] as a shorthand symbol for the function set E R. Using the kernel, we can complete the following diagram:

These four arrows make up the overall transformation: starting with the node feature, performing calculations on the edge, and finally aggregating the edge message in the sink. During this process, DeepMind updates the initial node characteristics (they are elements of [V, R]).

s^ is the pullback arrow previously defined, as mentioned earlier, DeepMind uses source functions to precombine node features. That is, (s^*f)(v) := f(s(v)). The kernel is then applied to the resulting edge feature, integrating the sender's feature with any provided edge features, such as edge weights.

After applying the kernel, you will get the edge message m: E R as a result. These messages now need to be sent to the receiving node, for which DeepMind uses pushforwards. As mentioned earlier, they define and interpret them as sums in form. Intuitively, (t_ m)(v) is the incoming value packet at v.

Finally, DeepMind applies the previously defined aggregator to each sink point by point to calculate the updated characteristics of each node.

The integral transformation of the above design describes both GNN (Equation 1) and DP (Equation 2), which allows the GNN architecture to be algorithmically aligned with the target DP algorithm. Perhaps the clearest is the last arrow, the aggregator (). If we have the aggregate function chosen by the GNN match the function used by the target algorithm, this should immediately improve the sample complexity and generalization capabilities. In fact, this does a lot with one of the earliest research lines in algorithmic reasoning: the deployment of aggregators that align GNNs with problems. Any investment in analyzing GNNs and DPs in this way can pay off handsomely when organizing existing research and proposing future work.

For more information, please refer to the original text of the paper.

Read on