Use large language models to promote comprehensive graph learning ability

At present, the work related to large models is in full swing, and its excellent text context understanding ability and encoding and decoding ability have attracted much attention. This sharing will explore whether it is possible to directly use the powerful capabilities of large models to better solve problems in graph data and graph tasks.

The main content includes the following five aspects:

1. Why use large language models for graph learning

2. An overview of the current situation of graph learning by large language models

3. Large language models promote unified graph learning across domains and tasks

4. Potential research directions

5. Q&A session

Speaker|Jiang Zhuoren, researcher of the "Hundred Talents Program" of Zhejiang University

Edited and organized|Diao Zhen

Content proofreading|Li Yao

Produced by DataFun

Why use large language models for graph learning

The first thing to explore is why large language models can be used for graph learning, and its feasibility will be explored from two directions, the first is the ability of large language models, and the second is the characteristics of graph data itself.

1. The ability of large language models

Use large language models to promote comprehensive graph learning ability

From the early statistical language model, to the language model of the neural network, to the pre-trained language model, and now to the large language model, the model's ability, especially the ability to encode text, has been continuously enhanced, and its ability to handle downstream tasks has also been improved. In particular, large language models have shown breakthrough effects on some inference tasks, because the continuous expansion of the scale of language models and the rich contextual learning ability have made up for the lack of previous language models in reasoning to a certain extent.

Some researchers believe that the previous model of language models was to directly let the model output the results, ignoring the intermediate thought process, and that when humans solve mathematical problems or reasoning problems, they usually write the entire solution process step by step to arrive at the final answer. Therefore, it is easy to think that if the model first outputs the intermediate inference steps, and then gets the answer according to the generated steps, it can improve the inference performance of the model. In this regard, Google has proposed a hint that leverages chains of thought called Chain-of-Thought prompting. In addition to inputting the problem into the model, the model can also input some solution ideas and steps for similar problems into the model, so that the model not only inputs the final result, but also outputs some intermediate steps, which can improve the reasoning ability of the model.

Large language models have a lot of potential for this kind of problem, and can do a lot of interesting reasoning. We also wonder whether large language models have the ability to reason about graph tasks.

2. Characteristics of graph data

Next, from another perspective, that is, the characteristics of graph data itself, why large language models can be applied to graph learning.

We find that quite a lot of graph tasks and texts are closely related, for example, in a common academic scenario, scholars, papers, and conferences naturally exist in the form of graphs, and there will be various relationships between them.

In addition, there are some graph data, which can be combined with text information to better represent their properties or characteristics. For example, a molecular diagram is a natural diagram structure, and we can use some texts to express its properties, such as benzene ring is toxic, water molecule is non-toxic, these text representations are very helpful for us to understand the properties of graphs, or to do some prediction and reasoning tasks based on graphs. Therefore, the text inference ability of large language models can help us solve some graph-related tasks. In fact, we already have some work on using large language models to help with graph learning.

An overview of the current state of graph learning by large language models

Next, we will introduce the current situation of large language models for graph learning from two aspects, first look at different graph data application scenarios, and then introduce the different roles played by large language models in graph tasks.

1. Different graph data application scenarios

Graph data can be broadly divided into three categories: the first type is graphs without textual information, called Pure Graphs, such as traffic maps; The second type is that it has its own graphs, which can be put together with some related texts, called Text-Paired Graphs, such as protein molecular formulas; The third type is that nodes themselves contain rich textual information, called Text-Attribute Graphs, such as social media networks, academic graph networks, and so on. Let's take a look at how large language models can be used for these three types of graph data.

（1）Pure Graph

The first type of Pure Graph has no textual information, or no semantically rich textual information, such as traffic maps and power transmission maps, only topology information. The main tasks to be completed on this kind of graph are some discrete mathematical problems related to classical graphs, such as analyzing the connectivity of graphs, the shortest path between two nodes, Hamiltonian paths and other topological problems.

This kind of question can be transformed into a question and answer method. Through some text to express how many nodes there are in the graph, what the edges between the nodes are, how many edges are there, etc., you can ask the large model whether there is an edge between (1) and (4), and you can also ask the connectivity problem, or the shortest path problem. In an undirected graph, you can ask how many nodes there are in the graph, which edges there are, and what are the weights on the edges, so that the large language model can directly solve these problems on the graph.

One of the efforts at Advances in Neural Information Processing Systems 2024 is to use this strategy to directly solve problems on the Pure Graph. As can be seen from the comparison data in the above figure, the direct application of the large model still reflects a certain ability to understand the graph compared with Random guess, especially after combining some chief source or self-consistency, it can achieve good results on some tasks.

（2）Text-Paired Graph

For Text-Paired Graph, here is a work of ICML2023, chemical molecules are encoded in the form of graphs, text is used to do modal coding in the form of text, and contrast learning is used to train, and the ability to represent chemical formulas is enhanced through enhanced text information, and then some special tasks for chemical formulas are solved.

The above figure is some test data, green is the best case scenario, and in general, the text information added is a good enhancement to these tasks on the graph.

（3）Text-Attributed Graph

The third is a work from EACL 2024 for Text-Attributed Graphs, which are graphs with rich textual information. This work converts some structural information on the original graph into text input, which is given to the large model, and then the large model is allowed to do inference. The difference with Pure Graph is that first of all, its node is used as a token of the language model, so that the node itself can be encoded by using the structural information of the large language model plus text representation. Therefore, in the end, in addition to the output of the graph inference task, the framework can also have some node representations.

Experimental results show that on a graph dataset with rich text information, the method of encoding node features and then understanding the graph structure by using the large model will perform better than the traditional graph neural network model, which has real training on the parameters of the large model, so theoretically the large model perceives some knowledge of the graph itself.

For different graph data, language models are applied in different ways.

2. The different roles of large language models in graph tasks

大语言模型在图任务中可以扮演三种角色,分别是 Enhancer/Encoder、Predictor 和 Aligner,接下来将逐一介绍。

（1）LLM as Enhancer/Encoder

In LLM as Enhancer or Encoder, the large language model is used as an auxiliary role to support the GNN to complete the final graph inference, and the understanding of the topology of the graph itself is still done by the traditional graph neural network.

This method can be divided into two types, one is Explanation-based, which uses large language models and prompts to enhance the input text information to obtain an explanation, which is taken as an enhanced feature, and then encoded the text, and then fed into the graph structure, and then used the graph neural network to complete further reasoning. The second type is Embedding-based, which directly uses the large model to embedding the text attribute and feed it into the representation of the graph, so that the graph neural network can do the inference.

One of the jobs at ICLR 2024 is the quintessential explanation-based approach. There are titles, abstracts, predictions and explanations, which encapsulate the text attributes of each node, such as the title and summary of the node, in a custom prompt, to query the large language model (GPT3.5), generate predictions, predict which domain the article belongs to, and generate explanations, i.e., why the predictions are made. After having this richer text information, all the original text information, as well as prediction and interpretation, are input into the large language model for fine-tune, and then converted into a richer text node representation, and then put this text node representation into the GNN for better guessing. In this way, the graph neural network can achieve better inference results on the graph task.

This method proved to be effective.

The University of Michigan and Baidu, along with several other teams, published a joint effort in KDD as part of the Embedding-base approach. As you can see in the image above, yellow is the best, green is second, and pink is third. When using a large language model to solve graph tasks, it is found that the embedding of a large language model based on fine-tuning will perform poorly when the number of labels is not large enough.

There is also a case where the traditional pre-trained embedding combined with GNN is better in the case of high label ratio, that is, in the case of more labels. As you can see from the diagram above, the main excellent results are in the sentence base model. This is actually a very interesting phenomenon, the pre-trained texts do not add structural information, and in the task of requiring structural information for reasoning, it surpasses these language models that use graph structure information to fine-tune.

The implication of this finding is that simple fine-tune, such as letting a large language model look at the structure of a graph in the fine-tune phase, may not be the best approach. We need to find a better way to better integrate the structural information of the graph, which is a problem that is worth further studying and exploring.

（2）LLM as Predictor

The second type of role is the large language model as a predictor, i.e., predictor. As the name suggests, large language models will directly serve as the protagonists of graph inference tasks, rather than simply helping us generate better representations of text. There are also two ways here, one is Flatten-based, which is to use text to express a graph structure, input the text with structured information into the large language model, combine it with prompt, and finally carry out graph inference. The second is the GNN-based approach, which relies on GNN to represent the original graph and nodes at the beginning, and then inputs the representation of these graphs into the large model together with the text representation, so that the large model can do the inference.

Peking University, Microsoft, and some university teams will have a work called Gpt4graph in 2023, which is essentially to convert the structure of a graph into a text representation. It uses some structured labels and scripting languages to describe a graph, and feeds this directly to the large model, so that it can generate some contexts, combine this context with its own original input, and output the final result.

The experimental results show that different ways of describing the graph will have a significant impact on the results. For example, using the GML method to do size detection and detect the size of the image is better. On the task of calculating degree, using edge list to represent the graph will get better results. In addition, the use of role prompting can improve the performance of the model because it allows the large language model to pay more attention to specific relationships, or the special attributes and roles of some nodes, so that the model can extract more valuable information and make better predictions.

Another job is a 2023 effort by the HKU and Baidu teams called GraphGPT, which is essentially a GNN-based approach. Firstly, the GNN is used to encode the graph structure information, and the embedding of the encoded node is embedded into the text representation, which is fed to the large model for graph inference. In order for the text model to directly understand the embedding of the graph, it is necessary to align the embedding of the graph with the embedding of the text to obtain a better effect.

Experimental results demonstrate the effectiveness of the proposed method.

（3）LLM as Aligner

The third role, LLM as Aligner, is more complex than the first two, and when encoding the data of the text modality, it will use some additional text information as a supervision signal to promote the learning of the graph neural network itself, so that the structural information and text information can be interactively learned, or the neural network of text learning and the neural network of graph learning can be directly intersected, and GNN can also be used for knowledge distillation.

One of the mainstream methods is contrastive learning, which uses the correspondence between the representation of nodes and the representation of text to construct contrastive learning samples, and simultaneously trains graph neural networks and large language models to achieve alignment.

Another approach is distillation, where the University of Minnesota, Amazon UIUC, and CMU work by updating a shared text encoding function with a Teacher GNN and other text-agnostic information to capture textual information that is enhanced in the unlabeled text. This work is essentially to distill some topological information into the text representation architecture through the Teacher GNN, so that the learned enhanced text information contains a part of the topological information.

From the above introduction, it can be seen that the current research is mainly to design some specific problems for different types of graph data to solve some specific problems, such as the problem of molecular diagrams, or the problem of rich text on academic networks. In the future, we hope to make large language models into a unified framework and directly use them to achieve unified graph learning across domains and tasks.

Large language models facilitate unified graph learning across domains and tasks

First, let's take a look at what was done before the advent of large language models.

Take a work in KDD'20 as an example, which uses graph comparison encoding, uses a self-supervised graph neural network to train the architecture, and captures the characteristics of the common topology in the network, so as to achieve cross-dataset training. The limitation of this kind of work is that the node features in different domains cannot be aligned, so the feature representation of some nodes is often abandoned, and only the structural features are used as node features for initialization.

Our team also published a work related to graph pre-training at this year's AAAI, using text autoencoders and some motive auto discovery mechanisms to facilitate knowledge transfer between datasets. However, we must admit that this type of method does not directly achieve the transfer on the dataset in the cross-domain situation where the text features between the two domains cannot be aligned.

After the large language model was proposed, it provided us with a new way to try. This is a job in 2023 at the University of Washington and JD.com, as well as Peking University, called One for all. In this work, the input of the model is a graph containing rich textual information and a description of the task, which are embedded into a unified space through a large language model, which is transformed into a prompt graph with a unified task representation, and then fed this prompt graph to a single GNN for adaptive prediction of downstream tasks.

This work solves the problem of misalignment of the features of different domain graph data nodes in cross-domain tasks, converts all features into text, and then uses large language models to encode, so that no matter which domain it belongs to, it can be encoded in the form of text.

In order to realize the leap of tasks, the structure as shown in the above figure is designed, and the concept of task nodes, that is, node of interest, is introduced. This kind of node also uses text to represent the task, so that the task can also be converted into a text representation, which realizes the compatibility of different tasks, and transforms different tasks into the link prediction problem as shown in the figure.

This work has achieved relatively good results on graph data. At the same time, it was also observed that using all datasets for text pre-training can make knowledge transfer better.

The other is a job called UniGraph at the National University of Singapore. This work is based on One for all, pre-training large language models and GNNs, and proposing a variety of methods to further improve, such as few-shot, zero-shot, and so on. Its essence is to use large language models to encode textual features and realize cross-dataset training features. In terms of tasks, it is similar, One for all is to directly use task nodes, here is to define a task subgraph, which is essentially to generalize the task to the expressible range of the text, so as to realize the unity of node, edge, and graph-level tasks.

This works better than One for all because of the use of some prompt tricks. One of the possible reasons is that if there is indeed a large incompatibility between graph data, and there is some noise or antagonistic conflict, using too much data to do pre-training will reduce the inference ability of the large language model on graph data.

Finally, there is a job at Fuzhou University and Emory University this year. The biggest difference between this work and the previous two works is that the GNN is removed, and the large language model is directly trained to make graphs through instruction tuning. There are three clear steps, first describe the graph with text, generate instructions, and finally do tuning.

After instruction tuning based on the language model LLaMA-7B, the effect exceeds that of the traditional graph model. It should be noted that the datasets here are all Text-Attributed graphs, that is, graph data that contains a large amount of text-rich information. The effect of Pure Graph and Text-Paired Graph is unclear.

Potential research directions

The existing work, especially this year's work, has taken a lot of solid steps in the direction of large language models to promote cross-domain and cross-task unified graph learning, and has achieved some positive results. What other directions can be explored? The first thing we need to think about is whether the current way of fine-tune and prompt engineering really allows the large model to learn the topology information in the graph.

A work by the University of Michigan and UIUC used a large model to directly do the inference of graph tasks, and after a large number of experiments, it was found that for large models, it is enough to have a context, and it does not matter how the internal structure of the context is described or whether the representation in the graph is correct. When it comes to linear representation, the effect won't make a big difference. The so-called linear expression is to directly take the text information of the neighborhood as the input and let the large language model do the reasoning, and the other is a very standardized, very standard, and very rigorous representation of a graph.

In addition, if you do some random rearrangement, such as shuffling the structure, removing some second-order neighbors, or putting the order of first-order neighbors into a random path, the result is that the inference effect is not very different. So what information does a large language model rely on to do graph inference tasks under this paradigm? Isn't it as we imagined to understand the topology of the graph.

In addition, it is also observed that the prediction effect of the large language model will be better when the nodes of the neighborhood and the ground truth we want to predict are relatively homogeneous. Again, the same question, did the large language model learn the text, or did it really understand the structure?

The researchers also found that large language models can benefit from structural information when the target node does not contain enough text for the model to make a reasonable prediction. This shows that large language models can still use this structural information, but how to stimulate this part of the ability.

Future research directions mainly include two questions. The first question is whether there is a common structural feature that is beneficial for all different domain graph learning, how to learn this feature, and whether large language models can learn it. The second problem is how to make large language models pay attention to complex topological features, rather than just using the ability to speculate on the text context. This will be a question worth continuing to explore in the future.

Q&A session

Q1: Is it possible to introduce some better textual representations for Pure Graph?

A1: For Pure Graph, the introduction of some better textual representations must be helpful in improving the performance, especially in the path of using large language models to solve inference problems. Because the strength of large language models lies in their textual capabilities.

Q2: What is the significance of cross-domain unified graph learning?

A2: In the view of scholars who do graph neural network graph mining, we often have to develop a graph neural network for a specific field before, and there are problems of data misalignment and task misalignment. If there is a unified paradigm architecture, on the one hand, it is a strong research tool for researchers; On the other hand, there is also a unified paradigm for users, so it benefits both parties. Judging from the current research trend, the general approach to unification has attracted more and more attention. Because we always want to find an ultimate good way to solve as many problems as possible.

Q3: How to combine business selection?

A3: There are three different graphs mentioned in the sharing, as well as the three roles of large language models, when applying large language models, you should choose the most suitable path according to your own technical conditions and technical accumulation. The focus is on being able to solve problems.

That's all for this sharing, thank you.

Use large language models to promote comprehensive graph learning ability

Read on