Professor Wang Jie of the University of Science and Technology of China: Knowledge graph reasoning technology based on representation learning

Author | Victor

Edit | Twilight

The knowledge graph contains rich human prior knowledge, which has important academic value and a wide range of application prospects. As the core technology in the field of knowledge graph, knowledge graph reasoning can greatly expand the boundaries of existing knowledge and effectively assist humans in intelligent decision-making.

On December 17, 2021, Professor of university of science and technology of China, doctoral supervisor, and national youqing Wang Jie made a report on "Knowledge Graph Reasoning Techniques Based on Representation Learning - From Simple To Complex Reasoning" at the CNCC 2021 "Knowledge as Meaning, Graph as Shape - Knowledge Reasoning Based on Graph Machine Learning" forum.

In the report, Wang Jie combined the research trends and application scenarios of knowledge graph in recent years, focused on the hierarchical and progressive reasoning scenarios from single graph reasoning to combined external information reasoning, from structured input to natural language input, and introduced the progress made in the direction of knowledge graph reasoning based on representation learning. Finally, Wang Jie looks forward to some challenges and opportunities facing the future development of knowledge graph reasoning technology.

For example, he mentions: "The current widely used data sets cannot accurately reflect the real-world scenario model, and the current model testing basically adopts the closed-world hypothesis, which does not conform to the real application scenario, which will lead to the result that should be correct and be judged to be wrong... Existing knowledge graphs deal only with textual information, and the future trend is to extend to multimodal information. The multimodal knowledge graph relies on the collection of multiple modal data, where the key question is..."

The following is the full text of the speech, and the AI technology review has been edited and revised:

Today's speech is entitled "Knowledge Graph Reasoning Technology Based on Representation Learning- From Simple Reasoning to Complex Reasoning", which is divided into four parts: Background Introduction, Simple Reasoning, Complex Reasoning, and Future Prospects.

The essence of a knowledge graph is a large-scale semantic network of knowledge bases that represent descriptions of entities of the objective world. As shown in the figure above, the lower left corner of the figure shows that each node represents a person, and the edge represents the relationship between the characters. In a computer, the knowledge graph is stored in the form of triples, containing head entities, relationships, and tail entities.

We always want to get a large knowledge graph, because under the influence of scale effects, it will bring qualitative changes to the application effect. Knowledge graphs can be divided into two categories, one is the general knowledge graph, an encyclopedia knowledge base for the general field, and the other is the domain knowledge graph, which is an industry knowledge base for a specific field.

The coverage of the general knowledge graph is wider, but the knowledge hierarchy system contained is shallower, the granularity is coarser, and the accuracy is not high, while the domain knowledge graph is the opposite, its coverage is narrower, only for a specific field, and the depth and precision of the knowledge contained are often of higher standards and quality.

The knowledge graph can be traced back to the expert system in the 60s, when it mainly relied on expert knowledge and was built manually, so the cost was higher. After years of development, the knowledge graph has gradually shifted to automated construction, and the semantic network proposed in 1998 and the linked data proposed in 2006 are the key nodes in the development of "automation".

In 2012, Google released a knowledge graph and applied it to search engines. This is when the knowledge graph vocabulary is explicitly proposed for the first time. At present, the knowledge graph built by Google, Baidu, etc. already contains more than 100 billion-level triples, which rely on automatic knowledge acquisition technology driven by big data.

Knowledge graphs are typical cross-technical domains that contain many technical elements: storage, querying, building, acquisition, inference, fusion, question answering, analysis, and so on. Among the many elements, reasoning is the core technique and task.

On the one hand, the storage, query, construction and acquisition of knowledge graphs are not only to be able to describe the objective world and summarize human prior knowledge, but more importantly, to serve the reasoning of knowledge graphs.

On the other hand, both techniques and tasks in the knowledge graph include deep semantic understanding. For example, in fusion technology, inference technology needs to be used to align entities of different knowledge graphs; in question answering technology, reasoning technology is required to expand the semantics of question sentences; and in analytical technology, reasoning technology is needed to help further excavate information in graph data.

Therefore, any task that involves deep semantic understanding involves the process of reasoning. The reasoning goal of the knowledge graph is to use the relationships or facts that already exist in the knowledge graph to infer unknown relationships and facts. In other words, it is from one or more known judgments to infer another unknown judgment.

There are two forms of reasoning in the knowledge graph: rules-based reasoning and reasoning based on representation learning. Rules-based reasoning refers to deductive reasoning based on ontological logic, for example, if A belongs to B and B belongs to C, then A belongs to C. Although this kind of reasoning is highly interpretable and accurate, it needs to write the rules clearly in advance, so it is not flexible enough in practical applications. When large-scale data is involved, statistical methods can be used to summarize the rules of induction, which is also known as inductive reasoning.

Inference based on representation learning requires mapping entities and relationships between entities to a vector space, and then modeling logical relationships through the operation of the vector space. This approach makes it easy to capture implicit information, but loses interpretability.

Enumerates how an inference based on representation learning works. There are two triples in the image above: ;. When you map it to the vector space, you will find that the difference between the two vectors of China and Beijing is close to the difference between the United States and Washington.

Then define a function that wants the triples to map to the vector space after the vector representation of the head entity + relation is as close as possible to the vector representation of the tail entity. The f(h,r,t) function in the figure above can be either a Loss function or a scoring function.

The scoring function is somewhat of a confidence that the triple is true, as shown in the example in the lower right corner of the figure above, based on the scoring value (confidence) to determine "the capital of the United Kingdom is London".

Further, according to the input, knowledge graph reasoning based on representation learning is divided into two categories: simple reasoning and complex reasoning. Simple reasoning is similar to link prediction, the difficulty of reasoning the relationship between two given entities based on existing entities and relationships in the knowledge graph lies in understanding the semantics of existing entities and relationships.

Complex reasoning has more complex inputs than simple reasoning. Depending on the input, the difficulties are:

The semantic structure between modeling relationships, given entity relationships that have not occurred in the trained model.

Model complex structured problems with several first-order logics.

Model unstructured problems, input data containing human spoken language, etc.

Simple reasoning is the latest development

Intuitively understand simple reasoning, such as having a head entity and a tail entity, and then want to complement the correspondence to make the triple set hold as much as possible.

Or take the character knowledge graph as an example, it is known (training data) "Jiang Ying's husband is Qian Xuesen, Jiang Ying's father is Jiang Baili", what is the relationship between Qian Xuesen and Jiang Baili? To better address this link prediction problem, key properties between entities in the knowledge graph need to be modeled.

There are three ways: semantic approximation, semantic stratification, and semantic fusion. For example, tigers are mammals, tigers and lions semantically similar, you can infer that lions are mammals; lions belong to felines, felines belong to mammals, according to the semantic stratification phenomenon can be inferred that lions are mammals; semantic fusion refers to the combination of knowledge graphs and unseasonable graphs of unstructured text descriptions, thereby capturing the potential semantics of entities.

Semantic approximation

For semantic approximations, the current classical method is "knowledge graph embedding model based on tensor decomposition", such as CP, RESCAL, ComplEx, etc., and the common denominator of such methods is that the probability of a triple being true is defined by the inner product. The problem is shown in the figure above (right), where entities with similar semantics in the vector space have disjoint representations.

Based on the above shortcomings, we propose "regular terms for tensor decomposition-oriented knowledge graph embedding models", the idea of which is to make semantically similar entities represent the inner product as large as possible and the distance as small as possible. As shown in the figure above (left), in addition to wanting the vector of the tail entity to fall on the yellow dotted line as much as possible, you also want the vector representation of the tail entity to fall in the ellipse (the red area) as much as possible.

How? Add a regular term based on a dual distance model, representing the original inner product with 2 norms of the vector difference. Expanding the "2 norms" reveals that the expression also contains the original inner product, as well as the squares of the next two 2 norms. Finally, the dual-induced regular term is obtained: from the original inner product + head entity 2 norm + tail entity 2 norm.

Experiments have found that "dual-induced regular terms" can effectively promote similar semantic entities to have similar representations, and can also significantly improve the inference performance of existing models. In addition, it has the advantage of giving an upper bound of the tensor kernel 2-norm and a tensor generalization of the regular norm of the trace norm in the matrix decomposition problem.

Semantic layering

Semantic layering is widespread, such as "palm trees are trees" and "Beijing is located in China". Among them, the tree is higher level, the palm tree is lower level; China is higher level, and Beijing is lower level. If entities are classified semantically, they can be divided into entities with different semantic hierarchies, such as "mammal" and "dog", "moving" and "running"; entities with the same semantic hierarchy, "rose" and "peony", "truck" and "passenger car".

The existing modeling semantic hierarchy has two traditional jobs, using external hierarchical information to assist modeling, and in some specific data sets, the entities and relationships themselves have hierarchical information. This approach can help understand the semantics of entities, but it doesn't do a good job of distinguishing between different levels of entities, and most critically, not all datasets have additional hierarchical information.

There is also a type of method that mainly considers the semantic hierarchy of the relationship, that is, the combination of a relationship abstracted into several sub-relationships of different levels, so as to achieve the modeling of the semantic level, but this type of method requires additional clustering operations on the representation of the relationship, and its disadvantage is that it is impossible to fully automatically learn semantic information with hierarchical properties from the knowledge graph.

In order to model the semantic hierarchy of the knowledge graph, the semantic hierarchy can be modeled as a tree structure, as shown in the figure above (left), the depth of the nodes in the tree structure can reflect the hierarchy information: the nodes closer to the root node have the higher the hierarchy; and the different nodes with the same depth have the same level.

Further, the tree structure can be modeled with polar coordinates. Polar coordinates consist of two parts, the radius coordinate reflects the distance from the point to the origin, and the angular coordinates can be used to distinguish different locations on concentric circles. Therefore, the distance from the point to the origin can be regarded as the distance to the root node, and the radius coordinates and angular coordinates can correspond to different levels and entities of the same level, respectively. In summary, mapping entities into a polar coordinate system, using the polar coordinate modeling semantic hierarchy, can use the modulus and the angle (phase) two parts for modeling.

In order to model the relationship between different entities, the relationship between the module lengths of different entities can be modeled as a telescopic transformation, that is, the module length of the head entity is multiplied by the relationship transformation (r) to obtain the module length of the tail entity, and then the relationship between the angles is modeled as a rotation change, that is, the angle of the head entity rotates different angles according to different relationships to obtain the angle of the tail entity. This type of modeling can be defined as a distance function in the upper (right) plot.

Experimentally, such methods can effectively distinguish the semantic hierarchy of an entity. For example, in the above figure, "CS and AI are different levels", "ask and index are the same level", "D and C are different levels" can be more clearly divided. In addition, the head and tail entities have the same hierarchy, and experiments have shown that they can be distinguished by angle. On the single-step inference test dataset, such methods have significantly surpassed other methods in terms of inference performance, and have been evaluated by peers as "the best performing model among geometry-based methods".

Semantic fusion

Semantic fusion requires combining graphs with text descriptions, involving both structured and unstructured data, an area that is still being explored. The existing trend is to develop from knowledge embedding to knowledge injection, the former refers to the traditional KGE model, only from the structured knowledge graph to obtain knowledge, the huge volume of text data can not be fully utilized.

Knowledge injection means that KGE models are trained in collaboration with pre-trained models to effectively process unstructured data. However, the disadvantage is that it will bring high computational costs due to the huge transmission volume of pre-trained models, and even the cost is too large to train together.

To solve this problem, we propose Hetero-Learner: an efficient learner that fuses heterogeneous knowledge, embeds graph structures and text descriptions into vectors, and organically stitches vectors. Experiments have shown that the SOTA results on Wikidata5M are obtained only with 3.6% of the parameters of the same model KEPLER.

To further improve performance, inspired by human cognitive reasoning, we proposed Hetero-Reasoner. The model method "simulates" humans, first making judgments and reasoning (corresponding to The Knowe learner)) based on the meaning of the reasoning object and the connections between the reasoning objects, then inducting abstract logical rules from the phenomenon to assist the reasoning (corresponding to Rule Miner), and finally recalling and regurgitating the existing knowledge to strengthen confidence in reasoning and judgment (corresponding to the Knowledge Distiller). Overall, the model includes three modules: heterogeneous learner, rule miner, and knowledge distiller, which can effectively combine structured knowledge graph data and unstructured text data for inference.

In the end, the "Link Prediction" track, which was held in the most recent KDD CUP 2021 large-scale knowledge graph competition, won third place, becoming the only team in the top three with members from universities.

Recent advances in complex reasoning

Complex reasoning mainly focuses on inductive reasoning, multi-step reasoning, and natural language query.

Inductive inference and simple inference are similar in that they are both tasks for link prediction, but the entities of the inductive inference test dataset and the entities of the training dataset do not coincide, so the difficulty lies in how to transfer or generalize the knowledge of the training dataset to the test dataset.

At the heart of inductive reasoning is the semantic structure of learning relationships. For example, the characters in the knowledge graph on the left (Dream of the Red Chamber) and the right do not coincide. But the relationship between the two does have some common features. For example, both are in line with the relationship patterns of mother, father and husband, and can be extracted and applied.

The classical method of this type of modeling is inductive reasoning based on rules learning, which is the relationship structure that often occurs in the statistics and induction of knowledge graphs.

We designed another inductive pattern, which first turned the relationships of the original atlas into nodes, and then generated a new atlas in which the edges between the relationship and the relationship represent the connection pattern of two adjacent relationships. Graph neural networks are then used to train graphs with relationships as nodes to find relevant features.

As shown in the figure above, this method significantly surpasses other methods in inductive inference performance, and the improvement relative to existing methods is about 5 points or even 10 points.

Multi-step reasoning

The complex form of reasoning corresponding to the input of a complex structured problem is multi-step reasoning. For example, for the query task "list the principals of 211 but not 985 universities in Anhui Province", this task can be solved by the traditional method of constructing computational graphs, but it will encounter problems such as structural diversity and non-logical operations, resulting in very high computational complexity.

Another example: Universities in the eastern provinces of China who have reasoned in the knowledge graph will increase exponentially as the reasoning steps progress, starting from the Chinese node. To solve this problem, we propose a method based on representational learning, inference in the appropriate vector space.

Multi-step reasoning based on representation learning has two key steps. First, define the vector space, and second, define the inference operation in the vector space.

Specifically, first the entity and the set of entities are mapped to the vector space, the entities are represented by geometry or probability distributions, and then the answer is obtained through similarity comparison in the vector space, thus avoiding huge computational overhead; then, the inference operation is defined as a transformation between the entity sets, such as "and" the intersection of the corresponding entity set; or "the union of the corresponding entity set; and the complement of the "non" corresponding entity set.

Thus, in a multi-step inference model based on representation learning, given the problem structure, the final problem representation is obtained through logical operations, and then the answer to the final question is obtained by the distance between the entity representation and the problem representation.

In general, the answer to the question is a collection of entities, and the question representation is essentially a representation of a collection of entities. So how the set of problems is represented becomes very important. The traditional approach is to use a "box" for a query, which, while it can perform logical operations, is difficult to model "non" relationships.

We propose coneE, a vector space composed of two-dimensional cones (Cone). Define an entity as a width angle of 0 and a collection as a width angle of not 0. Due to the closed nature of the cone, it is easy to perform "and or not" operations. Currently, this work significantly outperforms other methods in multi-hop inference performance.

Natural language queries

The difficulty of natural language queries is in modeling unstructured problems, and its task is to take input for a given natural language problem (different from structured queries) and give answers through knowledge graph multi-hop reasoning. However, as the number of problem hops increased, the number of candidate entities increased exponentially. Existing GNN methods are trimmed through subgraphs to reduce the number of candidate entities but at the expense of recall for the correct answer.

To this end, inspired by the theory of human cognition, we propose a two-stage approach. The first stage corresponds to System 1 (unconscious, intuitive, fast thinking), quick screening, scored by query-answer semantic matching; the second stage corresponds to System 2 (conscious, logical, slow thinking), through bayesian networks, based on the scoring of reasoning paths.

In the question "Who are the editors of John Derek's films?" As the results of applying our design approach, as shown, there are relatively few entities left behind and a high degree of confidence. Further experiments show that our method significantly outperforms previous SOTA methods on multi-hop datasets.

Future outlook

Reasoning on the knowledge graph has a rules-based approach in addition to a method based on representation learning. While methods based on representational learning can better model potential semantic information in the knowledge graph than regular reasoning, rule reasoning tends to be more popular in real-world application scenarios. The reason: it is highly precise and explainable. Therefore, the goal of the academic community should be to make the performance of the representation learning reasoning model in real scenarios comparable to that of the regular reasoning model.

On the other hand, academic model evaluation should be more comprehensive and efficient, in order to guide the design of the model to better meet the needs of the real scene. Let me discuss both datasets and metrics.

First of all, the current widely used data sets can not accurately reflect the real scene model, the existing model test basically uses the closed world hypothesis, that is, the triples that are not in the knowledge graph are wrong, which is obviously not in line with the real application scenario, so it will lead to the correct result being judged as wrong. Therefore, how to objectively reflect the performance of the model with the performance of the "candidate data set" needs to be further explored.

Furthermore, the currently widely used metrics do not fully assess the strengths and weaknesses of the model. For example, the higher the ranking of the correct triplets in the test set, the better the model will perform on these metrics. However, this is not comprehensive. In addition, under the closed-world hypothesis, some models that should be performing better may also perform poorly under these indicators.

Existing knowledge graphs deal only with textual information, and the future trend is to extend to multimodal information. The construction of the multimodal knowledge graph relies on the collection of multiple modal data, and the key question is how to align the data between different modes. In addition, it also needs a high-performance database to help store multi-modal data, and at present, domestic enterprises have begun to tackle key problems in this regard.

The combination of knowledge graphs and pre-trained language models is also the next development trend. Pre-trained language models are mature, but they don't perform satisfactorily when it comes to domain-specific knowledge or common sense. How to use the knowledge graph to enhance the pre-trained language model, or how to use the pre-trained language model to help better reason on the knowledge graph, is also the direction that needs to be focused on next.

Finally, the combination of the knowledge graph and the dialogue scene is also what I am looking forward to. The dialogue state is represented by a time series knowledge graph, which can more completely track the state and changes of the dialogue than the structure of traditional key-value pairs.

Professor Wang Jie of the University of Science and Technology of China: Knowledge graph reasoning technology based on representation learning

Read on