New milestones in NLP! Tsinghua Yao class graduates released KEAR: the first common sense question and answer beyond humans

2021-12-30 07:35:40

Reporting by XinZhiyuan

Edit: LRS is so sleepy

【Introduction to New Zhiyuan】In the future, it can no longer be said that humans know common sense better than AI! Recently by Microsoft Huang Xuedong, Tsinghua Yao class graduates released a new system KEAR, successfully brushed the list of major common sense Q&A rankings, common sense Q&A performance for the first time surpassed humans, even non-English common sense he also understands!

One of the things that AI models have always been criticized for is that they only "die learning", can only make predictions based on a given training sample, and cannot answer a little "common sense" question.

For example, you ask GPT-3: How many eyes does the sun have?

It will tell you without hesitation: an eye, of course!

Although common sense information is not reflected in the input text, if you do not understand common sense, the answer can only be that the donkey's lips are not right.

To address these common-sense errors, the researchers used ConceptNet to build a dataset commonsenseQA specifically for common sense questions, requiring models to understand common sense in order to answer questions correctly.

Each question contains five candidate answers, two of which are interference terms, which is even more difficult for AI models.

For example given a question: What does your dog like to eat? （What is a treat that your dog will enjoy?）

Candidate answers may be salad, petted, affection, bone, lots of attention, etc. In the process of interacting with dogs, people can understand that most dogs like to eat bones, thus deducing that your dog is also more inclined to bones in the candidate answer, but the AI model does not understand.

So to answer this question correctly, you must know how to use external knowledge.

Then the authors of CommonsenseQA tested BERT-LARGE, a model that swept the major charts at the time, and the results were disastrous, with an accuracy rate of only 55.9%, while the accuracy rate of human answers had reached 88.9%.

Three years later, a RecentLy Chinese team from Microsoft published a paper proposing a KEEN (Knowledge External Attention for Commonsense Reasoning) system that takes the performance of CommonsenseQA Common Sense Questions to a new height, with an accuracy rate of 89.4%, successfully surpassing humans, and is a milestone model in the field of AI common sense.

Compared with the traditional AI model that requires large-scale data to train, this paper proposes an external attention mechanism to enhance the Transformer architecture, which can integrate external knowledge information into the process of prediction, thereby reducing the need for large parameter quantities in the model and making the AI system more democratized, that is, it can reduce the threshold of AI model research. You don't have to buy a lot of graphics cards from Lao Huang to achieve SOTA performance.

In general, when the KEAR model answers the question "what does your dog like to eat?", it first retrieves "dog- desires — petted, affection, bone, lots of attention" from the ConceptNet entity chain, thus ruling out a wrong answer salad.

KEAR then retrieves the definition of bone from Wikitionary: a composite material making up the skeleton of most vertebrates;

Retrieving the training data from the CommonsenseQA dataset was retrieved as "What do dogs like to eat?" What do dogs like to eat? bones）。

After cascading the retrieved knowledge and the input knowledge, KEAR uses it as input to the DeBERTa model, and finally can deduce the correct answer: bones!

It can be seen that for humans, the simplest question, the AI model to complete but need a lot of external information to answer correctly.

Since CommonsenseQA is only data for Common Sense Q&A in English, it also explores whether common sense reasoning in other languages is still valid.

The researchers first translated the non-English question into English, then searched for knowledge in the corpus data in English, then translated the knowledge text into the source language, which went through an external attention mechanism and then translated to obtain the answer, that is, translation-search-translation (TRT).

The result was also that both tasks on the X-CSR benchmark, X-CODAH and X-CSQA, achieved first place.

More than self-attention

To this day, most AI models basically use a self-attention mechanism on the source text, training by feeding a large amount of data to the model, so that the model remembers the input text.

While Transformer works well, the drawbacks are also obvious:

Time and space complexity is too high, requiring a lot of graphics card and video memory

In the case of insufficient data, the Transformer is not performing well enough

On the other hand, Transformer is essentially a black box model, there is no way for him to understand and reason with text like a human, it is important to know why AI makes such predictions, KERA can use knowledge graphs, dictionaries and common knowledge of publicly available machine learning data, to a certain extent, can reflect the source of the answer and the process of model reasoning.

The implementation of external attention is also very simple, cascading input and knowledge as new inputs, and then going through the self-attention mechanism as the whole as H0.

Sources of K (nowledge) include knowledge graph ConceptNet, dictionaries, and training data.

It can be seen that the main difference between self-attention and external attention is whether the input only comes from the input text, that is, by providing the external attention mechanism with relevant background and knowledge from different sources, including the output of knowledge graphs, dictionaries, corpora and other language models, and then letting the model simultaneously pay attention to the input and external attention to knowledge, the effect of introducing external knowledge can be achieved.

The external information introduced is stored in symbol form, such as plain text or knowledge graph entries, thus improving transformers' ability to understand language.

And the text cascading of input and knowledge used by KEAR does not produce any changes to the transformer model structure, making it easy for existing systems to use external attention.

Because knowledge in the world is also dynamically changing, another benefit of external attention is that users can easily update the source of knowledge to change the predictive output of the model.

By introducing the latest common sense, such as entering an online updated knowledge graph into the model, the decision-making process of the model can become more transparent and interpretable.

The joint optimization of multiple modules and the introduction of external attention into the knowledge base are also the core directions for Microsoft's artificial intelligence cognitive services to improve quality.

About the author

The first author of the article is Xu Yichong, who graduated from Yao Ban of Tsinghua University with a bachelor's degree and received his Ph.D. from Carnegie Mellon University, focusing on interactive machine learning, natural language processing, and deep learning. He is currently a senior researcher in Microsoft's AI Cognitive Services Research Group.

Chenguang Zhu is the principal research leader of Microsoft's Cognitive Services Research Group. He leads knowledge and language teams and works on R&D in text summaries, knowledge graphs, and task-oriented dialogues. He received his Ph.D. in Computer Science and Master of Statistics from Stanford University in 2016 and his B.S. in Computer Science from Yao Ban of Tsinghua University.

Xuedong Huang is the leader of Microsoft's AI Cognitive Services Engineering and Research Team, an IEEE/ACM Fellow, Microsoft's first "Chinese Global Technology Fellow", Microsoft's Chief Voice Scientist, a Global Technology Academician/Global Artificial Intelligence Chief Technology Officer of Microsoft's Cognitive Services Team of Microsoft Cloud Computing and Artificial Intelligence Division. He holds a bachelor's degree from Hunan University, a master's degree from Tsinghua University and a doctorate from the University of Edinburgh.

Resources:

https://arxiv.org/abs/2112.03254

New milestones in NLP! Tsinghua Yao class graduates released KEAR: the first common sense question and answer beyond humans

Read on