Won 4 consecutive firsts! AI common sense reasoning and humans are close to 3%

Reporting by XinZhiyuan

EDIT: Good sleepy peach

【New Zhiyuan Introduction】Taking another step forward in making machines think like people?

Recently, the 16th International Semantic Evaluation Competition (SemEval 2022) came to an end.

iFLYTEK State Key Laboratory of Cognitive Intelligence led the team to break through the siege and won the championship of 3 major competitions in a row.

Just a few days ago, the lab also broke the world record for common sense reasoning challenge CommonsenseQA 2.0 with 76.06% accuracy, nearly 3 percentage points ahead of second place.

So, how difficult are these so-called challenges?

Set a new world record for common-sense reasoning

Therefore, common sense reasoning is to use the knowledge of one's own understanding, such as scientific facts, social conventions, etc., and then combine with a specific background to infer the answer to a certain question.

For human beings, the use of "common sense" to solve problems is itself a "common sense".

However, for the current reading comprehension AI, if the original text is not, it is basically "two eyes and one black".

It is very difficult for these models to use common sense to deduce the answer to a certain question, and it is also urgent to solve.

Therefore, in such a context, the CommonsenseQA 2.0 (CSQA2) international common sense reasoning evaluation dataset came into being under the leadership of the Allen Institute for Artificial Intelligence.

It has attracted many top international institutions including Google, the Allen Institute for AI, the University of Washington and many others to participate in the challenge.

In a nutshell, CSQA2 is a binary classification dataset containing 14,343 questions, mainly divided into training/development/testing sets, which require the determination of whether common sense statements are right or wrong.

The problems examined in version 1.0 are based on the knowledge triad in the existing common sense knowledge base ConceptNet, which allows the machine to directly focus on the reference when working on this task.

CommonsenseQA 1.0 task example

Subsequently, the Allen Institute of Artificial Intelligence launched version 2.0 to upgrade the challenge to "judgment questions", which is significantly more difficult than the "multiple choice questions" of 1.0.

The new version of the reasoning problem not only has a huge imagination space, but also most of it cannot be covered by the existing knowledge base. At the same time, in the process of constructing the evaluation data, iterative design is also continuously iteratively designed through the way of human-machine game confrontation.

If the industry's mainstream medium-sized pre-trained model is sent to answer, the accuracy rate can only reach 55%, which is slightly higher than the level of random guessing.

Prior to this, the optimal method was to generate knowledge of common sense QA 2.0 common sense reasoning problems through the GPT3 model with a parameter size of 175 billion, and the accuracy rate was increased to 73% after fusing based on the T5 model.

CommonsenseQA 2.0 task example

iFLYTEK, which participated for the first time, innovatively proposed the ACROSS model (Automatic Commonsense Reasoning on Semantic Spaces), which achieved the effective integration of external knowledge under the unified semantic space, significantly improved the problems existing in the hyperscale pre-training model, and achieved 76% accuracy on the CommonsenseQA 2.0 task.

By borrowing from human problem-solving ideas, the ACROSS model first collects a large amount of knowledge base and Internet-related information, and then integrates it in a unified semantic space. As a result, hyperscale pre-trained models have stronger knowledge inputs, enabling accurate common-sense knowledge inference.

However, this achievement is still far below the level of 94.1% of human beings, which shows that there are still great challenges and room for improvement in the direction of common-sense reasoning.

Multilingual language comprehension triple-cap

The three-time-time SemEval 2022 review, hosted by SIGLEX, a division of the Association for Computational Linguistics (ACL), has been held for 16 years.

The contestants cover domestic and foreign first-class universities and well-known enterprises, including Dartmouth College, the University of Sheffield, etc., representing the most cutting-edge international technology and level.

After the competition, iFLYTEK and the team won the championship in three sub-tracks: Task 8, Task 2: Subtask A one-shot, and Task 11.

News similarity ratings

In the multilingual news similarity evaluation task, the "Hitachi iFLYTEK Joint Laboratory" (HFL), jointly established by iFLYTEK and Harbin Institute of Technology, won the championship with a significant advantage.

Multilingual news similarity assessment task

So, what exactly is the similarity of news?

Take, for example, the two highly similar press releases below.

First, teams need to strip out the similar main elements of the text and analyze them one by one, such as geographic information, narrative technique, substance, tone, time, and style. And finally judged the similarity of the two news, with a score of 1-4.

The competition covers 10 languages, namely Arabic, German, English, Spanish, French, Italian, Polish, Russian, Turkish and Chinese.

Compared with ordinary articles, the competition emphasizes cross-language comprehension skills, in addition to writing style and narrative style, it is also necessary to grasp the specific events described in the article.

That is to say, when AI has practiced this skill, it can identify whether some news reports on the extranet have deviations and distortions, so as to effectively prevent the spread of false information and bad information.

Idiomatic language recognition

The second task championship won by the HIT-iFLYTEK team is idiomatic testing.

"Idiomatic words" are actually quite well understood.

For example, does the "Cao Cao" in "Say Cao Cao, Cao Cao" really exist?

But don't forget, this is a multilingual challenge.

Take an Example in English, such as "big fish" in the following two sentences.

In the first sentence, it is clear that it refers to a large fish, which translates to "when you catch a big fish from the net, it is best to hold its waist".

The "big fish" that appears in the second sentence needs to be understood as a "big man" in order to make sense logically.

The challenge of "multilingual idiomatic language recognition" is whether the model can judge whether it is idiomatic or literal through the context and the phrase itself.

Multilingual Idiomatic Recognition Mission (Sub-Track)

To accomplish this task, then, the model needs to be able to use cross-linguistic analysis and comprehension to distinguish between different semantics of the same word in different sentences.

In addition, it is necessary to have the ability to transfer learning between different languages, and use this to complete the test of languages that have not appeared in the training set.

If a model performs well in this challenge, then after practical application, it can effectively identify the expression intention of idiomatic expressions in daily writing and translation work, and greatly improve the accuracy of content. It can be said that it is quite practical.

Complex named entity recognition

There is also a very difficult project that feels complicated just by listening to the name: the Multilingual Complex Naming Entity Recognition Task (MutiCoNER).

Let's first disassemble the word MuticoNER, Muti is the abbreviation of multilingual (multilingual), Co is composite (complex), and NER is Named Entity Recognition, also known as "named entity recognition", which refers to the recognition of entities with specific meaning in the text, mainly including personal names, place names, institutional names, proper nouns, etc.

How difficult is this task?

For example, apart from losing in the first round to [the organization], it is now four consecutive wins. (Rafael van der Vaart) [PER], (Gonzalo Higuaín) [PER] and (Arjen Robben) [PER] did well.

Specifically, this task contains 11 separate language named entity evaluation tasks, and two multilingual unified modeling evaluation tasks. The data is derived from Wikidata (Wikidata), which is a huge amount of data and has great application value.

Teams need to accurately predict the category labels of different language entities in text data in a single language and a mixture of multiple languages, while using only one model for the whole process.

In this regard, the joint team of the University of Science and Technology of China and iFLYTEK reached the top with F1 results of 92.9%, 81.6% and 84.2% respectively on the multi-language mixed, Chinese and Bengali tracks.

Multilingual Complex Named Entity Recognition Task (Mixed List)

However, when it comes to technology, we must not only look at the results achieved in the examination room, but also see its real practical ability.

Did you use it?

No, just at this year's Beijing Winter Olympics, iFLYTEK can be described as a big show of its skills.

As the "official exclusive provider of automatic speech conversion and translation", the company presented a "barrier-free communication" sports event for all spectators.

Even iFLYTEK's virtual volunteer "i+", has become a "team favorite" on and off the field.

She can not only provide real-time consultation on the schedule and events, but also the surrounding transportation, cultural tourism and other consultation questions and answers, but more importantly, she can communicate face-to-face with athletes from all over the world in multiple languages.

"Aijia" is an automated avatar production solution created by iFLYTEK using a number of core technologies such as speech recognition, speech synthesis, lip drive, face drive, and body movement drive.

This allows virtual people to not only speak Mandarin, but also support 31 languages and dialects, which is a literal "language communication".

"Aijia" can not only carry out face-to-face real-time interactive exchanges of Winter Olympic events and schedules, but also accompany you to play a Big PK of the Winter Olympic Knowledge Game, and the surrounding transportation, culture, tourism and other consultation questions and answers are not a problem.

In addition, in various industry artificial intelligence applications in education, medical care, justice and other scenarios, multilingual voice interaction systems will play an important role.

The first perspective of Gu Ailing's "star chasing" scene

After years of technical accumulation, in addition to Chinese and English, iFLYTEK currently has speech recognition capabilities in 69 other languages, of which 35 languages have an accuracy rate of more than 90%.

It has deployed overseas sites in Singapore, Russia, India, Japan and other countries, and will continue to provide speech language services such as speech recognition, speech synthesis, machine translation, and graphic recognition for developers at home and abroad.

To say how effective these applications are, we have to rely on data.

On April 21, iFLYTEK released its 2021 annual report.

During the reporting period, the company achieved revenue of more than 18.3 billion yuan, an increase of 40.61% year-on-year, and deducted non-attributable net profit of 979 million yuan, an increase of 27.54% year-on-year, and the scale and efficiency of operation continued to grow.

Among them, the revenue of smart education business was 6.007 billion yuan, an increase of 49.47% year-on-year, and the revenue of open platform and consumer business was 4.687 billion yuan, an increase of 52.19% year-on-year. The business in the base area has taken root and maintained rapid growth.

You know, a company can achieve sustained revenue not only by profit, but more importantly by investing in research and development.

In 2021, iFLYTEK's R&D investment continued to grow to RMB2.936 billion, an increase of 21.50% year-on-year.

In addition, it is worth noting that the iFLYTEK open platform has shown rapid growth in the number of developers and revenue.

In 2021, iFLYTEK's open platform revenue reached RMB2.988 billion, an increase of 55.6% year-on-year. The number of developer teams increased by 66% to 2.93 million.

The open platform has opened 449 AI capabilities and programs to the outside world, and focuses on empowering 18 industry sectors such as finance, agriculture, and energy.

In addition, iFLYTEK also released the "Open Platform 2.0 Strategy", which unites industry leaders to build a baseline base for the industry, opens up scenarios to gather developers' creativity, and works with industry leaders and developers to build an artificial intelligence industry ecosystem.

Next stop, where to go?

In the next few years, iFLYTEK has prepared a battle map.

At the beginning of 2022, Liu Qingfeng, chairman of iFLYTEK, announced the launch of the "iFLYTEK Super Brain 2030 Plan" to enable artificial intelligence to understand knowledge, be good at learning, and evolve, and let robots enter every family.

It should be known that China is now facing a more serious aging problem, and the population over 60 years old will exceed 300 million, which has become an urgent problem to be solved.

The company has come up with a bold idea of letting robots into every home by 2030 to take on the problem of old-age care.

The Super Brain 2030 plan will be divided into three steps in the future:

Phase I 2022-2023.

iFLYTEK will launch the first pet robot that can be cultivated, which can accompany children to grow, teach children knowledge, and accompany the elderly to walk and run. And release a professional virtual person family, so that virtual people have education, medical, finance, customer service and other industry and professional field knowledge, and sustainable evolution.

Phase II 2023-2025.

Let the exoskeleton robot enter the home. The exoskeleton robot realizes adaptive motor function, which can not only help the disabled or the elderly with limited mobility to achieve independent walking, but also carry out texture assessment and movement judgment, and can actively compensate for human behavior. And released the companion virtual person family, which can accompany the elderly and have a warm emotional exchange.

Phase III 2025-2030.

Let the knowledgeable and learning companion robots enter the home, and the digital virtual people can learn and evolve on their own. In addition to the companionship of artificial intelligence entering the family like a loved one, it also needs to have the ability to interact and move. iFLYTEK hopes to truly help human beings better cope with the future in the process of just need through breakthroughs in artificial intelligence technology and the ability to integrate soft and hard.

It is precisely with years of technological accumulation and systematic innovation that iFLYTEK can prepare for the momentum and rely on its own strength to solve the problem of social aging.

In the future, iFLYTEK will continue to carry out technological innovation at the source of artificial intelligence, realize more innovative applications of artificial intelligence, and build a beautiful new world of artificial intelligence.

The future is worth looking forward to...

Won 4 consecutive firsts! AI common sense reasoning and humans are close to 3%

Read on