Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

China Fortune Network News Under the domestic "thousand model war", who is the smartest big model? MIT Technology Review's latest big model evaluation report in China provides the answer.

The report shows that in the test and blind evaluation of 600 questions in 8 first-level categories, iFLYTEK Xinghuo Cognitive Model V2.0 ranked first in the scoring rate in 6 categories, and performed prominently in this evaluation, topping the evaluation with a score of 81.5 points (100 points) and winning the title of "smartest" domestic large model.

Figure: Comprehensive score rate of large model evaluation

Figure: Radar map of various capabilities of 4 large models

MIT Technology Review China's ability to comprehensively test large models from the dimensions of R&D and commercialization capabilities, external attitudes and development trends, and strives to evaluate the "smartest" domestic large models. "iFLYTEK Spark", "Baidu Wenxin Yiyan", "SenseTime Discussion" and "Ali Tongyi Qianwen" were selected as representatives of the Chinese model platform to carry out systematic and scientific evaluation.

The test set used in this evaluation contains 600 questions, covering 8 first-level categories, 126 second-level categories, and 290 third-level labels in terms of language specialty, mathematics specialty, science comprehensive, liberal arts comprehensive, logical thinking, programming ability, comprehensive knowledge, and safety, and optimized for the richness and diversity of questions.

In terms of question types, in order to take into account quantitative and qualitative evaluation and testing, four question types are set up: "single choice", "multiple choice", "fill in the blank" and "short answer", with 145, 138, 136 and 181 questions respectively. The large model evaluation system uses blind evaluation to objectively evaluate the intelligence of domestic large models.

As the basic ability of the "smartest" large model, the special evaluation of language includes 61 secondary categories such as dialogue understanding, multilingualism, irony, ancient poetry comprehension, text generation, summary of key points, sentiment analysis, semantic judgment, etc., and the question types are mainly short answers. The results showed that iFLYTEK Xinghuo's score rate of 85.73% ranked first, which was significantly higher than the average.

Figure: Score rate of language-specific assessments

Mathematics special evaluation is an indispensable evaluation dimension for the "smartest" large model. This evaluation includes 9 secondary classifications such as algebra, geometry, equation solving, complex mathematics, statistics, etc., to choose the title as the main one.

Among them, iFLYTEK Xinghuo ranked first with a score rate of 77.75%, much higher than the average score rate of 56%, and the scoring rate of other platforms was basically the same. According to the report, in the case of large models generally "bad mathematics", iFLYTEK Xinghuo's achievement is quite rare, and its leading position in mathematics is also reflected in the scoring results of the secondary classification, with the first score rate in 77.8% of the secondary classification, far exceeding other platforms, and it is initially judged that it is good at geometry and scenario applications.

Figure: Mathematics special evaluation score rate

As an indispensable "hardcore" part that can reflect the "smartness" of the large model, the comprehensive evaluation of science includes 5 secondary classifications of table question and answer, chemistry, biology, physics, and medicine, and the question types are mainly single choice and short answer.

In the evaluation results, iFLYTEK Xinghuo ranked first with a score rate of 78.50%. In addition, iFLYTEK Xinghuo scored first in 80% of the secondary classification evaluation under the comprehensive category of science, and chemistry and biology were more prominent.

Figure: Score rate of science comprehensive assessment

Logical thinking is also an important embodiment of the "smartest" large model, this logical thinking evaluation in logical reasoning, thinking chain and other aspects of the design of more topics, including analogy, common sense reasoning, spatial orientation, deductive reasoning, logical fallacy detection, causal reasoning and other 19 secondary classifications, the question type is relatively average, of which the most fill-in-the-blank questions and the fewest multiple-choice questions.

In the logical thinking question, iFLYTEK Xinghuo's score rate of 81.25% ranked first, which was significantly higher than the average of 72.6%. In addition, iFLYTEK Xinghuo scored first on the secondary classification problem with 63.2% of logical thinking. Logical thinking is quite important for large models to truly understand the physical world.

Figure: Logical thinking assessment score rate

Programming ability is a relatively high-level ability of large models, this programming ability evaluation includes ASCII, ASCII code recognition, Python, code, code correction, computer 6 secondary classifications, of which Python mainly evaluates the code generation ability and accuracy rate of large models in the form of short answers, and the others are examined in the form of objective questions.

The results show that the score rate of 80% of iFLYTEK Xinghuo is significantly higher than the average of 71%, and the scoring rate of other platforms is basically the same. It is worth mentioning that on the short-answer question item of generating code that many people care about, iFLYTEK Xinghuo's score rate is as high as 82%, which is much higher than other platforms, and its performance is quite eye-catching.

Figure: Programming ability assessment comprehensive score rate

As a more difficult evaluation dimension, comprehensive knowledge also has high requirements for the "smart" degree of large models, and the topics involved are more complex, including 13 secondary classifications such as encyclopedia questions and answers, common sense, scientific knowledge, fact questions and answers, work skills, riddles, etc., and the question types are mainly multiple choices.

In the comprehensive knowledge evaluation, iFLYTEK Xinghuo ranked first in the scoring rate of 80.61% and the first in the secondary classification of 84.6%, which initially showed its "excellence" in encyclopedia questions and answers and history and humanities.

Figure: Comprehensive knowledge assessment score rate

The report pointed out that in this round of large model evaluation, iFLYTEK Xinghuo won the first place with a score of 81.5 points, becoming the "smartest" domestic large model.

iFLYTEK Xinghuo ranked first in the scoring rate in the 6 first-level categories of programming ability, science comprehensive, logical thinking, mathematics specialty, language specialty, and comprehensive knowledge, and its performance in this evaluation is very comprehensive, especially in code generation, mathematical ability, science and logic, etc., and is the "smartest science student" this time.

It is worth mentioning that from the perspective of question type, iFLYTEK Xinghuo ranked first with a score rate of 83.98% in the subjective short answer questions; In the objective questions, iFLYTEK Xinghuo ranked first with a score rate of 75.7%, and performed well in both subjective and objective question types.

In addition, on August 12, in the "Artificial Intelligence Large Model Experience Report 2.0" released by the China Enterprise Development Research Center of Xinhua News Agency, iFLYTEK Xinghuo V1.5 ranked first in the domestic mainstream large model evaluation list with a total score of 1013 points, and won the first place in the two dimensions of IQ index and tool efficiency improvement index among the four major evaluation dimensions.

On August 15, iFLYTEK Xinghuo Cognitive Big Model V2.0 was released as scheduled, further breaking through code capabilities and multimodal capabilities. While the technology has made major breakthroughs, the applications and products equipped with the core capabilities of iFLYTEK Spark V2.0 are also becoming more and more abundant: iFlyCode 1.0, an intelligent coding assistant that assists programmers to work efficiently, iFLYTEK Smart 2.0 that can perform video creation, an educational digital dock application development assistant that can easily build light applications, and a Spark teacher assistant that helps teachers design teaching activities and generate courseware with one click. For English learners' speaking practice with Spark Language Partner 2.0, iFLYTEK AI Learning Machine has also upgraded AI 1-to-1 intelligent programming assistant and AI 1-to-1 creative painting partner. In addition, iFLYTEK and Huawei also jointly released the Spark all-in-one machine, so that every enterprise has the opportunity to build its own large model.

It is reported that MITTechnology Review is a technology commercialization think tank wholly owned by MIT. MIT Technology Review was launched in China in 2016 and is exclusively operated by DeepTech to conduct media, research, publishing and conference business in China. (Bai Fei)

Source: Xinhua News Agency client

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Read on

World League: Chinese women's volleyball team 0-3 Italy suffered a third defeat and the world ranking was overtaken by Japan and fell to 7th

Serious provocation! Philippine illegal beaching warship personnel pointed at the Chinese Coast Guard, and the scene was exposed!

Why do Serbs like China so much? After reading the replies of netizens, I was shocked!

Wonderful! The Chinese women's volleyball team lost 0-3 to Italy Zhu Ting only sent a ball, and the audience shouted: Cai Bin is out of class!

Big upset! The Chinese women's volleyball team lost 1-3 to the Japanese women's volleyball team! Zhu Ting cried with red eyes!

At the end of the Macau station, the latest world ranking of the women's volleyball team, and the ranking of the Chinese women's volleyball team were updated.

Trouble! The Chinese women's volleyball team lost another game, 0-2 Italy, the net was poor, and Zhu Ting almost did not appear

Women's Volleyball World League standings: China lost and fell to 7th, the Netherlands counterattacked, and Brazil and Poland won 8 consecutive victories

Pit avoidance guide, China's top ten pit daddy attractions, I don't want to go a second time after I go

#如何有效解决楼道阳台堆放杂物难题#没法解决! This problem is the biggest problem faced by all rule-abiding Chinese, because of the piles of debris in the public spaces of the corridors

The reporter questioned the three unsolved cases of the Chinese women's volleyball team, what happened to Zhu Ting and Cai Bin, and Hui Ruoqi had questions

A strategic metal rarer than gold, if China controls it, Apple's iPhone will be "short"

Crazy! China is leading the way in a number of "key" technologies – ASPI

The key battle of the Chinese women's volleyball team! Against Italy, the win in the Olympics is stable, and the loss may be overtaken by Japan

3-1！ Women's volleyball upset: The Olympic champion was defeated, Poland celebrated, and China welcomed the key battle for advancement

In the warm-up game, the Chinese women's basketball team defeated the Australian women's basketball team 75-68, and Yang Shuyu scored 7 3-pointers as the key