laitimes

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

author:iFLYTEK

China Fortune Network News Under the domestic "thousand model war", who is the smartest big model? MIT Technology Review's latest big model evaluation report in China provides the answer.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

The report shows that in the test and blind evaluation of 600 questions in 8 first-level categories, iFLYTEK Xinghuo Cognitive Model V2.0 ranked first in the scoring rate in 6 categories, and performed prominently in this evaluation, topping the evaluation with a score of 81.5 points (100 points) and winning the title of "smartest" domestic large model.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Comprehensive score rate of large model evaluation

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Radar map of various capabilities of 4 large models

MIT Technology Review China's ability to comprehensively test large models from the dimensions of R&D and commercialization capabilities, external attitudes and development trends, and strives to evaluate the "smartest" domestic large models. "iFLYTEK Spark", "Baidu Wenxin Yiyan", "SenseTime Discussion" and "Ali Tongyi Qianwen" were selected as representatives of the Chinese model platform to carry out systematic and scientific evaluation.

The test set used in this evaluation contains 600 questions, covering 8 first-level categories, 126 second-level categories, and 290 third-level labels in terms of language specialty, mathematics specialty, science comprehensive, liberal arts comprehensive, logical thinking, programming ability, comprehensive knowledge, and safety, and optimized for the richness and diversity of questions.

In terms of question types, in order to take into account quantitative and qualitative evaluation and testing, four question types are set up: "single choice", "multiple choice", "fill in the blank" and "short answer", with 145, 138, 136 and 181 questions respectively. The large model evaluation system uses blind evaluation to objectively evaluate the intelligence of domestic large models.

As the basic ability of the "smartest" large model, the special evaluation of language includes 61 secondary categories such as dialogue understanding, multilingualism, irony, ancient poetry comprehension, text generation, summary of key points, sentiment analysis, semantic judgment, etc., and the question types are mainly short answers. The results showed that iFLYTEK Xinghuo's score rate of 85.73% ranked first, which was significantly higher than the average.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Score rate of language-specific assessments

Mathematics special evaluation is an indispensable evaluation dimension for the "smartest" large model. This evaluation includes 9 secondary classifications such as algebra, geometry, equation solving, complex mathematics, statistics, etc., to choose the title as the main one.

Among them, iFLYTEK Xinghuo ranked first with a score rate of 77.75%, much higher than the average score rate of 56%, and the scoring rate of other platforms was basically the same. According to the report, in the case of large models generally "bad mathematics", iFLYTEK Xinghuo's achievement is quite rare, and its leading position in mathematics is also reflected in the scoring results of the secondary classification, with the first score rate in 77.8% of the secondary classification, far exceeding other platforms, and it is initially judged that it is good at geometry and scenario applications.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Mathematics special evaluation score rate

As an indispensable "hardcore" part that can reflect the "smartness" of the large model, the comprehensive evaluation of science includes 5 secondary classifications of table question and answer, chemistry, biology, physics, and medicine, and the question types are mainly single choice and short answer.

In the evaluation results, iFLYTEK Xinghuo ranked first with a score rate of 78.50%. In addition, iFLYTEK Xinghuo scored first in 80% of the secondary classification evaluation under the comprehensive category of science, and chemistry and biology were more prominent.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Score rate of science comprehensive assessment

Logical thinking is also an important embodiment of the "smartest" large model, this logical thinking evaluation in logical reasoning, thinking chain and other aspects of the design of more topics, including analogy, common sense reasoning, spatial orientation, deductive reasoning, logical fallacy detection, causal reasoning and other 19 secondary classifications, the question type is relatively average, of which the most fill-in-the-blank questions and the fewest multiple-choice questions.

In the logical thinking question, iFLYTEK Xinghuo's score rate of 81.25% ranked first, which was significantly higher than the average of 72.6%. In addition, iFLYTEK Xinghuo scored first on the secondary classification problem with 63.2% of logical thinking. Logical thinking is quite important for large models to truly understand the physical world.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Logical thinking assessment score rate

Programming ability is a relatively high-level ability of large models, this programming ability evaluation includes ASCII, ASCII code recognition, Python, code, code correction, computer 6 secondary classifications, of which Python mainly evaluates the code generation ability and accuracy rate of large models in the form of short answers, and the others are examined in the form of objective questions.

The results show that the score rate of 80% of iFLYTEK Xinghuo is significantly higher than the average of 71%, and the scoring rate of other platforms is basically the same. It is worth mentioning that on the short-answer question item of generating code that many people care about, iFLYTEK Xinghuo's score rate is as high as 82%, which is much higher than other platforms, and its performance is quite eye-catching.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Programming ability assessment comprehensive score rate

As a more difficult evaluation dimension, comprehensive knowledge also has high requirements for the "smart" degree of large models, and the topics involved are more complex, including 13 secondary classifications such as encyclopedia questions and answers, common sense, scientific knowledge, fact questions and answers, work skills, riddles, etc., and the question types are mainly multiple choices.

In the comprehensive knowledge evaluation, iFLYTEK Xinghuo ranked first in the scoring rate of 80.61% and the first in the secondary classification of 84.6%, which initially showed its "excellence" in encyclopedia questions and answers and history and humanities.

Xinhua News Agency reported: iFLYTEK Xinghuo has been rated as China's "smartest" big model

Figure: Comprehensive knowledge assessment score rate

The report pointed out that in this round of large model evaluation, iFLYTEK Xinghuo won the first place with a score of 81.5 points, becoming the "smartest" domestic large model.

iFLYTEK Xinghuo ranked first in the scoring rate in the 6 first-level categories of programming ability, science comprehensive, logical thinking, mathematics specialty, language specialty, and comprehensive knowledge, and its performance in this evaluation is very comprehensive, especially in code generation, mathematical ability, science and logic, etc., and is the "smartest science student" this time.

It is worth mentioning that from the perspective of question type, iFLYTEK Xinghuo ranked first with a score rate of 83.98% in the subjective short answer questions; In the objective questions, iFLYTEK Xinghuo ranked first with a score rate of 75.7%, and performed well in both subjective and objective question types.

In addition, on August 12, in the "Artificial Intelligence Large Model Experience Report 2.0" released by the China Enterprise Development Research Center of Xinhua News Agency, iFLYTEK Xinghuo V1.5 ranked first in the domestic mainstream large model evaluation list with a total score of 1013 points, and won the first place in the two dimensions of IQ index and tool efficiency improvement index among the four major evaluation dimensions.

On August 15, iFLYTEK Xinghuo Cognitive Big Model V2.0 was released as scheduled, further breaking through code capabilities and multimodal capabilities. While the technology has made major breakthroughs, the applications and products equipped with the core capabilities of iFLYTEK Spark V2.0 are also becoming more and more abundant: iFlyCode 1.0, an intelligent coding assistant that assists programmers to work efficiently, iFLYTEK Smart 2.0 that can perform video creation, an educational digital dock application development assistant that can easily build light applications, and a Spark teacher assistant that helps teachers design teaching activities and generate courseware with one click. For English learners' speaking practice with Spark Language Partner 2.0, iFLYTEK AI Learning Machine has also upgraded AI 1-to-1 intelligent programming assistant and AI 1-to-1 creative painting partner. In addition, iFLYTEK and Huawei also jointly released the Spark all-in-one machine, so that every enterprise has the opportunity to build its own large model.

It is reported that MITTechnology Review is a technology commercialization think tank wholly owned by MIT. MIT Technology Review was launched in China in 2016 and is exclusively operated by DeepTech to conduct media, research, publishing and conference business in China. (Bai Fei)

Source: Xinhua News Agency client

Read on