laitimes

The Zhiyuan Index CUCUMBER was released, and the AI big model has a new benchmark for evaluation

The artificial intelligence big model is in the ascendant, and the evaluation benchmark has become the wind vane for the development of the big model. At the recent open day of frontier technologies for major research directions of natural language processing (hereinafter referred to as "NLP") held by Beijing Zhiyuan Artificial Intelligence Research Institute (hereinafter referred to as "Zhiyuan Research Institute"), the Zhiyuan Index, a new benchmark for Chinese language understanding and generation, was released.

The Zhiyuan Index CUCUMBER was released, and the AI big model has a new benchmark for evaluation

In recent years, evaluation benchmarks such as ENGLISH evaluation benchmark GLUE have become an important criterion for measuring the progress of big model language intelligence, which has received wide attention from the academic community and the industry. However, GLUE only measures language comprehension and ignores important language skills such as language generation, multilingualism, and mathematical reasoning; only dataset scores and overall scores are provided, and overall scores are susceptible to being dominated by a small number of datasets.

From flat to comprehensive systems, from simplification to multi-dimensionality, THE WISE aims to try to design a new exam paper for the large model evaluation that comprehensively evaluates the comprehensive ability.

In terms of benchmark framework, the Wisdom Source Index is different from the traditional way of flat organization of commonly used data sets, according to the human language examination syllabus and the current status of NLP research, according to the language ability - task - data set hierarchical framework to select and organize data sets, covering 7 important language skills, 17 mainstream NLP tasks and 19 representative data sets, comprehensive balance, to avoid "partial selection".

In terms of scoring strategy, the Zhiyuan Index can better show the differences in model language intelligence in different dimensions of the model, and provide different levels of model performance scores based on the hierarchical benchmark framework, including data sets, tasks and language capabilities, which is greatly strengthened.

In order to promote the co-construction and sharing of the Zhiyuan Index and improve the ease of use of the Zhiyuan Index, the event also released an online evaluation platform and an open ranking list, which supports a variety of display modes, including a comprehensive list, a simplified list and a single data set list, which is convenient for users to quickly understand the characteristics and latest developments of the model and data set from multiple angles.

The release is only the starting point, and the development also needs ecological co-construction - Liu Zhiyuan, associate professor of Tsinghua University, young scientist of Zhiyuan and backbone member of the construction of Zhiyuan Index, said: "Based on the list ability of single data sets, the future Zhiyuan Index will regularly absorb the latest excellent data sets. At the same time, we will also rely on the strength of Zhiyuan Research Institute and Zhiyuan community to establish a feedback and discussion mechanism for user-oriented datasets and evaluation results, build a Chinese high-quality dataset community, and promote the development of natural language processing Chinese. ”

With the support of Zhiyuan Research Institute, a team of scholars in major research directions of natural language processing actively explores a new pattern of natural language processing, and significantly improves the semantic understanding and generation ability of Chinese natural language with natural language as the core through the two-wheel drive of big data and knowledge-rich knowledge, and through interaction with cross-modal information.

In terms of landing applications, the "multimodal Beijing Tourism Knowledge Atlas" built by the team of Professor Li Juanzi of Tsinghua University can provide data support for functions such as path planning and scenic spot information query, and plan tourism itineraries for tourists.

Sun Maosong, professor at Tsinghua University and chief scientist of natural language processing (NLP) of Zhiyuan Research Institute, believes that at present, NLP-related technologies have been applied in speech recognition, machine translation, simultaneous interpretation and other aspects, and the next step will be to more in-depth applications.

It is reported that the Zhiyuan Index is supported by the Beijing Zhiyuan Artificial Intelligence Research Institute, and the working units are composed of Tsinghua University, Peking University, Renmin University, Chinese Academy of Sciences, Beijing Language and Culture University, Fudan University, Harbin Institute of Technology, Shanghai Jiao Tong University, Soochow University, Dalian University of Technology, Shanxi University, and Jingdong Research Institute.

Image source: Zhiyuan Research Institute

Read on