Tsinghua University Professor Sun Maosong et al. established the machine Chinese language proficiency evaluation benchmark "Wisdom Source Index"

2021-12-31 18:54:23

On December 30, a team led by Sun Maosong, a professor at Tsinghua University, released a photo by CUGE Song Jia, the benchmark for the evaluation of machine Chinese language proficiency, in Beijing

BEIJING, Dec. 31 (Xinhua) -- At the Beijing Zhiyuan Artificial Intelligence Research Institute Natural Language Processing (NLP) Major Research Direction Frontier Technology Open Day, a team led by Professor Sun Maosong of Tsinghua University released the "Wisdom Source Index" (CUGE), a benchmark for the evaluation of machine Chinese language proficiency.

Team representatives told the media on the 31st that in the era of artificial intelligence big models, evaluation benchmarks have become the wind vane for the development of large models. From flat to comprehensive systems, from simplification to multiple dimensions, the Wisdom Source Index aims to try to design a new "examination paper" for large model evaluation to comprehensively assess comprehensive ability.

Specifically, the "Wisdom Source Index" selects and organizes data sets in a hierarchical framework of "language proficiency-task-dataset" based on the human language test syllabus and the current status of NLP research, covering 7 important language abilities, 17 mainstream NLP tasks and 19 representative data sets. In terms of scoring strategies, the "Zhiyuan Index" provides different levels of model performance scores.

In order to promote the co-construction and sharing of the "Wisdom Source Index" and improve its ease of use, the team also released an online evaluation platform and public rankings, and said that it will "regularly absorb the latest excellent data sets" and "establish a feedback and discussion mechanism for users to data sets and evaluation results, and build a Chinese high-quality data set community".

Li Yuming, a professor at Beijing Language and Culture University and former deputy director of the State Language and Writing Work Committee, believes that these measures will promote the Chinese information processing work and promote Chinese to play a greater role in human society.

Dai Qionghai, academician of the Chinese Academy of Engineering and chairman of the Chinese Engineering Intelligence Society, also said that the results jointly established by Professor Sun Maosong and zhiyuan NLP scholars are of great significance to the development of Chinese information processing and even Chinese intelligence.

In addition to the "Wisdom Source Index", the open day also carried out a phased report on the research results such as "Problems and Countermeasures in Natural Language Processing Evaluation", "Towards a Universal Continuous Knowledge Base", and "Text Repetition Generation", covering more than ten key NLP scientific research issues such as pre-training models, knowledge calculation, human-computer dialogue, and text generation.

According to reports, with the support of the Zhiyuan Research Institute, a team of scholars in the major research direction of natural language processing has actively explored a new pattern of natural language processing. In terms of landing application, the "multi-modal Beijing Tourism Knowledge Atlas" built by the team of Professor Li Juanzi of Tsinghua University can provide data support for functions such as path planning and scenic spot information query, and can also plan tourists' travel itineraries.

In terms of pre-training large models, in order to break through the high computational cost, high equipment requirements, and difficult application adaptation of pre-trained language models, Liu Zhiyuan, associate professor of Tsinghua University, proposed a full-process efficient computing framework for PLM, and built a super-large-scale pre-trained language model CPM-2 with Chinese as the core based on this framework.

As a representative and innovative research institute, Zhiyuan Research Institute strives to build a computing and data platform for future research by building a collaborative community. In April 2019, the institute launched the "Zhiyuan Scholars Program", which supports scholars to explore freely in the mathematical basis of artificial intelligence, the cognitive neural basis of artificial intelligence, machine learning, natural language processing and other research directions. This program also promotes young talents to pick up the beams and be the protagonists. (End)

Tsinghua University Professor Sun Maosong et al. established the machine Chinese language proficiency evaluation benchmark "Wisdom Source Index"

Read on