laitimes

There is a data roadmap for large model training! The first batch of industry multimodal computing data sets of Shenzhen Institute of Digital Technology was released

author:Southern Metropolis Daily

With the evolution of AI models from unimodal to multimodal, a new round of competition in artificial intelligence has been triggered. On April 9, at the first exchange seminar of the integrated development of "data, computing and network" and the Guangdong-Hong Kong-Macao Greater Bay Area (Nanshan and Qianhai) computing service alliance, the Shenzhen Data Exchange released a roadmap for large model training data, and jointly released 500 multimodal computing data sets for vertical industries with the Open Computing Alliance.

There is a data roadmap for large model training! The first batch of industry multimodal computing data sets of Shenzhen Institute of Digital Technology was released

Wang Wuyue, head of the artificial intelligence industry of Shenzhen Data Exchange, delivered a keynote speech.

The roadmap of large model training data released by Shenzhen Institute of Mathematics

The seminar brought together government leaders, business leaders and industry experts to conduct heated discussions on the in-depth cooperation and innovation of data, computing power and network industry chains in the Guangdong-Hong Kong-Macao Greater Bay Area, aiming to build a high-end dialogue platform for resource sharing, complementary advantages, technology transformation and application innovation, promote the digital economy of the Bay Area to a new height, and help the construction of artificial intelligence training ground in the Guangdong-Hong Kong-Macao Greater Bay Area.

At the meeting, Wang Wuyue, head of the artificial intelligence industry of Shenzhen Data Exchange (hereinafter referred to as Shenzhen Data Exchange), delivered a keynote speech and demonstrated the industry practice of Shenzhen Data Exchange to help the construction of artificial intelligence training grounds in the Guangdong-Hong Kong-Macao Greater Bay Area. Wang Wuyue said that in the era of digital economy, "data element ×" and "artificial intelligence +" have become the two-wheel drive to promote new quality productivity, jointly leading the progress and development of the economy and society. As an efficient "multiplication" factor, data can be combined with different industries to significantly improve the production efficiency or innovation ability of the industry, while artificial intelligence technology brings incremental improvement and optimization to traditional industries through "addition", and the two are intertwined and become the two wings of the development of the digital economy.

Wuyue Wang introduced the development concept of data-centric AI and released a roadmap for large model training data. According to the different stages of large model application (training, inference, and tuning), the Shenzhen Institute of Data will provide data sources in a targeted manner, so that domestic large model manufacturers can "find the data". At present, the Open Computing Alliance & Open Islands Large Model SIG has achieved results in the two-wheel drive development of "data element ×" and "artificial intelligence +": the release of the large model training data map of the Guangdong-Hong Kong-Macao Greater Bay Area, the provision of end-to-end cross-modal data mining open source tools, and the construction of data resource value discovery agents.

There is a data roadmap for large model training! The first batch of industry multimodal computing data sets of Shenzhen Institute of Digital Technology was released

Roadmap for large model training data.

The first batch of high-quality training datasets for large AI models was announced

At the same time, the Shenzhen Data Exchange also issued the first "Data Application Scenario and Potential Value Analysis Report" written by Dataexchange Data Brokerage (Shenzhen) Co., Ltd. based on the above-mentioned agents through human-machine collaboration, the company is the first enterprise in the country with data brokerage as its main business and registered name, and has obtained the first data broker registration certificate issued by the data exchange in the country, and is the first batch of pilot data broker companies in Qianhai, Shenzhen.

Using the emergence ability of large language models, with the help of CoT, RAG, Few Shot and other skills, the data gold miner agent quickly and deeply excavated more than 4,000 ungoverned and complex fields within the data vendor, identified and listed 32 data application scenarios and value realization channels, and improved efficiency by 90%. At the meeting, the Shenzhen Institute of Data and the Open Computing Alliance jointly released the first batch of 500 high-quality training datasets for artificial intelligence large models provided by 37 different data vendors, covering 12 fields of "data element ×", 3 overseas data vendors, and 7 types of data modalities (text, image, audio, video, multimodality, 3D, GIS, etc.), bringing together for the first time from China Meteorological Administration, China National Knowledge Network, GTCOM, Wanbang Tonghe, Weimeng Data (Sina Weibo), Qianhai Data, Haitian AAC, Tors, Datatang, The datasets of data vendors such as Wisdom Buds, NetZhi Tianyuan, Baichuan Data, and Shenxin Technology are used as large model calculations. Most of them are expected to be national debuts.

There is a data roadmap for large model training! The first batch of industry multimodal computing data sets of Shenzhen Institute of Digital Technology was released

Map of large model training data in the Guangdong-Hong Kong-Macao Greater Bay Area.

Disclose trusted circulation channels for high-quality datasets

In addition, the Shenzhen Data Exchange has disclosed the trusted circulation channels of high-quality data sets, including: drawing a map of large model training data resources to provide clear navigation for data transactions, data vendors officially settled in Shenzhen Data Exchange with solid platform support, data vendors conducting credible quality evaluation to ensure the accuracy and reliability of data, compliance review and product listing to provide guarantee for the safe circulation of data, and data element circulation and trading to realize the value transformation of data.

According to reports, the above series of processes constitute a complete and credible path from aggregation to transaction of high-quality datasets. In the future, Shenzhen Digital Institute will give full play to the demonstration effect of "dual zone" drive, "dual zone" superposition and "double reform", adhere to innovation and leadership, provide fuel for domestic large model manufacturers, and work with large model manufacturers to jointly build vertical industry large model data applications in key action areas × data elements.

Producer: Nandu Big Data Research Institute

Written by: Nandu reporter Yuan Jiongxian (photo provided by Shenshu)

Read on