Huawei began to "sell shovel people" in the era of AI gold panning, genius teenager revealed "secret weapon"

On July 14, Huawei released two new products for the era of AI big models, providing storage solutions for basic model training, industry model training, and subdivision scenario model training and inference, so as to better release new AI momentum.

Zhou Yuefeng, President of Huawei's Data Storage Product Line, released a new AI storage product

Just seven days ago, at the Huawei Developer Conference 2023 (Cloud), Huawei released Pangu Model 3.0, a series of industry-oriented large models, which can provide a series of basic large models with 10 billion parameters, 38 billion parameters, 710 parameters, and 100 billion parameters, to meet the diversified needs of different scenarios, different delays, and different response speeds, including knowledge Q&A, copywriting generation, and code generation of NLP large models, as well as image generation and image understanding capabilities of multimodal large models.

Since the advent of ChatGPT in November 2022, emerging technologies represented by pre-trained large models have accelerated the development of a new generation of artificial intelligence, setting off a global arms race for AI large models. In this competition, in addition to launching a series of large models for industry applications, Huawei also became a "shovel seller" in the era of AI gold panning.

In the era of AI gold rush, Huawei became a "shovel seller"

The three elements of the AI big model era: computing power, algorithms and data. According to Zhou Yuefeng, President of Huawei's Data Storage Product Line, data and the quality of data determine the height of AI intelligence. To develop the artificial intelligence industry, we must attach importance to the digital recording of data and information.

According to Zhou Yuefeng's observation, the training efficiency of foreign ChatGPT large models is higher and easier, and the core reason is that in the digitization stage, more English data has been recorded, far more than Chinese data. The mainland has developed a large number of data centers, relatively more computing power, and relatively little storage power, a lot of high-value information has not been recorded, which will restrict the high-quality development of the mainland's artificial intelligence industry in the long run.

For enterprises, in the process of developing and implementing large model applications, they also face four major challenges due to data storage problems. Zhou Yuefeng pointed out that first of all, the data preparation time is long, the data sources are scattered, the aggregation is slow, and it takes about 10 days to preprocess 100TB of data. Secondly, the multimodal large model uses massive text and images as the training set, and the loading speed of the current massive small files is less than 100MB/s, and the loading efficiency of the training set is low. Third, the parameters of large models are frequently tuned, the training platform is unstable, and the training interruption occurs on average about 2 days, which requires the Checkpoint mechanism to resume training, and the failure recovery takes more than one day. Finally, large models have high implementation thresholds, complicated system construction, and difficult resource scheduling, and GPU resource utilization is usually less than 40%.

At the press conference on July 14, Huawei responded to the development trend of AI in the big model era and launched the OceanStor A310 deep learning data lake storage and FusionCube A3000 training/push hyperconverged appliance for large-model applications in different industries and scenarios.

Zhou Yuefeng introduced that OceanStor A310 is born for intelligent data, which can realize the entire storage support work from data aggregation, preprocessing to model training and inference. The OceanStor A310 supports 96 flash drives with a bandwidth of 400G/s. In other words, more than 200 high-definition movies can be transmitted every second. IOPS reaches 12 million and supports horizontal scaling of up to 4,096 nodes. Near-data preprocessing is realized through near-memory computing, reducing data migration and improving preprocessing efficiency by 30%.

The FusionCube A3000 training/push hyperconverged appliance integrates OceanStor A300 high-performance storage nodes, training/push nodes, switching devices, AI platform software, and management and maintenance software for large-scale model training and inference scenarios in the industry, providing a one-stop deployment and delivery experience for large-model enterprises.

Zhou Yuefeng introduced that the training/push nodes and storage nodes of the appliance can be independently and horizontally expanded to match the model requirements of different scales. At the same time, FusionCube A3000 implements multiple model training and inference tasks and shares GPUs through high-performance containers, increasing resource utilization from 40% to more than 70%.

Zhou Yuefeng, President of Huawei's Data Storage Product Line, said: "In the era of big models, data determines the height of AI intelligence. As a carrier of data, data storage has become the key infrastructure of AI large models. Huawei will continue to innovate in the future, provide diversified solutions and products for the era of AI big models, and work with partners to promote AI empowerment in various industries. ”

Huawei's genius teenager reveals "secret weapon"

The training and application of large models involves massive amounts of data, and large model manufacturers are particularly concerned about how to ensure the safe flow of data. At the press conference on the same day, Zhang Ji, a talented teenager from Huawei and chief scientist of data storage at Huawei Zurich Research Institute, gave an in-depth interpretation of this. He said that compared with the very popular deep learning in previous years, the biggest feature of AI large models is that the amount of data has become larger. To a large extent, high-quality data determines the upper limit of AI large models, and algorithms and computing power only approach this upper limit infinitely.

Zhang Ji introduced that data storage is the first line of defense for data security, and how enterprises can securely aggregate data from different locations and different nodes into one place faces great challenges. Therefore, Huawei is researching a "data cabin" technology that transfers data, its related credentials, privacy, permissions, and other information with the data during the process of data transfer. When this data arrives at the data aggregation place, the data can be safely executed and protected in the cabin, so as to achieve the ultimate security of the data.

At present, Huawei's "data cabin" is cooperating with some customers such as China CITIC Bank and Cloud Guizhou to innovate and practice, and Huawei hopes that the "data cabin" can enable the secure flow of high-value data from different industries. Zhou Yuefeng believes: "Only when data can flow safely, including AI large models, can we achieve long-term sustainable development in the future." ”

In addition to the secure flow of data, large model manufacturers are also generally concerned about the cost of AI large models in the implementation process, which is related to how data can be quickly connected to AI large models and efficiently complete a series of actions such as storage, training, and reasoning.

Zhang Ji said that if enterprises need to quickly access AI large models, one is to do secondary training on the basic large model, which consumes very high GPUs and will lead to very high costs. The most important thing is that it requires special personnel to maintain the domain knowledge of the vertical specialty, which is very time-consuming and labor-intensive; Second, thanks to the concept that everything is vector, Huawei researches vector storage technology for AI large models. Vector storage technology is similar to AI plug-in storage, which can store and retrieve the latest vertical data vectorization of enterprises, thereby greatly reducing the difficulty of enterprises accessing and using AI large models.

For example, in order to solve the problem of how to quickly fuzzy search and cluster billions of vectors in vector storage, Huawei Zurich Research Institute and Huawei HiSilicon hardware team have jointly innovated to accelerate vector retrieval through near-memory computing and software and hardware collaboration.

Zhang Ji said that Huawei also uses local resources to carry out industry-university-research cooperation with some top universities in Europe, hoping that in the future, under the background of data-centric architecture reform, it will use algorithms and architecture collaboration to truly release the value of data through in-memory computing and new data storage formats, offload the computing power of some GPU and CPU servers, and save energy consumption caused by invalid data migration, thereby ultimately promoting the rapid development of a new paradigm of data.

Zhou Yuefeng said that the "data cube" and vector storage technology are the "secret weapons" that Huawei is developing for the era of AI large models.

Written by: Cheng Yang, reporter of Nanduwan Finance Agency