Large model training drives computing power demand The development of domestic GPUs accelerates

Our reporter Tan Lun reported from Beijing

More than four months after the advent of ChatGPT, GPUs, which are regarded as the computing power infrastructure for training AI large models, have also ushered in a surge in demand in China.

According to public reports, Chinese Internet giants have begun to purchase a large number of GPUs from the global GPU leader NVIDIA since the beginning of this year, of which the purchase amount of bytes alone exceeds 1 billion US dollars, which is close to the total revenue of commercial GPUs sold by NVIDIA in China in 2022.

Behind the surge in orders, the demand for large model training is considered to be the main driver. According to incomplete statistics from Minsheng Securities, as of the end of April this year, at least 30 large-model products have been released in China, involving various technology companies, including traditional Internet giants such as BAT, domestic AI unicorn companies such as SenseTime and iFLYTEK, and "entrepreneurial schools" represented by corporate executives such as Sogou founder Wang Xiaochuan.

A domestic AI company told the "China Business News" reporter that taking the current mainstream ChatGPT-3.5 model as an example, OpenAI has deployed nearly 30,000 GPU chips as a computing power cluster, and if you want to iterate the version, the number of GPUs may have to grow exponentially. "The domestic basic large model computing power requirements will be lower, but due to the large number of competing enterprises, the total demand for GPUs will be very large." The person said.

In this context, GPU is rapidly emerging as a popular industry. IC Insights data shows that the global GPU chip market grew by more than 20% from 2015 to 2021, and the global GPU chip market exceeded $22 billion in 2021. Industry think tank Jiazi Lightyear predicts that the size of China's AI chip market will reach 55.7 billion yuan in 2023, of which GPUs account for about 90% of the AI chip market.

NVIDIA is no match for the time being

GPU full name Graphic Processing Unit, translated as graphics processing unit, first appeared as the core chip of computer graphics card, exclusively for complex image data calculation. As the demand for this functionality evolves, GPUs have gradually become an indispensable role in the fields of computer graphics rendering and game graphics processing.

"Compared to CPUs, GPUs have more independent computing units, that is, 'core count'." Luo Guozhao, director of the China laboratory of CHIP Global Test Center, told reporters that therefore, compared with the CPU that undertakes a large number of system management, control and program execution functions, the role of the GPU is more simple, that is, computing, so it is particularly suitable as a basic core device to provide AI computing power.

Public information shows that like the semiconductor industry chain, GPUs also include complete upstream and downstream links from IP, design, software ecology to applications. Among them, the design link is considered to be the field of dominant GPU capabilities, and the current global GPU design companies mainly include NVIDIA, AMD two giants, especially NVIDIA, in the field of AI training chips, its market share exceeds 90%.

According to the reporter, AI training GPU is a specific GPU designed for artificial intelligence large model training tasks. AI model training typically involves large-scale matrix computation and training of deep neural networks, and these compute-intensive tasks can be accelerated by the parallel computing power of GPUs. AI training GPUs typically have higher computing power and larger memory capacity to support the training of large-scale models and datasets.

"Several of NVIDIA's dedicated chips are currently the best performance computing power platforms for large model training, and there are no competitors." Semiconductor industry analyst Ji Wei told reporters that this has also caused the world's large model manufacturers to buy NVIDIA special GPUs for data training, not only Chinese manufacturers.

Ji Wei told reporters that in terms of industrial health, a single big company is not a good industrial ecology, but it is a special case in the field of large model training, because there are no strong competitors yet. The reporter noted that the supply of NVIDIA's dedicated GPUs also helped its market value to exceed $1 trillion a few days ago, becoming the first chip company to achieve this achievement.

China's industrial chain catches up

Although Chinese manufacturers have not yet had enough ability to enter the dedicated GPU market for AI large model training, the market momentum brought by the huge demand for computing power has put China's semiconductor industry on the road to catching up.

General-purpose GPUs are the earliest areas of layout. Driven by the cloud computing business, the four major domestic Internet giants have successively launched general-purpose GPU chips at the rate of every other year. Among them, in 2019, Alibaba took the lead in releasing the AI chip Hanguang 800; In 2020, Baidu produced the first generation of AI chip Kunlun; In 2021, Tencent released the video processing chip "Canghai" and the AI chip "Zixiao". In July 2022, Byte also officially confirmed its layout in independent core manufacturing, covering video platforms, information and entertainment applications and other fields.

Ji Wei told reporters that different from the high-performance GPU used for AI large model training, the general-purpose GPU is more used for various general-purpose tasks, including a high degree of parallel computing capabilities and large-scale computing cores, and can also process a large amount of data at the same time, providing high computing performance and throughput. At present, the general-purpose GPUs launched by domestic manufacturers are more used in the inference process of large models.

"Although the current general-purpose GPU cannot replace the AI training GPU in terms of performance, the gap between the two is also narrowing, and the general-purpose GPU can also be used for AI training tasks and achieve relatively high performance through appropriate software and optimization." Ji Wei said that Baidu CEO Robin Li revealed at the Yabuli China Entrepreneurs Forum in March this year that Kunlun chips are now very suitable for reasoning of large models and will be suitable for training in the future.

At the same time, the domestic GPU professional track has also begun to emerge a number of new star companies such as Fengyuan Technology, Days Wisdom Core, and Moore Thread. Taking the Moore thread as an example, its mass-produced "Sudi" and "Chunxiao" two full-featured GPU chips can simultaneously complete the four functions of graphics rendering, video encoding and decoding, AI computing acceleration, physical simulation, and scientific computing.

"In the past two years, the overall GPU industry in mainland China has made great progress, and there is still a gap in horizontal comparison, but compared with ourselves, there have been many innovative emerging manufacturers, which we should see." Luo Guozhao said.

The challenges of localization need to be solved

With the entry of manufacturers, China's GPU industry has begun a long journey, and the challenges that have emerged have also become a problem that the development of the entire industry must face.

In the industry's view, the localization rate is still the primary problem that the mainland GPU industry needs to solve. According to relevant official data, in 2019, the self-sufficiency rate of Chinese chips was only about 30%, of which the localization rate of editable logic devices including GPUs was only 1%.

"The promotion of grand strategies such as East Data and West Computing means that domestic cloud servers will become digital infrastructure like networks in the future." As the core device of current cloud servers, the importance of GPUs is self-evident. Ji Wei said that in 2020, the mainland issued Several Policies to Promote the High-quality Development of the Integrated Circuit Industry and the Software Industry in the New Era, which once again emphasized the importance of providing China's chip self-sufficiency rate.

In this regard, starting from the software architecture is also considered to be a key part of the rise of domestic GPUs. CITIC Securities Research Report pointed out that taking NVIDIA's proprietary software stack CUDA as an example, due to its closed-source characteristics and rapid updates, it is difficult for latecomers to be perfectly compatible through instruction translation and other ways, and even if they are partially compatible, there will be a large performance loss, resulting in continuous lagging behind NVIDIA in terms of cost performance.

At the industry conference held on the date, Zou Yi, vice president of Tianzhixin, also made it clear that local GPU companies need to locate the computing power requirements of large models from the bottom to seize the industrial opportunities of the rise of AI large models, and architecture is one of them.

In this regard, the reporter noted that a few days ago, the domestic enterprise Zhongtian Star Country launched the domestic GPU architecture "Sirius". According to Deng Yangdong, co-founder and chief architect of Zhongtianxing, the design of the "Sirius" architecture is complete independent research and development, starting from the derivation of mathematical formulas, architecture design, algorithm model, principle verification, hardware implementation and driver development and other links are all forward design, the core IP is completely independent and controllable, has complete intellectual property rights of graphics GPU, and has applied for hundreds of patents and copyrights. It is reported that the "Sirius" architecture chip design verification was completed in 2019 and officially born in 2021, and the architecture will officially achieve mass production this year.

CITIC Securities Research Report pointed out that under the background of external uncertainties and internal acceleration of independent innovation, domestic GPU manufacturers are expected to accelerate their rise. With the strong support of policies, the impact of international science and technology trade policies, the improvement of product performance of domestic manufacturers and the gradual improvement of the ecology, domestic GPU leaders are ushering in key development opportunities.

(Editor: Zhang Jingchao, Proofreader: Yan Jingning)