The domestic GPU that has just emerged in Tinder, who can break through the blockade of computing power?

Author | leaf

Source | Parity

Introduction: It is certain that with the advancement of science and technology, China's GPU computing power industry must and will get rid of the problem of being "stuck in the neck", so that Chinese AI companies can use China's GPUs to win this computing power war.

Nvidia (NASDAQ: NVDA) recently hosted GTC, its annual technology summit for software developers, at its headquarters in California. At the meeting, NVIDIA founder Huang Jenxun disclosed the latest artificial intelligence-related software and hardware technology, and repeated the phrase "AI's iPhone moment" three times in his speech.

Up to now, NVIDIA's GPU chips are providing the most basic computing power support for the vast majority of artificial intelligence systems around the world, and OpenAI, the parent company of ChatGPT, with 10,000 NVIDIA GPU chips, successfully trained the GPT-3 language model, shocking the world.

So, why did Nvidia CEO Jenxun Huang, who provides most of the world's high-end artificial intelligence computing power, call the changes brought by ChatGPT "the iPhone moment of artificial intelligence" at this conference? Behind the "iPhone moment of artificial intelligence", what basic hardware facilities and related enterprises support the AI industry to move forward?

This article will take the NVIDIA conference as the starting point to introduce the development status of domestic GPU industry-related enterprises, and then describe the reasons why computing power is lacking, in order to show readers the current situation and future development of the GPU industry.

Chinese manufacturers that are beginning to emerge

As a well-deserved leading company in the world's computing hardware, NVIDIA's annual GTC conference attracts the attention of many cutting-edge technology workers, especially in 2023, the first year of ChatGPT, as a major provider of artificial intelligence hardware, NVIDIA's GTC exposure has been significantly increased.

In this conference, NVIDIA demonstrated its layout of ChatGPT fields for training, inference, cloud services and other dimensions.

In the field of AI training, NVIDIA helps to continuously improve computing power and empower big model breakthroughs.

Based on the characteristics of GPU parallel computing and NVIDIA's forward-looking layout in the AI field, NVIDIA has absolute advantages in the field of AI training, and attaches great importance to the artificial intelligence track and continuously improves the computing power of its artificial intelligence hardware.

Moreover, with the increase in the demand for computing power in GPT large models, global technology giants have started or will soon be equipped with NVIDIA's H100 products: Meta has internally deployed the H100-based Grand Teton AI supercomputer for the team; OpenAI will use the H100 on its Azure supercomputer to power its ongoing AI research.

Source: NVIDIA official website

At GTC 2023, based on the Hopper architecture and its built-in Transformer Engine, NVIDIA H100 is optimized for the development, training and deployment of generative AI, large language models (LLM) and recommendation systems, using FP8 precision to provide faster training and inference speed on LLM than the previous generation A100, helping to simplify AI development.

Source: NVIDIA official website, CICC Research Department

In the field of AI inference, AI video, image generation, large-scale language model deployment, and recommendation systems are also accelerating.

In this year's GTC 2023, NVIDIA launched a new GPU inference platform: based on accelerated AI video, image generation, large-scale language model deployment and recommendation system, a product system of 4 configurations, 1 architecture and 1 software stack has been formed.

Source: NVIDIA official website, CICC Research Department

Among them, the H100 NVL GPU has received widespread attention from the market: NVIDIA publicly stated in GTC 2023 that the product will be equipped with dual-GPU NVLink, or will achieve 10 times faster speed than the current A100, can handle GPT-3 large models with 175 billion parameters, and supports commercial PCIe server expansion, suitable for training large language models.

Huang said that compared to the HGX A100, the only one that can process ChatGPT in real time, a standard server equipped with four pairs of H100 and dual NVLinks can increase the speed by 10 times, and can also reduce the processing cost of large language models by an order of magnitude. Huang also called the NVIDIA DGX H100 a blueprint for global customers to build AI infrastructure.

Huang also said that ChatGPT is only the first application of artificial intelligence out of the circle, and it is only a starting point. When the wave of artificial intelligence comes, a number of companies will enter the hardware of artificial intelligence large models around the world. However, NVIDIA has absolute technical advantages in the short term, and will continue to make efforts in the future.

There is no doubt that the latest hardware released by NVIDIA is the biggest boon for artificial intelligence companies to solve the problem of computing power, but for Chinese companies, this is not good news.

In August 2022, US regulators imposed a ban on NVIDIA A100 and H100 GPUs from being sold to Chinese companies on national security grounds, aiming to reduce the transmission speed of domestic AI models and delay the development of Chinese intelligence through the "stuck neck" method.

In the long run, the development and deployment of large models in the future is an inevitable trend, and behind each large model training and deployment, there are tens of thousands of GPU chips in support. Therefore, with the development and popularization of applications in this area in the future, the market demand for general-purpose GPUs will usher in explosive growth.

Jingjiawei's technical core team comes from the National University of Defense Technology, and the company's business is also relying on the graphics display and control module chip of the military business, and continues to invest in the research and development of the traditional business of graphics display and control of GPU chips independently developed by itself.

In the subsequent development, with the support of the national special fund and the promotion of enterprise chip research and development, the company's GPU chip business has gradually "branched out", penetrated into the civilian market, and carried out rapid development in the "8+N" industry.

Up to now, the company is the first domestic enterprise to successfully develop domestic GPU chips and realize large-scale engineering applications, and it is also the only listed company in China with completely independent research and development of GPU capabilities and industrialization, with 267 patents, in the forefront of the industry in the field of graphics display and control.

The company's products are also moving from "usable" to "easy to use" stage.

According to Jingjiawei's 2021 announcement, the JM9 series graphics processing chip developed by the company will support OpenGL 4.0, HDMI 2.0 and other interfaces, as well as H.265/4K 60-fps video decoding.

It has a core frequency of at least 1.5GHz, 8GB of video memory, and a floating-point performance of about 1.5 TFlops, which is similar to the NVIDIA GeForce GTX 1050.

Source: Zhongguancun Online, Core Parameters, Nvidia official website, company announcement, Pacific Securities Research Institute

In the communication with the company, relevant personnel said that the previous 7 series was divided into multiple versions, and it was shipped according to the needs of customers, prices and price affordability, so it achieved great success. The 9 series is still in the process of negotiation, and it is believed that the 9 series will promote the decline in the price of global graphics cards.

And the other party also said that the company is targeting the products of overseas competitors a few years ago, and when the profit is lower than a certain level, overseas companies will voluntarily abandon the market. The company will also start from a relatively low end, and gradually catch up with the pace of Intel and AMD with the advancement of technology.

Although there is a huge gap between Jingjiawei's products and international cutting-edge GPUs, as a GPU completely independently developed by Chinese enterprises, adopting positive design and independent intellectual property rights, it has taken a big step on the road of domestic independence and has become the "flame of hope" of China's computing power.

Next, Haiguang Information (SH:688041), an enterprise that relies on CPU and DCU two-wheel drive and deeply benefits from localized replacement.

Founded in 2014, Haiguang Information is mainly engaged in R&D, design and sales of high-end processors used in computing and storage devices such as servers and workstations, and currently has two product lines of Haiguang General Purpose Processor (CPU) and Haiguang Coprocessor (DCU).

Among them, DCU, as a product focusing on general computing and simply providing artificial intelligence computing power, has become a new performance growth pole for enterprises.

Haiguang Information entered the field of DCU in 2018, adhered to independent research and development, and has successfully mastered core technologies such as high-end coprocessor microstructure design, and launched DCU products with excellent performance on this basis, with powerful computing power and high-speed parallel data processing capabilities, and the performance can basically compete with international mainstream products of the same type.

Comparing the company's Shensuan No. 1 product and the high-end GPU products (A100) and AMD high-end GPU products (MI100) of the leading GPU manufacturer NVIDIA, in typical application scenarios, the indicators of Haiguang Information Shensuan No. 1 single chip basically reach the level of the same type of high-end products in the world.

Compared with the NVIDIA A100 products currently used by international mainstream artificial intelligence companies, Haiguang DCU single-chip products can basically reach 70% of its performance level, and at the same time, the interconnection performance of the company's DCU products still has a lot of room for improvement.

Source: Company Prospectus, Ping An Securities Research Institute

In addition to hardware, Haiguang Information has also specially made a software configuration to break the CUDA ecosystem: Haiguang DCU coprocessor is fully compatible with the ROCm GPU computing ecosystem, and due to the high similarity between ROCm and CUDA, CUDA users can quickly migrate to the ROCm platform at a lower cost.

Therefore, Haiguang DCU coprocessor can better adapt to and adapt to international mainstream commercial computing software and artificial intelligence software, and has a rich software and hardware ecosystem.

In addition, Haiguang has also actively participated in open source software projects, accelerated the promotion of DCU products, and successfully achieved compatibility with mainstream GPGPU development platforms.

In recent years, with the efforts of many domestic startups, breakthroughs in GPU hardware have been frequent, but at present, mainland CPU manufacturers are still far from international leading manufacturers such as NVIDIA.

Therefore, for Chinese GPU companies, doing a good job of domestic Plan B and then seeking development may be the right direction.

But what is certain is that with the advancement of science and technology, China's GPU computing power industry must and will get rid of the problem of being "stuck in the neck", so that China's AI companies can use China's GPUs to win this computing power war.

Computing power, why is it so scarce?

The above mentioned NVIDIA's computing power "muscle" display and domestic CPU catch-up, then, what is the current computing power demand of artificial intelligence enterprises? Why can the first "AI chip" stock of Nvidia soar 83% in less than four months?

From the perspective of computing power demand, the number of parameters of artificial intelligence models shows an exponential growth trend with the replacement of generations.

Taking GPT-3.5 as an example, as a large-scale language model, it has a large number of parameters. Even though OpenAI has not yet published data on GPT-3.5 used by ChatGPT, the demand for parameters has doubled with the introduction of new models.

Source: OpenAI official website, Essence Securities Research Center

In addition to the increase in the number of parameters, ChatGPT's next-generation GPT-4 can also expand application scenarios through multi-modality.

As a multimodal large model (accepting image and text input, generating text), GPT-4 can solve difficult problems more accurately than GPT-3.5, has a wider range of common sense and problem-solving ability, and the text processing ability has reached 8 times the upper limit of ChatGPT.

Source: OpenAI "GPT-4 Technical Report", GF Securities Development Research Center

However, with the maturity of artificial intelligence, it is naturally the computing power demand behind it: OpenAI expects that in order to make a breakthrough in artificial intelligence scientific research, the computing resources required to double every 3~4 months, so there is an explosive increase in computing power demand.

In the case of a significant increase in the demand for artificial intelligence companies, the supply of computing power has gradually slowed down.

In the semiconductor industry, there has always been a saying: "When the price does not change, the number of components that can be accommodated on the integrated circuit will double every 18-24 months, and the performance will also double." In other words, every dollar of computer performance will more than double every 18-24 months. This law reveals the speed at which information technology is advancing. ”

This is what we know as Moore's Law, in fact, the most intuitive feeling around you is that every two years or so, your computer or mobile phone will face the situation of elimination, especially today's smart phones, basic mainstream configuration of mobile phones, 2 years to prepare for replacement.

However, with the continuous evolution of semiconductor processes, problems such as leakage and heat generation caused by the short channel effect and quantum tunneling effect have become more and more serious, and Moore's Law in pursuit of economic efficiency has slowed down day by day, and is even close to failure.

That is to say, even if the demand remains unchanged, the computing power infrastructure should already be on the track of increasing the number, not to mention that the demand for computing power is growing at an exponential explosion rate.

Therefore, the growth of computing power demand for AI model training has a great mismatch with Moore's Law, which is bound to promote the rapid growth of demand for computing power infrastructure, and this is also the fundamental reason why many GPU hardware companies such as NVIDIA are sought after by funds - they hold the golden key to open the AI era.

brief summary

Perhaps, as Huang Jenxun said, the "iPhone moment of artificial intelligence" has arrived, and the road to the next era has long been in front of the world.

However, many of the most cutting-edge artificial intelligence companies are still worrying about the "admission ticket" in the AI era, and the high price of computing power and the high-end GPU that is permanently out of stock have become the biggest weakness of the enterprise.

It can be seen that for the future digital economy, the stage of computing power and other infrastructure will become the first competition field between AI companies and even countries. As a16z, a world-renowned investment institution, said in its evaluation of ChatGPT, "Infrastructure service providers may be the biggest winners and gain the most wealth." ”

After breakthroughs in new technologies, including AI, in order to enter the "homes of ordinary people" and achieve large-scale deployment and application, the security, high speed, high reliability, and high performance capabilities of computing power are indispensable. It can even be said that the increase in computing power has truly driven the growth of the digital economy.

For Chinese enterprises, the short-term blockade may be a dilemma, but from another point of view, it may not be an opportunity, and China's high-end enterprises represented by Jingjiawei and Haiguang Information will also be killed in layers of blockade, with excellent products, to promote China's rolling digital tide!

Resources:

1. "The Wave of Global AI Big Models is Surging, Computing Power Chips Are Expected to Usher in Explosive Demand", Orient Wealth Securities;

2. "Into the "Core" Era Series Depth No. 60: "AI Computing Power GPU" - AI Industrialization Accelerates, and the Intelligent Era Has Begin", Huajin Securities;

3. "ChatGPT Demand Measurement and Related Analysis of GPU Computing Power", CITIC Securities;

4. "AI Computing Power Industry Chain Sorting - Technology Iteration Promotes Bottleneck Breakthrough, AIGC Scenarios Drive Computing Power Demand Increase", Essence Securities

The domestic GPU that has just emerged in Tinder, who can break through the blockade of computing power?

Read on