laitimes

GPU industry analysis

author:Explore the sea of stars with hard technology

The full name of GPU in English is Graphics Processing Unit, that is, graphics processing unit, is a chip on the graphics card, this concept was proposed by NVIDIA in 1999, the main role is to assist the graphics card to do graphics processing and parallel computing, equivalent to the "brain" of the graphics card, but also one of the core products in the chip field. With the advent of ChatGPT, more and more people are aware of the important role played by GPUs in many technological fields such as business computing and artificial intelligence.

The following content will analyze the GPU industry, starting from the concept, the advantages, core functions, industry development history, market situation of GPUs and other aspects are discussed, and at the same time, the GPU industry chain, key layout enterprises and competitive landscape analysis, I hope you will be inspired to understand the GPU industry.

1. Industry overview.

(1) Definition of GPU.

1. GPU generally refers to graphics processor, also known as display core, vision processor, display chip, is a kind of microprocessor specializing in personal computers, workstations, game consoles and some mobile devices (such as tablets, smart phones, etc.) to do image and graphics-related computing work.

2. GPU is the processor of the graphics card. The full name of the graphics card is called the display adapter, the main function is only to assist the CPU in image processing, the working principle is to process the image signal issued by the CPU and then transmit it to the display. The graphics card is mainly composed of motherboard connected devices, monitor connected devices, processors and memory, and the GPU is the processor of the graphics card.

(2) The main difference between GPU and CPU.

1. The original GPU is only dedicated to graphics processing and production of graphics cards. It works by first generating the 3D graphic, then mapping the graphic to the corresponding pixels, and then calculating the final color for each pixel and completing the output. It is generally divided into five steps: vertex processing, rasterization calculation, texture mapping, pixel processing, and output.

2. GPU has powerful parallel computing capabilities. Although the single core cache is small, the logic function is simple, and only a limited number of logical operations can be performed, but the architecture design of the GPU multi-core allows him to have the ability to perform complex computing power, which is naturally suitable for parallel processing of dense data, and is good at large-scale concurrent computing. Therefore, GPUs are increasingly used in scenarios that require large-scale concurrent computing such as AI training.

3. CPU is the central processing unit, which is the computing core and control core of the computer. Its main function is to interpret computer instructions as well as process data in computer software. CPU is a general-purpose chip, using low latency design, composed of arithmetic (ALU), controller (CU) and several registers and cache memory, with many functional modules, with strong scheduling, management and coordination capabilities, wide application range, the highest flexibility.

(3) GPUs are subdivided into PCGPU, server GPU, intelligent driving GPU, and mobile GPU on the application side.

1. PCGPU is further divided into discrete graphics and integrated graphics. A discrete graphics card is a GPU that is separate from the processor CPU, has its own dedicated memory and power, and is not shared with the CPU, so it has higher performance, more power and generates more heat. Discrete graphics cards are commonly found in desktops, laptops and small form factor PCs, and the main manufacturers are NVIDIA and AMD. Integrated graphics is a GPU built into the processor that shares system memory with the CPU, so it consumes less power, generates less heat, and performs less than discrete graphics cards. Processors with integrated graphics are typically located in smaller form factor systems, such as laptops, and the main manufacturers are Intel and AMD.

2. Server GPUs are usually used in deep learning, scientific computing, video coding and decoding and other scenarios, and the main manufacturers include NVIDIA and AMD, of which NVIDIA dominates.

3. Autonomous driving. GPUs are commonly used for vehicle-side AI inference of autonomous driving algorithms, with NVIDIA dominating.

(4) Core functions of GPU.

1. Powerful graphics rendering capabilities. With its strong parallel computing power, GPUs have become a dedicated processor for image rendering in personal computers. Graphics rendering is implemented in five stages: vertex shading, shape assembly, rasterization, texture fill shading, testing, and blending.

The rendering process of the GPU is divided into six steps, one is that the three-dimensional image information is input into the computer, and the GPU reads the vertex data of the 3D graphics appearance, the second is to build the overall skeleton of the 3D graphics in the stream processor, that is, the vertex processing, the third is to convert the vector graphics into a series of pixels by the rasterization processing unit, that is, the rasterization operation, the fourth is to achieve texture filling in the texture mapping unit, and the fifth is to complete the calculation and processing of pixels in the stream processor, that is, shading processing. The sixth is to implement the test and mixing tasks in the rasterization processing unit. At this point, a complete GPU rendering process is implemented.

2. Wide range of general-purpose computing capabilities

In 2003, the concept of GPU-based general computing was first proposed, which is GPGPU (General Purpose computing on GPU), that is, using the computing power of GPU to conduct more general and extensive scientific computing in the field of non-graphics processing. The traditional GPU was optimized to remove some of the hardware responsible for graphics processing acceleration and make it more suitable for high-performance parallel computing, and the GPGPU was born.

Because the design of parallel processing structure is very suitable for various intelligent computing scenarios, GPGPU is widely used in artificial intelligence, high-performance computing and data analysis.

(5) Common data formats and application scenarios in GPUs.

Fixed-point representation and floating-point representation are two data formats commonly used in computers. "fixed-point representation" is characterized by a fixed decimal point position, a relatively limited range of values, INT8 and INT16 are commonly used fixed-point representations of GPUs, and they are mostly used in the inference process of deep learning.

The "floating-point representation" includes the symbol bit, the order code part, and the mantissa part. The sign bit determines the positive and negative value of the value, the order part determines the range of value representation, and the mantissa part determines the accuracy of value representation. The numerical representation range and expression accuracy of FP64 (double precision), FP32 (single precision), and FP16 (half precision) decreased sequentially, and the computing efficiency increased sequentially.

In addition, there are other floating-point representations such as TF32 and BF16, which retain the order part but truncate the mantissa part, sacrifice numerical accuracy in exchange for a larger range of numerical representation, and obtain an improvement in computing efficiency, which is widely used in deep learning.

(6) The application program interface is the connection bridge between the GPU and the application software.

API is the application program interface of the GPU, connecting the GPU hardware and application programming, and efficiently performing rendering functions such as vertex processing and pixel shading of graphics. Due to the lack of common interface standards in the early days, engineers had to program specific hardware for specific platforms, which was a huge workload. With the birth of APIs and the deepening of system optimization, GPU APIs can directly manage high-level languages, graphics drivers, and underlying assembly language, improving the efficiency and flexibility of the development process.

The GPU API mainly covers two camps, namely Microsoft DirectX and Khronos Group technical standards. The former provides a complete multimedia solution, 3D rendering is outstanding, but can only be used on Windows systems. The latter OpenGL has a wider range of hardware matching, and is widely used in high-end drawing fields such as CAD, game development, and virtual reality.

(7) The CUDA architecture realizes the generalization of GPU parallel computing.

CUDA is NVIDIA's unified computing device architecture for parallel computing, introduced in 2007, which can leverage GPUs to solve complex computing problems in business, industry, and science. The CUDA architecture allows the function of the GPU to not be limited to graphics rendering, but to realize the generalization of GPU parallel computing, turning "personal computers" into "supercomputers" that can perform parallel operations. After NVIDIA launched CUDA, it packaged complex graphics card programming into a simple interface, which can be used to intuitively write GPU core programs, which greatly improves programming efficiency. At present, mainstream deep learning frameworks are basically based on CUDA to accelerate GPU parallel computing.

Second, the development history of the industry.

(1) In the pre-GPU era, the graphics processor began to take shape.

IBM released the world's first personal computer in 1981, with a black and white display adapter and a color graphics adapter, which was the earliest graphics display controller. In the early 80s of the 20th century, there was a graphics processor marked by GE chips, which has a four-bit vector floating-point operation function, which can realize matrix, cropping, projection and other operations in the graphics rendering process, marking that computer graphics has entered the stage dominated by graphics processors.

(2) GeForce256 was born, and the GPU was officially born. In the 90s of the 20th century, NVIDIA entered the personal computer 3D market, and in 1999, it launched the iconic graphics processor GeForce 256, and the GPU was officially launched. Compared to previous graphics processors, the second-generation GPUGeFORCE256 separates the T&L hardware (which handles the overall angular rotation of graphics and three-dimensional effects such as halo shadows) from the CPU and integrates it into the GPU. This has a dual meaning, one is to allow the GPU to independently perform 3D vertex space coordinate transformation, and the other is to free the CPU from heavy lighting calculations. Therefore, even low-end CPUs with graphics cards that support hardware T&L can play games smoothly. The advent of GPUs has also given NVIDIA a great advantage in market competition, and its market share has continued to increase significantly.

(3) Vertex programming establishes GPU programming ideas. After the reshuffle of graphics card manufacturers in 2000, the third-generation GPU chips (such as NVIDIA's GeForce4Ti and ATI's 8500) were withdrawn in 2002, all of which have vertex programming capabilities, which can change the shape of three-dimensional models at work by assigning specific algorithms. The emergence of vertex programming capabilities has established the programming idea of GPU chips, making it possible for subsequent GPU chips to be used in other computing fields. However, GPUs in this period do not yet support pixel-level programming capabilities (fragment programming capabilities), and their programming freedom is incomplete.

(4) GPUs are used for general computing, and the concept of GPGPU appears.

In 2003, the SIGGRAPH conference first proposed the use of GPUs for general computing, laying the foundation for the emergence of GPGPU, and in the following three years, by replacing the original design of different shading units in the GPU with a unified stream processor, the computing power of the GPU was completely released, and the fourth generation of GPUs had vertex programming and fragment programming capabilities, and the fully programmable GPU was officially born. Because the parallel processing capability of the GPU is stronger than that of the CPU, the GPU can process a large amount of vertex data at the same time, so that it has great advantages in scientific visualization and computing processing such as human CT, geological exploration, meteorological data, and fluid dynamics, which is sufficient to meet various real-time tasks. With the continuous migration of various algorithms such as linear algebra, physics simulation and ray tracing to GPU chips, GPUs have gradually transformed from dedicated graphics displays to general-purpose computing.

(5) The architecture continues to iterate, and AI computing is gradually paying attention.

In 2010, NVIDIA released the new GPU architecture Fermi, which is the third generation GPU architecture that supports CUDA (the first and second generations are G80 architecture and GT200 architecture, respectively). At that time, NVIDIA did not make specific settings for AI computing scenarios in the design of the Fermi architecture, but compared with CPU chips, GPU chips already had great advantages in the field of AI computing. In the Kepler architecture and Maxwell architecture released successively in 2012 and 2014, AI computing was not specifically optimized at the hardware level, but the deep neural network acceleration library cuDNNv1.0 was introduced at the software level, which improved the AI computing performance and ease of use of NVIDIA GPUs to a certain extent.

(6) Release of Pascal architecture, specialized version of AI actuarial arrived. The Pascal architecture was launched in March 2016 and is the first architecture released by NVIDIA for AI computing scenarios. FP16 (half-precision floating point number calculation), NVlink (bus communication protocol, which can be used for single CPU configuration and multiple GPUs), HBM (increase memory bandwidth), INT8 format support (support inference scenarios) and other technologies are added to the hardware structure, and TensorRT and open source communication function library NCCL for inference acceleration scenarios are also released at the software level, and the forward-looking layout of Psdcal architecture in the field of AI computing makes NVIDIA's subsequent architecture have a great advantage in the competition.

(7) Subdivision scenarios continue to catch up, and GPUs usher in a period of rapid development.

Following the Pascal architecture, in the face of the pressure brought by Google's TPU at the AI computing level, NVIDIA has successively updated the Volta2017, Turing2018, Ampere era 2020 architecture, the technical generation difference in the field of AI computing is leveled in the training scene by introducing the first generation of TensorCore in the Volta architecture, and then the second generation of TensorCore of the Turing architecture is leveled in the inference scene until the Ampere era. NVIDIA has consolidated its leading position in the field of AI computing, and under the fierce competition between the two sides, GPUs have ushered in a period of rapid development.

Second, the key factors affecting GPU performance.

(1) Microarchitecture design is the key to GPU performance improvement.

The parameters for evaluating the physical performance of the GPU mainly include: microarchitecture, process, number of graphics processors, number of stream processors, memory capacity/bit width/bandwidth/frequency, and core frequency, of which microarchitecture design is the key to GPU performance improvement.

GPU micro architecture refers to the composition of physical circuits compatible with a specific instruction set, consisting of a stream processor, a texture mapping unit, a rasterization processing unit, a ray tracing core, a tensor core, a cache and other components. The graphics functions in the graphics rendering process are mainly used to draw various graphics and pixels, realize light and shadow processing, 3D coordinate transformation and other processes, during which a large number of the same type of data (such as image matrices) are intensively and independently calculated intensively, and many repeated computing units in the GPU structure are designed to adapt to such characteristics of data operations.

The design of microarchitecture plays a crucial role in improving GPU performance, and is also the most critical technical barrier in the GPU development process. Microarchitecture design affects the highest frequency of the chip, the computing power at a certain frequency, and the energy consumption level under a certain process, which is the soul of chip design. Compared with the A100, the 1.2 times performance improvement of the NVIDIA H100 comes from the increase in the number of cores, and the 5.2 times performance improvement comes from the design of the microarchitecture.

(2) Hardware composition of GPU microarchitecture.

One is the stream processor. It is the basic computing unit of the GPU, usually composed of the whole point operation part and the floating point operation part, called the SP unit, from the programming point of view, it is also called the CUDA core. Stream processor is a unified rendering architecture introduced after DirectX10, which integrates the rendering tasks of vertex processing and pixel processing, and the number of stream processors is closely related to the performance of the graphics card.

The second is the texture mapping unit. An independent part of a GPU capable of rotating, resizing, and distorting a bitmap image (performing texture sampling) to populate texture information onto a given 3D model.

The third is the rasterization processing unit. According to the perspective relationship, the entire viewable space is pressed from a three-dimensional three-dimensional form into a two-dimensional plane. The stream processor and texture mapping unit respectively submit the rendered pixel information and the trimmed texture material to the rasterization unit at the back end of the GPU, and mix and fill the two into the final picture output, and post-processing effects such as fog, depth of field, motion blur, and anti-aliasing in the game are also completed by the rasterization unit.

The fourth is the ray tracing core. It is a complementary rendering technology, mainly by calculating the reaction between light and rendering objects to obtain correct reflection, refraction, shadow, that is, global illumination and other results, rendering realistic simulation scenes and the lighting of objects in the scene. By sampling the BVH algorithm, which is used to calculate the intersection of rays (rays, sound waves) and object triangles, RTCore can achieve BVH calculation efficiency by geometric orders of magnitude compared with traditional hardware, making real-time ray tracing possible.

The fifth is the tensor core. It can improve the rendering effect of the GPU and enhance the AI computing power. Tensor Cores uses Deep Learning Super Sampling (DLSS) to improve the clarity, resolution, and game frame rate of rendering, while denoising the rendered images to clean and correct the images rendered by the ray-traced core in real time to improve the overall rendering effect. At the same time, Tensor Core greatly accelerates the speed of AI computing through low-precision hybrid computing, allowing computer vision, natural language processing, language recognition, text conversion, personalized recommendation and other functions that were difficult to be realized by CPUs in the past to be completed at high speed.

Fourth, GPU market analysis.

(1) GPU market size and forecast. According to Verified Market Research, the global market size of GPUs in 2020 was $25.4 billion and is expected to reach $246.5 billion by 2028, maintaining rapid growth in the industry, with a CAGR of 32.9%, and the global market size of GPUs is expected to be $59.5 billion in 2023.

(2) PC graphics card market

1. The discrete graphics card market began to gradually recover. According to Jon Peddie Research, discrete GPU shipments fell to 38.08 million units in 2022, down 22.5% year-on-year, and 6.9 million units were shipped in a single quarter in 22Q3, down 45.7% year-on-year, the largest decline in a decade, and discrete graphics card shipments began to gradually warm up in 22Q4.

2. Integrated graphics card shipments are still not optimistic. In 2022, integrated GPU shipments were 283 million units, down 29.8% year-on-year. The demand for working from home during the pandemic has led to an increase in the consumption of laptops, and the surge in demand for integrated graphics cards has prematurely consumed market demand to some extent. In the post-epidemic era, the demand for laptops has weakened, and the excess inventory of superimposed suppliers has led to a continuous decline in integrated graphics card shipments.

3. There are three reasons for the huge decline in discrete graphics card shipments in 2022, one is affected by the macroeconomy, the personal computer market is in a downward cycle, the second is that some independent GPUs participate in virtual currency mining, the Ethereum merger has caused a huge impact on independent GPU shipments, and the third is that downstream board manufacturers have opened an inventory reduction cycle. The larger the amount of pseudo-currency, the higher the probability of obtaining accounting gains. After the full merger of Ethereum, it is no longer necessary to purchase a large number of graphics cards and invest computing resources for mining, which is an important turning point in the graphics card mining market.

(3) GPU has great application potential in the field of data centers.

GPUs are widely used in artificial intelligence training, inference, high-performance computing (HPC) and other data center fields.

1. The computing power demand brought by pre-trained large models drives the rapid growth of the artificial intelligence server market. Giant quantization is an important trend in the development of artificial intelligence in recent years, and the core characteristics of giant quantization are many model parameters and large amount of training data. The introduction of the Transformer model has opened the era of pre-training large models, and the computing power demand of large models has increased significantly faster than other AI models, injecting a strong driving force into the market growth of artificial intelligence servers. According to Omdia, AI servers are the fastest growing segment in the server industry, with a CAGR of 49%.

2. Strategic needs drive the steady growth of GPUs in the field of high-performance computing. High-performance computing (HPC) provides powerful ultra-high floating-point computing capabilities to meet the computing needs of compute-intensive and massive data processing services, such as scientific research, weather forecasting, computational simulation, military research, biopharmaceuticals, gene sequencing, etc., greatly shortening the time spent on massive computing. High-performance computing has become an important means to promote scientific and technological innovation and economic development.

3. Large models bring strong demand for artificial intelligence computing power.

The huge quantization of natural language large model parameters is the development trend of the industry. The artificial intelligence model represented by ChatGPT shows a high degree of intelligence and anthropomorphism, and the factor behind it lies in the emergence ability and generalization ability of natural language large models. When the model parameters reach the order of 100 billion, it may show a leapfrog improvement in performance, which is called the emergence ability; In the zero-sample or few-sample learning scenario, the model still shows strong transfer learning ability, which is called generalization ability. Both capabilities are closely related to the amount of model parameters, and the huge quantification of AI model parameters is an important industry development trend.

Pre-trained large models have entered the era of hundreds of billions of parameters, and the demand for model training computing power has reached a new level. Since the GPT-3 model, large-scale natural language models have entered the era of hundreds of billions of parameters, and many hundreds of billions of natural language models have emerged after 2021, and the training computing power of the model has increased significantly. The ChatGPT model parameters are 175 billion, the training computing power requirement is 3.14*1023flops, and the current various pre-trained language models are still rapidly updating and iterating, constantly refreshing the performance records of natural language processing tasks, and the training computing power demand of a single model is constantly breaking through new highs.

4. Large models bring significant pull to the demand for AI chips.

The computing power requirements of large models mainly come from three links, one is the training link of pre-training to obtain large models. In this link, computing power presents massive demand and centralized training, and large models are usually trained in the cloud within days to weeks. Taking ChatGPT training as an example, a single model training requires 2,000 NVIDIA A100 graphics cards to train for 27 days. The second is to further fine-tune the link when adapting to the downstream field. The computing power requirement depends on the generalization ability of the model and the difficulty of the downstream task.

The third is the reasoning link in the daily operation of the large model. In the daily operation of a large model, each user call needs a certain amount of computing power and bandwidth as support, and the calculation amount of a single inference is 2N (N is the number of model parameters).

For example, the ChatGPT model with 175 billion parameters has an inference operation of 2*1750*108*103=3.5*1014flops=350 Tflops.

Recently, the number of daily visitors attracted by the official website of ChatGPT is close to 50 million, the average number of visitors per hour is about 2.1 million, if the number of online people at the same time during peak hours is 4.5 million, each person asks 8 questions in an hour, each question answers 200 words, and it is estimated that 14,000 NVIDIA A100 chips are required for daily computing power support. In the process of integrating large models into search engines or providing other commercial services in the form of apps, the demand for AI chips will be further significantly pulled.

5.AI Server is an important support for the growth of the GPU market.

According to Omdia data, the global artificial intelligence server market size was $2.3 billion in 2019 and will reach $37.6 billion in 2026, with a CAGR of 49%. According to IDC data, the market share of chips used for AI inference in China's data centers has exceeded 50% in 2020, and it is expected that by 2025, the share of chips used for AI inference workloads will reach 60.8%.

Artificial intelligence servers usually use a combination of CPU and acceleration chip to meet high computing power requirements, and commonly used acceleration chips include GPUs, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), neuromorphic chips (NPUs), etc. With its powerful parallel computing capabilities, deep learning capabilities, strong versatility and mature software ecosystem, GPUs have become the first choice for data center acceleration, and about 90% of AI servers use GPUs as acceleration chips.

Affected by the capital expenditure of cloud vendors, the AI server market may slow down in the short term.

North American cloud vendor capital spending has slowed. AI servers mostly adopt a hybrid architecture of public cloud, private cloud and on-premises deployment, and track changes in AI server market demand based on the capital expenditure of four cloud vendors in North America, with a total capital expenditure of US$151.1 billion in 2022, an increase of 18.5% year-on-year. Meta's guidance for capital expenditure in 2023 is $30-33 billion, essentially unchanged from 2022 and lower from the previous 22Q3 forecast of $34 billion to $39 billion; Google expects capital spending in 2023 to be roughly flat in 2022, but will increase investment in AI and cloud services.

The short-term revenue decline of Xinhua Technology has eased. As the world's largest BMC chip company, Aspeed's revenue changes are generally one quarter ahead of cloud vendors' capital expenditure, and its monthly revenue data can be used as a forward-looking indicator of cloud vendors' capital expenditure, and Xinhua Technology's recent revenue decline has eased.

6. The market size of GPUs in supercomputing servers has maintained steady growth. The penetration rate of GPGPU in the field of high-performance computing continues to increase. In the field of high-performance computing, CPU+GPU heterogeneous collaborative computing architecture has been more and more used, and 170 systems of the world's top 500 supercomputers in computing power adopt heterogeneous collaborative computing architecture, of which more than 90% of the acceleration chips choose NVIDIA's GPGPU chips.

The market size of GPUs in supercomputing servers has maintained steady growth. According to Hyperion Research, the global supercomputing server market size will increase from $13.5 billion in 2020 to $19.9 billion in 2025, according to the GPU in the supercomputing server cost accounted for 27.3%, the GPU market size in the supercomputing server will increase from 3.7 billion in 2020 to 5.4 billion US dollars in 2025, with a CAGR of 8%.

7. The GPU market in the field of autonomous driving maintains high growth

In the field of autonomous driving, all kinds of autonomous driving chips are widely used. According to Yole, the global autonomous driving market will reach $78 billion by 2025, of which more than $10 billion will be used in AI chips for autonomous driving.

The autonomous driving GPU market maintained high growth. According to ICVTank's autonomous driving penetration data, assuming that the penetration rate of GPUs in L2 is 15% and 50% in L3-L5, the overall market size of GPUs in the field of autonomous driving is estimated to increase from USD 710 million in 2020 to USD 4.4 billion in 2025, with a CAGR of 44%.

5. Analysis of industrial chain and competitive pattern

(1) GPU industry chain. The industrial chain of the GPU industry mainly involves three links: design, manufacturing, and packaging. There are three supply modes: IDM, Fab+Fabless and Foundry. The IDM model refers to the integration of the three links of the GPU industry chain, fully combining independent research and development and external foundry, integrating design, manufacturing and packaging, and vertically integrating the overall GPU industry chain. Fab+Fabless gives full play to the comparative advantages of each enterprise, is only responsible for chip circuit design, and outsources other links in the industrial chain, which disperses the risk of GPU R&D and production. The Foundry model is that the company is only responsible for chip manufacturing, not upstream design and downstream packaging, and can serve multiple upstream enterprises at the same time.

(2) Competitive landscape. The global GPU market is basically monopolized by Nvidia, Intel and AMD. According to JPR statistics, global PCGPU shipments reached 84 million units in Q2 2022, a year-on-year decrease of 34%, and the compound growth rate of GPUs is expected to be 3.8% from 2022 to 2026. From the perspective of market structure, Nvidia, Intel and AMD have a market share of 18%, 62% and 20% respectively in Q2 2022, and Intel occupies the largest market share with its integrated graphics card on the desktop.

In the dedicated graphics market, Nvidia occupies a leading position. Different from the overall market, in the exclusive graphics market, Nvidia and AMD monopolize the market, its Q2 market share in 2022 is about 80% and 20%, respectively, it can be seen that Nvidia has continued to consolidate its advantages in recent years, and its discrete graphics card market share has shown an overall upward trend.

From the perspective of the domestic market, the domestic GPU track continues to boom. In recent years, domestic GPU companies have sprung up, and companies such as Biren Technology, Moore Thread, Xindong Technology, and Days Intelligence have released new products. However, from the perspective of IP licensing, major domestic GPU startups, such as Xindong, Moore Thread, Wall Dong, etc., use Imagination IP or VeriSilicon authorized IP. However, from the perspective of IP licensing, major domestic GPU startups, such as Xindong, Moore Thread, Wall Dong, etc., use Imagination IP or VeriSilicon authorized IP. The first high-performance 4K-level graphics card GPU chip "Fenghua 1" released by Xindong Technology uses Imagination's IMG B series GPU IP, which is the first high-end GPU application of Imagination IP in the Chinese market. A key part of the Moore threaded chip design is also reportedly from Imagination Technologies.

Imagination is a UK-based company that builds intellectual property (IP) for semiconductors and software. The company's graphics, computing, vision, and artificial intelligence, as well as connectivity technologies, enable superior PPA (power, performance, and area) metrics, robust security, fast time-to-market, and lower total cost of ownership (TCO). In September 2017, private equity firm Canyon Bridge acquired Imagination for £550 million, backed by China's Guoxin.

(3) How the United States responds to the ban on China

According to Reuters, on August 31, 2022, the US government required whether NVIDIA's A100, H100 series and AMD's MI250 series and future high-end GPU products can be sold to Chinese customers, which needs to be licensed by the US government. These chips are high-end GPGPUs for general-purpose computing, usually used in cloud training and inference scenarios of artificial intelligence computing and supercomputers, and the customers in China are mostly cloud computing vendors and universities and research institutes.

In response to the lockdown, in the short term. Choose from NVIDIA and AMD's mid- to low-performance GPU chips that have not yet been banned. For cloud computing, computing power can be increased by product upgrades or by increasing the number of computing cards, so in the short term, the processing power of high-end GPU chips can be replicated by using multiple CPU, GPU and ASIC chips with low computing power, which can basically meet the requirements of cloud training and high-performance computing.

In the long run, choose domestic GPUs for replacement. Although chips are the main source of computing power and the most fundamental material foundation, the production, aggregation, scheduling and release of computing power is a complete process, which requires the joint cooperation of the software and hardware ecology of complex systems to achieve "effective computing power". Therefore, in the short term, it may be difficult to replace it because it is not compatible with the CUDA architecture widely used in the field of artificial intelligence, but in the long run, domestic CPUs, general-purpose GPUs, and AI chips will gain unprecedented development opportunities, and gradually realize the localization of high-end GPU fields through the improvement of software and hardware technology.

6. Related companies.

(1) Foreign companies. One is NVIDIA. NVIDIA is a company focusing on GPU semiconductor design. The company was founded in 1993, and in 1999 NVIDIA launched the GeForce 256 chip and first defined the concept of GPU; Subsequently, the CUDA architecture was innovatively proposed, allowing GPUs that had previously only done 3D rendering to achieve general computing functions; After entering the 2010s, NVIDIA foresaw the application of GPUs in the AI market in the early stage of the development of the AI industry and made every effort to carry out related layouts; At present, the company has completed the construction of a full-stack ecosystem of hardware, system software, software platform and application framework based on four categories of chips in data centers, games, automobiles, and professional vision.

Tracing the company's history, NVIDIA continues to promote the development of the GPU industry based on technological innovation, which can be called the founder of the GPU era. The development history of NVIDIA can be simply divided into four stages: First, the power stage: in 1993, Huang Jenxun and Sun Microsystem two young engineers co-founded NVIDIA, which was committed to the research and development of graphics chips in the early days. In 1997, the company introduced the RIVA128, the company's first truly successful product. The second is the rise stage: in 1999, the company launched the GeForce 256 and defined the GPU chip, and NVIDIA embarked on the road to reshaping the graphics card industry. The third is the hegemonic stage: in 2006, NVIDIA innovatively launched the CUDA architecture. CUDA is NVIDIA's own GPU based on a parallel computing platform and programming model. CUDA brings two huge impacts, in the GPU industry, CUDA enables GPUs that only do 3D rendering to achieve general computing functions, and for NVIDIA itself, it vigorously promoted CUDA in the early days, and extended the programming language of CUDA, so that developers can easily program GPUs, CUDA is currently one of the two mainstream GPU programming libraries, laying the foundation for the formation of NVIDIA's GPU ecosystem. The fourth is the take-off stage: betting on AI, the data center business opens the second growth curve. In 2012, Alex Krizhevsky used GPUs for deep learning, and won the ImageNet competition after a few days of training, which increased the accuracy of the deep convolutional neural network AlexNet by 10.8%, shocking the academic community, and opened the door to GPU application in deep learning, which uses the NVIDIA GTX 580 GPU chip and CUDA computing model. Since then, NVIDIA GPU and CUDA models have become the preferred chips for deep learning (especially in training), and NVIDIA has also launched a large number of chips and supporting products dedicated to AI, transforming from a graphics card hardware company to an artificial intelligence company.

NVIDIA has launched a generation of chip architecture every two years on average, and a new product every six months, and has persisted for many years. From the Fermi architecture in 2009 to the current Hopper architecture, the company's product performance has steadily improved and has always led the development of GPU chip technology.

At present, NVIDIA GPU chips have formed a chip product array covering several scenarios of data center, gaming, professional vision and automotive business, of which consumer GPU and data center GPU are the core scenarios. NVIDIA's data center business has expanded rapidly since 2017, and has successively released high-performance general-purpose computing graphics cards such as V100 and A100, providing top AI computing power for the world. Latest product generation: NVIDIA has launched the first product of the GeForce 40 series on September 20, 2022.

The second is AMD, the American Chaowei Semiconductor Corporation (AMD). Founded in 1969, specializing in the computer, communications and consumer electronics industries to provide a variety of microprocessors and provide flash memory and low-power processor solutions, the company is the world's leading CPU, GPU, APU and FPGA design manufacturers, master central processing units, graphics processors, flash memory, chipsets and other semiconductor technologies, the specific business includes data center, client, gaming, embedded four parts. The company adopts the Fabless R&D model, focusing on chip design, and the manufacturing and packaging and testing links are entrusted to professional foundries around the world. At present, the global CPU market is dominated by Intel and AMD, and Intel dominates. In the independent GPU market, mainly NVIDIA (NVIDIA), AMD to compete, Intel is currently with its Iris Xe MAX products are also gradually entering the independent GPU market.

The company's revenue mainly includes four parts. The data center business mainly includes various chip products for data center servers; The client business mainly includes various processor chips for PCs; The game business mainly includes independent GPU and other game product development services; The embedded business mainly includes various types of embedded computing chips suitable for edge computing.

AMD can provide two types of PC GPUs: integrated GPU and discrete GPU. Integrated GPUs are mainly used in APU products and embedded products of desktops and notebooks, mainly used in games, mobile devices, servers and other applications. APU comes with an integrated on-board GPU, and the CPU and GPU are highly integrated to co-compute and accelerate each other, which has more cost-effective advantages than discrete GPUs.

The discrete GPU is the Radeon series. AMD's Radeon series of independent GPUs can be divided into RX500 series, Radeon7, RX5000 series, RX6000 series, and RX7000 series in order of launch. Radeon series graphics cards have certain cost-effective advantages, and there is room for further growth in market share.

The RDNA3 architecture uses a 5nm process and chiplet design, which has 54% performance per watt improvement over the RDNA2 architecture, including 2.7 times AI throughput, 1.8 times second-generation ray tracing technology, 5.3TB/s peak bandwidth, 4K480Hz and 8K165HZ refresh rates. AMD expects to launch the RDNA4 architecture in 2024, which will be manufactured using more advanced processes.

In 2018, AMD launched the Radeon Instinct GPU-accelerated chip for data centers, and the Instinct series is based on the CDNA architecture. In the field of general computing, the latest CDNA2 architecture achieves significant improvements in computing power and interconnection capabilities compared with CDNA1 architecture, and the MI250X adopts CDNA2 architecture. In terms of vector computing, CDNA2 optimizes the vector pipeline, and FP64 operates at the same frequency as FP32 and has the same vector computing capabilities. In terms of matrix calculations, CDNA2 introduces a new matrix multiplication instruction level, especially for FP64 precision, and MattrixCore also supports FP32, FP16 (BF16) and INT8 calculation accuracy. In terms of interconnection, P2P or I/O communication between accelerators is realized through the AMD infinityfabric interface, providing a total theoretical bandwidth of 800GB/s, an increase of 235% compared with the previous generation.

AMD ROCm is an open source software development platform for HPC and ultra-large-scale GPU computing developed by AMD in 2015 to benchmark the CUDA ecosystem. ROCm is to AMD GPUs what CUDA is to NVIDIA GPUs.

AMD ROCm is an open software platform built for flexibility and performance, targeting accelerated computing and programming language agnostic, enabling participants in the machine learning and high-performance computing communities to accelerate code development with a variety of open source computing languages, compilers, libraries, and redesigned tools for large-scale computing and support for multi-GPU computing, with the goal of building an alternative to CUDA.

(2) Domestic companies.

1. Haiguang information. Founded in 2014, its main business is R&D, design and sales of high-end processors used in computing and storage devices such as servers and workstations. Products include Haiguang general-purpose processor (CPU) and Haiguang coprocessor (DCU), and a number of new high-end CPU and DCU products that can reach the international mainstream products have been developed. In October 2018, the company launched the product design of Shensuan No. 1, and at present, Haiguang DCU series Shensuan No. 1 has achieved commercial application, and in January 2020, the company launched the product research and development of the second generation DCU Shensuan No. 2.

Haiguang DCU belongs to a kind of GPGPU, the composition of Haiguang DCU is similar to CPU, its structural logic is simple CPU, but the number of computing units is large. The main functional modules of Haiguang DCU include computing unit (CU), on-chip network, cache, various interface controllers, etc. Deep-learning Computing Unit (DCU) is a type of coprocessor designed and released by the company based on the general GPGPU architecture suitable for computing-intensive and computing acceleration fields, defined as a deep computing processor DCU. Compatible with the general "CUDA-like" environment and international mainstream business computing software and artificial intelligence software, the software and hardware ecology is rich, and can be widely used in big data processing, artificial intelligence, business computing and other applications. Haiguang 8100 adopts advanced FinFET process, and the performance indicators in typical application scenarios can reach the same level of international high-end products of the same type, which is in a leading position in China. In the second half of 2021, DCU officially realized commercial application.

Haiguang Information DCU coprocessor is fully compatible with ROCm GPU computing ecology, because ROCm and CUDA have a high degree of similarity in ecology, programming environment, etc., CUDA users can quickly migrate to the ROCm platform at a lower cost, so ROCm is also known as "CUDA-like". Therefore, Haiguang DCU coprocessor can better adapt to and adapt to international mainstream business computing software and artificial intelligence software, rich software and hardware ecology, can be widely used in big data processing, artificial intelligence, business computing and other compute-intensive applications, mainly deployed in server clusters or data centers, providing high-performance, energy-efficient computing power for applications, supporting high-complexity and high-throughput data processing tasks.

2. Jing Jiawei. Changsha Jingjia Microelectronics Co., Ltd. was established in 2006, launched the first domestic GPU in 2015, is the first domestic enterprise to successfully develop GPU chips with completely independent intellectual property rights and realize engineering applications, and was successfully listed on the Growth Enterprise Market of Shenzhen Stock Exchange in 2016. The company's business layout is in the field of graphic display, graphics processing chip and small specialized radar, and its products cover integrated circuit design, graphic image processing, computing and storage products, small radar systems and other directions.

The company has a long history of GPU research and development and deep technology accumulation. At the beginning of its establishment, the company undertook the graphics acceleration task of Shenzhou 8, laying a solid foundation for graphics processor design; In 2007, the company independently developed the M9 chip driver under the VxWorks embedded operating system, and solved the 3D graphics processing problem and Chinese character display bottleneck under the system, and had the ability to control the graphics display and control products from the bottom up. In 2015, the GPU chip JM5400 with completely independent intellectual property rights came out, which has the characteristics of high performance and low power consumption; Since then, the company has continuously shortened the research and development cycle, and JM7200 has made great progress in design and performance, from the special market to the general market; JM9 series is positioned in the mid-to-high-end market and is a general-purpose chip that can meet the needs of high-end display and computing.

JM7200 adopts 28nm CMOS process, core clock frequency up to 1300MHz, memory memory is 4GB, supports OpenGL1.5/2.0, can efficiently complete 2D, 3D graphics acceleration functions, supports PCIe2.0 host interface, adapts to domestic CPU and domestic operating system platform, can be applied to personal office computer display system and high-reliability embedded display system. JM9 series is aimed at the mid-to-high-end general market, which can meet the high-performance display needs and artificial intelligence computing needs such as geographic information systems, media processing, CAD-aided design, gaming, and virtualization. In May 2022, the second chip of the JM9 series has completed preliminary testing.

3. Walltech Technology. The company's main business is high-end general intelligent computing chips. Founded in 2019, the company is committed to developing an original general computing system, establishing an efficient software and hardware platform, and providing integrated solutions in the field of intelligent computing. From the development path, the company will first focus on cloud general intelligent computing, gradually catch up with existing solutions in artificial intelligence training and reasoning, graphics rendering and other fields, and achieve breakthroughs in domestic high-end general intelligent computing chips. In March 2022, the company's first general-purpose GPU chip BR100 was successfully lit up, and then officially released in August 2022, setting a new record for global computing power.

The company's product system mainly covers BR100 series of general-purpose GPU chips, BIRENSUPA software development platform and developer cloud. Among them, BR100 series general-purpose GPU chips are the company's core products, which currently mainly include BR100 and BR104 chips. Developed for a wider range of general computing scenarios such as artificial intelligence (AI) training, inference, and scientific computing, the BR100 series is mainly deployed in large-scale data centers, relying on the original architecture of "Wall-to-Wall" to provide energy-efficient and high-versatility accelerated computing power.

The BR100 series has a number of core advantages in terms of performance and safety. The company is committed to creating GPU chips with advanced performance and strong competitiveness, and has taken a large number of technical measures to this end, including: supporting 7nm process, and innovatively applying Chiplet and 2.5D CoWoS packaging technology, taking into account high yield and high performance; Support PCIe 5.0 interface technology and CXL communication protocol, bidirectional bandwidth up to 128GB/s. In 2022, the company officially launched the Wall ™100, whose peak computing power reaches more than 3 times that of the flagships sold by international manufacturers, surpassing the flagship products of similar international manufacturers, and its competitive advantage is very significant. At the same time, in terms of security, the BR100 series supports up to 8 independent instances, each of which is physically isolated and equipped with independent hardware resources to run independently.

3. Moore threads. Moore Thread is an integrated circuit high-tech company focusing on GPU chip design. Founded in October 2020, the company focuses on the R&D and design of full-featured GPU chips and related products, supporting multiple combined workloads such as 3D high-speed graphics rendering, AI training inference acceleration, ultra-high-definition video encoding and decoding, and high-performance scientific computing, taking into account computing power and computing efficiency, and providing powerful computing acceleration capabilities for China's technology ecosystem partners. Under the vision of "meta-computing" to empower the next-generation Internet, the company will continue to innovate a new generation of GPUs for meta-computing applications, build an integrated computing platform that integrates visual computing, 3D graphics computing, scientific computing and AI computing, and establish an ecosystem based on cloud-native GPU computing to help drive the development of the digital economy.

The company's product system mainly includes: MTTS60, MTTS2000, MTTS100 and other hardware products; MTSmart Media Engine, MT GPU Management Center, MT DirectStream, MT OCR and other software products; AND OTHER PRODUCTS SUCH AS MUSA UNIFIED SYSTEM ARCHITECTURE, DIGITALME DIGITAL HUMAN SOLUTIONS, AND META-COMPUTING APPLICATION SOLUTIONS.

MTTS60 graphics card is made of GPU Sudi core chip based on MUSA architecture, using 12nm process, containing 2048 MUSA cores, single-precision computing power up to 6TFlops, configured with 8GB of video memory, based on MUSA software runtime and drivers and other software tools. Supported by advanced hardware specifications, MTTS60 graphics cards can show multiple advantages in different application scenarios: rich graphics APIs, 4K/8K ultra-high-definition displays, leading hardware video encoding and decoding capabilities, and general AI function support.

MTTS2000 adopts 12nm process, uses 4096 MUSA cores, is configured with a maximum of 32GB of video memory, single-precision computing power can reach up to 12TFlops, supports H.264, H.265, AV1 multi-channel HD video codec, and a wide range of AI model algorithm acceleration. At the same time, MTTS2000 also adopts passive heat dissipation and single-slot design to meet the high-density GPU configuration mode of data centers. At present, MTTS2000 has been compatible with CPU architectures such as X86 and ARM and mainstream Linux operating system distributions, and has established cooperative relations with many server partners such as Inspur, New H3C, Lenovo, and Tsinghua Tongfang, and the product ecology continues to improve. As the company's GPU chip for the data center field, in addition to the ecology, MTTS2000 also has the advantages of full-featured GPU, rich graphics API support, and green computing. Based on the advantages of multi-dimensional computing power and ecological improvement, MTTS2000 is expected to help the company empower PC cloud desktop, Android cloud gaming, audio and video cloud processing, cloud Unreal/Unity application rendering, and AI inference computing.

7. Future prospects.

(1) In the future, some manufacturers are expected to usher in explosive growth.

As the core base of data computing, GPU has a high strategic position, the country attaches great importance to it, and under the background of Sino-US scientific and technological friction, independent and controllable is imperative. From the perspective of growth, the global market space is vast, and the domestic market size has reached tens of billions, and it has accelerated with the increase in downstream demand. Under the background of increasing total demand driven by digitalization, superimposed on the trend of localization, the domestic GPU industry has ushered in opportunities for both total volume and share, and the development of domestic GPU manufacturers has accelerated. Under the large market demand, the localization space of GPUs is broad, the scarcity of excellent manufacturers is highlighted, and the growth is accelerated, and some manufacturers are expected to grow explosively.

(2) China's GPU market will grow rapidly and is expected to bring faster growth to the corresponding segments.

The huge demand and the gradual maturity of the industry indicate a broad space for development. In the context of new scenarios such as artificial intelligence, cloud gaming, and autonomous driving, and the explosive growth of demand, it can be judged that China's GPU market will grow rapidly, and compared with the PC market, the new market space may be larger. Compared with traditional IT application scenarios such as PCs, China is at the same level of competition as powerful countries in the field of artificial intelligence and autonomous driving, and China's huge market is expected to bring faster growth to corresponding segments.