laitimes

The wind of artificial intelligence continues to blow, and AI chips ride the wind

author:Foresight Think Tank

Artificial intelligence is rising again, and the industry has a broad space for development

This wave of artificial intelligence is set off by ChatGPT, and uses Large Language Model (LLM) and generative AI applications as the entry point. Since Google's release in 2017, in addition to bringing C-side explosive products like ChatGPT, Transformer has long been widely used in natural language processing, computer vision, autonomous driving and other fields. Chinese and foreign technology companies continue to increase investment in related matters, including Google (GOOGL US), Meta (META US), Microsoft (MSFT US), ByteDance (unlisted), Baidu (BIDU US) and other domestic and foreign technology giants and start-ups are hoping to get a piece of the pie, and other non-technology companies are also constantly deploying talents, technology and resources. According to Bloomberg Intelligence's forecasts, generative AI could grow from less than 1% of total IT hardware, software, services, advertising, and gaming spending to 12% by 2032.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

ChatGPT (Chat Generative Pre-trained Transformer) has attracted global attention since its release in November 22, with more than 1 million registered users in 5 days, and 100 million monthly active users just two months later. ChatGPT officially brings the multimodal application of generative AI in text, image, video and other fields to the field of vision of C-end mass users. However, we believe that if the language model only stays in the C-end application and provides some netizens with entertainment, it is actually of little significance. We believe that the development of generative AI must be matched with the landing of B-side applications in order to become a high-end technology that can truly change the world. At present, Microsoft has released generative AI products such as Microsoft 365 Copilot as the first major commercial application. Relying on Microsoft's huge user base, product ecology and usage scenarios, Copilot is expected to open a new milestone in the development of B-side applications of AI and drive Microsoft to open up new AI commercialization space. Bloomberg Intelligence predicts that the global generative AI downstream software market will grow to $279.9 billion by 2032, with a ten-year compound growth rate of 69% from 2022 to 2023.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

It is optimistic that the overall demand for AI chips will increase with the landing of B-side applications of large models and generative AI

Since 2022, the number of large models and the number of parameters have increased exponentially. In general, we believe that the number of models and the required training data are the key to the computing power requirements, so we are optimistic that the demand for the overall AI chip will increase with the landing of B-side commercial applications supported by large models and generative AI.

Since OpenAI (not available) released the first generation of GPT (Generative Pre-trained Transformer) models with 117 million parameters in 2018, each iteration of GPT models has been accompanied by a leap in the amount of parameters.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

Not to be outdone, many Chinese and foreign technology giants are not to be outdone, including Google, Meta, Baidu, etc. have released PaLM, LaMDA, Llama, Wen Xin Yiyan and other large language models. In January 2020, the OpenAI team's paper "Scaling Laws for Neural Language Models" proposed the "scaling laws", that is, the performance of large models increases with the growth of model parameters, dataset size, and computation, and they also emphasized again in May 2023 that there is still no bottleneck in the scaling law.

But we also see that Google's new generation of PaLM large model, PaLM2, released at the I/O conference in May this year, is through algorithmic improvements to achieve about 5 times that of the previous generation of PaLM (780 billion tokens), reaching 3.6 trillion tokens, but the number of parameters is 340 billion, less than PaLM's 540 billion.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

"Large model" usually refers to self-supervised and pre-trained models with a large number of parameters, and the core technology behind it is the Transformer architecture, which is currently widely used in natural language processing fields such as text generation. Transformer was proposed in 2017 by the Google Brain team in the paper "Attention Is All You Need". The architecture is mainly used to process sequence data, mainly using a self-attention mechanism to give different weights to each element in the sequence, thereby capturing long-distance dependencies within the sequence. Before Transformer, deep learning models were trained more using supervised learning, so they required a lot of labeled data. Relatively speaking, the innovation of GPT models lies in the combination of pre-training close to unsupervised learning (specifically called "self-supervised learning") and a small number of supervised fine-tuning.

The wind of artificial intelligence continues to blow, and AI chips ride the wind
The wind of artificial intelligence continues to blow, and AI chips ride the wind

In large language models that require generalization capabilities, such as text generation, contextual semantic understanding, article revision and abstract summary, the Transformer architecture has made great progress compared with the previous CNN and RNN network structures. The Transformer architecture breaks through the computational limitations imposed by the fixed sequence attribute of the RNN (Recurrent Neural Network) model, and through the self-attention mechanism, it can process all elements of the entire sequence at the same time, thereby achieving efficient parallelization and improving the computing speed.

At the same time, compared with the increase of the accompanying distance in the CNN (Convolutional Neural Network) model, the operation required to calculate the location association will continue to increase, and the Transformer can directly calculate the association between any two elements in the sequence through the self-attention mechanism, and display the relationship between the sequence elements through the weight, so as to provide richer global context information for the model and effectively improve the understanding of complex structure and semantics. Therefore, Transformer is considered to be suitable for most white-collar jobs, and in the current context of high labor costs and urgent productivity improvement, it may begin to sink into various fields such as office, accounting, legal, programming and medical care. We can compare the Transformer model to the right brain of humans, which performs well in superficial correlation and is suitable for generative fields that require creativity, but it still needs to strengthen the logical judgment ability of the left brain.

The wind of artificial intelligence continues to blow, and AI chips ride the wind
The wind of artificial intelligence continues to blow, and AI chips ride the wind

The mode of operation of human brain neural networks has always been the ultimate form pursued by artificial intelligence

By analogy with the human brain, the left brain is mainly responsible for the processing of information logic, such as serial operations, numbers and arithmetic, analytical thinking, understanding, classification, sorting, etc., while the right brain is responsible for parallel computing, multimodality, creative thinking and imagination. Therefore, the left and right brains represent the CPU and GPU functionally, respectively, compared with humans can achieve the left and right brains to work together and mobilize the neural network as a whole, which will be the ultimate vision of AI.

As early as 2011, AMD product concept used CPU and GPU to compare the human left and right brains, and based on this, a heterogeneous product strategy of CPU+GPU was proposed. At present, AMD's MI300A and NVIDIA's Grace Hopper (GH200) are heterogeneous CPU+GPU integrations.

The GPU has high computing power and is aimed at parallel computing, but the CPU must make control calls and issue instructions. On the AI training side, the CPU can be responsible for controlling and issuing instructions to instruct the GPU to process data and complete complex floating-point operations (such as matrix operations). In the face of reasoning of different modal data, we believe that the division of labor between CPU and GPU is also different, so deploying CPU and GPU at the same time can provide greater computing support. For example, when processing the reasoning of speech, language, and text data, the AI model needs to recognize the target text one by one and calculate it orderly, so it may be more suitable to use a CPU that is good at serial operations for computing support; But when processing the reasoning of images, videos, and other data (compared to human operations, each pixel enters the eye at the same time), it requires massively parallel computing, or is more suitable for GPU, such as NVIDIA L4 GPUs can improve AI video performance by 120 times, according to NVIDIA testing, L4 is 99% more energy efficient than traditional CPU-based infrastructure.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

The AI inference market is large, but the computing power requirements are lower than training, so we believe that the use of various chips will be diverse, and GPUs may win share under the trend of large models and multi-modality. However, at present, the inference side is still dominated by CPUs, and the competition is becoming more and more fierce under the influx of multiple parties. It is worth mentioning that there are various chips in the data center, and which chip different AI workloads should run on will depend on the adaptability and cost performance mentioned above. Therefore, various types of chips also have different advantages.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

Are we in the "iPhone" moment of AI?

The concept of artificial intelligence dates back to the fifties and sixties of the last century. Many of the algorithms we are familiar with today, such as neural networks, existed 20-30 years ago but could not be operated efficiently due to a lack of computing power and data. With the application of GPUs to AI, the popularization of cloud computing, and the generation and storage of massive data, AI technology has been rapidly developed and applied.

For the view that "now is the iPhone moment for AI", we are more inclined to think that this is an important breakthrough to describe the GPT-related generative AI beginning to be applied to the B side and liberate productivity. As for the To C side, AI technology has already been integrated into our lives by many applications, such as Siri, a voice assistant in smartphones, and facial recognition.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

Generative AI will drive cloud giants to increase their hardware infrastructure

We believe that the scale and performance of hardware devices is an inevitable requirement in the era of AI big models. Given that generative AI is currently mainly implemented in the path of large-parameter models, with the growth of the number of models and the amount of data that needs to be processed, its training and inference require a lot of computing power and storage resources, so the vigorous development of generative AI applications will drive the demand for high-computing AI chips and cloud computing. According to Bloomberg Intelligence and IDC, the AI training and inference hardware market will reach $93 billion by 2024 and more than $600 billion by 2032.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

Cloud giants and internet giants are expected to continue to increase capital expenditures, with AI hardware as the focus area. Google, Microsoft, Amazon and Meta mentioned in the second quarter earnings briefing:

• Microsoft FY23Q4: Capital expenditure (excluding financial leases) was US$8.943 billion, an increase of 30.16% year-on-year, and plans to continue to increase investment in data centers, CPUs and GPUs;

• Google 232Q: Capital expenditures increased 10% sequentially to $6.9 billion, primarily in servers and AI big model computing, a lower increase than Bloomberg consensus expected primarily due to delays in data center construction projects, but the company expects investment in technology infrastructure to increase in the second half of 2023;

• Amazon 23Q2: Capital expenditure (including financial leases) was $11.455 billion, down 27% year-over-year, although the company expects capital expenditure to decline to just over $50 billion year-over-year in 2023, although affected by declining transportation inputs, but will continue to increase investment in AI and large-language models to meet customer demand;

• Meta 23Q2: Capital expenditure, excluding finance leases, decreased 19% year-over-year to $6,134 million, primarily due to a decrease in non-AI server expenses and delays in the delivery of certain projects and equipment as it rolls over to 2024, which the Company expects to increase in 2024 as investments in data centers, servers and artificial intelligence advance.

Overall, in the first half of 2023, Internet giants represented by Google, Microsoft, Amazon and Meta are gradually increasing in AI-related capital expenditure, although they are disturbed by factors such as project delays or macro and other business planning. Looking ahead to 2024, AI infrastructure will be a key investment area. Therefore, we believe that the increase in capital expenditure in the field of AI by leading cloud vendors and Internet giants will further support the industry trend of AI. We believe that since 2022, the Fed's steady increase in interest rates has led to enterprises cutting data center spending, and the Fed may stop raising interest rates in the future, coupled with the growth of AI demand, which is expected to boost the capital expenditure of technology giants and will continue to drive the volume of infrastructure such as AI chips.

The wind of artificial intelligence continues to blow, and AI chips ride the wind

The above content is for learning and communication only and does not constitute investment advice.

Selected report source: Wenku - Foresight Think Tank

Read on