laitimes

Behind the "gold-swallowing beast" ChatGPT: AI computing power is urgent!

In the past two months, netizens around the world have been enthusiastically "tweaking" ChatGPT, but the first one who can't stand it is the owner of ChatGPT.

For a longer-term look, OpenAI announced a paid subscription version of ChatGPT Plus, which costs $20 per month.

While OpenAI says it will continue to offer a free version, the paid program will be better off "helping as many people as possible use the free service." But the New York Times also noted that "during peak hours, the number of visitors to the free version will be limited." ”

Obviously, charging will be an inevitable choice for the long-term development of AI services such as ChatGPT.

The root cause lies in the fact that ChatGPT is "getting smarter" and needs huge expenses to support. Among them, the cost of computing power is the most important, and it is also the part that cannot cut corners.

So, how much computing power does ChatGPT need to support?

"Gold-swallowing beast" ChatGPT

Hash power consumption

The consumption of computing power by ChatGPT can be divided into three main scenarios:

The first is the model pre-training process, which is the most important scenario where ChatGPT consumes computing power.

ChatGPT uses a pre-trained language model, and under the model architecture of Transformer, the language pre-training process can process all inputs at once according to the context to achieve massively parallel computing.

By stacking multiple decoding modules, the scale of the number of layers of the model will also increase, and the number of parameters that can be carried will increase simultaneously. Correspondingly, the more computing power is required to train the model.

According to the OpenAI team's paper "Language Models are Few-Shot Learners" published in 2020, training a GPT-3 model with 174.6 billion parameters requires about 3640 PFlop/s-day.

That is, if you calculate a quadrillion times per second, you also need to calculate 3640 days.

Considering that the model used in ChatGPT training is fine-tuned based on the GPT-3.5 model, and the GPT-3.5 model increases the number of parameters and training samples, including more than 174.6 billion parameters, it is estimated that the computing power required to train a ChatGPT requires at least about 3640 PFlop/s-day.

Soochow Securities Research Report analysis believes that the optimization of ChatGPT mainly comes from the increase of the model and the resulting increase in computing power.

The number of parameters of GPT, GPT-2 and GPT-3 increased from 117 million to 175 billion, and the amount of pre-training data increased from 5GB to 45TB, of which the cost of GPT-3 training was as high as $4.6 million.

At the same time, it is difficult to succeed in the model development process at one time, and the entire development stage may require multiple pre-training processes, so the demand for computing power is continuous.

In addition, the process of migrating from a basic large model to a specific scenario, such as building a medical AI large model based on ChatGPT, requires the secondary training of the model using data from a specific domain, which will also increase the training computing power requirement.

The second is the model iteration process.

From the perspective of model iteration, the ChatGPT model is not static, but requires continuous model tuning to ensure that the model is in the best application state.

In this process, on the one hand, developers need to adjust the model parameters to ensure that the output content is not harmful and distorted; On the other hand, it is necessary to perform large-scale or small-scale iterative training of models based on user feedback and PPO strategies.

Therefore, model tuning will also bring computing power costs to ChatGPT, and the specific computing power requirements and cost amount depend on the iteration speed of the model.

The third is the daily operation process.

In the daily operation process, the data processing requirements brought by user interaction are also a large computing power expense. Considering that ChatGPT is aimed at mass users around the world, the more people use it, the greater the bandwidth consumption, and the server cost will only be higher.

According to SimilarWeb, the total number of visits to ChatGPT's official website in January 2023 was 616 million.

According to Fortune magazine, the computing power cloud service costs about $0.01 per user interacting with ChatGPT.

Based on this, ChatGPT's monthly operating cost is approximately US$6.16 million.

According to the above, we know that training a GPT-3 model with 174.6 billion parameters requires 3640 PFlop/s-day computing power and a cost of 4.6 million US dollars, assuming that the unit computing power cost is fixed, it is estimated that the computing power required for ChatGPT's monthly operation is about 4874.4PFlop/s-day.

Intuitive comparison, if you use a data center with a total investment of 3.02 billion and a computing power of 500P to support the operation of ChatGPT, at least 7-8 such data centers are needed, and the investment in infrastructure is tens of billions.

Of course, the infrastructure can be solved by renting, but the pressure brought by the demand for computing power is still huge.

As domestic and foreign manufacturers enter the market to develop similar large models, the demand for computing power will be further increased.

The era of AI computing power hegemony has arrived

The growth rate of model computing power demand exceeds the growth rate of chip performance, and the era of computing power hegemony may come.

According to OpenAI calculations, since 2012, the global head AI model training computing power demand has doubled in 3-4 months, and the computing power required for head training models has increased by up to 10 times every year.

Moore's Law states that chip computing performance doubles roughly every 18-24 months.

The data shows that from 2012 to 2018, the computing power spent training AI increased by 300,000 times, while Moore's Law increased only 7 times in the same time.

Therefore, the mismatch between the growth of computing power demand for AI training models and the growth of chip computing performance may lead to rapid growth in the supply and demand for computing power infrastructure.

Considering the key role of computing power in the training effect of AI models, model developers with more abundant computing power resources may be able to train better AI models.

So now there is a saying: AI has entered a new era of computing power hegemony, and everyone has to use a thousand times and ten thousand times the computing power to train the world's best algorithms.

So whoever is involved needs to answer the question: how to solve the cost of computing power?

In China, this answer is hidden in the "East Data and West Calculation" project that the country is in full swing.

The data shows that the scale of the mainland computing power industry has grown rapidly, with an average growth rate of more than 30% in the past five years, ranking second in the world in terms of computing power.

However, in the process of development, it still faces problems such as low per capita computing power, difficult to meet the needs of on-demand computing power, and insufficient application breadth and depth of computing power.

Therefore, the national "East Data and West Computing" project optimizes the layout of data center construction by building a new national integrated computing power network, guiding the computing power demand in the east to the west in an orderly manner, and making use of the resource advantages of the west to provide low-carbon, low-cost high-quality computing power for the development of digital China.

For the AI industry, "East Data West Computing" can also become "East Data West Training", that is, the huge training computing power demand can be completely transferred to the western data center with lower computing power cost and more advantageous scale.

Correspondingly, these data centers that carry intelligent training themselves will also carry out targeted transformation to better adapt to the needs of intelligent training, such as servers that use a large number of intelligent training chips in terms of energy supply, heat dissipation structure, cabinet form, etc.

This also puts forward new ideas for the future development of data centers.

Data center construction will bid farewell to the cookie-cutter era of generalization, and enter the scenario-guided and application-oriented era of "specialized", and applications such as "East Data West Training", "East Data West Shade", and "East Data West Storage" will become the mainstream direction.

At present, the mainland computing power industry is still growing rapidly.

According to the "2022-2023 Chinese Intelligent Computing Power Development Assessment Report" jointly released by IDC and Inspur Information, compared with the total computing power of 135EFLOPS in 2020, the scale of intelligent computing power in mainland China will nearly double in 2022, reaching 268EFLOPS, exceeding the scale of general computing power. It is expected that the compound annual growth rate of the scale of mainland intelligent computing power will reach 52.3% in the next five years.

In the future, the mainland should further strengthen the construction of supercomputing centers, intelligent computing centers and edge data centers in terms of computing power, continuously meet the needs of diverse intelligent scenarios such as governments, industries, enterprises and even individuals, and empower the high-quality development of thousands of industries such as smart cities, smart healthcare, and smart agriculture with computing power.

Not only that, vigorously strengthening the production capacity of independent and controllable high-end chips, striving to achieve lane change and overtaking in the field of quantum chips, and strengthening the training of computing power talents are also important means to maintain the leading edge of mainland AI computing power.

Read on