laitimes

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

author:One Zero Society loves science
AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

01

What is the energy consumption?

Data and computing power are the main engines driving the rapid development of AI technology, and as long as you pay attention to the AI industry, you can clearly feel the influence of massive data and large computing power chips on the "100-mode war". While AI has brought great changes to the human production process, it has also brought great challenges to the global power system.

In 2020, OpenAI's pre-trained GPT-3 large language model parameters reached 175 billion, requiring nearly 1,300 megawatt-hours of electricity, equivalent to 1.3 million kilowatt-hours of electricity, enough for 130 American households for a year;

The parameters required for GPT pre-training have increased from 175 billion to 1.8 trillion for GPT-4 and 10 trillion for GPT-5, which means that as long as the scope of generative AI applications becomes more extensive, power consumption will only continue to rise.

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

Some large model training parameter data

People's livelihood has been cooled and hot. According to data released by the U.S. Department of Labor in early April, U.S. electricity prices have risen by 5% year-on-year in March this year, outpacing gasoline, and the main factor leading to the increase in electricity prices is the new demand for electricity from AI.

Dan Yergin, vice chairman of S&P Global, believes that the United States has seen a surge in electricity demand in the past two years, with the fastest growth in power demand for AI and various data centers, but the current power generation capacity in the United States is far from the demand, and it is difficult to improve the shortage of electricity supply in the short term in view of the long approval time for power projects.

In February this year, Elon Musk, the founder of Tesla, said at the "Bosch Internet of Things Conference" held by the Bosch Group that the constraints of AI are predictable, "A year ago, I predicted that there would be a shortage of silicon, that is, a shortage of chips, and the next one is electricity." Maybe by next year we won't have enough power to run all the chips."

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

Previously, OpenAI's CEO Sam Altman was also optimistic about the construction of nuclear fusion and nuclear fission power plants, pointing out that AI will consume more electricity than humans think, and energy breakthroughs must be achieved in the future to support AI iterations.

If you just list the data, it may still be a bit abstract, why is the processing of parameter information, AI chatbots related to energy?

02

Eternal calorie consumption

The best proof that "there is an energy cost for processing information" is the heat generated when computers work. It is a well-known phenomenon that computers generate heat while they are running, and this is determined by how computers work.

In 1961, Rolf Landauer, a physicist at IBM, calculated in his paper the theoretical efficiency of a "perfect computer" that theoretically does not lose energy against resistance. But even if there is such a computer, some energy must be wasted, because a computer is also a machine – it only stores and processes information in bytes – and as long as it is a machine, it will inevitably follow the second law of thermodynamics, which states that disorder (a quantity called entropy) is always increasing in any closed system.

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

Landall's principle states that even the simplest computational process inevitably results in thermodynamic consumption

The existing classical computers are almost all irreversible computers, and the operation of information processing is logically irreversible, which means that the continuous disappearance of information will also lead to an increase in entropy in the physical world, thus consuming energy. Randall believed that the reduction in entropy could only be exchanged for energy.

According to his calculations, even the simplest computational process, such as deleting a byte, inevitably results in a tiny thermodynamic exertion. In other words, when the information stored in a computer undergoes irreversible changes, it emits a trace amount of heat to the surrounding environment.

Of course, the heat it emits is also related to the temperature the computer is at that time: the higher the temperature, the more heat is emitted, which is why the servers in the data center are now equipped with corresponding cooling systems, so that the heat is released through the water cooling system.

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

Today's electronic computers actually consume hundreds of millions of times more energy than the theoretical value calculated by Randall when performing computing tasks, and researchers have been looking for more efficient calculation methods to reduce this energy cost, such as the unremitting pursuit of room-temperature superconducting materials.

Superconducting materials have zero resistance and electrical conductivity, in which current can flow without energy loss, meaning that the circuits built in the superconducting material do not generate heat, thus eliminating all the energy costs required to process the information, while "room temperature" superconductivity means that it also does not require very low temperature refrigeration facilities, which also typically consume a lot of energy.

Putting it into a large AI model, we can deduce a very simple inference from Randall's principle: the larger the number of parameters and the more data that needs to be processed, the more computation required, the more energy consumed, and the more heat released. In the pre-training stage of the large model, it is first necessary to "feed" the computer a large amount of labeled text data, and then process the input data in the well-tuned model architecture, try to generate the output, and then repeatedly adjust the model parameters according to the difference between the output and the expected effect.

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

The power consumed per 1,000 queries in different AI application directions

According to Schneider Electric's calculations, 80% of the AI load in the data center comes from the inference stage, and only 20% comes from training. In the inference phase, the trained model parameters are loaded first, the text data that needs to be inferred is preprocessed, and then the model generates output based on the learned language rules. In general, no matter what the stage, it is a process of information reorganization for the computer.

However, this kind of heat consumption due to the increase of information is only a drop in the bucket of AI energy consumption, and the greater consumption is still in the integrated circuit.

03

The fruit of chips

Integrated circuits are also known as chips, and in the process of information processing, the current is blocked in the chip, which will cause power consumption, which will be expressed in the form of heat.

On the nanometer chip, there are often hundreds of millions of transistors working together, these transistors can be regarded as tiny switches controlled by voltage, and they can all be connected in series or parallel to achieve logical operations, and the two states of "on" and "off" represent "0" and "1", which is the basis of computer binary.

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

Ohm's law tells us that by controlling voltage changes, we can change the inflow and outflow of electrons, which constitutes an electric current, and that there is always resistance, and Joule's law proves that the heat generated is proportional to the square of the current, and also proportional to the conductor resistance and the time of energizing. Although a single transistor is very small and can not generate much heat, the NVIDIA A100 GPU alone has 54 billion transistors, which is quite large in the context of this scale.

"If you put more than 100,000 Nvidia H100 GPUs in a single state in the United States, the power grid in that state will collapse immediately. This is the information revealed to the media by an engineer at Microsoft some time ago, and his job happens to be to train the new GPT-6 large model in a data center jointly established by Microsoft and OpenAI.

The performance of the NVIDIA H100 GPU is much more powerful than the A100 originally used by ChatGPT: the H100 is designed for AI computing, integrating 80 billion transistors, and the Transformer is equipped with an optimization engine for the basic architecture of large models such as GPT, which makes large model training 6 times faster, and the energy consumption is not far behind.

According to a report by market research firm Factorial Funds, OpenAI's Wensheng video model Sora needs at least 720,000 H100s during traffic peaks, and each H100 consumes about 700 watts of power, which can provide up to 60 TFLOPs of theoretical peak performance, that is, 60 trillion single-precision floating-point operations per second, and each operation involves many transistor switches.

AI is "sucking up" the world's electricity, and the choice between computing power and power is being made

Why do we need more and more powerful GPUs to train AI? This goes back to the beginning, because the scale of large models is so large that parameters can reach the trillion level. The required dataset also needs to be iterated repeatedly, and the values of hundreds of billions of parameters need to be calculated and adjusted each time.

One way to solve this problem is to change the physics of the hardware and break through the limits of Moore's Law. This is what we also mentioned in our previous article, that is, to replace the basic "silicon chips" of modern computers with some new materials, such as graphene, carbon nanotubes and other "carbon-based" chips. Or, look outward for energy breakthroughs. The "artificial sun" is still too far away, so it is better to bet on the upgrading of wind and solar power and energy storage technology first.

Read on