laitimes

Academician Qian Depei: The high-end computing power invested by the state should be connected into a "net" as soon as possible

Academician Qian Depei: The high-end computing power invested by the state should be connected into a "net" as soon as possible

Academician Qian Depei: The high-end computing power invested by the state should be connected into a "net" as soon as possible

▲ Qian Depei, academician of the Chinese Academy of Sciences and professor of Beihang University. Photo: Courtesy of interviewee

Higher performance and lower energy consumption are the two sides of the "coin" of computing power, or the challenges faced by the mainland after the ChatGPT large model brings fire to the computing power industry?

"These two are actually cause and effect of each other, and they are the biggest challenge to our country's development of computing power." Qian Depei, an academician of the Chinese Academy of Sciences and a professor at Beihang University, told the Beijing News Xinjing think tank. Under the international background of the blockade of the mainland's high-tech industries by developed countries such as the United States, we can only embark on a high-performance, low-energy computing road in China if we highlight the encirclement in both aspects.

The good news is that there are already companies that are taking on the heavy lift.

On August 15, iFLYTEK released the V2.0 version of iFLYTEK Spark Cognitive Model. Liu Qingfeng, Chairman of iFLYTEK, introduced at the press conference, "iFLYTEK is working with Huawei to build a domestic computing power cluster for ultra-large-scale parameter model training, benchmarking NVIDIA's A100 chip. ”

According to an inference benchmark test released by the MLPerf organization, the NVIDIA A100 Tensor Core GPU benchmark performance in cloud inference is 237 times that of Intel's most advanced CPUs. MLPerf is an internationally authoritative AI performance benchmarking organization initiated by Turing Award winner David Patterson, Google, Stanford University and Harvard University.

Nvidia's official data shows that at the same level of performance, GPU-accelerated systems consume 588 megawatt-hours less energy per month than CPU-only systems. Scientists can save scientists more than $4 million by running the same workload for a month on a quad-socket NVIDIA A100 cloud instance compared to a CPU-only system.

So, what is the current level of computing power in the mainland, and can it meet the computing power needs of different cities? Is the construction of computing power really as public opinion says, and will definitely drive economic development? Xinjing Think Tank interviewed Academician Qian Depei about this, and he responded to some hot issues about the construction of computing power.

Qian Depei has served as the overall leader of the national major project in the direction of high-performance computing in mainland China for a long time, presided over the formulation of strategic goals and implementation plans, established the architecture and technical scheme of high-performance computers, and realized the leapfrog development of high-performance computers.

The mainland's computing power has stepped up the fourth level

Xinjing Think Tank: At present, the mainland's computing power level ranks second in the world, from weak to strong, how many stages has this process gone through? What are the signature events at the beginning of each new phase?

Qian Depei: Computing power mainly develops with application needs, and high-performance computing in mainland China has received long-term support from the national science and technology plan, so it has realized the development process from tracking to parallel running, and even alternately leading.

If you divide it from the performance of the computer, there are 4 steps in the past 30 years, and the performance of each step has increased by 1,000 times, that is, from G level (1 billion times per second) to T level (trillion times per second), then to P level (1000 trillion times per second) to E level (tens of billions of times per second). The corresponding typical computer systems are the early Dawning 1000 (G class) and the Dawning 3000 (100 G level) at the end of the last century. Since the beginning of this century, the mainland has successively developed several generations of supercomputers, such as Lenovo Shenteng 6800 and Dawning 4000A (T class), Tianhe-1, Dawning 6000 and Shenwei Blu-ray (P level), Sunway Taihu Light and Tianhe 2 (100P level), followed by E-class computers.

In recent years, the development of a new generation of artificial intelligence has sharply increased the demand for computing power, especially in the past year, large model training has attracted more and more attention from all walks of life. At this stage, a group of intelligent computing centers were established, which are mainly computing power centers for artificial intelligence applications.

Xinjing Think Tank: In May 2021, the National Development and Reform Commission, the Ministry of Industry and Information Technology, the Cyberspace Administration of China and the National Energy Administration jointly issued a policy on the construction of the "East Data and West Calculation" project, is it a symbol?

Qian Depei: The "East Data and West Calculation" project should not be regarded as a sign that computing power has reached a new stage. The "East Data and West Calculation" project is a national strategy, which is actually proposed in the context of "dual carbon", the purpose of which is to solve the problems of uneven distribution of national energy, uneven distribution of computing power and application, and uneven economic development. Therefore, the "East Data and West Calculation" project is not only to solve the problem of computing power development, but also as part of the country's new infrastructure.

Higher performance and lower energy consumption are a challenge

Xinjing Think Tank: From the national level, the state has planned the "East Data and West Computing" project, what problems can this integrated new computing power network system solve? What are the challenges?

Qian Depei: The purpose of the national planning "East Data and West Calculation" project is to build a new computing power network system, but its starting point is actually to solve the problem of whether the carbon peak and carbon neutrality goals can be achieved on schedule under the background that the "dual carbon" indicators have been determined. The energy consumption of IT systems is already quite high and is rising rapidly.

In this case, eastern energy is already tight, such as the National Supercomputing Shenzhen Center, originally planned to install an E-class computer, because energy consumption will increase from a few megawatts to 80 megawatts, and finally abandoned the plan. Therefore, the "East Data and West Calculation" project is a long-term national strategy and a measure to solve some practical problems.

As for the challenge of the "East Data and West Calculation" project, it is necessary to avoid its negative effects as much as possible. For example, after the construction of the computing power center in the west, it faces insufficient load, that is, the task is not full. Everything has its two sides, if the computing power center built in the west in the future does not have enough applications, it is a waste to cause idleness, and the network foundation, talent conditions and application needs in the west may not be sufficient. The last thing we want is to use the energy and electricity of the west, but to some extent it has destroyed the environment of the west and has not yet promoted the economic development of the west.

Xinjing Think Tank: What do you think is the biggest challenge facing the development of computing power in the mainland? Is it the lack of chips with higher computing power, or the contradiction between higher energy consumption and carbon reduction?

Qian Depei: I think these two are intertwined challenges, and these two challenges are cause and effect of each other. We measure whether a computer is good or not, to see whether its calculation speed is fast, whether it is energy-saving, and whether it is easy to use. We need computers with higher computing performance and lower energy consumption, because the blockade in the United States makes it impossible to buy high-end chips, that is, high-performance and low-energy chips, such as NVIDIA's H100. It recently released GH200.

We can build our own computers, but if we don't have high-end chips, it will lead to higher energy consumption. The indicator of energy consumption restricts how to build a computer, you can't do whatever you want, rely on more processors (CPUs) to improve the performance of the computer, that can't, because too many processors and too high energy consumption, will exceed the user's affordability. Therefore, without a high-end processor, the energy consumption of the entire computing system cannot be reduced.

So, higher performance versus lower energy consumption is actually the same challenge. We need to make breakthroughs in both aspects in order to embark on a high-performance, low-energy computing path in China.

"Distance" is not a challenge of "counting from east to west"

Xinjing Think Tank: There is also a concern that the three major computing power hubs in Beijing-Tianjin-Hebei, Guangdong-Hong Kong-Macao and the Yangtze River Delta are relatively close to the user side, while the four hubs of Gansu, Ningxia, Inner Mongolia and Guizhou are far from the user side, is it difficult to meet some scenarios with relatively high real-time requirements?

Qian Depei: In fact, there are different types of computing applications, some are high-real-time, some are not so high-real-time, and most of the numerical simulation applications are not so real-time. For example, if an engineer proposes a new design plan and needs to verify whether the design is good or not through simulation, he submits the task when he leaves work, and the next morning he can get the result, which is good.

I think some people have a misconception, as if the machine must be at the table or in the unit to be able to use it at any time. In fact, this is not the case, many applications do not have to be so high real-time. In particular, high-performance numerical simulation calculations are often batch jobs that do not require interactive real-time performance. But some people feel as if the machine is not under his control, so it is inconvenient. I think it's an illusion.

In fact, the real thing that prevents us from using computing remotely is technology, such as the performance of the network. If the network transmission rate is not high, the transmission of a large amount of raw data and result data will be slower, and most people lack tolerance for the delay of this transmission. This can be a constraint to remote use of computing centers.

The other is the service level, but this problem is not prominent in the current situation, because most of the computing power centers in the west are set up by the user-side institutions themselves, such as China Telecom, China Mobile, Alibaba and JD.com. These are within their own control, and there is no problem of affecting user usage due to the level of service.

However, in the long run, the computing power center should gradually transition from providing original computing power to providing application solutions, at which time the service level of the possible application will be more visible.

Xinjing Think Tank: Is a high-real-time requirement like autonomous driving not suitable for using long-distance computing power centers?

Qian: Autonomous driving is another type of application. In fact, in the 1980s, the United States conducted trials of self-driving using optical fiber to control cars thousands of miles apart.

It's a technical demonstration, but I don't believe any operator will rely on remote computing power to control autonomous vehicles in the future. It is unrealistic in terms of economic model and technical feasibility. Unless the latency of the network is reduced to a very, very low level, so that there is no difference between remote and near-ground.

Xinjing Think Tank: From the perspective of different cities, can the current distribution of computing power resources in the mainland well meet the needs of the city or nearby cities?

Qian Depei: If the network is really formed, the supercomputing network or the intelligent computing network, it does not matter whether it is a national network, the geographical distribution of computing power should not be a key factor, unless the transmission network is too poor. As long as the application system can keep up, it can definitely meet the computing power needs of cities near non-computing power centers.

For example, the next upgraded machines of the National Supercomputing Center in Wuxi will be put in Xining, Qinghai, and let them manage them, because electricity is cheap there. So it doesn't matter where the machine is or whether it's close to a city.

What has a big impact is whether your application system can be supported by the computing system, and whether there is corresponding application software support. If there is no corresponding software support, the machine is useless at the doorstep; If there is software support, the machine can be used even if it is deployed in the west.

In addition, how to operate the computing power center, from an economic point of view, depends on whether the fee is reasonable or unreasonable, or whether the user has real benefits. If the cost of user use is reduced after the computing power center is networked, why don't users use it?

However, if the computing center charges unreasonably, or is too expensive in a hurry to recover the cost, then the user will choose not to use your resources. If a computing power center does not have enough applications, it will eventually not be able to operate.

The investment of computing power does not necessarily bring GDP returns

Xinjing Think Tank: There are many sayings about the role of computing power, such as the investment in computing power can drive GDP growth. What impact do you think computing power has on the mainland's economy, science and technology development?

Qian Depei: Computing power is a foundation for innovative national construction, and it is a supporting technology, which has gradually formed a consensus, but it was not considered so twenty or thirty years ago. Today, more and more people have realized the importance of computing power, especially in recent years.

But it is difficult to say exactly how much GDP computing power can bring. The role of computing depends entirely on whether you have completed your application with calculation, not that there must be output when you put in. Therefore, I have never been willing to quote the data of the GDP return of computing power measured by some institutions, and I have always doubted the scientific nature of these data and do not advocate this formulation.

Xinjing Think Tank: More than 10 years ago, the newly established Ministry of Industry and Information Technology proposed the "integration of industrialization and industrialization", that is, the integration of industrialization and informatization. It is also conducive to the transformation and upgrading of traditional industries. What is the difference between the two?

Qian Depei: Informatization is a macroscopic and general embodiment, and in the past two years, more digitalization has been mentioned. In the past, the traditional industry should be informatized, often saying that the operation and management of enterprises should be informationized, and all kinds of information of enterprises should be entered into computers to achieve management. Although the paper data needs to be calculated after entering it into the computer, it is not so powerful computing requirements. Companies now need to rely on high-performance computing numerical simulation and design optimization techniques to become competitive. The role of computing may go beyond traditional informationization.

For example, cars and high-speed rail models are calculated by computers, rather than making prototypes to test and evaluate. That is to say, after using computing, the design optimization and performance improvement of a large number of products no longer need to be carried out in the physical world, but can be completed in the virtual digital space. In this sense, computing plays a very important role in the transformation of enterprises, that is, from labor-intensive or energy-dependent to higher value-added products and higher market competitiveness.

Computing power centers need to implement flexible charging mechanisms

Xinjing Think Tank: You also mentioned the charging of computing power centers just now, and now there is a voice that our computing power centers charge too much, especially for some university researchers.

Qian Depei: The issue of charging fees is a phenomenon unique to the mainland. In many developed countries, the computing power centers established by the state are basically free, of course, it does not mean that there is no money, otherwise how can these centers be maintained. They have set up corresponding computing funds, if universities and scientific research institutions have computing power application needs, then apply and be reviewed by relevant institutions: your computing power application is worthy of support, pass your application.

Many computing power centers in our country are led by the Ministry of Science and Technology, but the Ministry of Science and Technology has no operating costs, and even if there are funds, they cannot be used for the operation of computing power centers. So who will operate these computing power centers, and where will the operating funds come from? Some cities have good economic conditions, such as the Shanghai Municipal People's Government, which has specially allocated some funds to supercomputing centers, what should be done in places with relatively poor economies?

Therefore, the computing power center has a lot of pressure on how to raise money to ensure that it can operate normally, and a large part of the operating expenses are electricity costs. As far as I know, the current supercomputing center charges are as low as possible, and in fact cannot cover their electricity costs, but for a researcher or other users, they often feel that the charges are relatively high. Therefore, the statement that the computing power center charges high is relative.

In addition, I think that the computing power center can also implement a flexible charging mechanism, such as single payment, which pays a fixed fee every year, and then you can use it however you want. Another example is the adoption of different charging rates for different periods of time. In short, I hope that it can be solved through multiple channels, on the one hand, the state should set up some relevant funds to help the computing power center that we have built with great difficulty to operate normally. In addition, it is also necessary to attract other business capital to join in the operation, which can reduce the fees paid by users; It is not realistic to charge at all, and it will breed new problems in the current state, such as someone using our computing power resources to "mine".

The "China Computing Power Network" does not yet exist

Xinjing Think Tank: At present, which computing power centers (supercomputing centers, intelligent computing centers, etc.) in the mainland have been connected to China's computing power network? How will this affect the demand for computing power in cities?

Qian Depei: "China Computing Power Network" does not yet exist, and although there is a lot of publicity, there is no "China Computing Power Network" in the true sense yet. To say "China Computing Power Network" as an infrastructure, who is the owner and who is operating? Not yet.

But a local, domain, or supercomputing center "network" exists. After more than 20 years of development, the supercomputing center has been connected to the network, and now it is planned to develop a new generation of supercomputing Internet, which exists and has been connected to the network. A new part of the intelligent computing center has also been networked, and its business form is in the process of taking shape.

As for whether a network of Chinese computing power can be built in the future, I am a little skeptical. Computing is a bit different from electricity, and if electricity is not connected, it can only be used locally. Now we have two major power operators, State Grid and China Southern Power Grid. However, so far, there is no owner or operator operating the "China Computing Power Network".

Computing, by its very nature, is discrete and distributed from the start. Because computers are distributed in different places one by one, and later with the network, the computers are connected together, and only then there is resource sharing, unified scheduling, and finally an infrastructure form.

Moreover, some computing power resources are invested and built by the state, some are invested and built by enterprises, and the ownership and ownership of different computing power centers are different, how do you unify them into a "network"?

Just like cloud computing, every large enterprise has its own cloud computing platform, Ali, JD.com, Baidu, Tencent, etc. all operate their own cloud platform, they can develop the interconnection between different clouds, can become a consortium, but it is difficult to imagine that JD.com's resources are handed over to Ali to operate, which is impossible to achieve in the business model.

In the past few years, we have been doing one thing, that is, to link the high-end computing power related to the national economy and people's livelihood, related to the country's innovation and development, and mainly invested by the state into a "network" to operate as an infrastructure to support the development of scientific research. A large amount of other computing power may still have to be liberalized and competitive, and the state will give policy guidance.

Beijing News reporter Xiao Longping

Edited by Zheng Weibin

Proofreader Liu Jun