laitimes

The three major operators are vying for layout, setting off an "arms race" in AI computing centers around the world

author:The semiconductor industry is vertical
The three major operators are vying for layout, setting off an "arms race" in AI computing centers around the world

In recent years, the field of artificial intelligence is ushering in an explosive development led by generative AI models. On November 30, 2022, OpenAI launched an artificial intelligence conversational chatbot ChatGPT, whose excellent natural language generation capabilities have attracted widespread attention around the world, exceeding 100 million users in 2 months, and a wave of large models has been set off at home and abroad, Gemini, Wenxin Yiyan, Copilot, LLaMA, SAM, SOLA and other large models have sprung up, and 2022 is also known as the first year of large models.

As a result, AI is considered a revolutionary technology and is strategically important to governments around the world. Data shows that this year, the enterprise adoption rate of generative AI in mainland China has reached 15%, and the market size is about 14.4 trillion yuan. The adoption of generative AI technology in four major industries, namely manufacturing, retail, telecommunications, and healthcare, has all seen rapid growth.

As one of the three major elements that promote the development of artificial intelligence, computing power is known as the "engine" and core driving force of artificial intelligence. Computing power refers to the computing power of a device to process data to achieve the output of a specific result. Chen Yuanmou, a senior analyst at the Strategic Development Research Institute of China Telecom Research Institute, said that every point increase in the computing power index will pull the digital economy by about 0.36 percentage points and GDP by about 0.17 percentage points

The shortage of computing power has even become a key factor restricting the research and application of artificial intelligence. In this regard, the United States has adopted a ban on the sale of high-end computing products in China, and Huawei, Loongson, Cambrian, Dawning, Haiguang and other companies have entered the entity list, and the advanced technology of their chip manufacturing is limited, and the process nodes that can meet the scale of mass production in China are 2-3 generations behind the international advanced level, and the performance of core computing power chips is 2-3 generations behind the international advanced level.

01

The shortage of computing power has given rise to a huge market for computing power centers

In the 21st century, mobile computing and cloud computing are booming. With the advent of cloud computing, computing power can "flow" through the network to every corner where it is needed, just like water and electricity.

The rise of artificial intelligence has put forward higher requirements for computing power, and the emergence of special hardware such as GPU (Graphics Processing Unit) and TPU (Tensor Processing Unit) has greatly improved processing efficiency and provided strong support for the training and inference of machine learning models.

The superimposed factors of shortage of computing power have further contributed to and nurtured such a huge market as computing power centers. Computing center refers to a computing center with high-performance computing, large-scale storage, high-speed network and other infrastructure, aiming to provide large-scale, high-efficiency, and low-cost computing services.

Taking China as an example, many places across the country are accelerating the deployment of public computing infrastructure. In Shanghai, the country's first computing power trading platform and artificial intelligence public computing service platform have been built. In Guangzhou, the first computing resource release and sharing platform in China has been built. It can be said that these public platforms have bridged the supply and demand sides.

At present, China has built national computing hub nodes in 8 regions, and planned 10 national data center clusters to build a national computing network system. By the end of 2023, there will be 128 intelligent computing center projects in China, of which 83 projects will disclose their scale, with a total scale of more than 77,000 P. In addition, a total of 39 intelligent computing center projects have been put into operation in 2024.

02

There is still a gap in intelligent computing, and the three major operators have laid out intelligent computing centers

In the past two years, large AI models have emerged one after another, and the demand for intelligent computing has also grown rapidly. IDC, a market consulting agency, predicts that in 2026, the scale of China's intelligent computing power will enter the level of 10 trillion floating point per second (ZFLOPS), reaching 1271.4EFLOPS. The "Action Plan for the High-quality Development of Computing Infrastructure" issued by the six departments clarified the construction rhythm of top-level computing power in the next three years. It mentions that the gap in intelligent computing construction from 2023 to 2024 is 23EFlops. In 2025, the national computing power target will exceed 300EFlops, the proportion of intelligent computing will reach 35%, and the intelligent computing power target will be 105EFlops.

In response to this, in recent years, the three major operators have also been actively deploying intelligent computing centers and proposing relevant strategic deployments.

In terms of providing professional intelligent computing infrastructure services, China Unicom has laid out a "1+N+X" intelligent computing capability system, including a super-large-scale single intelligent computing center, N intelligent computing training and pushing integrated hubs, and localized X intelligent computing reasoning nodes.

China Mobile has strengthened the "4+N+31+X" data center layout, achieved computing resource coverage around hotspots, centers and edges, and built more than 1,000 edge nodes. "4+N+31+X" data center system, in which "4" refers to the four hot business areas of Beijing-Tianjin-Hebei, Yangtze River Delta, Guangdong-Hong Kong-Macao Greater Bay Area, and Chengdu-Chongqing, "N" refers to the ultra-large data centers planned in the 10 data center clusters of the national hub node, "31" refers to the ultra-large data centers planned by various provinces, and "X" refers to the city-level data centers and aggregation computer rooms.

China Telecom has put forward the concept of "cloud-network integration", forming a "2+4+31+X+O" computing power layout. Specifically, it refers to building integrated resource pools in two national cloud bases in Inner Mongolia and Guizhou, building large-scale public clouds in four regions including Beijing, Tianjin and Hebei, building localized dedicated clouds in 31 provincial capitals and key cities, building differentiated edge clouds in X nodes, and extending the computing power system to overseas in countries along the "Belt and Road".

03

The United States, Europe and Japan have invested heavily to set off an "arms race" in AI computing power around the world

Currently, countries around the world are developing their own AI strategies and policies to promote the development of the AI industry.

In 2016, the U.S. Artificial Intelligence Research and Development Strategic Plan for the United States explicitly proposed to strengthen the construction of AI infrastructure. At the same time, the European Union has also set out the goal of strengthening infrastructure in its AI strategy published in 2018. These infrastructures mainly include computing resources, data resources, human resources, etc. Following the footsteps of the United States, Japan has successively issued three versions of the "Artificial Intelligence Strategy" in 2019, 2021, and 2022. In April last year, the Japanese government established an AI strategy group, headed by Hideki Murai, an assistant to the prime minister, and composed of officials in charge of AI policy from the Cabinet Secretariat, the Ministry of Foreign Affairs, and the Digital Agency.

Under a series of strategic deployments, the United States, Japan, Europe and other countries and regions are also vying to build computing power centers, setting off an "arms race" of AI computing power around the world.

In November last year, the National Supercomputing Center and a number of leading companies in the field of AI jointly formed the Trillion Parameter Consortium (TPC). The consortium is made up of scientists from across the globe with the goal of working together to advance AI models for scientific discovery, with a particular focus on giant models with a trillion or more parameters. Currently, TPC is developing scalable model architectures and training strategies, and organizing and collating scientific data for model training to optimize AI libraries for current and future exascale computing platforms.

In addition, the U.S. Department of Energy's Oak Ridge National Laboratory and Lawrence Livermore National Laboratory, as well as IBM and NVIDIA, have established the Supercomputer Center of Excellence to jointly develop a new generation of HPC computers, using IBM's Power processors and NVIDIA's Teslak accelerator cards, with floating-point performance of at least 1 billion times and up to 3 billion times.

In December 2020, the EU proposed to allocate 7.5 billion euros for the "Digital Europe" plan, of which 2.2 billion euros will be spent on supercomputing and 2.1 billion euros for artificial intelligence. Specifically, the plan includes the acquisition of at least one exascale supercomputer by the end of 2021, the establishment of a Europe-wide data space and testing facilities for AI in areas such as health, manufacturing and energy, the deployment of a pan-European quantum communications infrastructure and support for the establishment of a cybersecurity product certification program, and a dedicated master's program in artificial intelligence, advanced computing and cybersecurity.

In March last year, the British government pledged £1 billion ($1.3 billion) in supercomputing and artificial intelligence research in hopes of becoming a "tech superpower". As part of that strategy, the government said it wanted to spend around £900 million to build a "hyperscale" computer that would be able to build its own "BritGPT" to rival OpenAI's generative AI chatbot.

In April, Japan's Ministry of Economy, Trade and Industry will provide subsidies totaling 72.5 billion yen to five Japanese companies to build artificial intelligence supercomputers, aimed at reducing dependence on U.S. technology. The Japanese government provided government subsidies of 50.1 billion, 10.2 billion, 1.9 billion, 2.5 billion, and 7.7 billion yen, respectively, to Sakura Internet, Japanese telecom giant KDDI, GMO Internet, Rutilea, and Highreso. According to the news, Japan's "National Institute of Advanced Industrial Science and Technology" will develop a supercomputer as early as this year, with about 2.5 times the computing power of existing machines. Under the supervision of the Ministry of Economy, Trade and Industry, the agency will make the supercomputing available to domestic companies developing generative AI through cloud services.

In addition to government-backed projects, global technology companies are also spending money to build computing power. Amazon plans to invest $148 billion in the next 15 years to build data centers around the world to meet the needs of artificial intelligence and other needs. Google Inc. announced a $3 billion investment to build or expand its data center campuses in Virginia and Indiana. Microsoft and OpenAI are also working on a five-phase supercomputer construction project that will require an investment of more than $115 billion, most of which will be spent on procuring the computing power needed to drive AI.

04

Operators launched large-scale procurement, and the AI chip market exploded

The large-scale layout of computing power centers has also brought about large-scale procurement of AI chips.

Recently, China Mobile launched a large-scale centralized procurement of AI chips, which has attracted widespread attention in the industry. China Mobile launched the procurement of new intelligent computing centers from 2024 to 2025. According to the tender announcement, the total procurement scale of this project reached 8,054 units. Some institutions estimate that according to the previous winning bid, the scale of this procurement may exceed 15 billion yuan.

A month ago, China Unicom also launched the purchase of more than 2,500 AI servers, and China Telecom has already taken action. With the launch of large-scale bidding by the three major operators, in the eyes of the industry, the deployment of domestic computing power has been on the "fast lane".

Just 2 months ago, China Mobile also released the 2023-2024 new intelligent computing center (trial network) centralized procurement project, with a total of 2,454 AI training servers corresponding to 12 standard packages (1,204 units for 1-11 standard packages and 1,250 units for 12 standard packages).

At the end of March, China Unicom issued the prequalification announcement of the 2024 China Unicom artificial intelligence server centralized procurement project, the announcement shows that the 2024 China Unicom artificial intelligence server centralized procurement project has been approved, and the tenderer is China United Network Communications Co., Ltd. and provincial branches, Unicom Digital Technology Co., Ltd., etc. This time, China Unicom will purchase a total of 2,503 artificial intelligence servers and a total of 688 RoCE switches for key networking equipment.

In October last year, China Telecom also announced the evaluation results of the centralized procurement project of AI computing power servers (2023-2024), and manufacturers such as Super Fusion, Inspur, and Xinhua III were shortlisted, purchasing a total of 4,175 AI servers and 1,182 switches.

05

The construction of computing power centers has benefited AI chip manufacturers

At present, the main enterprises that build computing power centers include operators, large cloud service enterprises and large Internet enterprises. These enterprises have abundant funds and large volumes, and can bear the huge costs of building computing power centers. At the same time, it has a huge demand for computing power, and there are also abundant downstream customers who can sell computing power.

On October 17, 2023, the U.S. Department of Commerce issued ECNN 3A090 and 4A090 requirements for the Export Control List to further restrict the export of high-performance AI chips, while adding 13 Chinese companies to the Entity List. Modified Overseas Control Design Products include, but are not limited to: NVIDIA A100, A800, H100, H800, L40, L40S, and RTX 4090 products. Due to the United States' restrictions on the purchase of domestic computing AI chips. At present, the computing center and related AI chips have formed two markets at home and abroad.

The huge domestic computing power market has benefited domestic chip manufacturers. Recently, China Mobile officially announced the construction of the world's largest single intelligent computing center - China Mobile Intelligent Computing Center (Hohhot), which has been put into operation. About 20,000 AI accelerator cards have been deployed in the project, and the localization rate of AI chips has exceeded 85%.

China Unicom has also recently built the country's first "government + operator" intelligent computing center in Beijing, and the computing center continues to use the basic software and hardware of Ascend AI produced in China.

Previously, China Telecom Shanghai lit up the "large-scale computing power cluster and artificial intelligence public computing service platform" in Shanghai, which is the largest carrier-level intelligent computing center in China, with a computing power cluster scale of 15,000 cards and independent innovation AI chips. The Central Intelligent Computing Center of China Telecom, which was put into operation at the beginning of the year, has also adopted a solution architecture based on domestic AI basic software and hardware platforms.

It is not difficult to find that most of the domestic computing centers use domestic AI software and hardware. At present, GPUs are the largest in the AI chip market, and the current domestic AI chip procurement mainly benefits China's representative companies include Huawei, Haiguang Information, Jingjiawei, Suiyuan Technology, etc. Last year, Baidu ordered 1,600 Ascend 910B AI chips for 200 servers.

According to agency estimates, affected by the upgrade of NVIDIA's restrictions, the new market space for AI domestic chips will reach more than 70 billion in 2024.

Other major foreign markets have received fewer restrictions on chip procurement. The global AI chip market is currently dominated by European and American manufacturers represented by NVIDIA, and industry data shows that NVIDIA almost "monopolizes" the AI chip market with a market share of 80%. Previously, Nvidia CEO Jensen Huang also announced that they will build an AI factory in Japan, and the factory will prioritize the supply of GPU demand in Japan.

06

Competition has intensified, and large manufacturers have begun to develop their own AI server chips

At present, it is generally believed that under the boom of artificial intelligence, the AI chip manufacturers that "sell shovels" will benefit the most. The data shows that the chip cost accounts for about 32% of the total cost in basic servers, and in high-performance or stronger servers, the chip cost accounts for as much as 50%~83%.

The high cost has also led to more and more Internet and IT equipment manufacturers starting to develop their own AI server chips.

In 2016, Google launched its self-developed AI tensor processing unit (TPU), around 2022, Google began to develop server CPUs based on Arm architecture, and in April 2024, Google released its self-developed Arm architecture CPU-Axion, and announced that the chip has been used internally.

In 2020, Microsoft began to customize chips for its Azure cloud services, and in November 2023, Microsoft launched two self-developed chips - Maia100 and Cobalt100. Maia100 is a chip designed for large language model training and inference, using TSMC's 5nm process, and Cobalt100 is a 128-core server CPU based on Arm architecture.

In early April, Meta released MTIA, a next-generation AI training and inference accelerator with more than double the compute and memory bandwidth of the previous generation, and the latest version of the chip to help drive ranking and recommendation ad models on Facebook and Instagram.

Previously, it was reported that OpenAI, an American AI research company, is in talks with potential investors, including the UAE government, in an attempt to push forward a project aimed at increasing global chip manufacturing capacity and reshaping the global semiconductor industry. According to one of the people familiar with the matter, the plan is ready to raise up to $5 trillion to $7 trillion.

In addition, large domestic manufacturers have not made much concessions and have begun to develop AI chips. Recently, China Mobile officially released the Dayun Panshi DPU at its 2024 Computing Network Conference, with a chip bandwidth of 400Gbps, which is the leading level in China.