laitimes

Challenge Google! Chinese companies compete for AI network standards

author:Ping An Jiangsu

Alibaba Cloud has just elected new members to the UEC Technical Advisory Committee of the Hyper-Ethernet Alliance, and Alibaba Cloud has become the only Chinese company among the 13 members, and will work with technology giants such as Microsoft, Meta, AMD, and Broadcom to promote the development and development of open networks and standards, and build the next generation of AI network infrastructure.

Just a few days ago, the AI high-performance network HPN7.0 paper created by Alibaba Cloud was selected into the top international academic conference SIGCOMM, which caused heated discussions in the industry, and experts pointed out that this architecture is very likely to replace the Jupiter architecture proposed by Google and become the next generation of AI network architecture standards.

With the wave of large models sweeping in, AI infrastructure has become the hottest battleground for tech giants, and this time, China has taken a rare lead.

Challenge Google! Chinese companies compete for AI network standards

16 times more scale!

Ultra-high-performance networks speed up China's AI models

As we all know, large models require large computing power. At a time when computing resources are extremely tight, only through the innovation of system architecture can AI overtake in corners. A highly stable and high-performance network is the key underlying technology that supports this AI infrastructure.

The network architecture and technology come from the West and are also monopolized by the West. Google's Jupiter architecture for data center networks proposed in 2015 is the most mainstream technology line and dominates the design of data center network architecture in the industry.

It was not until September 2023 that Alibaba Cloud launched a new generation of HPN7.0 architecture, announcing the completion of the world's first breakthrough in AI high-performance network clusters.

Experts say that HPN 7.0 is very likely to replace Google's classic Jupiter architecture and become the mainstream architecture paradigm and standard for next-generation AI networks.

HPN7.0 is not a refurbishment or a hardcover, but a systematic reconstruction. To use a popular analogy, in the past, a house could accommodate 10 people, and a good technology company could stuff 15 people into this house with a single operation, but Alibaba Cloud directly redesigned and created a house that can accommodate 100 people.

A set of data shows that under the traditional general-purpose computing cluster architecture, a single-layer switch supports direct interconnection with a maximum of 16~64 GPUs with extreme performance. Alibaba Cloud's AI intelligent computing cluster architecture HPN 7.0 supports 1,024 GPUs for direct interconnection with single-layer switches. It is equivalent to a direct increase of 16 times the scale of extreme performance interconnection! This brings a large enough network performance guarantee for the training and inference of large AI models.

Challenge Google! Chinese companies compete for AI network standards

HPN 7.0 architecture: a high-performance network cluster designed for AI

Based on HPN7.0, Alibaba Cloud's AI infrastructure can efficiently coordinate the scheduling of various chips, support the scalable scale of clusters of up to 100,000 cards, achieve high-performance and high-stability interconnection of the network, make super-large clusters run efficiently like a computer, and help improve the performance of large model training by 14.9% compared with the previous generation.

Not long ago, Alibaba Cloud released the Tongyi Qianwen 2.5 version of the large model, and the Chinese performance has fully caught up with GPT-4Turbo, which is based on HPN7.0 high-performance network cluster training.

It is conceivable that in the future, all domestic companies will be able to obtain high-quality AI network services through Alibaba Cloud, which will greatly benefit the development and application of China's large models.

Embrace open source

Alibaba Cloud took the lead in formulating the "Android" standard for AI intelligent computing networks

Currently, there are two main standards for AI high-performance networks, one is InfiniBand, a private standard led by NVIDIA, and the other is RoCE v2 (RDMA network based on converged Ethernet).

These two standards are like Apple and Android in the web world: one is self-sufficient and basically closed; An open source and the most dynamic. Which standard to choose, you basically choose the full set of equipment, systems, software and applications that the standard represents.

Among them, the Linux Foundation's initiative to establish an open source organization, the Hyper Ethernet Alliance, UEC has developed the most rapidly, and technology giants have joined one after another, making UEC the hottest and hottest AI infrastructure-related organization at the moment.

Challenge Google! Chinese companies compete for AI network standards

The latest news shows that in the election of the core technical committee of UEC, Alibaba Cloud was successfully selected and became the only member of the Chinese company! This means that for the first time, a Chinese technology company will appear in the core technology research and development decision-making of the next generation of open network, and Alibaba Cloud, together with Microsoft, Meta, AMD, Broadcom and other technology giants, will participate in the core decision-making and standard formulation!

According to people familiar with the matter, the competition for seats on the UEC technical committee is fierce, and only members with strong technical skills and great contributions to the open source community can be elected through the election. Alibaba is the only Chinese company among AMiner's top 10 network research institutions in the world, and Alibaba Cloud is also one of the seven founding members of the open network SONiC community and the vice chairman of the technical committee. Experts pointed out that the selection of the UEC technical committee fully proves that China's network technology has been fully recognized by industry peers.

According to sources, based on the large-scale practice of HPN 7.0, Alibaba Cloud is taking the lead in promoting technical drafts such as network performance improvement projects for AI needs, and this direction happens to be one of the most important directions for UEC's future technology layout.

From lagging behind to catching up, and now participating in decision-making on future technology direction and standard formulation, Chinese technology companies represented by Alibaba Cloud have been working silently for more than ten years to continue to make breakthroughs in the underlying network infrastructure field in this AI era, so that Chinese solutions can break the monopoly and become a global open technology standard, so that AI can serve the human world better and faster.

Source: Observer.com

Read on