Baidu Intelligent Cloud & Nvidia's new generation of high-performance AI computing cluster online sharing meeting will be held next week

2022-03-14 11:46:21

On March 9, Baidu Intelligent Cloud announced the landing of a new generation of high-performance AI computing clusters, which can provide EFLOPS-level computing power support, and released a new generation of GPU server instance GPU-H5-8NA100-IB01. At the same time, it will be broadcast live at 19:00 on March 16 on the B station enterprise number "Friends of Baidu Intelligent Cloud" and "Baidu Intelligent Cloud" video number, as well as the "Smart Orangutan" video number.

The technical strength of the leading AI native cloud computing power base has been revealed

The new generation of high-performance AI computing clusters is built on NVIDIA A100-80G NVLink GPUs and InfiniBand HDR, becoming the leading AI cloud-native computing power base. Researchers can form thousands of node-scale ultra-high-performance computing clusters based on newly released instances, exponentially shorten the training time of ultra-large AI models, and stimulate the imagination of AI business innovation.

In the new generation of GPU server instance GPU-H5-8NA100-IB01, the super AI computer using Baidu's self-developed X-MAN architecture is used as the hardware platform. Since its launch in 2016, X-MAN has been applied on a large scale in Baidu's internal businesses such as Fengchao, automatic driving, and natural language processing for many years, and has applied for six patents, including PCIe Fabric architecture, liquid cooling technology, and maximum support for 64GPU card expansion, which is an important infrastructure for the rapid landing of Baidu's AI business. At present, X-MAN has been fully upgraded to the fourth generation of X-MAN 4.0, and a new optimized design has been carried out for computing scenarios such as AI and HPC.

In terms of configuration, each X-MAN 4.0 contains eight A100-80G NVLink GPUs and can support eight 200Gb/s InfiniBand niBCs, realizing high-speed storage, high-speed and unobstructed networking, and high-performance computing in one of the super AI computers.

In terms of architecture, the newly designed architecture of X-MAN 4.0 shortens the data transmission delay, improves the data transmission bandwidth, effectively solves the communication bottleneck of local data transmission, and reduces the idle time of the GPU in AI operations. In the MLCommons 1.1 list, X-MAN 4.0 ranks top 2 in terms of the performance of the same configuration stand-alone hardware.

At the same time, in order to achieve higher cluster performance, Baidu Intelligent Cloud has specially designed the InfinitiBand network architecture for ultra-large-scale clusters, which optimizes the network convergence ratio, improves the network throughput capacity, and combines fault tolerance, switches and topology mapping to maximize the performance of the computing cluster with EFLOPS-level computing power.

Blockbuster tech café for you to deeply analyze the product characteristics, applications and scenarios

At 19:00 on March 16th, the Smart East-West Open Class, together with Baidu Intelligent Cloud and Nvidia, planned and launched the "Baidu Intelligent Cloud & NVIDIA New Generation High-performance AI Computing Cluster" online sharing meeting.

The sharing will be attended by Four technical experts, Named Xuan Lingbo, Heterogeneous Computing Product Manager of Baidu Intelligent Cloud, Sun Peng, Senior R&D Engineer, Wu Zhenghui, Senior System Engineer, and Cheng Shuai, Solution Architect of NVIDIA, and will conduct in-depth analysis of technical solutions and applications in a new generation of high-performance AI computing clusters.

First of all, the theme of Baidu Intelligent Cloud's Xuan Lingbo speech was "GPU Cloud Product System Introduction and Application Scenario Sharing". It will comprehensively introduce the various product characteristics of Baidu's intelligent cloud GPU cloud product system and its typical application scenarios, so as to help users select suitable GPU cloud products and accelerate the development of AI services.

Secondly, Sun Peng of Baidu Intelligent Cloud will share with the theme of "Design and Optimization of Ultra-large-scale AI Heterogeneous Computing Clusters". In addition to introducing the IB network design method of EFLOPS-level super-AI heterogeneous computing clusters, Sun Peng will also share the best practices of software and hardware optimization to ensure its efficient operation, and reveal the powerful computing strength in the training of super-large AI models.

Third, Wu Zhenghui of Baidu Intelligent Cloud will introduce the technical architecture of X-MAN through the theme of "Super AI Computer X-MAN Technology Revealed", and focus on analyzing the technical characteristics and key capabilities that are constantly innovating, and how it will eventually become the hardware base in the ultra-large-scale AI high-performance computing cluster.

Finally, Cheng Shuai, NVIDIA Solutions Architect, will share the design features of the NVIDIA SuperPOD reference architecture and its global landing cases on the theme of "NVIDIA SuperPOD Empowering AI Data Center".

In addition to the video live broadcast on the knowledge store of the Zhidong Open Class, this sharing meeting will also be broadcast live on the "Friends of Baidu Intelligent Cloud" B station enterprise number and the "Baidu Intelligent Cloud" video number, as well as the "Zhi Orangutan" video number, want to know more technical content? Want to fight technology online face-to-face and technologists? Click on the link to https://mp.weixin.qq.com/s/aCvi4E8S217AJ4EOxUdnAA today and join us

Baidu Intelligent Cloud & Nvidia's new generation of high-performance AI computing cluster online sharing meeting will be held next week

Read on