The network hero behind the success of ChatGPT, the secret of full-stack intelligent network technology

Wisdom Stuff (Public Number: Zhidxcom)

Author | ZeR0

Edit | Shadow of indifference

Zhidong reported on April 17 that as the research of large language models becomes increasingly hot, the huge scale of parameters has driven the amount of computing to soar, putting forward higher requirements for network bandwidth. The network infrastructure behind accelerated big language model training and inference is also receiving increasing attention.

Breaking through network communication limitations not only helps to improve the performance and efficiency of large model computing, but also helps reduce energy consumption and power costs. Recently, Cui Yan, an NVIDIA network technology expert, and Meng Qing, NVIDIA network marketing director, had in-depth exchanges with Zhidong and other media to share how their full-stack intelligent network technology provides support for large models.

Meng Qing said that in the process of rapid development of AI, if the participants who have entered the market want to catch up with the participants who have entered the market, there are two ways, one is to invest resources according to the successful example, and the other is to invest the same resources but run faster, running faster must improve efficiency, and improving efficiency requires DPU.

OpenAI is an example of the time savings of using DPUs. Microsoft's blog post released some time ago clearly states that the hardware used by OpenAI includes the NVIDIA BlueField-2 DPU and its smart network card InfiniBand 200G network. This tried and tested sample has been referenced by many companies.

First, for two sets of high-performance network platforms, BlueField-3 DPU began large-scale launch this year

For new applications, NVIDIA offers two sets of high-performance network platforms, the Quantum-2 InfiniBand network platform and the Spectrum-4 Ethernet network platform, both of which are end-to-end 400G high-bandwidth high-performance network architectures. BlueField-3 DPUs are common components in both networking platforms, using both InfiniBand and Ethernet networks.

The network hero behind the success of ChatGPT, the secret of full-stack intelligent network technology

At the recent NVIDIA GTC conference, NVIDIA announced that the BlueField-3 DPU was fully operational. NVIDIA began to mass-scale BlueField-3 DPU products to the market this year.

The NVIDIA BlueField-3 DPU will break through the following:

1. 400G connection: 2 times the network bandwidth, 2 times the network pipeline, and 4 times the host bandwidth.

2. Programmable computing: 4 times the Arm computing power, 5 times the memory bandwidth, and new data path accelerator.

3. Zero trust security: 4 times IPsec encryption acceleration, 2 times TLS encryption acceleration, new MACsec encryption acceleration and platform authentication.

4. Elastic storage: 2 times storage IOPs, 2 times storage encryption performance, and new NVMe over TCP acceleration.

Today's data storage is dominated by distributed storage. The BlueField-3 DPU can use virtual storage devices to make the host side aware of whether its data is from local or remote, and provide elastic storage resources for the host side. Moreover, the storage of data is also encrypted, which can meet the different forms of storage types of customers while ensuring the security of data.

Second, four advantages, four use cases

The NVIDIA BlueField-3 DPU has four advantages:

1. Accelerated performance: The hardware design architecture can meet the offloading of software-defined networking, storage, and security, providing faster and higher performance to run the most demanding workloads.

2. Cloud-scale efficiency: Applying BlueField-3 DPU to the architecture can release the x86 core for business applications, allowing the CPU to carry more business applications instead of doing the operations of these infrastructures, so as to achieve unprecedented scale and higher efficiency levels.

3. Strong zero-trust security: Ensure comprehensive data center security without compromising performance. For example, in a cloud environment, multiple tenants share the infrastructure of a data center at the same time, which can meet the security between tenants and achieve effective isolation. It can also divide the host into business application domain and network infrastructure domain, and if the application business domain is hacked, it will not carry out lateral attacks in the data center through the infrastructure domain.

4. Fully programmable infrastructure: The NVIDIA DOCA software framework provides consistent development and running of applications and maintains the highest performance, enabling developers or enterprises to develop and make their own infrastructure applications or provide corresponding services based on BlueField-3 DPU.

The following four use cases can help understand these advantages of the BlueField-3 DPU.

1. Accelerate cloud computing. In the future, most of the computing power of AI model training and inference will rely on cloud computing, and BlueField-3 DPU can be the support of the corresponding cloud computing infrastructure. The BlueField-3 DPU can support 4096 virtual instances, which is 4-8 times the number of virtual instances per node in the previous generation. This has the benefit of being able to host more business applications because each VM can be rented separately. For cloud computing service providers, this is also additional income and increases the return on investment.

2. Secure cloud computing. In a secure multi-tenant cloud environment, it provides an isolated data center control plane, tenant workloads run on hosts, and infrastructure workloads run on BlueField-3 DPUs, achieving isolation between tenants and between business application domains and infrastructure domains, and providing a platform for zero-trust secure deployment, so the security of the entire data center is better guaranteed.

3. Accelerate enterprise computing. NVIDIA, in partnership with VMware, will now use BlueField-3 DPUs on Dell's PowerEdge servers to run VMware vSphere 8, adding up to 50 percent of Redis key-value store transactions per second, enabling zero CPU core usage in VMware networks.

4. Sustainable cloud computing. Data center power consumption and demand are not growing, how to make cloud computing sustainable? After applying BlueField-3 DPU, its own performance is very strong, which can improve the overall performance of the server host; Offloading all infrastructure workloads to BlueField-3 DPUs, freed up CPU computing resources can be used for business applications, which is equivalent to supporting more services without increasing servers, or requiring fewer servers to do the same business.

In addition to freeing up CPU cores, BlueField-3 DPUs can also save power consumption across the server. A server consumes 334 watts of power when running IPSec workloads on the CPU without any load, which increases significantly to 728 watts if an IPSec workload is running on the CPU. If you offload the IPSec workload to the DPU and do it without the CPU, it only needs 481 watts.

Therefore, on the whole, BlueField-3 DPU can save the number of servers purchased, reduce the power consumption of the data center, and reduce the cost of power distribution, cooling, cabinet space, etc., and have some essential improvements to the overall hard limit of the data center.

DOCA: Provide software compatibility for DPU hardware iteration

A chip without software is like expensive sand, useless. NVIDIA DOCA, a software development framework for BlueField DPUs, stands for Data-Center-infrastructure-On-A-Chip Architecture.

DOCA is to DPUs what CUDA is to GPUs. The DOCA software framework is compatible with previous and subsequent BlueField family DPUs. DOCA offloads, accelerates, and isolates infrastructure operations to support hyperscale, enterprise, supercomputing, and hyperconverged infrastructures. Some large cloud service providers and hyperconverged computing companies are learning and using DOCA programming.

DOCA IS VERY SYSTEMATIC, WITH TWO PARTS: SDK AND RUNTIME PARTS.

The SDK mainly serves developers, providing libraries and drivers, orchestration and telemetry APIs, development tools, including simulating the development environment of DPUs built on Arm on x86 laptops, as well as developer documentation and reference applications, which can be installed through NVIDIA SDK Manager or manually.

THE RUNTIME SERVES THE ADMINISTRATOR ROLE, THERE ARE BASIC DOCA SERVICES, LIBRARIES AND DRIVERS, USER GUIDES, APPLICATION SAMPLES, ETC., AND THERE ARE SOME TOOLS TO HELP IT ADMINISTRATORS OR OPERATORS DEPLOY, INSTALLATION METHODS INCLUDE X86/ARM DOCA RUNTIME REPO INSTALLATION FILES, BLB IMAGE DEPLOYMENT, AND LINUX PACKAGE MANAGERS.

NVIDIA GPU Cloud (NGC) can simplify deployment, support one-click deployment to servers, and migrate in different hardware environments such as x86 and Arm through virtualization and migration.

DOCA drivers and libraries provide different APIs and different levels of development interfaces to meet different needs. The bottom layer provides the kernel-mode driver, and very senior developers can "build a house with blocks" on the kernel-mode driver; User model drivers provide basic functionality that allows developers to call and write programs.

For example, UCloud, a well-known domestic cloud computing service provider, uses DPU for its bare metal server leasing, which can run the resource allocation software on the DPU, so that customers can rent 100% of the computing resources and memory resources of this machine, without hidden consumption (data center tax).

DOCA 2.0 was officially released with the BlueField-3 DPU. The NVIDIA BlueField-3 DPU provides three heterogeneous programmable engines, including Arm, accelerated programmable pipeline lines, and data path accelerators (DPAs), which are unified using DOCA's software framework. Therefore, developers do not need to care about how the underlying heterogeneous hardware is implemented.

For example, the smart NIC chip on the BlueField-2 DPU uses ConnectX-6, while the BlueField-3 DPU uses ConnectX-7. Programming implemented on ConnectX-7 can be seamlessly applied directly to the BlueField-3 DPU, calling the same core code without changes.

In addition, DOCA 2.0 adds programming capabilities for Data Path Accelerators (DPAs), processors with RISC-V architecture that are used to accelerate network traffic and process packets.

Overall, DOCA 2.0 has several important new features:

1. Unified software framework. At the same time, it supports the previous generation DPU and the latest BlueField-3 DPU, adding DPA and other functions.

2. The software ecology has become rich. Many hardware and software partners are compatible with and support the new generation of DPUs, including Oracle OCI's cloud infrastructure, Microsoft Azure, and more.

DPA has several characteristics: 1. The DPA computing subsystem is introduced in the BlueField-3 DPU; 2. DPA is optimized for device simulation, IO-intensive applications, high insertion rate, network stream processing, customer protocol, collective and DMA operations; 3. Achieve customer programmability through DOCA FlexIO SDK, DOCA DPA SDK, tool chains, examples and promotional materials that support these SDKs; 4. Provide turnkey applications developed using DPA.

The DOCA 2.0 FLOW library is a very useful feature for developers to abstract away the way packets are processed, and then when invoked, whether it is doing software-defined networking, gateways, or millions of query insertion operations, it makes sense.

The library is like a dictionary, regardless of how the specific operation is completed when called, for developers, so many functions can be achieved in a simple way, so it can quickly write code, deploy, and shorten the business launch time. Its target network applications include routers, next-generation fire arresters (NGFWs), load balancers, user plane functions (UPFs), etc., all based on DOCA FLOW channels provided by NVIDIA.

Regex is an important algorithm for processing network traffic and is the calculation of regular expressions in mathematics. NVIDIA has improved the Regex hardware engine in the BlueField-3 DPU. By enhancing the overall performance, the new bidirectional search capability allows it to find matches faster, thereby improving rule performance and the ability to compile more rules than BF2 currently supports. This is important when the network is concurrent.

DOCA 2.0 also brings the new SNAP infrastructure, SNAP v4, to the BlueField-3 DPU. Many AI companies have hundreds of terabytes of data sets, and making remote calls will pay great attention to this core. Offloading SNAP v4 with DPU frees up bandwidth and call time, greatly improving efficiency. NVIDIA will then release SNAP v4 on public NGC, which is also a one-click deployment.

DOCA 2.0 BlueMan is an easy-to-deploy telemetry visualization tool. Administrators can view all their host status, including health, performance, network traffic problems, and risk warnings.

56% of DOCA developers come from the Chinese community

Currently, the NVIDIA DOCA developer community provides DOCA API documentation and resources to guide developers on how to deploy and program based on the BlueField family of DPUs.

It has more than 4,700 developers worldwide. The DOCA China community was released in July 2021, and at the end of 2021, China accounted for 42% of global developers, and by January 2023, 56% of developers were from the Chinese community. Some of these developers from China are from Internet giants, some from startups, and some from universities.

NVIDIA has considered a problem when promoting the development of the DOCA Chinese community, DOCA software development based on the BlueField series DPU is different from the development of CUDA based on GPU, developers can use laptops or desktops to build a GPU development environment, but it is not easy for developers to build their own DPU development environment.

To this end, NVIDIA authorized the establishment of the DPU & DOCA Center of Excellence, which together with three partners (Leadtek, Shinhiro, and Luen Thai Cluster) provides free DOCA development environment to Chinese developers.

DOCA developers need to successfully enroll in the NVIDIA DOCA Developer Experience Program in order to apply for a free development environment. Applicants need to submit the application details to NVIDIA authorized partners by sending applications to NVIDIA authorized partners 48 hours in advance, and the information is accurate, and they can get 2 hours to 6 hours of free DOCA development environment after being reviewed and approved by authorized partners.

Conclusion: Accelerating the popularization of large models requires better network infrastructure

Whether it is a large model that is becoming more popular or a traditional small and medium-sized model that has become popular, a better network infrastructure is needed to break through throughput and performance bottlenecks.

Just as people never feel too fast about the network in their daily lives, the efficiency improvement of the data center is never-ending. Improving data efficiency means less rented space, less equipment such as air conditioners, less electricity, or much more efficient computing tasks when using the same amount of electricity.

As NVIDIA founder and CEO Jensen Huang said, AI is ushering in the "iPhone moment", with more powerful computing infrastructure and network infrastructure, generative AI capabilities will gradually penetrate into all walks of life, slowly changing many ways of working, producing and living.

The network hero behind the success of ChatGPT, the secret of full-stack intelligent network technology

Read on

Vuzix announced an agreement with Verizon to provide AR for sports and gaming

IOTE Internet of Things Exhibition, West Centaur once again detonated the scene

New Year fireworks, fingertips blooming! Idle mobile phone hardcore transformation to play a new trick

Edge computing is not the edge, and the operator blue ocean market nuggets in 2022 are promising

Dongtu Technology has reached a strategic cooperation with the National Innovation Center in automotive electronics and operating systems

Edge computing windward, TZTEK enters the intelligent transportation

To seize the edge computing "front row seat", why does Intel push vRAN? ｜MWC2022

Win-win AI application new ecosystem, UCloud Launched three series of AI intelligent side-end products

From ecology to technology: manufacturing uses edge computing to break the "smart manufacturing" problem

JD Cloud Wireless Treasure AX6600 Athena Review: This 799 is worth it!

Saudi Arabia's 5G development trend: the network speed is leading the world, and many industries have benefited from it

Investment income and non-operating income both increased significantly, and C*Core Technology's Q1 revenue increased by 90.37% year-on-year

Edge-as-a-Service will play an important role in the automotive industry