laitimes

中国移动发布了一颗芯片:mainland首颗400Gbps DPU

中国移动发布了一颗芯片:mainland首颗400Gbps DPU

Semiconductor Industry Observations

2024-04-29 16:57Posted on the official account of Anhui Semiconductor Industry Watch

Since the release of ChatGPT last year, the demand for chips in data centers has ushered in a new round of upward cycle. Among them, the performance of NVIDIA GPUs is the most eye-catching. At the same time, cloud vendors, as the ultimate buyers of chips, have also begun to get involved in the development of core chips in data centers by self-development or cooperation with chip vendors.

Amazon Cloud (AWS) in the United States was the first to realize the successful commercial deployment of DPU chips (called Nitro by AWS) in 2017, and AWS has achieved huge benefits after using self-developed DPU chips.

Recently, China Mobile, a major domestic cloud user, also brought its own DPU chip "Rock" - the first local DPU ASIC chip with 400Gbps bandwidth, to achieve independent and controllable key technologies. Considering the importance of DPU to data center and cloud services, it is believed that this will lay a solid foundation for China Mobile, which has vigorously developed cloud services in recent years, and bring it unique competitiveness.

中国移动发布了一颗芯片:mainland首颗400Gbps DPU

Figure: China Mobile Panshi DPU V4.0

DPU, the third workhorse chip

The so-called Data Processing Unit (DPU) is a data processing unit. As the name suggests, this is a chip that is specifically designed for data processing. As the third main chip after CPU and GPU, DPU has become the target of attention of almost all cloud vendors and even overseas chip giants. For example, Nvidia spent $6.9 billion to acquire Mellanox, and AMD spent $1.9 billion to acquire Pensando, just for DPU.

The DPU white paper "White Paper on the Development of Cloud Computing Universal Programmable DPU (2023)" jointly written by China Mobile in collaboration with Clouded Leopard Intelligence and the Academy of Information and Communications Technology pointed out: "As human productivity enters the era of computing power, the traditional CPU-centered architecture is suffering from the bottleneck of computing power, and the diversified computing power demand urgently requires a comprehensive change in software and hardware architecture It will become the new core of computing power, redefine the new standard of cloud computing technology in the computing era, and build a new technology curve in the computing era. ”

But in fact, before the DPU came out, this chip still went through several generations of changes.

In the beginning, the data processing work in the data center was done by the CPU, and the network transmission task was handled by the specialized traditional network card NIC (also known as network interface card). The specific workflow is that the NIC converts the data that the user needs to transfer into a format that can be recognized by the network device, and then hands the data to the CPU for processing.

However, as the network scale continues to increase and new requirements continue to emerge, the amount of data stored in the network and storage continues to increase, which in turn drives the rapid evolution of NIC port rates in data centers from 10G to 25G, 100G, and even 200G and beyond, bringing new pressure to CPUs. At this time, a SmartNIC designed to reduce some of the processing load on the CPU and further improve the efficiency of the data center came into the public eye. It is understood that in addition to the network transmission function of the traditional basic network card, SmartNIC also provides certain hardware offloading and acceleration capabilities to release part of the computing resources of the host CPU.

However, in subsequent developments, SmartNIC was also overstretched. For example, because the general-purpose processor CPU is not included, it means that the host CPU is still required for control plane management and most of the processing of protocols such as network and storage, which continues to consume a large amount of host resources. Moreover, as data center network speeds continue to increase to 100G, 200G, and beyond, hosts not only consume a lot of valuable general-purpose CPU resources to classify, track, and control traffic, but also their performance can no longer meet the needs of higher network speeds and storage bandwidth.

Therefore, how to achieve "zero consumption" of host CPUs and unlock the evolution of data centers to larger scale and higher bandwidth has become the next research direction of cloud vendors, and DPUs have also come into being.

From the design point of view, the DPU adds a general-purpose processing unit CPU and a rich hardware acceleration unit to the hardware architecture, so as to facilitate the acceleration and full offloading of general-purpose infrastructure such as network, storage, security, and control. Its product forms mainly include NP/MP+CPU, FPGA+CPU and single-chip ASIC solutions. It is understood that in the early stage of development, the FPGA+CPU multi-chip solution based on FPGA programmability became the first choice in the industry.

In addition to Amazon, most cloud vendors, especially domestic cloud vendors, such as Alibaba, Tencent, Baidu, etc., use traditional FPGA+CPU solutions, and their competitive pressure is also coming. With the further increase of bandwidth traffic, the programmable ASIC single-chip solution with price and performance advantages, excellent performance of dedicated accelerators and embedded general-purpose processors has become the final choice of the industry, and domestic cloud manufacturers are also seeking the evolution from FPGA+CPU solutions to ASIC solutions, which finally drove China Mobile to develop its own DPU chip "Rock" using ASIC.

Rock, a major breakthrough

From the perspective of product application, how to call a DPU with competitive advantages?

In our view, it should be able to support high-speed and low-latency networks first, as this is the chip's top priority, and secondly, we also hope that the DPU will introduce high-performance general-purpose multi-core CPUs and programmable hardware accelerators to provide programmability and general-purpose processing power while also meeting the performance of differentiated specific tasks such as artificial intelligence, analytics, and security operations.

China Mobile's chip has a bandwidth of 400Gbps, which closely fits the current high bandwidth demand of data centers, and we can safely say that the successful research and development of China Mobile's "Rock" DPU chip is a major technological breakthrough in the field of domestic chips in the mainland.

Familiar readers should be well aware that data center servers are becoming more and more integrated. Whether it is x86 or Arm, the CPU chips of servers are integrating hundreds or more CPUs on a single chip, and the density is increasing; at the same time, network storage is also developing in the direction of elastic storage based on low-latency Ethernet technology, which increases the demand for high-bandwidth and low-latency Ethernet; furthermore, the increase of private cloud applications and virtual desktop infrastructure puts forward additional requirements for the network; and finally, the massive data accumulation of the Internet of Things and the edge is increasing the bandwidth requirements for the network.

Combined with the new demand brought by AI, 400Gbps is surging, which also makes the release of Panshi DPU at the right time. According to reports, the advent of Panshi DPU not only raises the maximum transmission rate of domestic DPU chips to a new level, but also dwarfs the DPU of another domestic operator based on overseas FPGA+CPU multi-chip solutions.

It is worth mentioning that with the advent of this chip, the domestic DPU has also evolved to the world's top level for the first time - the chip has reached the same level as the world's leading Nvidia BlueField-3 DPU.

According to public information, the Panshi DPU chip has a data transmission capacity of 400Gbps, doubling the maximum transmission rate of the domestic DPU chip and reaching the world's top level. The chip has the memory capacity to process millions of packets per second, and the latency of remote direct access data (RDMA) is as low as 5 microseconds. At the same time, it also has the characteristics of low power consumption and low cost, and the Panshi DPU hardware board built on this chip reduces the power consumption by 50% and the cost by 50% compared with the previous generation of hardware boards. After China Mobile launches the "Panshi" DPU chip, it will undoubtedly bring huge benefits to its cloud services, and will also bring more cost-effective product solutions to its customers, which will also bring greater pressure to other domestic cloud service providers.

We believe that the "Panshi" DPU chip will effectively promote the independent and controllable key technologies of the mainland DPU, the continuous optimization of the hardware architecture, and the continuous improvement of the ecological layout. China Mobile further pointed out at the press conference that the chip will be widely used in the construction of China Mobile's data centers, support general computing, intelligent computing and other business scenarios, provide safer, reliable and efficient technical support for cloud computing, edge computing, big data processing, AI large model training and other fields, and help the rapid development of big data, artificial intelligence and computing power networks in mainland China.

As we all know, building a DPU ASIC is not an easy task, which is why most vendors build solutions based on FPGAs. From the current point of view, compared with ASIC solutions, FPGA-based multi-chip solutions have high power consumption, high cost, high R&D requirements for users, large manpower investment, and cannot flexibly port various applications. What's more, the high-performance FPGA chips used in these solutions and the CPU chips used in these solutions are all provided by overseas manufacturers.

This makes the 400Gbps domestic DPU chip developed by China Mobile's collaborative ecological partners more meaningful in the country's strategy of vigorously developing new quality productivity.

Write at the end

As a leading operator in China, China Mobile has frequently invested in chip self-development through its subsidiaries in the past few years.

For example, in June 2023, China Mobile IoT, a subsidiary of China Mobile, officially released the world's first RISC-V architecture LTE-Cat.1 chip (CM8610 LTE-Cat.1 chip) and China Mobile's first mass-produced cellular IoT communication chip (CM6620 NB-IoT chip); According to China Mobile's description, these chips can not only improve the performance of the company's products, but also contribute to the domestic independent and controllable business.

After the launch of the "Rock" DPU, China Mobile's self-developed chip business has reached a new level. Looking forward to more surprises from them in the future.

View original 610K

  • 中国移动发布了一颗芯片:mainland首颗400Gbps DPU

Read on