laitimes

Chip heterogeneity: the competitive landscape is confusing

Chip heterogeneity: the competitive landscape is confusing

On March 22, NVIDIA released a dedicated CPU for data centers, the "Grace CPU Super Chip". The chip consists of two CPU chips, interconnected by NVLink-C2C technology. NVLink-C2C is similar to the UCIe standard recently initiated by Intel and TSMC, Samsung and many other technology manufacturers, and is also a new type of high-speed, low-latency, chip-to-chip interconnect technology that can support custom die and GPU, CPU, DPU, NIC, SOC to achieve interconnection.

At present, the demand for computing power in new data continues to rise, and only a single type of architecture and processor cannot handle more complex massive data, and "heterogeneousness" is becoming a key technical direction to solve the bottleneck of computing power. Chiplet ("core grain") technology is considered a collection of "heterogeneous" technologies. In early March, the Intel-initiated UCIe standard will provide a unified interface and technical standard for chiplet ("core grain") technology, and TSMC, Samsung, Sun Moonlight, AMD, and other manufacturers joined, but NVIDIA did not move.

Experts pointed out that this shows that Nvidia does not want to be outside the UCIe alliance, but it also shows NVIDIA's absolute confidence in NVLink-C2C, and may form its own alliance in the future. In the field of global heterogeneous computing, although AMD also occupies a seat, from the perspective of its joining the UCIe standard alliance, AMD has been biased towards Intel's side in terms of "heterogeneousity", and the future heterogeneous chip battle is mainly between Intel and NVIDIA, which the industry calls "double British war".

Intel's "Core Alliance"

The charm of UCIe is that the chiplets of various enterprises can be specified under a unified standard, so that chips from different manufacturers, processes, architectures, and functions can be mixed and matched, so that interoperability can be easily achieved, and high bandwidth, low latency, low energy consumption, and low cost can be achieved. Zhang Binlei, senior analyst of Chip Research, told China Electronics News that the development of "small chip" chiplet technology is expected to promote the development of heterogeneous computing, chiplet technology provides a unified interface and technical standards, to solve the problem of heterogeneous packaging connection and transmission efficiency (there will be a slight loss in speed and energy efficiency). The UCIe standard will facilitate the development of chiplet-related technologies that are expected to achieve balanced and commercial value in terms of performance and power consumption.

Chip heterogeneity: the competitive landscape is confusing

Intel has proposed six technical pillars that have played a key role in the implementation of XPU, including process, architecture, memory, interconnection, security, and software. Although heterogeneous computing seems to be a hardware-level content, to release its capabilities, it needs to consider the integration of chips, systems, and software in order to play a role. One is the chip layer, which refers to the heterogeneity in the chip package, and the concept of "small chip" is closely linked; the second is the system layer, which refers to the integration of multi-functional multi-architecture computing architecture; the third is the software layer, a unified cross-architecture programming model oneAPI, which can provide developers with convenience in programming on different architectures through a set of software interfaces and a set of function libraries. Under the unified UCIe standard, the difficulty of heterogeneity will plummet and the effect will be better.

At present, the UCIe Alliance has included the entire upstream and downstream industry chain of semiconductors, packaging, IP suppliers, wafer foundries and cloud service providers. Mark Papermaster, EXECUTIVE VICE PRESIDENT AND CHIEF TECHNOLOGY OFFICER OF AMD, said: "The UCIe standard will be a key factor in leveraging heterogeneous computing engines and accelerators to drive system innovation. ”

Lu Lizhong, academician of TSMC Technology and deputy general manager of design and technology platform, said: "This industry-wide alliance is determined to expand the package-level integration ecosystem, and TSMC is pleased to join it. TSMC offers a variety of silicon and packaging technologies to create multiple implementations for heterogeneous UCIe devices. ”

Dr. Lihong Cao, Director of Engineering and Technology Marketing at Sun Moon Semiconductor, said: "The industry generally believes that heterogeneous integration helps bring chip-based designs to market. ”

Nvidia or "start another stove"

However, while paying attention to the UCIe Alliance, people also found that there are no two heterogeneous integration companies in the UCIe Alliance. The reason for this can be found in part by NVIDIA CEO Jen-Hsun Huang at the recent GTC 2022 Spring Developer Conference.

NVIDIA released NVIDIANVLink-C2C interconnect technology, which has links that are up to 25 times more efficient than PCIe Gen 5 on NVIDIA chips, 90 times more area efficient, and achieve consistent interconnection bandwidth of 900 GB per second or more. That said, in terms of heterogeneously integrated small chip interconnects, Nvidia is also doing something similar to Intel.

"In addition to NVLink-C2C, NVIDIA will support the UCIe standard. Custom chip integration with NVIDIA chips can use either the UCIe standard or the NVLink-C2C. Huang Renxun said.

In this regard, some experts pointed out that this shows that Nvidia does not want to be outside the UCIe alliance, but at the same time shows NVIDIA's absolute confidence in NVLink-C2C, and may form its own alliance in the future.

Chi Xiannian, senior consultant of CCID Consulting Integrated Circuit Center, told China Electronics News that NVIDIA's own NVIDIA NVLink-C2C relies on NVIDIA's world-class SERDES and LINK design technology, which can be extended from PCB-level integration and multi-chip modules to silicon inserters and wafer-level connections. This provides extremely high bandwidth while optimizing energy efficiency and die area efficiency. Compared to the UCIe standard, NVLink-C2C is optimized for lower latency, higher bandwidth and higher energy efficiency.

Apple may have the same considerations as Nvidia. At the beginning of this month, the "crossover player" Apple took the M1 Ultra, the strongest desktop chip on the earth, to grab the "rice bowl" of others, and the product performance surpassed that of many CPU and GPU professional players.

Zhang Xianyang, a research analyst at Chip, told reporters that the self-developed chip M1 Ultra announced by Apple on March 9, 2022 is based on the Chiplet process, which provides an ultra-high bandwidth of 2.5TB/s, far ahead of the currently announced UCIe1.0 standard. That said, Apple's Chiplet product line can be completed through cooperation with TSMC and leads the current UCIe standard, so joining the alliance is not a must for Apple.

The "heterogeneous" pattern is confusing

Previously, the global heterogeneous computing field has maintained a three-country killing pattern of mutual checks and balances. But the emergence of the UCIe alliance broke the original balance, and the intimate interaction between Intel and AMD, and the seeming separation of NVIDIA, made the whole situation confusing. Although the combination of vertical and horizontal is the best strategy, but the iron also needs to be hard, want to take the lead in the barrier of heterogeneous computing, strength is the last word, so the three giants are in their respective areas of expertise.

Chip heterogeneity: the competitive landscape is confusing

The "Big Three" each have their own heterogeneous computing systems. Chi Xiannian introduced that the Intel-led heterogeneous computing system is mainly used for its own series of products and services, which has advantages in the field of PC and high-performance mobile computing; the OpenPower Alliance, which is dominated by IBM, Google, and Nvidia, is based on IBM Power chip architecture technology and is mainly oriented to high-performance computing applications; HSA (heterogeneous system) with AMD, Qualcomm, ARM, Samsung, Beijing Huaxia Core, etc. as the main body Architecture, heterogeneous computing system) alliance, is a completely open heterogeneous computing alliance, ARM, Qualcomm, Samsung and other giants involved, in the field of high-performance mobile computing has an advantage.

As the only enterprise in the industry that owns CPUs, independent GPUs, IPUs, ASICs, FPGAs, and various accelerators, Intel is the cpu leader. A new architecture was proposed at a recent investor conference, Falcon Shores, scheduled for completion in 2024, a new architecture that integrates X86 and Xe GPUs into a single Xeon slot.

Song Jiqiang, president of Intel China Research Institute, told China Electronics News that it is an innovation to integrate the performance of the X86's main chip plus GPU. In terms of performance, Falcon Shores will deliver more than 5x performance per watt, more than 5x compute density, and more than 5x memory capacity and bandwidth.

When asked by the reporter what advantages Intel has compared with others, Song Jiqiang pointed out that one is that the technical foundation is solid and powerful; the other is that Intel can have an architecture and a variety of different accelerators to handle the appropriate application load; and the third is that Intel proposes "software first". This is especially important for developers.

GPU leader NVIDIA unveiled its Grace CPU series products for artificial intelligence and supercomputing use needs at GTC2021 last year, and created a new chip route "GPU+DPU+CPU". At this year's GTC2022, NVIDIA announced the launch of the first Arm Neoverse-based dedicated CPU for AI infrastructure and high-performance computing - the "Grace CPU Super Chip".

The GraceCPU Super Chip is designed for AI, HPC, cloud computing, and hyperscale applications and consists of two CPU chips that are interconnected via NVLink-C2C. And with the ability to accommodate 144 Arm cores in a single socket, it achieves industry-leading analog performance of 740 points in SPECrate 2017_int_base benchmarks. According to NVIDIA Labs estimates using comparable compilers, this result is more than 1.5 times higher than the current DGX A100.

Huang Washizog was very fond of it, praising: "Everything about Garce is amazing, and we expect the Grace superchip to be the most powerful CPU by then, 2 to 3 times the top CPU of the 5th generation that has not yet been released." ”

For Nvidia, the emergence of Grace CPU makes Nvidia's CPU products no longer limited by Intel and AMD, although Nvidia is the global GPU overlord, but the GPU is only responsible for computing acceleration, need to rely on the CPU to issue instructions to execute, so the communication between the GPU and the CPU is particularly important.

Although the previously boiling ARM acquisition case ended in failure, this is also a signal from Nvidia to the outside world, and it is fully demonstrated to enhance its determination to be heterogeneous.

The new FPGA leader AMD in the merger of Xilinx after the completion of the completion of the situation, out of the situation that various industries can only be the second, AMD has become another semiconductor manufacturer after Intel with cpu, GPU, FPGA three major product lines, the future AMD CPU will be combined with Xilinx's FPGA into a CPU + FPGA heterogeneous model. Xilinx's deeply cultivated FPGA products, in the FPGA market in 2020, Xilinx's global and Domestic market share will reach 50% to 55%. For the Xilinx acquisition, AMD President and CEO Su Zifeng said that AMD can provide a broader portfolio of high-performance computing products by effectively integrating Xilinx's advantages in FPGAs, providing system-level solutions from CPU to GPU, ASIC, and FPGA. At the same time, with Xilinx's resources in 5G, communications, autonomous driving and industry, AMD can bring high-performance computing capabilities to more areas and expand to a wider customer base. And AMD will enable FPGAs to run programming languages on off-the-shelf CPUs in the future, and develop custom ASIC products for implementing certain functions or software stacks.

Xiaoming Pan, Senior Vice President of AMD Worldwide and President of Greater China, said at the 2021 World Semiconductor Congress: "Today's and tomorrow's workloads require powerful computing power, and heterogeneous computing is a key future trend. AMD will focus on high-performance computing in three areas of computing, graphics, and solutions to maintain high-performance computing leadership in a growing industry. ”

Read on