laitimes

"Hunting" NVIDIA H100

"Hunting" NVIDIA H100

Recently, generative AI led by ChatGPT has swept the world, and the huge increase in productivity brought about by technological upgrading is also revolutionizing various industries, and even industrial logic needs to be reevaluated.

And the "shovel seller" behind the AI wave, NVIDIA entered the trillion-dollar market cap club in one fell swoop.

Nvidia's most recent fiscal quarter results are equally surprising. Earnings data showed that Nvidia's second-quarter revenue was $13.507 billion, a record high, which made analysts' expectations of $11.04 billion extremely conservative.

"Hunting" NVIDIA H100

NVIDIA fiscal 2023 Q2 revenue data

Overall, NVIDIA's business is basically twice the size of the same period last year, which is almost entirely thanks to the strong demand for its AI chips, which are frantically snapped up by startups and tech giants that build generative AI services.

Market research firm Omdia recently released a report that NVIDIA expects shipments of H100 GPUs for AI and high-performance computing applications to exceed 900 tons in the second quarter. And it expects its GPU sales in the coming quarters to be roughly the same, so Nvidia will sell about 3600 tons of H100 GPUs this year.

And not only that, there is also H800, as well as the previous generation of A100, A800 and other GPU products. Therefore, it can be expected that shipments will accelerate in the coming year as NVIDIA profits from the generative AI boom.

According to industry insider sources, the production of NVIDIA H100 in 2023 has long been sold out, and it will not be available until at least mid-2024 to order now.

Who will get how many A100 GPUs, how many H100 GPUs, and when, are all hot topics in Silicon Valley right now.

Nvidia's biggest customers seem to agree. International giants such as Microsoft, Amazon, Google and Meta all signaled in their recent earnings reports for the fiscal quarter ended June that they have a strong intention to continue investing in generative AI capabilities, despite a slowdown in capital investment in other areas.

AI godfather Sam Altman once revealed that the GPU has run out of steam, hoping that fewer users can use ChatGPT. Sam said OpenAI has postponed several short-term plans due to GPU limitations.

According to sources, Chinese technology giants Baidu, Tencent, Alibaba and ByteDance have placed more than $1 billion in delivery orders to NVIDIA this year, purchasing a total of about 100,000 A800 and H800 chips; The value of AI chips delivered next year will reach $4 billion.

Not only are technology companies lining up to buy H100, but Middle Eastern countries such as Saudi Arabia and the United Arab Emirates are also gaining momentum, buying thousands of H100 GPUs at one time. Among them, the "Falcon 40B" model developed by the Abu Dhabi Technology Innovation Institute in the United Arab Emirates is a hot commercial model in the recent open source community, reflecting the UAE's spare no effort in enhancing basic computing power.

In an industry-hot "Nvidia H100 GPU: Supply and Demand" article, the author also deeply analyzes the current use and demand of technology companies for GPUs. The article speculates that the large-scale H100 cluster capacity of small and large cloud providers is running out, and the H100 demand trend will continue at least until the end of 2024.

As Nvidia CEO Jensen Huang said: "Our current shipments are far from meeting demand. ”

NVIDIA GPU chips are not only not worried about selling, but the profit margin is also frighteningly high. Industry experts have said that the profit margin of the NVIDIA H100 is close to 1,000%. After the news was announced, it quickly triggered a heated discussion on the chip battlefield.

US financial institution Raymond James revealed in a recent report that the cost of the H100 chip is only about $3,320, but NVIDIA's volume price for its customers is still as high as $25,000-$30,000, and the profit margin of up to 1,000% has made H100 almost the "most profitable" chip ever.

This can also be fully confirmed by the quarterly financial report, NVIDIA's Q2 net profit reached 6.18 billion US dollars, up 843% year-on-year. Nvidia's adjusted operating margin in the most recent fiscal quarter reached 58%, the highest level in at least a decade, and a sharp jump from its average operating margin of 39% in the previous eight quarters.

NVIDIA's explosive growth and long-term outlook show that AI demand is not short-lived. The huge market space and unimaginable prospects attract many manufacturers to participate in it, which will further stimulate industry competition.

Under this trend, the battle for AI chips is intensifying.

Technology giants such as AMD, Intel, and IBM and new companies are successively launching new AI chips in an attempt to compete with NVIDIA AI chips; Google, Microsoft, Amazon, Alibaba, Baidu and other companies have also laid out self-developed chips to reduce dependence on external suppliers.

AMD: the "number two player" in the GPU market

The current AI chip market can be said to be the world of NVIDIA, and it is not easy for every challenger to shake its foundation. AMD, as an old rival of NVIDIA, will naturally not allow it to monopolize such a large and ultra-fast-growing market.

For this "No. 2 player" in the GPU market, everyone expects it to come up with the "ultimate weapon" that shakes NVIDIA's "computing power supremacy" status.

In June this year, AMD, which attracted much attention from the industry, released the Instinct MI300 series products. Instinct MI300 series products mainly include MI300A, MI300X two versions, and Instinct Platform that integrates 8 MI300X.

For the MI300A, AMD CEO Su Zifeng claimed that this is the world's first APU acceleration card for AI and HPC, using the integrated combination of "CPU + GPU + memory", with 13 chiplets, a total of 146 billion transistors, 24 Zen 4 CPU cores, 1 CDNA 3 graphics engine and 128GB HBM3 memory.

"Hunting" NVIDIA H100

The Instinct MI300X is a direct benchmark to the NVIDIA H100 chip, an accelerator specifically for generative AI. The product uses the design of 8 GPU chiplets and 4 I/O memory chiplets, a total of 12 5nm chiplets packaged together, so that the number of transistors integrated reaches 153 billion, higher than the 80 billion transistors of NVIDIA H100, which is the largest chip since AMD put into production, which can accelerate the application of large models such as ChatGPT.

Compared to NVIDIA's H100 chip, the AMD Instinct MI 300X has 2.4 times the density of HBM and 1.6 times the bandwidth, and can theoretically run a larger model than the H100.

In addition, AMD also released the "AMD Instinct Platform", which integrates 8 MI300Xs and provides a total of 1.5TB of HBM3 memory.

Su Zifeng said that as the scale of model parameters increases, more GPUs are needed to run. With the increase in AMD chip memory, developers will no longer need as many GPUs, which can save costs for users. In addition, she revealed that the MI300X will be sampled to some customers in the third quarter of this year and mass production in the fourth quarter.

So, can the excellent MI300X compete with the H100?

Some industry experts said that although AMD's MI300X uses a larger 192GB HBM3, NVIDIA's products are also iterating, and when the MI300X is officially released in the future, NVIDIA may have launched products with stronger parameters. And because AMD has not announced the price of the new product, the cost of the MI300X with 192GB HBM3 may not have a significant price advantage compared to the H100.

Second, the MI300X doesn't have the engine that the H100 has to speed up large Transformer models, which also means that it will take longer to train with the same amount of MI300X. At present, the supply of GPUs for AI training is in short supply, the price is rising, and the launch of MI300X will undoubtedly be conducive to healthy competition in the market, but in the short term, AMD's MI300X may be more as a "replacement" for customers who cannot buy H100.

The chief analyst of the Supreme Think Tank also said that although from the performance parameters disclosed by AMD, MI300X is superior to NVIDIA's H100 in many aspects, but it is not that the higher the performance, the more people use it, which is not a positive relationship. NVIDIA has been deeply engaged in the GPU field for many years, and has market recognition and product stability that AMD does not have. In addition, in terms of the establishment and development of software ecology, NVIDIA's CUDA has built a moat that other competitors cannot overcome in a short period of time after more than ten years of accumulation.

Although AMD already has a complete set of libraries and tools ROCm, which is also fully compatible with CUDA, providing AMD with conditions and reasons to persuade customers to migrate, compatibility is only a stopgap measure, and only by further improving its own ecology can it form a competitive advantage. In the future, ROCm needs to support more operating systems, open up a wider framework in the field of AI, so as to attract more developers, compared with hardware parameters, software barriers and barriers are higher, AMD needs a long time to improve.

Karl Freund, principal analyst at Cambrian-AI Research LLC, also pointed out in the latest report from Forbes magazine that the MI300X faces some challenges compared to Nvidia's H100.

On the one hand, the NVIDIA H100 has been shipped with a full load, while the MI300X is still in its "infancy"; Second, in the AI industry, NVIDIA has the largest software ecosystem and the largest number of researchers, while AMD's software ecosystem is not so perfect. Moreover, AMD has not disclosed any benchmarks, and training and running AI large models depends not only on GPU performance, but also on system design.

As for the advantages of the MI300X in memory, Karl Freund believes that NVIDIA will also provide products with the same memory specifications, so this will not be an absolute advantage.

On the whole, it is not easy for AMD to shake the ever-growing NVIDIA.

However, it is undeniable that although NVIDIA's "AI throne" is difficult to shake in the short term, MI300X is still a strong competitor to NVIDIA H100, and MI300X will become a "second choice" in addition to NVIDIA H100.

In the long run, AMD is also a wary competitor for Nvidia.

Intel: Compete for the throne of the AI computing power market

As we all know, GPU resources are currently in short supply, NVIDIA's 100 series is banned in China, and the demand for computing power under the 100-model war is still soaring. For the Chinese market, the current urgent need for AI chips to "quench thirst", for Intel, the current window of shortage of computing power is also an excellent opportunity to attack.

In July this year, Intel launched the AI chip Habana Gaudi 2 for the Chinese market, directly benchmarking the 100 series of NVIDIA GPUs to compete for the throne of the AI computing power market.

At the press conference, Intel directly compared the Gaudi 2 with NVIDIA's A100, and its ambitions were evident. According to data released by Intel, the Gaudi 2 chip is built specifically for training large-language models, using a 7nm process with 24 tensor processor cores. From computer vision model training to BLOOMZ inference with 176 billion parameters, Gaudi 2 performs about twice as much per watt as A100, reducing power consumption for model training and deployment by about half.

Sandra Rivera, executive vice president of Intel and general manager of the data center and artificial intelligence group, said that in terms of performance, according to the results of MLPerf Training 3.0, an AI performance benchmark released by ML Commons at the end of June, Gaudi 2 is the only chip that can run the MLPerf GPT 3.0 model in addition to NVIDIA products.

With the big models changing day by day, Intel has continued to optimize around Gaudi 2 in recent months.

According to reports, compared to the A100, Gaudi 2 is more competitively priced and has higher performance. Next, the Gaudi 2 with FP8 software is expected to offer a better price/performance ratio than the H100.

In fact, Intel released Gaudi 2 overseas last year, and this time it is a "China Special Edition" launched in China.

Intel emphasized that in the Chinese market, Intel has cooperated with major domestic server manufacturers such as Inspur Information, New H3C, and Super Fusion. Sandra Rivera said: "The demand for AI solutions in China is very strong, and we are in talks with almost all traditional customers. Cloud service providers and communication service providers are enterprise customers, so there is a strong demand for AI solutions. ”

On the other hand, in terms of product pipeline, Intel has been emphasizing XPU in recent years, that is, diversified, multi-combination heterogeneous computing. In the AI-related product line, there are CPU processors with integrated AI accelerators, GPU products, and ASIC-type AI chips represented by the Habana Gaudi series.

The popularity of large models continues to drive the demand for AI chips.

Intel's Gaudi 2 processors have been selling strongly since their launch in July, and Intel CFO David Zinsner said at an earlier conference that it has seen more and more customers seek their Gaudi chips as an alternative to processors in short supply.

Gaudi is an AI-accelerated exclusive product. Among Intel products, Gaudi is the best and best performance for large model workloads. According to Sandra Rivera: "Next year we will have the next generation of Gaudi 3 released. In 2025, we will combine Gaudi's AI chips with the GPU roadmap to launch a more integrated GPU product. ”

A few days ago, Intel revealed at the "Intel Innovation" event held in San Francisco that the next generation of Gaudi 3 built using the 5nm process will greatly improve performance. Among them, the performance under BF16 is increased by four times, the computing power is increased by 2 times, the network bandwidth is 1.5 times, and the HBM capacity is increased by 1.5 times.

"Hunting" NVIDIA H100

Going forward, after Gaudi 3, Intel plans to introduce a successor codenamed Falcon Shores.

Regarding Falcon Shores, Intel did not disclose many details. But according to its original plan, Intel will launch the Falcon Shores chip in 2024, and the original plan was to design "XPU", that is, integrated CPU and GPU. But at last month's earnings conference, Intel adjusted Falcon Shores' plans and subsequently repositioned it as a standalone GPU, which will be released in 2025.

Overall, the Gaudi series as a flagship of Intel AI, the outside world is also waiting to see the performance and computing power of Gaudi 2 in practical applications. From hardware iteration to software ecology, the competitive story of AI chips will continue.

IBM: Analog AI chips, leading industry trends

The future of AI requires new innovations in energy efficiency, from the way models are designed to the hardware on which they run.

IBM recently unveiled a new analog AI chip that is said to be 14 times more energy efficient than the current industry-leading Nvidia H100, which is designed to solve one of the main problems of generative AI: high energy consumption. This means that it is able to complete more computational tasks with the same energy expenditure.

This is especially important for the operation of large models, which typically require more energy to run. IBM's new chip is expected to ease the pressure on generative AI platform companies and may replace Nvidia as the dominant force in generative AI platforms in the future.

This is due to the way analog chips are built. These components differ from digital chips in that they can manipulate analog signals and understand grayscale between 0 and 1. Digital chips are the most widely used in this day and age, but they can only handle different binary signals and differ in function, signal processing, and application areas.

IBM claims that its 14nm analog AI chip can encode 35 million phase-change storage devices per component and can model up to 17 million parameters. At the same time, the chip mimics how the human brain works, and the microchip performs calculations directly in memory, suitable for energy-efficient speech recognition and transcription.

IBM has demonstrated the advantages of using such chips in multiple experiments, including a system capable of transcribing audio from people speaking with an accuracy very close to that of a digital hardware setup. In addition, the speed of speech recognition has also been significantly improved, increasing by 7 times. This will result in a smoother user experience for many use cases that require real-time response, such as voice assistants and smart speakers.

The release of IBM's analog AI chip marks that analog chips have become a new trend in the field of artificial intelligence. By integrating a large number of phase-change memory cells, the chip enables more efficient computing and energy efficiency. With the continuous development of technology, it is expected that analog chips are expected to become a new trend in the field of artificial intelligence in the future and become the core driving force for the development of artificial intelligence technology.

In short, IBM's new analog AI chip is expected to bring a major breakthrough in the field of generative AI. NVIDIA GPU chips are the components that power many of today's generative AI platforms. If IBM iterates on that prototype and prepares it for the mass market, it may well one day replace Nvidia's chips as the mainstay of the current generation.

SambaNova: A new AI chip that challenges NVIDIA

Under the continuous shortage of high-end GPUs, a chip startup that wants to challenge NVIDIA has become the focus of hot discussions in the industry.

The unicorn SambaNova just released a new AI chip SN40L, which is manufactured by TSMC's 5nm process and contains 102 billion transistors, a peak speed of 638TeraFLOPS, up to 1.5T of memory, and supports a sequence length of 256,000 tokens.

"Hunting" NVIDIA H100

Compared to the main competitors, the NVIDIA H100 has up to 80GB HBM3 memory, and the AMD MI300 has 192GB HBM3 memory. SN40L's high-bandwidth HBM3 memory is actually smaller than the previous two, relying more on high-capacity DRAM.

SambaNova CEO Rodrigo Liang said that while DRAM is slower, a dedicated software compiler can intelligently distribute the load between the three memory layers and also allows the compiler to treat 8 chips as a single system.

In addition to hardware metrics, SN40L is optimized for large models to provide both intensive and sparse compute acceleration.

Gartner analysts believe that one possible advantage of the SN40L is multimodal AI. While GPUs are architected and can be inflexible when faced with diverse data such as images, videos, and text, SambaNova can adjust the hardware to meet the requirements of the workload.

Compared to other chip vendors, SambaNova business model is also more special, not just selling chips, but selling its custom technology stack, from chips to server systems, and even deploying large models.

Rodrigo Liang pointed out that hundreds of chips are required to run a large model of trillions of parameters under current industry standard practices, and our method makes the total cost of ownership only 1/25 of the standard method.

According to Rodrigo Liang, a cluster of 8 SN40L can handle a total of 5 trillion parameters, which is equivalent to 70 large models with 70 billion parameters. Global 2000 companies only need to buy two such 8-chip clusters to meet all large model needs.

At present, SambaNova's chips and systems have obtained many large customers, including the world's leading supercomputing laboratory, Japan's Fugaku, Argonne National Laboratory, Lawrence National Laboratory, and consulting company Accenture.

Cloud service providers develop their own AI chips to get rid of NVIDIA

At present, NVIDIA is still a well-deserved "AI computing power king", A100, H100 series chips occupy the top position of the pyramid, is the power source behind large language models such as ChatGPT.

However, whether it is to reduce costs, reduce dependence on NVIDIA, and improve bargaining power, technology giants including Google, Amazon, Microsoft, Tesla, Meta, Baidu, Ali, etc. have also come down to develop their own AI chips.

Taking Microsoft, Google, and Amazon as an example, according to incomplete statistics, these three companies have launched or plan to release 8 server and AI chips.

In this AI chip race, Amazon seems to have the upper hand, already owning two AI-specific chips - training chip Trainium and inferentia chip. In early 2023, Inferentia 2, built for artificial intelligence, was released, increasing compute performance by three times, increasing total accelerator memory by a quarter, throughput by a quarter, and latency by a tenth. INF2 instances can support up to 175 billion parameters, which makes them strong contenders for large-scale model inference.

As early as 2013, Google secretly developed a chip focused on AI machine learning algorithms and used it in internal cloud computing data centers to replace Nvidia's GPUs. In May 2016, this self-developed chip was unveiled, that is, TPU.

In 2020, Google actually deployed the AI chip TPU v4 in its data centers. However, it was not until April 4 this year that Google first disclosed technical details: compared with TPU v3, TPU v4 performance is 2.1 times higher. The TPU v4-based supercomputer has 4,096 chips, which is about 10 times faster overall. Google said that for a system of similar size, Google can do it 4.3-4.5 times faster than the Graphcore IPU Bow, 1.2-1.7 times faster than the Nvidia A100, and consume 1.3-1.9 times less power.

Google has moved the engineering team responsible for AI chips to Google Cloud, aiming to improve Google Cloud's ability to sell AI chips to companies that rent its servers to compete with larger rivals Microsoft and Amazon Web Services. Although the GPU computing power advantage provided by NVIDIA is in the forefront, the two "big takes" OpenAI and Midjourney's computing power system that detonated this AI are not NVIDIA's GPUs, but use Google's solutions.

Microsoft, by contrast, relies more heavily on off-the-shelf or custom hardware from chipmakers like Nvidia, AMD, and Intel.

However, according to The Information, Microsoft is also planning to launch its own artificial intelligence chips.

People familiar with the project said Microsoft began internally developing chips code-named "Athena" as early as 2019, and these chips have been made available to a small group of Microsoft and OpenAI employees, who are already testing the technology. Microsoft hopes the chip will perform better than the chips it has spent hundreds of millions of dollars buying from other vendors, which can save money for high-value AI work.

It is reported that these chips are designed to train software such as large language models, while supporting inference, which can power all the AI software behind ChatGPT. According to a person familiar with the matter, Microsoft's AI chip planning includes future generations of Athena chips, and the initial Athena chips will be produced based on the 5nm process and may enter the mass production stage next year.

In May, Microsoft also released a series of chip-related job postings, and is looking for a lead design engineer for the AISoC (Artificial Intelligence Chips and Solutions) team. The team is purportedly working on "cutting-edge AI designs that can perform complex and high-performance functions in an extremely efficient manner." In other words, Microsoft has somewhat pinned its future on a range of technologies from AI development agency OpenAI, which wants to make chips that are more efficient than off-the-shelf GPUs and associated accelerators to run these models.

At the same time, Meta revealed that it is building the first custom chip specifically designed to run AI models, the MTIA chip, using an open-source chip architecture called RISC-V, which is expected to come out in 2025.

On the other hand, with the continuous strengthening of export restrictions on high-performance chips in the United States, NVIDIA A100 and H100 are restricted from sale, and A800 and H800 are seriously out of stock, and domestic AI chips shoulder the important mission of filling market gaps.

At present, Huawei, Alibaba, Baidu Kunlun Chip, Bicheng Technology, Cambrian, Tianzhixin, Hanbo Semiconductor, etc. are also making efforts in the GPU track and have achieved certain results. However, it should be noted that although domestic GPUs have certain advantages in terms of price, there is still a gap with NVIDIA in terms of computing power and ecology.

Overall, when some of NVIDIA's major customers begin to develop AI chips themselves, it will undoubtedly make NVIDIA face more fierce competition.

Read on