laitimes

Tech Cloud Report: From "Computing Power Nuke" to Generative AI, How Far Is the New Era?

author:Tech Cloud Report

The original report of the technology cloud.

"We need bigger GPUs"!

In the early morning of March 19, GTC 2024, the annual "AI Wind Vane" blockbuster conference, arrived as scheduled.

NVIDIA CEO Jensen Huang unveiled a series of breakthrough scientific and technological achievements, including the next-generation accelerated computing platform NVIDIA Blackwell, the Project GR00T humanoid robot base model, the Omniverse Cloud API, NVIDIA DRIVE THOR centralized in-vehicle computing platform, and more.

Among them, NVIDIA Blackwell, as NVIDIA's "king bomb" masterpiece, once again pushed the technical standard of AI chips to a new height.

Tech Cloud Report: From "Computing Power Nuke" to Generative AI, How Far Is the New Era?

Immediately after, NVIDIA and Amazon Web Services announced an expanded collaboration in which Blackwell will soon be available on Amazon Web Services and combined with leading technologies such as networking, advanced virtualization, and hyperscale clusters, which Amazon Web Services is proud of, to deliver massive performance leaps for large model inference workloads with trillions of parameters.

The trillion-level parameter scale is in the actual parameter range of the world's top large models, and users may soon be able to experience the improvement brought by the new hardware in various generative AI applications.

The "King Bomb" AI chip was born

How much computing power is needed to train a large model with trillions of parameters?

At the GTC, Huang began by calculating a math problem. Take OpenAI's state-of-the-art 1.8 trillion parameter model as an example, which requires trillions of tokens to train.

The trillion parameters multiplied by the trillions of tokens is the computing scale required to train OpenAI's most advanced large model. Huang estimates that it would take 1,000 years to complete a single petaflop (1quadrillion operations per second) GPU.

Since the invention of Transformer, the size of large models is expanding at an astonishing rate, doubling every 6 months on average, which means that trillion-level parameters are not the upper limit of large models.

In the midst of this trend, Huang believes that the iteration and evolution of generative AI will require larger GPUs, faster GPU interconnect technology, more powerful supercomputer internal connectivity technology, and larger supercomputer megasystems.

For a long time, the GPUs launched by NVIDIA are generally two architectures, among which the GeForce RTX series GPUs for games are the Ada Lovelace architecture, and the professional-grade graphics cards for AI, big data and other applications are the Hopper architecture. The H100 that swept the world is based on the Hopper architecture.

While Hopper is already able to meet the needs of most commercial markets, Huang believes that this is not enough: "We need bigger GPUs, and we need to stack them on top of each other. ”

As a result, Blackwell, a product that uses both of the above architecture types, was born. Blackwell is the sixth-generation chip architecture offered by NVIDIA. This small GPU, with an integrated 208 billion transistors, has super computing power, and also subverts all previous products.

According to Huang, Nvidia invested $10 billion in R&D for the chip. The new architecture is named after David Harold Blackwell, a mathematician at the University of California, Berkeley. He specializes in game theory and statistics, and was the first black scholar to be elected to the National Academy of Sciences.

Blackwell's FP8 performance for single-chip training is 2.5 times that of its predecessor architecture, and FP4 performance for inference is 5 times that of its predecessor. It has a fifth-generation NVLink interconnect that is twice as fast as Hopper and scales up to 576 GPUs.

So, Blackwell is not a chip, but a platform.

NVIDIA GB200 Grace Blackwell超级芯片通过900GB/s超低功耗的片间互联,将两个NVIDIA B200 Tensor Core GPU与NVIDIA Grace CPU相连。

Its huge performance upgrade can provide AI companies with 20 petaflops or 2 trillion calculations per second of AI performance, which is 30 times faster than H100, while consuming only 1/25 of the energy consumption.

It's easy to see that such a remarkable performance boost on the Blackwell platform is in preparation for the next generation of generative AI. As can be seen from OpenAI's recent release of Sora and the development of a more powerful and complex GPT-5 model, the next step in generative AI is multimodality and video, which means larger-scale training. Blackwell opens up more possibilities.

Today, from Google's boundless search engine, to Amazon's Genting Paradise, to Tesla's intelligent driving, major tech giants are joining NVIDIA's Blackwell camp and opening an exciting feast of AI-accelerated computing.

Amazon, Google, Dell, Meta, Microsoft, OpenAI, Oracle, Tesla and other industry leaders are all scrambling to deploy and prepare to show their skills in the new era of AI.

It's hard to hide strategic anxiety

Benefiting from the popularity of generative AI since last year, Nvidia's latest quarterly earnings report announced after the market on February 21 once again broke market expectations. According to the financial report, in fiscal year 2024, NVIDIA's total revenue will reach $60.9 billion, a year-on-year increase of 125.85%, net profit will be $29.76 billion, a year-on-year increase of more than 581%, and adjusted earnings per share will be $12.96, a year-on-year increase of 288%. This is the fourth consecutive quarter of Nvidia's earnings exceeding market expectations.

NVIDIA's accelerated performance actually reflects the surge in demand for AI computing power from global technology companies. With the advent of applications such as Sora, the world has seen great potential for large-scale models.

Generative AI is likely to enter an "arms race" phase, and with it, the demand for chips from tech companies will continue to increase.

According to Counterpoint Research, Nvidia's revenue will soar to $30.3 billion in 2023, an increase of 86% from $16.3 billion in 2022, jumping to become the world's third-largest semiconductor manufacturer in 2023.

Wells Fargo predicts that Nvidia will earn up to $45.7 billion in revenue in the data center market in 2024, or a record high.

However, the history-making Nvidia is not resting on its laurels. Nvidia's "monopoly" in AI computing is not satisfying for everyone, competitors are trying to break Nvidia's dominance, and customers need a second source of AI chips.

While Nvidia's GPU has many advantages, it can be too power-hungry and complex to program when used for AI. From startups to other chipmakers and tech giants, Nvidia's competitors are endless.

Recently, OpenAI CEO Altman is raising more than $8 billion with global investors such as the G42 Fund in Abu Dhabi in the Middle East and Japan's SoftBank Group to establish a new AI chip company, with the goal of using the funds to build a network of factories to manufacture chips and directly benchmark against NVIDIA.

On February 17, industry insiders revealed that Masayoshi Son, founder of Japanese investment giant SoftBank Group, is seeking to raise up to $100 billion to create a large-scale joint venture chip company that can complement the chip design arm Arm.

In the past, AMD has been planning its next-generation AI strategy, including mergers and acquisitions and restructurings, but the emergence of generative AI has led to a further expansion of the company's product lineup: the MI300 chip, released in December last year, is designed for complex AI large models, with 153 billion transistors, 192GB of memory, and 5.3TB of memory bandwidth per second, which are about 2x, 2.4x, and 1.6x that of NVIDIA's most powerful AI chip, the H100, respectively.

Amazon Web Services also continues to invest in self-developed chips to improve the cost performance of customers' cloud workloads. Amazon Web Services has launched two series of Trainium training chips and Inferentia inference chips for the AI field very early, and they are constantly updated and iterated.

Launched at the end of last year, Trainium2 can provide 65 exaflops of AI computing power through cloud expansion and network interconnection, and can complete the training of a large language model with 300 billion parameters in a few weeks. These AI chips are already being used by leading companies in the generative AI space, including Anthropic.

These large manufacturers have invariably invested heavily in choosing to develop their own AI chips, revealing that no one wants to put the right to speak and dominate technology into the hands of chip manufacturers, and only at the top of the "AI food chain" can it be possible to grasp the key to the future.

R&D is the bottom, ecology is the path

Huang has said in many places that Nvidia is not selling chips, but problem-solving capabilities.

Driven by this concept of industrial ecosystem co-construction, NVIDIA has built an ecosystem around GPUs that includes hardware, software, and development tools.

For example, NVIDIA's investment in the field of autonomous driving has achieved remarkable results, and its Drive PX series platform and the later Drive AGX Orin system-on-chip have become key components for many automakers to realize advanced driver assistance systems (ADAS) and autonomous driving, which is a successful case of deep integration of underlying technology innovation and practical application scenarios.

In the face of industry competition, NVIDIA hopes to give full play to the overall ecological collaboration power to jointly serve the industry and the market.

The cooperation between NVIDIA and Amazon Web Services, the "first brother" of cloud computing, has also been extraordinary, from the first GPU cloud instance to the current Blackwell platform solution, the two sides have cooperated for more than 13 years. Customers will soon be able to use infrastructure based on NVIDIA GB200 Grace Blackwell Superchip and B100 Tensor Core GPUs on Amazon Web Services.

The combination of NVIDIA's powerful SoC system with leading technologies such as Amazon Web Services' powerful Elastic Fabric Adapte (EFA) network connectivity, advanced virtualization (Amazon Nitro System) and Amazon EC2 UltraClusters enables customers to build and run trillions of parameters of large language models on the cloud faster, at scale, and more securely.

In the field of large-scale model research and development, the scale of trillion-level parameters was previously a threshold. According to public reports, the model parameters of GPT4 released in the middle of last year are 1.8 trillion and consist of 8 220B models, and the Claude3 model released not long ago has not announced the parameter scale, while Musk's latest open-source Grok large model has a parameter size of 314 billion.

The cooperation between the two parties is expected to provide new possibilities for breakthroughs in the field of generative AI in terms of accelerating the research and development of trillion-level large language models.

NVIDIA's own AI team built Project Ceiba on Amazon Web Services specifically to help drive future generative AI innovation.

The Ceiba project was first unveiled at the Amazon Web Services 2023 re:Invent conference at the end of November 2023, when Nvidia partnered with Amazon Web Services to build one of the world's fastest AI supercomputings, with a computing performance of 65 exaflops.

With the addition of the Blackwell platform to Project Ceiba, which brings seven times the computing performance of the previous one, the AI supercomputer will now be able to handle up to 414 exaflops of AI compute.

The new Ceiba project features a 20,736-unit B200 GPU-powered supercomputer built with the new NVIDIA GB200 NVL72 system, which leverages fifth-generation NVLink technology to connect 10,368 NVIDIA Grace CPUs.

The system is also scaled with Amazon Web Services' fourth-generation EFA network, which delivers up to 800 Gbps of low-latency, high-bandwidth network throughput per Superchip.

In addition, Amazon Web Services plans to offer Amazn EC2 instances with new NVIDIA B100 GPUs-based and the ability to deploy at scale in Amazon EC2 UltraClusters.

Huang is looking forward to the collaboration: "AI is driving breakthroughs at an unprecedented rate, leading to new applications, business models, and innovation across industries.

NVIDIA's partnership with Amazon Web Services is accelerating the development of new generative AI capabilities and providing customers with unprecedented computing power to push the boundaries of what's possible. ”

With so many industries and so many complex innovations, NVIDIA and its partners are building an increasingly robust AI ecosystem to lead a new era of generative AI. In Huang's words, when computer graphics, physics, and artificial intelligence converge, the soul of NVIDIA is born.

【About Technology Cloud Report】

Focus on the original, enterprise-level content expert - science and technology cloud reporting. Founded in 2015, it is one of the top 10 media in the cutting-edge enterprise IT field. It has been recognized by the Ministry of Industry and Information Technology (MIIT) and is one of the official designated communication media of Trusted Cloud and Global Cloud Computing Conference. In-depth original reports on cloud computing, big data, artificial intelligence, blockchain and other fields.

Read on