NVIDIA's counterattack: H100 to order the princes

In terms of riot operations, I am afraid that no CEO in Silicon Valley can match Huang Jenxun.

Last year, Huang proposed a plan to cloud computing vendors such as Microsoft, Google and Amazon: These manufacturers have many servers with Nvidia GPUs, which Nvidia leases, and then lets Nvidia engineers "optimize" them, and then leases them to ordinary AI companies under the name of Nvidia to earn the difference.

To put it simply, in the past, Microsoft would directly sell cloud services to small and medium-sized companies, and now NVIDIA comes in as a middleman. Of course, according to NVIDIA's official statement, this move is to "show cloud computing vendors the correct way to configure GPUs in data centers" [1].

The server is still that server, but after NVIDIA "optimized", the customer optimized from Microsoft to NVIDIA. But it is such a bizarre proposal that in addition to Amazon, all major cloud computing vendors have agreed.

In March 2023, NVIDIA officially launched its cloud computing service DGX Cloud. It turns out that after optimization by NVIDIA engineers, DGX Cloud does perform better when training large models; On this basis, NVIDIA also made an exception to allow short-term rentals. In just half a year, NVIDIA won large customers such as software company ServiceNow.

The real reason why technology companies are willing to cooperate with NVIDIA may be because NVIDIA has the most scarce resource in the era of large models - H100.

At present, almost all enterprises do not have enough computing power. Even OpenAI founder Altman said helplessly at a hearing: "If people reduce their use of ChatGPT, we will be very happy because we have a very short shortage of GPUs [2]. ”

How much H100 you buy can even be a key factor in determining AI's achievements. This also gives NVIDIA the confidence to "hold H100 to make the princes".

The "rare earth" of computers

Typically, technology companies will purchase the services of cloud computing vendors to meet computing power needs. From March 2023, cloud computing vendors such as Microsoft Azure and Amazon AWS have also launched the rental service of HGX H100, which is a server composed of 4 or 8 H100.

However, at present, the supply and demand are seriously imbalanced, and the H100 inventory of cloud computing vendors is far from satisfying the appetite of the market. In the 2023 H1 earnings report, Microsoft specifically updated a risk factor: if enough AI chips cannot be obtained, the cloud computing business may be interrupted.

Many startups need to wait in line for 3-12 months, and once a friend gets ahead of them, it may be a loss of tens of billions of valuations.

HGX H100

Countless "H100 poor" can only be forced to exert their subjective initiative to see whose path is wilder.

In an interview with The New York Times, one entrepreneur compared the H100 to a "rare earth." Earlier, he ran to ask the National Science Foundation to invest in himself, simply because there were just a handful of vacant H100s on the foundation's next project.

In Silicon Valley, the way AI entrepreneurs say hello has become "I know a guy with an H100" - those who don't know think they are buying and selling drugs [4].

GPU Utils has measured the specific demand data behind the H100 rush wave:

For enterprises that need to train their own large models and pursue miracles, no tens of thousands of H100 are embarrassed to go out. InflectionAI, founded by former DeepMind co-founder Suleiman, has bought 22,000 H100s just a year after its founding; a deep-pocketed company like Meta is likely to buy 100,000 or more.

For cloud computing vendors such as Microsoft Azure, each of them also needs at least 30,000 H100. The remaining private clouds will consume a total of about 100,000 H100.

After calculation, it is found that the demand for large technology companies and a few star startups in the United States alone has reached about 430,000 [5]. If you add in the chase of other start-ups, research institutes, universities, and even rich countries, plus uncontrollable factors such as scalpers and black markets, the actual demand is likely to be much larger than this figure. However, according to the Financial Times, this year's H100 shipments are about 550,000 [6].

One of the core reasons why the H100 is so hungry is its near-monopolistic market position.

Faced with the need for extreme efficiency in large model training, H100 is the optimal solution in most cases.

The MPT-30B was the first open-source LLM (Large Language Model) trained with the H100 and took only 11.6 days to actually train, compared to 28.3 days for the previous generation A100 [7]. If you replace it with an AI with a larger parameter scale, such as the GPT-4 of 1800B, the efficiency difference will be more obvious. In the era of horse racing, time is everything.

In addition, the H100 is also much more efficient in model inference than the A100. Although the initial price of H100 is about $33,000, and now the second-hand market price has risen to $40,000-50,000; if you divide the performance of H100 and A100 by their respective prices, you can find that H100 is actually more cost-effective than A100.

The specific training and inference of the MPT-30B

Huang said, "Buy more GPUs, the more money you save," which seems to make sense.

Because of this, even if the United States restricts H/A100 exports to China, domestic technology companies are still snapping up the castrated version of the H/A800 - although the castrated version has half the data transfer speed between chips, which means that more time needs to be spent on large model training.

In addition to the huge demand, another reason for the shortage of H100 is the serious shortage of production capacity.

The H100 chip needs to use SK hynix's HBM memory and TSMC's CoWoS package - both of which are too expensive, have not been able to be marketed on a large scale before, and there is not much reserve capacity. Since it will take time for production capacity to climb, some analysts predict that the shortage of H100 will continue at least until the first quarter of next year, while others believe that it is unlikely to ease until the end of next year[9].

H100 internal structure

The unprecedented grandeur of the H100 allowed Jensen Huang to experience the feeling of riding a roller coaster in just one year.

In the second quarter of last year, the consumer market was sluggish and mining companies went bankrupt, NVIDIA handed over a failing financial report, and the "GPU slow sales, help us" meme was once everywhere. A year later, Huang successfully showed the capital markets what a "reverse thunderstorm" is, and year-on-year revenue soared 854%, greatly exceeding the most optimistic analysts' forecasts.

The peak was exchanged for the praise of the sky, but Huang Jenxun knew in his heart that NVIDIA had always had a sword hanging from his head.

Inevitable war

In August, legendary engineer Jim Keller commented to the media, "I don't think GPUs are all about running AI, the world hates monopolies[11]. ”

Although this statement is suspected of advertising its own AI chips, it is also the consensus of the industry.

In fact, those large technology companies that have purchased the most H100 are basically not very "secure": Microsoft, Google, Meta, more or less have tried to develop their own AI chips.

This makes NVIDIA face an extremely embarrassing situation: in the field of AI chips, there will almost be a battle between itself and "big customers" in the future.

Big tech companies chose to develop their own AI chips, initially stemming from a very simple need to save money, the most typical of which is Google.

As early as 2014, Google has launched its own chip plan. At that time, Ilya, the chief scientist of OpenAI, was still working at Google, creating a set of subversive AI models. The model is born out of Ilya's concept of "power out of miracles", and only needs to be poured with enough and correct data to better complete translation, speech recognition and other work. However, when it comes to practical application, Google makes a difficulty:

If AI services are installed on more than 1 billion Android phones, even if each person only uses 3 minutes a day, Google needs twice the computing power of its current data center. At that time, Google had already built 15 data centers, each costing hundreds of millions of dollars, and "super doubling" was obviously impractical.

In the end, Google developed a TPU with stronger performance and lower power consumption, which greatly improved the supply of computing power in a single data center and solved the computing power problem in a more economical way.

Data centers where TPU was introduced

The emergence of TPU made Huang Jenxun like a needle felt, began to "explode the GPU", and soon achieved a reverse in performance, and its latest achievement is H100. However, the H100 is simply too expensive.

If H100 were sold by weight, it would sell for half the price of gold per ounce; Even for the most profitable tech companies on the planet, this "NVIDIA tax" is astronomical.

However, the actual manufacturing cost of the H100 is not high. According to financial consulting firm Raymond James, the cost of H100 is about $3,320, accounting for only 1/10 of the initial offer, and Huang Jenxun tearfully earns 10 times [12].

The economic benefits of self-developed chips are undoubted, but there is another benefit in addition to this: vertical integration creates differentiation.

Stacking computing power is not simply adding gasoline to the car, but also needs to consider a series of problems such as software adaptability and its own business needs. For example, there are multiple factions of deep learning frameworks used by AI, Google is TensorFlow, Meta uses PyTorch, and Baidu has PaddlePaddle, and the hardware needs to be adapted to different frameworks.

Specially customized AI chips can be more closely aligned with the needs of their own AI business. Therefore, Meta has restarted its self-developed chip plan this year and customized a new MTIA chip for the PyTorch framework.

For large companies, the core of considering chips is not actually computing power, but "computing power provided per dollar", that is, cost. Both Google's TPU and Tesla's Dojo prove that the cost of customized services is acceptable.

Right now, the "spark of resistance" has been ignited. According to foreign media reports, the cloud computing team of large technology companies has begun to frequently persuade customers to switch to their self-developed chips instead of NVIDIA's GPUs. Nvidia is the absolute winner so far, but no one knows when the balance will be upset.

However, in the face of this inevitable war, NVIDIA also left behind.

H100 to order the princes

The first card played by Nvidia was called CoreWeave.

CoreWeave was founded in 2017 as an Ethereum mining company and later transitioned into a cloud computing business. According to the founder of CoreWeave, the company's revenue in 2022 is $30 million, only 1/1133 of Microsoft Azure, and there is almost no presence in Silicon Valley.

However, in 2023, CoreWeave suddenly became famous overnight, signing two large customers, Inflection AI and Stability AI, and annual revenue is expected to reach $500 million, a 16-fold increase a year. In addition to this, Microsoft has even decided to spend billions of dollars on its services in the coming years; Among them, the orders in 2024 alone have reached 2 billion US dollars.

The noble man who changed CoreWeave's fate is NVIDIA.

In April, NVIDIA participated in its investment in CoreWeave; But compared to dollars, Nvidia also gave it a rarer resource - H100. CoreWeave is the world's first cloud computing company to launch HGX H100 rental service, a month before Microsoft Azure.

Three founders of CoreWeave

This arrangement is actually Huang Jenxun's intention.

The H100's near-monopolistic market position and severe shortages give NVIDIA an extra layer of power: it is free to decide who to supply first.

Compared to their plastic friendships with Big Tech, CoreWeave and Nvidia are real revolutionary comrades. As a result, Nvidia cut its H100 supply to big tech companies and instead gave that capacity to its "brothers" like CoreWeave, who had made sure they wouldn't make their own chips.

Judging from the results, this strategy not only avoids the emergence of hoarding, but also does steal the cake of big tech companies:

For example, the aforementioned Stability AI has been treating Amazon AWS as the only cloud service provider at the end of 2022; However, in March this year, Stability AI, which was trapped in insufficient computing power, quietly knocked on the door of CoreWeave.

In fact, Nvidia isn't the only CoreWeave card in its hands. The H100 investor also invested in Lambda Labs, also a cloud computing company, and three star startups engaged in large-model and application development.

Inflection AI, founded by former DeepMind co-founder Suleiman, also received investment from Nvidia

At the moment when the 100,000 models are produced, H100 is a hard currency more precious than the US dollar, and it also creates a valuable window for NVIDIA: let as many companies as possible use H100, establish an ecology as soon as possible, and "get more friends".

So how long does this window last?

End

NVIDIA's series of "riot operations" have attracted the attention of US anti-monopoly authorities, and at the same time, the current situation of the global rush for H100 is likely not to last for a long time.

As mentioned earlier, the H100 production capacity is limited because TSMC and SK Hynix have insufficient reserve capacity; As new production lines are landed, shortages will gradually be alleviated.

In addition, strong demand may not continue.

In fact, more and more tech companies and research institutions are choosing to open source the big model. With more and more high-quality open source models on the market, start-ups and research institutions can no longer have to train themselves, but directly download open source models and develop or reason according to their own business needs.

After Meta released the open source model Llama, researchers from Stanford, Carnegie Mellon and other universities joined forces to create the open source model Vicuna on this basis, which soon exceeded 2 million downloads.

Vicuna

In the visible future, the main use case of computing power is likely to shift from training to reasoning - by then, H100 will no longer be alone. Because unlike training scenarios that pursue extreme efficiency, AI reasoning actually pays more attention to cost performance.

On the other hand, the problem that generative AI represented by large models is now facing is that in the face of high computing power costs, everyone has not made money except NVIDIA.

When the CUDA platform was launched in 2006, NVIDIA promoted the rapid progress of AI with a forward-looking vision that went beyond the industry. And now, NVIDIA's imposing performance seems to be a torture: has it changed from a promoter of AI to a resistance to AI's progress?

Resources

[1] Nvidia Muscles Into Cloud Services, Rankling AWS，The Information

[2] OpenAI CEO Sam Altman testifies at Senate artificial intelligence hearing | full video，CBS News

[3] Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors，Semi Analysis

[4] The Desperate Hunt for the A.I. Boom’s Most Indispensable Prize，The New York Times

[5] Nvidia H100 GPUs: Supply and Demand，GPU Utils

[6] Saudi Arabia and UAE race to buy Nvidia chips to power AI ambitions，Financial Times

[7] MPT-30B: Raising the bar for open-source foundation models

[8] China’s internet giants order $5bn of Nvidia chips to power AI ambitions，Financial Times

[9] AI Capacity Constraints - CoWoS and HBM Supply Chain，Semi Analysis

[10] Insight: Inside Meta's scramble to catch up on AI，Reuters

[11] Jim Keller speaks: The world hates monopolies, GPUs are not everything, semiconductor industry watch

[12] Nvidia Makes Nearly 1,000% Profit on H100 GPUs: Report，Toms Hardware

[13] The Deep Learning Revolution, Cade Metz

[14] A crack in the Nvidia empire, the boss of the rice system

[15] CoreWeave came ‘out of nowhere.’ Now it’s poised to make billions off AI with its GPU cloud，Venture Beat

[16] Why Nvidia Aids Cloud Rivals of AWS, Google and Microsoft，The Information

[17] TPUv5e: The New Benchmark in Cost-Efficient Inference and Training for

[18] Nvidia’s Hot Streak May Not Last Forever，The Information

Editor: Li Motian

Visual Design: Shurui

Responsible editor: Chen Bin

Research support: He Luheng

NVIDIA's counterattack: H100 to order the princes

NVIDIA's counterattack: H100 to order the princes

Read on

Boom 28 points and 8 assists! Boom 33 points and 7 assists! Not $150 million, but you stay with the Clippers for the best ending

Sophisticated: Mbappe is about to join Real Madrid, although he is free of visa, he has to compensate Paris with at least 100 million euros

Han Hong responded! shouted that he was willing to play in "Singer", but netizens questioned that excessive marketing could not be played

Defending to win championships? Of the 5 teams with the best defense so far in the playoffs, only one team has not made it to the second round!

CCTV exposure! Take Ben in a day! The exam syllabus has changed! As long as you pay money, you can get a driver's license? Reporter doxxing

48-year-old Shu Qi and 38-year-old Tong Yao are in the same frame, Tong Yao is gentle and generous, but she was won by the pure and lustful Shu Qi

The Wolves team 4-0KSG, Chubby KDA33, redeemed himself to the fish, and the team recognized LoveCD

The Old Man's Cup conflict broke out! Otto angrily sprayed MLXG's mother, and the live broadcast room was blocked! Crying at night

He was deceived in the play, but outside the play, he had a bumper career and family, and spoiled his wife as a princess

PRX defeated GEN 3-2 to win the first stage of the Pacific Division!

There is also a private placement explosion, some people are tens of millions gone, learn these five tricks to keep money

Sudden progress! D-class god deal, the lone ranger makes a lot of money!

Seven blockbuster new cars coming to market in May Which one are you most looking forward to?

Bazaar red carpet: Zhang Tianai dared to wear it, Song Jia was elegant and atmospheric, and Zhou Dongyu's beautiful style was a failure

Hit 1st in the league in three-point shooting in the playoffs! Warriors, warriors, why let him go, isn't it better than Claychan?

The Premier League championship fight to the last 1 round! Arsenal overtook Manchester City, the blue moon had a big advantage, and the 4th consecutive championship was imminent