Meta trains AI, the cost has surpassed that of the Apollo moon landing! Google has proudly invested more than 100 billion US dollars

Edit: Aenea is sleepy

In a recent interview, LeCun personally confirmed: Meta has spent $30 billion to buy Nvidia GPU, which cost more than the Apollo moon landing. In contrast, the Stargate built by Microsoft and OpenAI cost $100 billion, and Hassabis, CEO of Google's DeepMind, made a bold statement: Google has invested more than that! Big tech companies are burning money more and more without blinking, after all, the prospect of AGI is too tempting.

Just now, Yann LeCun, head of AI at Meta, confirmed that Meta has spent $30 billion to buy NVIDIA GPUs, which has exceeded the Apollo moon landing program!

Meta trains AI, the cost has surpassed that of the Apollo moon landing! Google has proudly invested more than 100 billion US dollars

Although $30 billion is staggering, it is still a small case compared to the $100 billion Stargate that Microsoft and OpenAI plan to build.

Hassabis, CEO of Google DeepMind, even said that Google is going to smash more than this.

That's where it goes.

LeCun:Meta买英伟达GPU,的确超过阿波罗登月

In order to develop AI, Meta is in a ruin.

In this interview, the host asked: Meta is said to have bought 500,000 Nvidia GPUs, which is $30 billion according to the market price. So, the whole cost is higher than the Apollo moon landing project, right?

"It's not just the training, it's the cost of deployment. The biggest problem we face is the supply of GPUs. 」

Some have questioned whether this should be true. As the largest reasoning organization in history, they probably didn't spend all their money on training.

Some people also punctured this bubble, saying that every giant is lying, so as to create the illusion that "they have more GPUs" -

While it's true that a lot of money is invested in NVIDIA hardware, only a small percentage is actually spent on actually training the model. The notion that "we have millions of GPUs" just sounds like bragging.

Of course, there are also doubts: considering inflation, the cost of the Apollo program should be close to 200 billion to 250 billion US dollars.

Indeed, it has been estimated that the total cost of the Apollo program, adjusted for inflation, would have been $217 billion or $241 billion, taking into account the original value of 1969.

https://apollo11space.com/apollo-program-costs-new-data-1969-vs-2024/

And Wharton professor Ethan Mollick said that while it is nowhere near as good as the Apollo program, Meta spends almost as much on GPUs as the Manhattan program in today's dollars.

But at the very least, netizens said they were glad to get a glimpse of the giant's AI infrastructure: power, land, racks that can hold 1 million GPUs.

开源Llama 3大获成功

In addition, on Llama 3, Meta also achieved outstanding results.

In the development of Llama 3, the Meta team had four main considerations:

Model architecture

In terms of architecture, the team used a dense autoregressive Transformer, and added a grouped query attention (GQA) mechanism to the model, as well as a new tokenizer.

Training data and compute resources

Since more than 15 trillion tokens were used in the training process, the team built two computing clusters with 24,000 H100 GPUs each.

Instruction fine-tuning

In fact, the effectiveness of the model depends mainly on the post-training phase, which is where the most time and effort is consumed.

To do this, the team scaled up the manually annotated SFT data (10 million) and used techniques such as rejection sampling, PPO, DPO, etc., to try to find a balance between usability, human characteristics, and large-scale data in pre-training.

Now, judging from the latest code review, this series of explorations by the Meta team can be said to be a great success.

Symflower首席技术官兼创始人Markus Zimmermann在对GPT-3.5/4、Llama 3、Gemini 1.5 Pro、Command R+等130多款LLM进行了全面评测之后表示：「大语言模型的王座属于Llama 3 70B！」

- 100% coverage and 70% code quality

- The most cost-effective reasoning skills

- Model weights are open

However, it's worth noting that GPT-4 Turbo is the undisputed winner in terms of performance – with a perfect score of 150.

As you can see, GPT-4 (150 points, $40/million tokens) and Claude 3 Opus (142 points, $90/million tokens) perform really well, but they are 25 to 55 times higher in price than Llama, Wizard, and Haiku.

Specifically, in Java, Llama 3 70B managed to identify a constructor test case that was not easy to discover, a finding that was both unexpected and effective.

In addition, it is able to write high-quality test code 70% of the time.

GPT-4 Turbo tends to include some obvious comments when generating test code, but this is generally something to avoid in high-quality code writing.

The quality of the test code is greatly affected by fine-tuning: in performance testing, the WizardLM-2 8x22B outperforms the Mixtral 8x22B-Instruct by 30%.

In terms of the ability to generate compilable code, models with smaller parameters such as Gemma 7B, Llama 3 8B, and WizardLM 2 7B do not perform well, but Mistral 7B does a good job.

After evaluating 138 LLMs, the team found that about 80 of the models were unreliable in their ability to generate simple test cases.

If the score is less than 85 points, it means that the model is not performing as well as it should. However, the chart above does not fully reflect all the findings and insights from the review, and the team expects to add to them in the next release

The detailed review can be viewed in the following article:

Review address: https://symflower.com/en/company/blog/2024/dev-quality-eval-v0.4.0-is-llama-3-better-than-gpt-4-for-generating-tests/

If you want to win the AI war, the cost is tragically high

Today, major tech companies are paying a high price to win the AI war.

How much does it cost tech giants to make AI smarter?

Demis Hassabis, the boss of Google's DeepMind, made a prediction at the TED conference half a month ago: Google is expected to invest more than $100 billion in the development of AI.

As the most central and soulful figure of Google's artificial intelligence program and the leader of DeepMind Labs, Hassabis's remarks also expressed his unflinching attitude towards OpenAI.

According to The Information, Microsoft and OpenAI plan to spend $100 billion to build "Stargate", a supercomputer that is expected to contain millions of dedicated server chips to power more advanced models such as GPT-5 and GPT-6.

When Hassabis was asked about the huge cost of supercomputing that his competitors were spending, he was shy about it: Google could spend more than that.

We're not going to talk about the exact numbers right now, but I think we're going to invest more than that over time.

Today, the generative AI boom has sparked a huge investment boom.

According to Crunchbase, AI startups alone raised nearly $50 billion in funding last year.

Hassabis's statement shows that the competition in the AI field has no intention of slowing down, and will become more intense.

Google, Microsoft, and OpenAI are all competing fiercely for the feat of "being the first to reach AGI".

A crazy figure of $100 billion

More than 100 billion US dollars will be spent on AI technology, where will the 100 billion be spent?

First of all, the bulk of the development cost is the chip.

At present, Nvidia is still the boss of the word. Google's Gemini and OpenAI's GPT-4 Turbo still rely heavily on third-party chips such as NVIDIA GPUs.

The cost of training models is also becoming more and more expensive.

Stanford's annual AI Index report pointed out that "the training cost of SOTA models has reached an unprecedented level. 」

According to the report, GPT-4 used "about $78 million worth of computation for training", compared to only $4.3 million in 2020.

Meanwhile, Google's Gemini Ultra costs $191 million to train.

The original technology behind the AI model cost only $900 to train in 2017.

The report also points out that there is a direct correlation between the training cost of an AI model and its computational requirements.

If the target is AGI, the cost is likely to skyrocket.

$190 million: How much does it cost to train an AI model, from Google to OpenAI

Speaking of which, let's take a look at how much it costs for major technology companies to train AI models.

The recent AI Index report revealed the staggering cost of training the most complex AI models to date.

Let's dive into the breakdown of these costs and explore what they mean.

Transformer（谷歌）：930美元

The Transformer model is one of the pioneering architectures of modern AI, and this relatively modest cost highlights the efficiency of early AI training methods.

Its cost, which can be used as a benchmark to understand the progress of the field in terms of model complexity and associated expenses.

BERT-LARGE(谷歌):3,288美元

Compared with its predecessor, the training cost of the BERT-Large model has increased significantly.

BERT is known for its bidirectional pre-training of contextual representations and has made significant advances in natural language understanding. However, this progress has come at a higher financial cost.

RoBERTa Large（Meta）：160美元

RoBERTa Large is a variant of BERT that is optimized for robust pre-training, and its training cost jumps to reflect the increasing computational requirements as models become more complex.

This sharp increase underscores the rising costs associated with pushing the boundaries of AI capabilities.

Lamda (谷歌): $1.3M美元

Designed to conduct natural language conversations, LaMDA represents a shift towards more specialized AI applications.

The significant investment required to train LaMDA highlights the growing need for AI models tailored to specific tasks, which in turn require more extensive fine-tuning and data processing.

GPT-3 175B（davinci）（OpenAI）：$4.3M

Known for its sheer size and impressive language generation capabilities, GPT-3 represents an important milestone in the development of AI.

The cost of training GPT-3, reflecting the enormous computing power required to train a model of this size, highlights the trade-off between performance and affordability.

Megatron-Turing NLG 530B (微软/英伟达): $6.4M

The cost of training Megatron-TuringNLG illustrates the trend towards larger models with hundreds of billions of parameters.

This kind of model pushes the boundaries of AI capabilities, but it comes with a staggering training cost. It raises the bar dramatically, widening the gap between industry leaders and smaller players.

PaLM（540B）（谷歌）：$12.4M

With a large number of parameters, PaLM represents the pinnacle of AI scale and sophistication.

The astronomical cost of training PaLM shows the huge investment required to push the boundaries of AI R&D, and raises the question: Is this type of investment really sustainable?

GPT-4 (OpenAI): $78.3M

The estimated training cost of GPT-4 also marks a paradigm shift in the economics of artificial intelligence – the training cost of AI models has reached an unprecedented level.

As models become larger and more complex, the economic barriers to entry escalate. In this case, the latter limits innovation and the availability of AI technology.

Gemini Ultra（谷歌）：$191.4M

The staggering cost of training Gemini Ultra exemplifies the challenges posed by ultra-large-scale AI models.

While these models have demonstrated breakthrough capabilities, they have been astronomical to train. With the exception of the largest companies with the most adequate funds, the rest of the businesses and organizations were kept out of the barriers.

Chip race: Microsoft, Meta, Google, and Nvidia battle for AI chip supremacy

Although Nvidia is taking the lead in the chip field with its long-term layout, whether it is AMD, an old rival, or giants such as Microsoft, Google, and Meta, they are also catching up and trying to adopt their own designs.

On May 1, AMD's MI300 AI chip sales reached $1 billion, making it its fastest-selling product ever.

At the same time, AMD is also non-stop to increase the production of AI chips, which are currently in short supply, and is expected to launch new products in 2025.

On April 10, Meta officially announced the next generation of self-developed chips, and the speed of model training will be greatly improved.

Meta Training and Inference Accelerators (MTIAs) are designed to work with Meta's sorting and recommendation models, and these chips can help improve training efficiency and make actual inference tasks easier.

Also on April 10, Intel also revealed more details about its latest AI chip, the Gaudi 3 AI.

Intel says that compared to the H100 GPU, Gaudi 3 can get a 50% improvement in inference performance while improving energy efficiency by 40%, and it is more affordable.

On March 19, Nvidia released the "most powerful AI chip on the surface" - Blackwell B200.

According to Nvidia, the new B200 GPU can deliver up to 20 petaflops of FP4 computing power with 208 billion transistors.

Not only that, but the GB200, which combines two of these GPUs with a Grace CPU, can deliver up to 30x more performance for LLM inference tasks than before, while also greatly improving efficiency.

In addition, Lao Huang has also hinted that the price of each GPU may be between $30,000 and $40,000.

On February 23, Nvidia's market capitalization exceeded $2 trillion in one fell swoop, becoming the first chipmaker to achieve this milestone.

At the same time, it also makes Nvidia the third company in the United States to have a market capitalization of more than $2 trillion, behind Apple ($2.83 trillion) and Microsoft ($3.06 trillion).

On Feb. 22, Microsoft and Intel struck a multibillion-dollar deal for custom chips.

It is speculated that Intel will produce its own AI chips for Microsoft.

On February 9, the Wall Street Journal said that Sam Altman's dream of AI chips could require an investment of up to $7 trillion.

"Such an investment would dwarf the size of the current global semiconductor industry. Global chip sales were $527 billion last year and are expected to reach $1 trillion per year by 2030. 」