laitimes

How much money does a large model burn? Celebrity Unicorn Reveals Training Cost: It Could Reach $10 Billion Next Year

author:Smart stuff
How much money does a large model burn? Celebrity Unicorn Reveals Training Cost: It Could Reach $10 Billion Next Year

The cost of AI training has skyrocketed, and small language models have shown cost-effective advantages.

编译 |Alyssa

Edit |Panken

Zhidong reported on May 8 that according to VentureBeat, a well-known American technology media, in recent tests, a newly released large language model (LLM) seems to be able to "recognize" that it is being evaluated and comment on the relevance of the information it processes. This has led to speculation that this reaction may be an instance of "metacognition", the understanding of one's own thought processes. This large language model has sparked discussion about the potential of AI to be self-aware, but more interestingly, it illustrates the potential for new capabilities as large models become larger.

This has been accompanied by a sharp rise in emerging capabilities and costs, which are now "astronomical". Just as there are only a handful of companies in the semiconductor industry that can afford the latest billion-dollar chip fabs, the AI space may soon be dominated by the few big tech giants and their partners, who alone can afford the huge costs of developing the latest big language models like GPT-4 and Claude 3.

The cost of 01.AI model training has increased exponentially: the next top model may need tens of billions of dollars!

As the cost of training the latest models has skyrocketed, some models have reached or in some cases surpassed human capabilities. According to a report by Stanford University, the cost of training the latest model is approaching $200 million.

How much money does a large model burn? Celebrity Unicorn Reveals Training Cost: It Could Reach $10 Billion Next Year

▲ Test scores of AI systems on various abilities related to human performance. (Source: Our World in Data)

If this exponential performance growth continues, AI capabilities will not only evolve rapidly, but their costs will also exponentially balloon. Anthropic, a well-known American large-model unicorn, has the current flagship large-scale model Claude 3, which is the leading performance. Like GPT-4, Claude 3 is a foundational model that has developed a broad understanding of language, concepts, and patterns by being pre-trained on a diverse and rich dataset.

How much money does a large model burn? Celebrity Unicorn Reveals Training Cost: It Could Reach $10 Billion Next Year

▲LLM基准性能。 (图源:Anthropic)

Recently, Dario Amodei, co-founder and CEO of Anthropic, revealed in a public discussion that the current training cost of AI models is rising sharply, taking the Claude 3 model as an example, its training cost has reached about 100 million US dollars; The training cost of the next-generation model, which is in the development stage and is expected to be available in late 2024 or early 2025, is approaching $1 billion.

How much money does a large model burn? Celebrity Unicorn Reveals Training Cost: It Could Reach $10 Billion Next Year

▲LLM训练成本随着模型的复杂程度而上升。 (图源:Stanford 2024 AI Index Report)

In the face of this soaring cost, it is important to explore the reasons behind it. Amodei explained that as each generation of models continues to increase in complexity, the number of parameters they have continues to increase, which not only allows the model to handle more complex comprehension and query tasks, but also places higher demands on the amount of training data and computational resources.

Amodei predicts that by 2025 or 2026, the cost of training the latest large language models will reach $5 billion to $10 billion. Only a handful of large, financially strong companies and their partners have the ability to build these foundational models.

02.AI industry follows in the footsteps of the semiconductor industry, and high costs promote the outsourcing of manufacturing operations

With the rapid development of technology, the AI industry is following a path quite similar to that of the semiconductor industry. Back at the end of the 20th century, most semiconductor companies adopted the model of self-design and self-built chips. At the time, the semiconductor industry was following Moore's Law, the concept of exponential chip performance, and the cost of building new equipment and fabs increased with each new generation.

Faced with high cost pressures, many companies eventually choose to outsource product manufacturing. AMD, for example, used to produce cutting-edge semiconductors in-house, but in 2008 decided to spin off its fabs (fabs) as a way to reduce expenses.

Due to the huge cost of capital, only three semiconductor companies are currently building advanced fabs using the latest process node technologies: TSMC, Intel, and Samsung. TSMC recently revealed that it would cost up to $20 billion to build a new fab to produce cutting-edge semiconductors. A number of companies, including Apple, Nvidia, Qualcomm and AMD, have chosen to outsource their product manufacturing to these top fabs.

03. Customized AI entry: small language model, a new choice under cost-effectiveness

In the AI space, the impact of these cost increases varies, as not all use cases require the latest and most powerful large language models. The same is true for the semiconductor industry. For example, in computers, central processing units (CPUs) are typically manufactured using the latest high-end semiconductor technology, while the memory or networking chips that surround them run at lower speeds, meaning they don't need to be built with the fastest or strongest technology.

If we look at the AI field, with the emergence of many small large language model alternatives, such as Mistral and Llama 3, they have billions of parameters, unlike GPT-4, which is rumored to have more than a trillion parameters. Microsoft has also recently released its own small language model (SLM), Phi-3. According to The Verge, Phi-3 has 3.8 billion parameters and is trained on a smaller dataset than large language models such as GPT-4.

Although it may not be fully comparable to the performance of large models, small language models have unique advantages in terms of cost control due to their streamlined size and training datasets. These small language models act like auxiliary chips in computers, providing efficient support for CPUs, making them an affordable option.

For complete knowledge use cases that don't need to span multiple data domains, small language models are the ideal tool for tailoring. For example, small language models can be used to fine-tune specific internal data and industry jargon to provide accurate and personalized customer service responses. Or it can be trained on industry- or market-specific data to generate comprehensive and customized research reports and answers.

As Rowan Curran, a senior AI analyst at Forrester Research, puts it likely, "Sports cars aren't needed all the time, and sometimes a minivan or pickup truck is more suitable." In the future, the model application will not be single, but the most appropriate tool will be selected according to different needs. ”

04. Conclusion: A small number of players dominate or exacerbate the risk of AI innovation

Similar economic pressures are shaping the landscape of large language model development due to the increasingly expensive costs that have left only a handful of companies capable of making top-of-the-line chips. These rising costs may limit AI innovation to a few leading players, potentially inhibiting innovation and diversity. The high barrier to entry may discourage startups and smaller enterprises from contributing to the development of AI.

To balance this trend, the industry needs to explore small, specialized language models. They are essential components in complex systems, providing critical and efficient functionality for a wide range of subdivided applications. Promoting open source and collaboration is critical to democratizing AI development, allowing a wider range of actors to influence the technology. By fostering an inclusive and open environment, AI technology is expected to bring a wide range of benefits to communities around the world and provide equal opportunities for innovation in the future.

来源:VentureBeat

Read on