laitimes

Big model fierce battle for half a year: Wang Xiaochuan attacked, Wang Huiwen withdrew, and Tencent Byte was late

author:Times Finance

Source of this article: Times Finance Author: Xie Silin

Big model fierce battle for half a year: Wang Xiaochuan attacked, Wang Huiwen withdrew, and Tencent Byte was late

Image source: Pixabay

The dogfight for domestic universal large models is far from over. After a dormant period of about half a year, most players surfaced.

In the midst of this, there are people who speed up iteration. On August 8, Baichuan Intelligence, founded by Sogou Search founder Wang Xiaochuan, released its third large model product, Baichuan-53B, with training parameters as high as 53 billion. At this time, only 4 months after Wang Xiaochuan announced his entry into the big model battlefield, the startup made rapid progress.

This is just the beginning, Baichuan Intelligent revealed to Times Finance that there will be a number of product releases in the future, including a larger model with more than 100 billion parameters.

Others exited bleakly. Founded by Meituan co-founder Wang Huiwen, it attracted well-known VC institutions such as Source Code Capital and Wuyuan Capital, as well as Internet bigwigs such as Meituan founder Wang Xing and Kuaishou founder Su Hua, and was once considered by the market to be one of the most powerful players on the domestic big model battlefield.

However, with Wang Huiwen leaving his post in late June due to health problems and unable to continue to run Lightyear, the highly anticipated large-model startup had to sell itself to Meituan, and a number of investors withdrew.

Others have taken a different approach. Lanzhou Technology, founded by AI bull Zhou Ming, emphasizes lightweight models and hopes to solve the problem of B-side scenarios at a lower cost. You Yang, a young professor at the National University of Singapore who helped Google reduce the training time of the BERT model from 3 days to 76 minutes, established Luchen Technology to try to break through with a low-cost solution for training large models.

In contrast, the large models developed by large factories are belated. Until the beginning of August, Tencent's self-developed hybrid element large model and Grace, an AI dialogue product created by bytes, successively reported internal beta news, and the specific launch time is still unknown.

Also staying in the testing stage is the AI 2.0 company "Zero One Everything" founded by Kai-Fu Lee. At the external exchange meeting held on July 3, Lee Kai-fu revealed that the company has achieved model internal testing of tens of billions of parameters within three months, and is currently expanding to 30 billion to 70 billion parameters. However, the product is still not available to the market.

What kind of changes these unreleased large-model products will bring to the technology industry is worth looking forward to. From this point of view, this melee may continue for a long time.

Attack on Wang Xiaochuan

Baichuan Intelligence, founded by Wang Xiaochuan, is attracting market attention with its amazing product release speed.

After it announced the enlargement of the model in April, it took only two months and five days to release the 7 billion parameter open-source large model Baichuan-7B on June 15. In less than a month, another 13 billion parameter open-source large model Baichuan-13B was released.

The Baichuan-53B, released on August 8, is already the third product released by this large-model startup in half a year, and Baichuan Intelligent is making rapid progress.

The relevant person in charge of Baichuan Intelligent replied to Times Finance that the company had spent a lot of time making preliminary preparations before its establishment, and thought about the route and method more clearly at the beginning.

It pointed out that making a large model will consider three levels: data, algorithms and computing power. Regardless of computing power, companies that do search naturally have excellent data capabilities, and the core team of Baichuan Intelligent has previously done data capture, extraction, cleaning, deduplication, anti-garbage and other operations for 20 years, which can get high-quality data sets faster.

The algorithm is centered on natural language processing, and the algorithm engineering is iterative, which is not a single engineering problem, but driven by text data, the algorithm and the project run together. Previous experience in search can also be used here to use data evaluation to drive model advancement.

"With years of technology and experience accumulation, the speed of Baichuan Intelligent to expand model products will be fast and good."

However, at the press conference, Wang Xiaochuan also pointed out that the current domestic general large model is still in a stage of parting and replicating. All vendors are basically benchmarking against OpenAI, and the problem of homogenization will inevitably occur.

Because of this, in his view, unlike the situation that the head pattern of the closed-source large model in the United States has been determined, there is no conclusion on "who has the best big model in China" now. In this melee, money is important, but the ultimate decision is the strength of people and teams, organizational skills. Large factories have a lot of money, more people, and more computing power, but the organizational efficiency is usually not necessarily good enough, and the organizational efficiency of the startup company may or may not be good.

"Everyone is fighting for opportunities, and they don't necessarily fall in the big factories."

Wang Xiaochuan also talked about Wang Huiwen who withdrew in the interview. It pointed out that Wang Huiwen is the only one among several mainstream models in China that does not have a strong technical background, and the challenge for him is greater than others. There are a lot of technical decisions to make at work, who to recruit, what technology roadmap to take, and how many computing resources are needed, and there will be a lot of decision-making pressure.

"It's not that the pressure of making a big model is high, it's that there is no technical background to make decisions much more pressure." But if the technology is enough, it is actually quite pleasant. ”

Tencent, Byte are long overdue

At the beginning of the big model melee, Internet giants were considered strong competitors because they had more computing power, talent, capital and data.

Baidu's self-developed Wen Xin Yi Yi was the first to land as early as the end of March this year; Alibaba built a clear message and was unveiled at the Alibaba Cloud Summit held on April 11. Just the day before Ali released Tongyi Qianqian, Wang Xiaochuan had just announced the end of the establishment of Baichuan Intelligence.

In contrast, Tencent and Byte, which are also first-tier manufacturers, have launched a general model at a much slower pace.

On August 3, according to 36kr, Tencent's self-developed "Tencent Mixed Element Big Model" has entered the application internal testing stage. Three days later, on August 6, Byte's AI dialogue product Grace was also revealed to have finally entered the testing stage after two months of research and development.

At this time, 4 months have passed since Baidu released Wen Xin's words. For the reason why Tencent's large-scale model products are slightly slower, Ma Huateng has publicly said, "Tencent is also immersed in research and development, but it is not in a hurry to finish it early and show the semi-finished product." ”

However, Tencent, which is "not in a hurry", took the lead in announcing the route of the "industry big model" in mid-June this year, throwing out more than 50 solutions in 10 major industries in one go. Coincidentally, ByteDance also released its large-model service platform "Volcano Ark" in June, providing enterprises with a full range of platform services by integrating the large models of multiple AI technology companies and research institutes.

The market once believed that the industry's large model would become a way for these two big manufacturers to break through.

But this may not be the case. There is always a risk that the big industry models that are being touted today will be replaced. Wu Xiaoru, president of iFLYTEK, once pointed out to Times Finance that 10 years ago, in terms of speech recognition technology, there were also many special models focusing on different scenarios such as calling, driving, and office, but with the maturity of general model technology, special models also withdrew.

"I think big models will go through the same phase."

In contrast, in the longer term, the generic big model truly represents a platform-level or disruptive big opportunity. It is precisely because of this that neither Tencent nor Byte can let themselves miss, even if the progress is slow, but they must stick to the scene.

Some Tencent insiders pointed out to Times Finance that Tencent's plan has always been to walk on two legs, and GM and the industry go hand in hand. However, compared with some radical manufacturers, Tencent, whose products cover many fields such as social networking, games, advertising, and content creation, is more cautious.

Academic entrepreneurs take a different approach

In the battlefield of large models, academic startups from universities and research institutions form the third pole of competition.

They are not seed players like Wang Xiaochuan and Wang Huiwen, who can attract hundreds of millions of dollars of investment with their connections at the beginning of their business, and start quickly. It is not like big manufacturers such as Tencent, Ali, and Baidu, which occupy insurmountable advantages in computing power, talent, capital and other aspects.

But with their deep understanding of artificial intelligence technology, these entrepreneurs can still find new ways to develop under the attack.

For example, Lanzhou Technology, founded by Zhou Ming, former vice president of Microsoft Research Asia, is different from the large model products on the market that chase hundreds of billions or even trillions of parameters, and this Chinese AI bull who has been studying NLP (natural language processing) since 1980 hopes to solve the problem of B-side scenarios with a lighter model.

The Mencius model launched by it has refreshed the CLUE list of the Chinese authoritative evaluation benchmark for language understanding that was previously dominated by tens of billions and hundreds of billions of parameter models with one billion parameters.

This is a pragmatic decision. For data security reasons, most enterprises will not upload data, but will require localized deployment, which is significantly higher. In an interview with the media, Zhou Ming pointed out that even if it is only local deployment inference, take the trained large model to use, the 100 billion parameter large model also needs 8 to 16 pieces of A100, which translates to at least one or two million yuan of investment, "for many scenarios, customers need to be cheap and enough."

Luchen Technology, founded by You Yang, a young professor at the President of the National University of Singapore, hopes to use algorithm technology to reduce the cost of calling large models.

Nowadays, both large manufacturers and start-up companies must face the problem of the increasingly obvious trend of homogenization of domestic large models. If this problem is not solved, the future large model is very likely to fall into the low gross profit dilemma faced by cloud service vendors.

You Yang told Times Finance that this is because the iteration cost of the underlying technology base is too high. He cited GPT as an example, OpenAI costs up to $60 million per training, needs to be trained every three or four months, and four or five times for an iteration. Based on this calculation, each iteration of the technology base may require $200 million to $300 million.

The excessively high cost has led to an extreme scarcity of technology bases on the market. Basically, there are only GPT, LLAMA, and domestic GLM. All manufacturers are basically imitating these large models to make products, which has led to the problem of homogenization becoming more and more prominent.

You Yang, who has been studying high-performance computing for a long time, established Luchen Technology. The company's current open source system, Colossal-AI, can significantly reduce the development and application costs of AI large model training, fine-tuning and inference through efficient multi-dimensional parallelism, heterogeneous memory and other technologies.

You Yang believes that only with the rapid decline in the cost of large model training, or the adoption of better optimization techniques, so that the parameters are controlled at about 20 billion, and can still achieve the same effect as hundreds of billions of parameters, will the day when large models will truly bloom.

Read on