laitimes

Open source big model, the next "stuck neck" technology? Deep web

Open source big model, the next "stuck neck" technology? Deep web

Open source big model, the next "stuck neck" technology? Deep web

Source: Visual China

Author|Ye Hao

Editor|Kang Xiao

Huawei's mobile phones broke through the 5G chip blockade and became the hottest topic in the technology circle in the past week. And the entrepreneurs of China's AGI general artificial intelligence model are thinking about whether the big model will become the next "stuck neck" technology?

In July this year, Meta released the commercial version of the open source large model LLaMA Llama2, which the industry believes can be comparable to commercial GPT-3.5 and unique in the open source model. The birth of LLaMA2 is tantamount to dropping a bombshell on the big model battlefield, adding variables to the global big model battle.

Open source and closed source are becoming two parallel forces in the field of large models, "The emergence of Llama2, for ChatGPT, the entire competition pattern in the field of large models has become clear, just like the battle between iOS and Android, which is currently neck and neck." Cheng Weizhong, founder of Zhongke Zhishen, told "Deep Web".

"There is no suspense in the head of the closed-source model of the US general large model, OpenAI's ChatGPT and Google have a ticket, and the appearance of Llama2 in the United States to do open source general large model is no longer suspenseful." Wang Xiaochuan, founder and CEO of Baichuan Intelligence, believes.

But in China, there is no conclusion on who makes the best big model, Wang said, "Everyone has a chance to fight for it." ”

A dangerous signal is that the smoke of Llama2's large model in the US market has also affected the direction of the domestic "100-model war", and some Chinese technology companies believe that Llama2 provides a free option for domestic companies that is expected to catch up with GPT-3.5, so there is no need to independently develop the huge cost of basic large models.

"In fact, many domestic companies want to do general large models at the beginning, the emergence of Llama2, the work that those companies have done is basically wasted, pay a lot of manpower, material resources and computing resources to make a general model, found that there is no one else's open source Llama2 effect is good, and there will definitely be a stronger open source general model than Llama2 in the future." Dr. Shao Ling, chief scientist of Terminus, told Deep Web.

A domestic large-model entrepreneur told "Deep Web" that at present, domestic enterprises and developers are much more enthusiastic about Llama2 than support for domestic large-model products.

The above entrepreneurs believe that compared with Llama2, especially the Chinese version of Llama2, the level and ability of the domestic open source model are actually comparable, if Chinese companies blindly embrace Llama2, it will repeat the situation of operating systems iOS and Android alone, and face the risk of being stuck in the field of super artificial intelligence in the future.

"Big countries definitely need their own independent research and development of large models, similar to chips, if there is no own, then it is easy to control into the hands of others." Shao Ling told Deep Web.

Needless to say, tech companies can't rely entirely on open source LLaMA, and China needs homegrown big models.

"Now the competition of large models is a simple competition of models, but also a competition of computing power and talents (these two are also 'stuck neck' factors), but the competition of large models in the future is more likely to be a competition in the ecological field." Jiang Tao, founder and chairman of CSDN, told Deep Web.

Llama2 accelerated the 100-model battle elimination

From the birth of ChatGPT in December last year, Meta's release of Llama, Stanford University's fine-tuning of Llama in March, and Falcon's emergence in May, open source models around the world are advancing rapidly. On July 18, the appearance of Llama2 directly changed the competitive landscape of large models.

According to Meta's official introduction, the Llama 2 language model series is a pre-trained and fine-tuned generative text model with a number of parameters ranging from 7 billion to 70 billion.

"Llama 2 is indeed a bombshell, it publishes the data, technology and details used in the training method, which is very rare. From a historical point of view, there is a closed source, there must be open source, if ChatGPT occupies the first-mover advantage of the general large model, there will inevitably be an open source large model ecology, the emergence of Llama2 has disrupted this market, creating more opportunities and possibilities. CSDN founder Jiang Tao said.

Yann LeCun, vice president and head of artificial intelligence at Meta, said that Llama 2 will change the landscape of the big language model market. Nathan Lambert, one of the leading authorities in the field of artificial intelligence, said that Llama 2 performance is superior to GPT-3, which is a huge blow to many companies that build large models behind closed doors.

The industry believes that the GPT-3.5 level is generally considered to be the standard line for commercial use of large models, and of the three parameter variants of the Llama2 model of 7 billion, 13 billion and 70 billion, the 70 billion version is close to the level of GPT-3.5 on MMLU and GSM8K. This means that with a large open source model like Llama 2, self-research is even less meaningful.

Domestic large model manufacturers also have two divergences in the choice of path. Baichuan Intelligence, Zhipu, Tsinghua EKG, Alibaba Cloud, etc., chose open source. And Huawei's Pangu big model, Baidu's Wen Xin Yiyan and so on chose to close the source.

The emergence of LLAMA2 has also accelerated the pace of open source of domestic large-model enterprises, and the elimination of China's 100-model war has begun.

On July 11, Baichuan Intelligent launched the large model Baichuan-13B with tens of billions of parameters, which was not only announced as open source, but also free and commercially available. The free strategy has impacted the domestic large-model payment market. KLCII announced on the 14th that its corporate registration was authorized to allow free commercial use of ChatGLM-6B and ChatGLM2-6B.

Fan Kai, CTO of Lilac Garden, described this wave of open source free tide, just like taking the waterworks to the user's home for free, let each family have a faucet, those closed-source waterworks, the best water is invincible and delicious, everyone is willing to pay.

China must have its own big model

"It's still in the era of the big model 'Western Barbaric', there is a lack of legal supervision, and all parties are scrambling. Has Meta figured out its profit model? Not really. Now that the super application of the big model has not yet come out, the entire market is still in a chaotic state. CSDN founder Jiang Tao said.

In view of the current situation, "Deep Web" contacted investors and scientists who believe that China must have its own large model. "China and the United States are the two countries with the fastest development of AI, and China definitely needs its own large model layout, not only China and the United States, but also some European countries, such as the United Kingdom, have recently invested in their own large models."

Dr. Shao Ling, chief scientist of Terminus, told Deep Web, "In the area of large models, China's development is relatively early, and before ChatGPT, China's large model research and development actually has some reserves." ”

Data show that before the release of ChaTGPT 3.0, there were already a number of trillion-parameter large models in China, they were the M6 of DAMO Academy and the Pangu model of HUAWEI CLOUD, and Zhiyuan's Wudao 2.0. But for a variety of reasons, the effect is not comparable to ChatGPT.

"China will definitely have its own ChatGPT. Just like search engines, we have our own compliance requirements. But the Chinese version of ChatGPT will only be produced in 5 companies: BAT + Byte + Huawei. Cheng Hao, founder of Xunlei and Yuanwang Capital, told "Deep Web".

Half a month after Llama2 was open sourced, a large number of Chinese versions of models based on Llama2 were localized in the form of instruction fine-tuning in China, so how did Llama2 perform after localization? SuperCLUE, a domestic large model evaluation agency, evaluated five Llama2 Chinese models that were widely discussed by the community.

According to the results, although some Chinese versions of the Llama2 model achieved good performance (such as OpenBuddy), the effect was close to that of the ChatGLM2-6B (35.12 vs 36.50). However, the performance of all Chinese version of the large model optimized based on Llama2 is still significantly different from that of domestic Baichuan-13B-Chat.

Studies have also shown that Chinese models trained on Llama2 can improve Chinese capabilities, but may also cause a significant reduction in general capabilities.

From a practical point of view, the Chinese version of Llama 2 can not meet the application needs in the Chinese environment, although it is not ruled out that with the efforts of the open source community, the Chinese performance of Llama can be further improved in the future, catching up with the native large model in China, but putting eggs in the basket of Llama, there will be a risk of simplification. Therefore, China still needs to develop its own large models.

"The reason why Baichuan and KLCII publicly disclose some parameters is to prove their advantages in various key performance indicators and parameters, which is also for large-model entrepreneurs to PK, who can run ahead is to achieve a first-mover advantage, which is crucial to success." CSDN founder Jiang Tao believes.

Who can have the last laugh?

For the current competition pattern of domestic large models, the investors, entrepreneurs and scientists contacted by "Deep Net" all believe that the entire field is still in the stage of horse racing, and it is impossible to see who will become the winner. However, the consensus is that there may be big model companies running out in 2024, and everyone is currently rushing for time.

Internet veterans such as Li Kaifu, Wang Huiwen, Wang Xiaochuan, the middle and senior management of Internet factories, plus some academic scientists, and large factories, have joined this wave of large-model entrepreneurship. One of them makes self-developed large models and the other makes vertical large models.

After Meta's LLama2 large model is open source and commercialized, it means that large model applications have entered the "free era", and startups can also create chatbots like ChatGPT at low prices.

The current opportunity in China is actually on the same starting line as Llama, and at present, 90% of these domestic companies that do general large models are expected to prefer to develop based on open source large models.

Wang Xiaochuan said that in the future, open source and closed source will develop in parallel like Apple and Android. Most services rely on open source models, while closed source provides specific value-added services. The open source model provides 80%, and finally provides the remaining 20% of services on closed source.

Fu Sheng, founder and chairman of Cheetah Mobile, publicly said on social media: "Big models are no longer unattainable, and the era of civilian large models has arrived!" Companies like ours wake up laughing late at night. ”

Taking Zhongke Shenzhi, which makes digital virtual humans, as an example, Cheng Weizhong started large model training around the Spring Festival in 2023, and 5 months later, Zhongke Zhishen released a 2 billion parameter-level large language model "Digital Intelligence Jiang Shang". This is a product they made by renting 2,000 NVIDIA A100 graphics cards.

"For most enterprises, it is wiser to train on a better open source model. Even if I have 'Digital Intelligence Jiang Shang', I feel that at a certain point, with a particularly good open source big model, we will translate our training work to an open source big model. Standing on people's shoulders, progress will be faster. ”

"If Llama2 is a highway, what we're doing now is building a road so that it can connect to the highway. ”

"With the development of the open source model and the digital upgrade of the industry, the number of developers will double, and the application demand of enterprises based on private data will also be released. We will also usher in a new era of intelligence in which everyone is a developer, knowledge refining models, software tools are fully reconstructed, and intelligent applications are millions. CSDN founder Jiang Tao said.

For the advent of this new wave of AI technology, startups are working hard to build small roads in order to better connect to the highway in the future, and the mature open source large-model ecology is that highway.

Read on