Who is left at the Big Model table?

"The feeling of "working and earning money before you are an adult": small talk is okay, but you can't study it in detail."

Written by|Shi Shengyuan

Editor|Zhai Wenting

The Tencent hybrid model has finally appeared.

In Tencent's own words, it was "not in a hurry to show the semi-finished product." But this release, they also admitted that at present, "it is only usable and practicable".

As early as March, Baidu Wenxin launched an internal beta invitation; In April, Ali Tongyi Qianwen followed. Even the belated byte also tested the AI dialogue product "Bean Bao" on August 17.

In the "100-model war", is the first-mover advantage important?

It doesn't seem that important. The large model is a very standardized product, whether it is individuals, enterprises or developers, can be accessed through the API, and the cost of switching models is quite low. In the end, it is the effect and experience of the product that determines everything.

But it also has a little effect. Real questions from users are the most valuable data assets. Run first, you can accumulate more data, help large models train, learn, and enhance their capabilities in real scenes full of noise and ambiguity.

The first batch of 8 large-model products that passed the "Interim Measures for the Management of Generative Artificial Intelligence Services" have been opened for registration one after another, and ordinary users can finally get started and experience. However, after a few rounds of chat, there will be a feeling that large model products "come out to work and earn money before they are adults" - small talk is okay, but not detailed.

This also inevitably makes people worry that the instability of the generation results will become a constraint to the actual deployment, and the optimization cycle is relatively long.

The big model players who can really stay at the table must be a minority.

First, homogeneous competition?

Judging from the large-model products and solutions announced by various manufacturers, the situation of homogenization is more serious.

In the toB office scenario, it mainly focuses on document and conference scenarios, acting as a creative assistant, conference secretary, and design assistant; toC personal scenes, the cards played are also emotional companionship, life science (recipes, travel planning).

At present, Baidu Wenxin, Byte Bean Bao, Zhipu AI, and Baichuan Intelligent are all fully open for registration and use; The Chinese Academy of Sciences Zidong Taichu is under maintenance, SenseTime needs an invitation code, MiniMax is only for developers, and the general large model of the Shanghai Artificial Intelligence Laboratory has not yet been registered.

In addition, the iFLYTEK Spark model has also opened for full registration, and the Tencent Mixed Yuan model is still an invitation system for the time being, and you need to apply and queue.

The 5 products that are open for registration are all in the form of chatbots, and they have also added different degrees of prompt guidance and usage scenario prompts. Some recommend questions in conversation, and some preset assistant roles. Some have gone a little deeper, making prompt templates, communities or plugins, which can vaguely see the ambition of building an ecosystem and borrowing wisdom from the creativity of users and developers, but they are currently in a relatively early stage.

But similarity in user perception is not equal to similarity in business logic.

All large model manufacturers, without exception, want to leverage the company's existing business to carry out differentiated competition.

Baidu is the big factory that emphasizes "ecology" the most, and the deepest business scenario is also "search". In a prominent position on the homepage of Wen Xin, a link to the plug-in market entry application is placed. Baidu is also particularly active in connecting developers and entrepreneurs, taking the lead in holding the Wenxin Cup Entrepreneurship Competition. In the Baidu search engine, the AI conversation assistant has also been launched and is open for use.

The first scene where Ali Tongyi Qianwen landed was DingTalk, and Ye Jun, president of DingTalk, once said, "I want to redo DingTalk with a large model".

When Tencent released the hybrid model, it also said that more than 50 businesses and products such as Tencent Cloud, Tencent Advertising, Tencent Games, and Tencent Meeting have been accessed.

iFLYTEK masters 9 dialects in the field of machine speech recognition, which gives the Spark model an absolute advantage in accepting voice data. In addition, educational hardware such as iFLYTEK's learning machine gives the Spark model a natural advantage in combining it with educational scenarios.

Second, "many will disappear quickly"

In addition to the difficulties of the technical layer and the ecological assembly of the business layer, there is also the battlefield of "big model evaluation": all large manufacturers want to compare GPT.

According to incomplete statistics, since August, at least 4 local large models have officially announced that they have surpassed GPT in some aspects.

iFLYTEK said that the code capability of the Spark model exceeded GPT 3.5; SenseTime said that its new model, internlm-123b, exceeded GPT 3.5 in 300,000 questions in 51 review sets; Baichuan CEO Wang Xiaochuan said that after fine-tuning his model, its performance in Chinese Q&A and summary segmentation scenarios exceeded GPT 3.5; Tencent was even more unceremonious, and Vice President Jiang Jie said that the Chinese capability of the mixed element model exceeded GPT-3.5.

If you don't have a "exceed GPT in a specific area" review, you seem embarrassed to join the big model scuffle.

But making a model the top questioner of a "evaluation dataset" is of little significance to the actual efficiency improvement.

People in the industry know that there is an opportunistic training method, which is to output high-quality large models on open source datasets, and then use the results of these outputs to fine-tune small models and directly copy the work of large models. But Berkeley scholars have shown that these imitation models just look good, the actual ability is not improved, and the generalization ability in real scenes is weak.

At present, OpenAl's GPT-3 has 175 billion parameters, and the scale of local large models is generally between tens of billions and hundreds of billions.

In addition, reviews that are divorced from specific usage scenarios are hooliganism. In the toB office scenario, it is most important to accurately extract data and give stable output. In the toC accompaniment scene, the empathy and sense of humor of the model are the key to providing emotional value. The review lists released by each company are more like PR behavior than usability evaluation.

Shen Jing, president of Baidu's intelligent cloud business group, said in an interview that there are many models on the market, but many will quickly disappear. "A lot of models still exist because a lot of people don't know how good or bad it's yet. Anyway, no one can try, no one can use it, and the ranking is quite high. But with the release of the model, it is easier to judge the pros and cons."

The time has come to gradually let go.

Xinberry daybreak experienced the large model products that can be registered on the C-end at present. In the task of generating "Little Red Book Seed Copywriting", the performance of several products has reached "smooth writing", and even "a little easy to use". Wen Xin is good at adding tags to drain traffic, the copywriting of the bean bag is quite intimate, the copywriting of mixed yuan is a bit of the taste of a 4A advertising company, Zhipu Qingyan is like a rigorous Chinese teacher, and iFLYTEK Spark cuts from the scene. Or the local model knows the local social platform best.

But in toB, the toes of large models have touched the mud of the application scenario.

Tencent, Huawei, SenseTime, and Baidu have all mentioned that their large-model solutions have covered more than ten or dozens of industry scenarios. But in reality, are companies really using it?

"Making a big model an assistant to a certain industry, such as a large model in the financial industry, is still too general, and the industry and scene need to be broken down into more detail." Peter said he is an algorithm engineer who develops and explores large model applications at a financial institution.

He introduced that taking banks as an example, they have many main businesses. In the capital market business alone, there are more than ten sub-businesses such as private placement, equity investment, equity incentives, debt-to-equity swaps, and exchangeable bonds. For equity incentives alone, there are dozens of relevant laws and regulations.

"Now we can't even let the big model learn the laws and regulations of equity incentives to give reliable answers. Of the 10 questions, 5 of them are answered correctly."

Third, the model should be large and the application should be vertical

It is undeniable that when the base capacity of the Chinese large model is still weak, the upper application has already run first.

"The idealized scenario is that the large model can identify the intention of the questioner in the initial communication, and then distribute it to different AI agents with knowledge of the subdivided field, and then let each AI agent deal with it, rather than being a large and comprehensive legal AI assistant and financial AI assistant."

David is an AI product manager at a startup that developed a Character.ai-like product. He believes that as a developer, engineering efforts such as process planning and system stability are more important for landing applications.

Magi founder Ji Yichao also mentioned a similar view in the podcast: "AI entrepreneurship is 80% product engineering + 20% underlying technology."

Ji Yichao believes that more than 65% of the application scenarios of large models are information retrieval, aggregation, and regeneration, and about 20% of the requirements are process automation and decision assistance.

Taking the retrieval and generation of information as an example, it seems simple, but in fact, every corner and every detail needs to be optimized. Whether the data can be processed cleanly, whether the slicing of text blocks is complete, how samples and machines are distributed during training, and how to trade-off between response speed and cost, involves a lot of work. If the quality of each link is only 60-70 points, then the final usability must not be ideal when connected in series.

Jiazi Lightyear conducted a customer group analysis on 10 large models with high popularity at home and abroad, and foreign large model manufacturers mainly focus on the use of ordinary C-end users, and the business model is to charge subscription fees. The big models in China seem to have made up their minds, do the platform, do the ecology, and then make money from B-end customers, the business model includes pay-as-you-go API calls, as well as more in-depth solution services and model custom development.

However, whether it is toB or toC, the business model may be different, and the key to making users pay is the ability of the underlying model.

After all, the ability of the upper-layer application is still determined by the underlying model. The capabilities possessed by the basic model may not be able to be exerted by upper-level applications; But the basic model does not have the ability, the upper application must not be able to do.

Peter admits that they tested a circle of large local models, and in real scenarios, the performance was "almost meaningful". For the fine-tuning of the industry model, they "dare not think about it", because "at least 5 million at a time", the effect is not yet known.

"So there will definitely be vertical applications at this stage, but it is unlikely that there will be vertical models." Peter concluded.

Another key thing for domestic app developers to consider is compliance. Two regulations provide specific guidance: the Provisions on the Administration of Deep Synthesis of Internet Information Services, which came into effect on January 10, and the Interim Measures for the Administration of Generative Artificial Intelligence Services, which came into effect on August 15.

At present, AI products need to pass algorithm filing and security assessment before going online, which is called "double new assessment" in the industry. It can be said that being able to achieve compliance faster and more timely is also part of product competitiveness.

It is not difficult for careful users to find that the large model dialogue product interface currently available on the domestic C-end almost all has disclaimers and watermark marks. The former suggests that AI-generated content is not necessarily guaranteed to be authentic, while the latter ensures traceability as information travels.

Domestic large models have just moved from the laboratory to the market and began to be aimed at real users. At this time, the value measurement standards of the business world are taken out, and they are asked three consecutive questions with extreme pragmatism, "whether they can really improve work efficiency, can they effectively reduce costs, and can they optimize user experience", which is inevitably a bit harsh. But this is precisely the real concern of enterprise users, and it is also the core value of big models in business applications.