Jin Lei from Au Fei Temple

Quantum Position | 公众号 QbitAI

ByteDance has finally taken off the mystery of its own large model.

Just now, its volcano engine officially debuted the bean bag model family for the first time: 9 members were directly sacrificed in one go.

The ByteDance large model made its debut with full staff: the price was 99% lower, and there was no parameter scale and running score

Among them, the core of the large model family is the bean bag universal model, which is divided into two sizes:

Large Cup: Bean Bag Universal Model Pro, the window size can reach up to 128K, and the whole series can be finely adjusted.
Small Cup: Bean Bag Universal Model Lite, with a faster response speed.

What is very surprising is that as a large-scale model unveiling of the press conference, the "number of roads" of Volcano Engine and other large-scale model manufacturers is completely different -

No list scores, no parameter scales!

And the price has become a big highlight that makes the audience "wow", compared with other large models:

Less than 32K window size: bean bag general model pro, only 0.0008 yuan/1000 tokens, 99.3% lower than the industry price
128K window size: bean bag universal model pro, only 0.005 yuan/1000 tokens, 95.8% lower than the industry price

To make a simple conversion, it is 1 yuan = 1250000 tokens!

The main thing is the landing effect, so that everyone can use it is the last word.

Why is this so? Throughout the press conference, we can summarize the logic behind Volcano Engine's move as follows:

Only with the maximum amount of use can the best large model be polished.

It is understood that since the bean bag model was launched in August last year, the average number of tokens processed per day has reached 120 billion, equivalent to 180 billion Chinese characters; The number of images generated per day is 30 million.

Not only that, the bean bag model family will also be practiced and verified in more than 50 scenarios, including Douyin, Toutiao, etc.

Therefore, we can regard the number of ways in the performance of the Volcano engine in large models as repeatedly polished in the way of "left-handed usage, right-handed multi-scene".

In a word, whether the large model is good or not, you can know it by using it.

For example, the previous large models were in the ultra-long context window of the volume, in fact, the 128K released by the bean bag general model this time was not very impressive in terms of data.

But this magnitude is enough for daily use, so ByteDance has put more energy into "how to use it well", which is the fine tuning mentioned this time.

For example, in a random place in a 200,000-word article, we insert a sentence that is not related to the original text:

High-end hunters, often in the form of prey.

Then upload the document to Doubao and ask it to answer the question "What posture will a high-end hunter take" based on the article, and it will answer exactly according to the sentence we inserted.

It can be seen that the general model of bean bags with the blessing of 128K long context window + fine tuning is already a task that can accurately cope with ultra-long texts.

However, this is only a small part of the capabilities of the Volcano Engine's large model, and we will continue to look down.

More like people, and more understanding of people

In addition to text dialogue, voice is also an important part of the bean bag model family, and there are three members related to it:

Speech synthesis model
Sound replica model
Speech recognition models

For example, in the matter of speech synthesis, the large model of the bean bag now focuses on a super natural and more human-like model; Without further ado, let's listen straight to it:

It is not difficult to hear that the speech effect synthesized by AI is already close to the level of a real person, and it is no longer the cold "one listens to it is AI" in the past.

Moreover, it can also control the sense of pause and emotion in the process of speaking according to the context; Switching between multiple languages is no problem.

It is understood that the volcano engine relies on the timbre matrix built by the large model, and can also express more complex human emotions such as crying; If you let such an AI "read" to you, it will be immersive:

In terms of cloning sounds, the MegaTTS technology behind the bean bag sound replica model has also been upgraded this time -

It has greatly improved its ability in timbre similarity, sound naturalness and multilingual expressiveness.

Again, let's listen to the effect directly:

How is it? Is it difficult to distinguish between the original voice and the cloned voice?

What's more, no matter how weird or varied the original soundtrack of the clone is, it only takes 5 seconds! And now it can be achieved on the bean bag APP:

As a result, if you encounter something at work that requires you to "appear" in your own voice, even if you don't speak a foreign language, you don't have to be afraid.

In terms of speech recognition, with the support of the upgraded bean bag large model capability, even in a noisy environment, silky conversations can be carried out according to the context.

For example, we also communicated with Doubao in English in the environment of playing English songs:

It is understood that compared with the small model, the recognition error rate of the bean bag speech recognition model has been reduced by 30%; In vertical fields such as music, technology, education, and healthcare, the identification error rate has been reduced by more than 50%.

But if it's just a simple conversation like the above, it may be a bit too monotonous and emotional.

Another member of the bean bag model family, the role-playing model, just solves this problem.

For example, we can have a conversation with Li Bai across time and space:

Specifically, this function is an agent in the Doubao APP, which uses an upgraded Doubao role-playing model, which strengthens its more personalized persona, more natural chat, and better empathy ability.

From the above example, we can feel that "AI Li Bai" not only speaks in a poetic style, but also the content of the dialogue is highly interlocking.

And there are countless intelligent bodies such as this in the Doubao APP, such as the domineering school grass, the only daughter of the ruthless family, the intimate sister, and the God of Wealth...... Well, kind of interesting.

All in all, the overall feeling of communicating with bean bags now is that they are becoming more and more human-like.

In addition, the bean bag model has also been upgraded in terms of capabilities such as Wensheng diagrams; This function allows you to enter prompt directly in the dialog window, or you can select your favorite type in the agent square.

In the same way, what is the effect of the upgrade, we still directly test the generated results:

Of course, if you don't have your favorite agent, the Doubao APP also supports DIY, the kind that can be created in just a few simple steps.

For AI applications that are more practical and complex in learning and work, Volcano Engine also announced the large model behind the one-stop AI application development platform coze:

Function Call model: Good at using plug-ins and tools, the main model that supports buttons.
Vectorization model: A large number of texts are trained, covering different industries, with strong generalization ability, and supporting mixed retrieval of Chinese-English bilingual corpus.

In terms of use, it is still simple and efficient: whether you have a programming background or not, it is a matter of "one sentence + little by little".

No matter what kind of needs you have, there seems to be a button bot that can meet your needs.

For example, if we want to quickly find the paper we want to search for on arXiv, we just need to fill in the requirements when creating the agent:

Even if the prompt will not be optimized in the follow-up process, it doesn't matter, the button platform will automatically generate it for you with one click:

If you want to make the capabilities of AI agents more powerful, we can also choose one or more plug-ins that suit your needs from a large number of plug-ins:

In addition to plug-ins, the button platform also provides optimized solutions from more dimensions, such as workflows, triggers, variables, databases, long-term memory, etc., so that AI applications can become more personalized and localized.

And the whole process is only a matter of minutes.

It is not difficult to find that the Volcano Engine has achieved efficiency and all aspects in the large-scale model application of To C, but at the same time, the Volcano Engine has also made great moves in To B.

Industry-oriented: Upgrade the Volcano Ark

Volcano Ark, a large model service (MaaS) platform released by Volcano Engine in June last year, officially entered the 2.0 era today.

In terms of characteristics, it also follows the characteristics of efficiency, diversity, simplicity and security, and the main focus is to allow enterprises to quickly implement large-scale model applications in a one-stop way.

From the perspective of overall functions and processes, the use of Volcano Ark by enterprises can be divided into four steps.

Step 1: Pick a model

The first thing enterprises should do is to pick the one that suits them among the many "top stream" large models in the model square according to their own business needs.

Step 2: Experience the model

Whether it is suitable for your own business or not, you still have to experience it to know.

Therefore, the Volcano Ark platform also issues "experience cards" to enterprises, which can quickly experience the actual performance of each model and explore their capabilities in language, images, etc.

Step 3: Machining the model

After the company has experienced the large model of its choice, Volcano Ark also provides "processing" services.

Specifically, it is to quickly build and use exclusive large model services through professional training, inference, evaluation, and fine-tuning functions.

Step 4: Model onboarding

After everything is ready, you can really let the big model of your choice go to "work".

Looking at the whole process, Volcano Ark is like a large model factory, not only providing raw materials, but also taking over the processing and after-sales work.

In terms of specific operations, based on the upgrade of Volcano Ark, Volcano Engine has also officially released the professional version of the button, which is an enterprise-level AI application development platform.

One of its major features is that it provides enterprise-level SLAs and a variety of advanced features in addition to the ability to visualize and flexibly program agents for clasps.

The goal is to make it easier to implement AI applications and help companies focus more on innovation.

So the last question is: after all, it is for the industry, the Volcano Ark, is it reliable enough?

In this regard, the Volcano Engine also gives its solution in terms of stability and safety.

First of all, in terms of computing power, Volcano Ark relies on the massive GPU resource pool of the Volcano Engine and the tidal scheduling capability of integrated training and pushing, and through the system optimization of software and hardware, it can complete the elastic scheduling of 1,000 GPU cards from the training state to the inference serving state within 2 minutes, which can effectively support burst traffic and business peaks, and reduce costs for enterprises.

Secondly, at the algorithm level, Volcano Ark supports the same SFT training engine as the Doubao large model, and the fine-tuned model can be scheduled to a servable state in 3 seconds, and the fine-tuned model is no different from the basic model in terms of TPM support capacity, inference delay and price, which greatly facilitates you to carry out follow-up effect evaluation, online business serving grayscale and gradual scale-up, and improves the iterative efficiency of the fine-tuning algorithm of the large model.

Finally, at the security level, Volcano Ark focuses on openness and transparency, end-to-end encryption and protection of prompt data through self-developed security sandboxes, preventing malicious attacks and data leakage in the training and inference stages, and providing a transparent audit center to realize controllable and auditable data flows.

Of course, every enterprise user must want their large model service to be unique, and the three plug-ins of Volcano Engine are the key points that can provide differentiation:

Networking plug-in: Provide the same search capability as Toutiao Douyin, connect to massive high-quality Internet data in real time, and constantly learn from new data and information, so as to improve its performance and adaptability, while using multi-modal interaction methods such as text, image, and voice.
Content plug-ins: Provide Toutiao Douyin with the same source of massive content, support multi-modal interaction, provide intent-based vertical content information retrieval, and have stronger content timeliness retrieval, helping large models to deeply understand, retrieve and generate content.
RAG knowledge base plug-in: provides millisecond-level high-performance retrieval with tens of billions of scales, second-level streaming knowledge base index updates, and embedded bean packet vectorization models to improve the relevance and accuracy of searches.

All in all, whether it is the bean bag model family released by Volcano Engine this time, or the upgraded Volcano Ark, or even the tone of the normal press conference, the goal of the sword is very clear and clear.

It is the last word to use it

That's right, it's "using it is the last word".

And this is where the most obvious difference in strategy between the Volcano Engine and many large model players lies -

Most players release large models with their applications; The Volcano Engine is the opposite, and it will be officially released after it has been used.

The reason for this is exactly what we mentioned at the beginning:

Only with the maximum amount of use can the best model be polished.

As for why not release the evaluation list and parameter scale and other indicators that the industry seems to have long been accustomed to comparing, in the exchange between the qubit and Wu Di, the person in charge of the volcano engine intelligent algorithm and the person in charge of the volcano ark, he gave a very straightforward explanation:

We want to compare ourselves to yesterday.

What we pay more attention to is whether the user experience and effect are good; And not those excellent scores.

The customer himself can determine what is the most suitable model.

The answer is very simple, very confident, but where does this confidence come from?

One is the scene.

It is already a consensus in the industry that large models need user feedback to optimize, and in this regard, Volcano Engine has a natural advantage in relying on ByteDance.

It is understood that the bean bag model is continuously iterated and optimized through ByteDance's internal 50+ business and multi-scenario practice verification, which can be said to be all in the large model of the entire company.

The second is technology.

ByteDance's recommendation algorithm is also recognized as a strong player in the industry, and the current core algorithm service team of Volcano Engine (led by Wang Ke, the head of Volcano Engine's large model algorithm service), is the original team that laid the foundation for ByteDance's starting technology.

Its technical strength can be seen.

The third is the market.

It is understood that the cumulative number of downloads of the Doubao APP has exceeded 100 million, which shows the popularity of users.

In terms of To B, Volcano Engine has also cooperated with many companies in smart terminals, automobile, finance, consumer and other industries, including OPPO, vivo, Xiaomi, Honor, Samsung, Asus, China Merchants Bank, Jietu, Geely, BAIC, Zhiji, GAC, Dongfeng Honda, Haidilao, Feihe, etc.

In addition, Volcano Engine does not rely only on its own huge business scenarios on the road of optimizing large models by usage, but works with the above-mentioned partners to polish it, forming a closed-loop process.

So finally, how do we evaluate the large model of the Volcano Engine?

Perhaps, it is: more usage, lower price, more scenarios, more understanding of people, and more intelligence.

And the main theme of this conference also once again confirms the trend of "application is king" in the current era of large models——

Whoever can use it better will have the last laugh.

— END —

量子位 QbitAI 头条号签约

The ByteDance large model made its debut with full staff: the price was 99% lower, and there was no parameter scale and running score

More like people, and more understanding of people

Industry-oriented: Upgrade the Volcano Ark

It is the last word to use it

Read on

Mobile phone into model machine! Baoshan Police: A gentleman loves money and takes it in a good way

Learn more about large language model operations (LLMOps)

Slap in the face! The domestic AI model is far stronger than you think

10 domestic large models vs. college entrance examination essay: writing AI with AI

12 domestic large models vs. college entrance examination mathematics, accidentally exploded a big bug

The last round of mathematics in the high school entrance examination is to check and fill in the gaps: auxiliary circle & hidden circle & maximum value model and its extended application

The last round of mathematics in the high school entrance examination to fill in the gaps: the Hu Bugui model and its extended application

The last round of mathematics in the high school entrance examination is to fill in the gaps: the model of the melon bean principle and its extended application

The last round of mathematics in the high school entrance examination is missing and filling: the Afch's circle maximum value model and its extended application

The final round of mathematics in the high school entrance examination is to fill in the gaps: the general's drinking horse model and its extended application

The final round of mathematics in the high school entrance examination: the Fermat point model and its extended application

Recommend an open-world object detection model: DINO 1.5

16 college entrance examination records! Use mathematical models to predict Tang Shangjun's 2024 college entrance examination scores!

Podcast Update|First Voter for MiniMax: MiniMax, GenAI Conference, and Big Model Playing Cards

CVPR 2024|Only one language model is needed to generate high-quality 360-degree scenes from image diffusion models

The Digital Transformation Maturity Model and Assessment was released