Exposing 9 large models at a time, what has "Byte AI" been doing this year?

Throughout 2023, Byte has not officially announced its internal self-developed large model. The outside world once believed that the technological change of large models was late for bytes. Liang Rubo also mentioned this at the annual meeting at the end of last year, saying that "bytes are not as sensitive to technology as startups, and it is not until 2023 that GPT is discussed."

Despite this, there is a constant stream of news about byte models and AI applications.

On August 31, 2023, the first batch of large-scale model products in China passed the "Interim Measures for the Management of Generative Artificial Intelligence Services" for the record, and the ByteDance Skylark large model was impressively listed. At the same time, it was revealed that Byte had established a new AI department, Flow, which gathered a group of the most capable talents of Byte Group to explore AI applications. In the past six months, ByteDance has launched AI applications covering almost all popular tracks, and bean bags and buttons are the two most representative.

On May 15, at the Volcano Engine Power Conference, ByteDance unveiled for the first time a corner of the mystery of the above-mentioned self-developed large models and AI applications: the Doubao large model (formerly the Skylark large model) family made its debut, and its large model products - "Doubao App" and AI application products - "Buttons" were also elaborated for the first time.

Volcano Engine is ByteDance's cloud service platform, according to Tan Bei, President of Volcano Engine, after a year of iteration and market verification, Byte's self-developed large model - Doubao Large Model (formerly Skylark Large Model) is becoming one of the largest and most abundant application scenarios in China, currently processing an average of 120 billion Tokens text per day and generating 30 million images.

As far as the bean bag model is concerned, the price of model inference has become a major highlight, which is also the direction of model manufacturers in the past two weeks. Tan said that the large model will be priced from cents to cents, which will help enterprises accelerate business innovation at a lower cost.

At present, the Doubao model has greatly reduced the unit cost of model inference, and its main model is priced at only 0.0008 yuan/1000 tokens in the enterprise market, and 0.8 centimeters can process more than 1500 Chinese characters, which is 99.3% cheaper than the industry.

In addition to the model product itself, what is more noteworthy is: ByteDance's thinking on large models and AI products.

The members of the bean bag model family, why are there currently these nine?
What is the thinking behind the model product "Doubao APP" and the application product "Button", as the two main applications of ByteDance?
What is the "ambition" of Volcano Engine as a cloud platform in the new era?

At this press conference, these questions were also answered.

01 9 models, the bean bag large model family debuts

At this year's AI conference, large model manufacturers no longer only focus on the base model itself, but also launch models, tools, and applications. Obviously, the large model landed and went further.

The same is true for ByteDance, which officially released a series of latest products such as ByteDance's bean bag large model family, Volcano Ark 2.0, AI applications and AI cloud infrastructure at the Volcano Engine Power Conference.

Let's take a look at the model first, the two major evolution directions of the current large model industry are price and performance: the price of model inference is further reduced, and the model performance is further improved. In these two directions, the bean bag large model family has its own characteristics.

Exposing 9 large models at a time, what has "Byte AI" been doing this year?

Tan Bei, president of Volcano Engine, announced the pricing of the bean bag large model|Image source: Volcano Engine

Volcano Engine said that in terms of model price, the pricing of the main model of Doubao in the enterprise market is 0.0008 yuan/1,000 tokens, and 0.8 centimeters can process more than 1,500 Chinese characters, which is 99.3% cheaper than the industry.

Tan believes that cost reduction is a key factor in driving large models to fast forward to the "value creation stage". When asked "if you lose money to subsidize the price by pushing the price so low", Tan said, "The ToB business is not sustainable to exchange losses for revenue, and the volcano engine has never taken this path." There are a series of technical means to reduce the price of inference, and we can do better in the future", such as optimizing the model structure and replacing stand-alone inference with distributed inference in engineering methods.

In terms of model performance, the "Doubao Large Model Family" is unveiled with 9 models according to market demand, mainly including nine models, including the general model Pro, the general model Lite, the speech recognition model, the speech synthesis model, and the Wensheng diagram model.

The reason why the current stage converges to these nine large models is that ByteDance comes according to the number and demand of the back-end model.

Tan said to Geek Park that first of all, there must be a strongest workhorse model that can support advanced functions; Secondly, there are high requirements for low latency in different scenarios or on the device side, so Doubao lite is also required. There is also a need for models that compromise performance and low latency; There are also models in large vertical scenes, such as entertainment product role-playing, "which most likely do not need to be programmed, but need to enhance interactive entertainment".

Doubao large model family|Image source: ByteDance

Doubao Universal Model pro: ByteDance's self-developed LLM model professional version, which supports 128k long texts, the full series can be fine-tuned, has stronger comprehensive capabilities such as understanding, generation, and logic, and adapts to rich scenarios such as Q&A, summary, creation, and classification;
Doubao general model lite: ByteDance's self-developed LLM model lightweight version provides lower token cost and lower latency than the professional version, providing enterprises with flexible and economical model choices.
Bean Bao Role-playing Model: Personalized character creation ability, stronger context perception and plot promotion ability, to meet flexible role-playing needs;
Bean Bag Speech Synthesis Model: It provides natural and vivid speech synthesis capabilities, and is good at expressing a variety of emotions and interpreting a variety of scenes;
Doubao sound replica model: 1:1 cloning of sound can be achieved in 5 seconds, which highly restores the similarity and naturalness of timbre, and supports cross-language transfer of sound;
Doubao speech recognition model: higher accuracy and sensitivity, lower speech recognition delay, and support for correct recognition in multiple languages;
Bean Bao Wensheng Diagram Model: More accurate text comprehension ability, more accurate picture and text matching, more beautiful picture effect, good at the creation of Chinese cultural elements;
Bean Bag · Function call model: provides more accurate function identification and parameter extraction capabilities, suitable for complex tool invocation scenarios.
Bean Bag Vectorization Model: Focusing on the use scenarios of vector retrieval, it provides core comprehension capabilities for the LLM knowledge base and supports multiple languages.

For the Doubao large model family released today, an investor said, "Byte does not emphasize parameters, data and corpus, and directly subdivides the model capability vertically in the scene, which is the difference between application and non-application, and more essentially the difference between data and no data." With user feedback and data feedback, Byte can make more accurate scenarios and services based on user and data feedback."

Just like Toutiao and Douyin, which ran through in the mobile Internet era, Byte is also a data logic in AI, which determines the next move of a product or model based on different data chain feedback. On the contrary, if only the basic model is done and no service upgrade is done, the scene feedback and user data feedback will become less and less, and the difference in model capabilities will be widened.

02 Bean Bao, how to carry the large-scale model product idea of "App Factory".

In fact, as early as last year, the Doubao model (formerly known as Skylark) was launched within ByteDance, and more than 50 of its internal businesses have also used a large number of Doubao models for AI innovation, including Douyin, Tomato Novels, Feishu, Giant Engine, etc., to improve efficiency and optimize product experience.

ByteDance has also built a series of AI-native applications based on the Doubao model, including the AI dialogue assistant "Doubao", the AI application development platform "Button", the interactive entertainment app "Maobox", and AI creation tools such as Star Painting and Instant Dream.

Among them, bean buns and "buttons" are the main products of ByteDance.

According to QuestMobile, the Doubao App, which is based on the Doubao model of the same name, ranks first in the number of downloads of the Doubao App in the Apple App Store and major Android app markets. According to the latest official data, more than 8 million agents have been created on Doubao, and the number of monthly active users has reached 26 million.

Zhu Jun, Vice President of Product and Strategy at ByteDance|Image source: Volcano Engine

At the press conference, Zhu Jun, head of product strategy and Flow department of Byte, took the Doubao App as an example to describe the product idea of Byte as an AI native application for the first time. He believes that compared with product design before the AI era, the core needs of users have not changed, including efficient access to information, work efficiency, self-expression, social entertainment, and so on.

The difference is that in the past, I wanted to apply it in mature technology, but as long as I used empathy to understand the needs of users and the user experience, I could make a good product. Now the technology under the product is no longer a stable foundation, the ability of the large model is still defective in many dimensions, and at the same time it is evolving rapidly, and there will be great changes every three months and half a year, not even linear gradual changes, but sudden jumps.

Therefore, he believes that a big challenge in the application of large models is to first judge what tasks the large model can solve now in this dynamic development process, and more importantly, try to predict what kind of tasks the large model can solve in half a year or a year.

Taking the Doubao App as an example, he shared Byte's thoughts on making large-scale model applications.

Anthropomorphism

Zhu Jun said that the first product design principle of Doubao is "anthropomorphism", which is a new feature of large-scale model products, a new interaction method of natural language, which lowers the threshold for use, and also allows users to feel the temperature of the product similar to a person when using the product. To reflect this anthropomorphic feeling, the app uses a nickname like "bean bag" that is like the everyday name of a close friend.

Close to the user

The second design principle of bean bags is to be close to the user. It should be able to accompany the user at any time and embed it in different user environments. "Bean Bao to the user, not the user to the Bean Bao".

One example is the design of the entrance to voice interaction. In order to make it more convenient for Doubao to interact in this kind of mobile scene (such as outdoors), like a jack-of-all-trades that can be carried around, Byte has invested a lot of effort in optimizing the voice interaction experience early on, including ASR based on large models and supernatural TTS timbre, trying to make it feel similar to a conversation with a real person. Doubao is almost the first major manufacturer in China to establish a voice interaction entrance as the default interactive interface, and later the voice interaction entrance has also been added to other large model apps on the market.

personalize

The third design principle is "individuality". Although the general model can solve a very wide range of tasks, in fact, users have their own personalized needs, including the functional positioning of the agent, and the response style, voice, image, and memory have very personalized needs.

Zhu Jun believes that in the future, there is a high probability that users will have a major agent (such as Doubao) to do the most frequent interaction and solve many tasks; But it will also interact with many other agents because of the needs of personalization and diversity.

He concludes, "The challenge and joy of making a large-scale model product is that you need to constantly determine what the PMF (product-market matching point) of the next product may be in this continuous and dynamic technological development."

03 Model landing, infrastructure for volcano engines

In addition to the Doubao large model family and Byte's AI application product ideas, at the main venue of the Volcano Engine press conference, the large model service platform "Volcano Ark" has also upgraded a variety of plug-ins and AI application services such as data, marketing, and sales.

In terms of plug-ins and toolchains, Ark 2.0 has upgraded networking plug-ins to provide the same search capabilities as Toutiao Douyin; Upgrade the content plug-in to provide Toutiao Douyin with the same source of massive content; Upgrade the knowledge base plugin to improve the relevance and accuracy of your search.

At the same time, Volcano Ark 2.0 has also fully upgraded the underlying infrastructure. In terms of system carrying capacity, it provides abundant GPU computing resources and super elasticity of minute-level kilocalorie expansion and contraction, ensuring business stability and cost control. In terms of security protection, a trusted execution environment and a multi-dimensional security architecture are built through the security sandbox to ensure data security. In addition, Volcano Engine provides professional algorithm team services to help customers unleash the value of unique data and implement large-scale model applications.

In addition, in response to the natural language-based application development model brought by large models, Volcano Engine has also launched a new generation of AI application development platform - Buckle Professional Edition.

Pan Yuyang, product manager of Button, introduced that as a new generation of AI application development platform launched by ByteDance, Button (coze) has the advantages of low threshold, personalization, real-time, multi-modality, etc., and integrates massive AI resources and rich release API services.

Regarding buttons, Tan Bei believes that "there must be a low-code ecology like a button, and the application ecology is the collective wisdom of a bunch of people, which must allow many people to do various things (AI applications) with a very low threshold."

Exposing 9 large models at a time, what has "Byte AI" been doing this year?

01 9 models, the bean bag large model family debuts

02 Bean Bao, how to carry the large-scale model product idea of "App Factory".

Anthropomorphism

Close to the user

personalize

03 Model landing, infrastructure for volcano engines

Read on

Discussion|The second model of the stone hitting the bridge pier, can the bullet break the bridge pier?

Gu Weihao, CEO of Momo Zhixing: AI large model is the only way to realize autonomous driving

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Six front-line AI engineers summarize the explosion! The experience of large-scale model application for one year is public

Sunday Meditation (152): Journal paper based on the fairness concern model of the Stackelberg game

The Stanford team was exposed to plagiarism of the Tsinghua system and deleted the database, and the CEO of the plagiarized company was also internationally recognized

Mistral's first "open" programming model

Stanford AI team plagiarized domestic large models? Even the identification of "Tsinghua Jane" was copied! The Tsinghua team responded

A preliminary study of the basic model in the figure below in the era of rapid development of LLMs

Chaos Cosmos adds over 650 high-quality 3D models and materials

It seems that AI is the trend of mobile phone development in the future, and it has recently been revealed that Siri will be completely remodeled with AI, allowing it to control all functions, which allows users to control the single through voice

The Stanford AI team was accused of plagiarizing a large domestic model

RAND: Make sure the AI model weights

The Stanford AI team admitted to plagiarizing the Tsinghua model, publicly apologized and pulled down the controversial project

Today's Legal Q&A: Whether the Stanford AI team's plagiarism of the facewall open source model is infringing

The model jointly developed by Tsinghua University and Facewall was shelled, and the two Stanford student authors apologized for deleting the citation