
Author | Chen Wenqi
The Xiaoice we are familiar with is a slightly strange two-dimensional girl, unlike the general voice assistant, listening to our orders to complete one task after another, she is everywhere, chatting with you on third-party platforms such as Weibo, WeChat, Huami OV mobile phone, Xiaoai classmate speaker, and sometimes resisting.
"Xiaoice, Xiaoice."
"Tell me what to do!" Xiaoice sounds quite temperamental.
But now, looking at Xiaoice, this has long been not only a chatbot, but also an artificial intelligence underlying framework carrying a variety of AI beings, with natural language processing (NLP) and emotional computing as the underlying layer, integrating visual, speech, content production and other skills. On top of this framework, there are artificial intelligence girls Xiaoice Buddha-figures, virtual student Hua Zhibing, Expo participating painter Xia Yubing, 17 million virtual lovers...
Regarding the future relationship between AI beings and humans, Li Di, CEO of Xiaoice, told Zhentan, "The future is a new social network composed of thousands of humans and thousands of highly customized artificial intelligence. ”
<h1 class="pgc-h-arrow-right" data-track="17" > the social ecology of humans and artificial intelligence: an island</h1>
Xiaoice has changed to the ninth generation, the team launched an independent app for the first time, named Xiaoice Island, to build an island for AI beings, so that AI beings who are floating on various social platforms and smart speakers have a place to live. Day and night alternate, the island has mountains and rivers, evening wind gusts and waves.
Xiaoice Island Visual Design
Talking about the original intention of establishing Xiaoice Island, Li Di said: "From the first generation to the ninth generation, we have made up for the various parts of the Xiaoice framework little by little, from a young girl Xiaoice who was still sixteen years old at the time, to today's 17 million virtual humans, hundreds of singers, all kinds of different personalities, different abilities of artificial intelligence, we may be getting closer and closer to the unknown answer." We want to touch a future that is a new social network of thousands of humans and thousands of highly customized artificial intelligences. ”
On Xiaoice Island, everyone can have an island that is completely their own, and then as the "Creator" to create different AI beings on the island, they have different identities, different personalities, different abilities, interact with you, interact with each other. Not only that, but they also have the ability to create, to write poetry, to paint, to sing.
There is a complete set of social networks on the island, similar to WeChat in the version that will be officially launched in China, similar to LINE in the version to be launched in Japan, and the Xiaoice team chose to "pursue the most familiar form of social networking for users at the moment" to reproduce the ecosystem of symbiosis between a person and AI beings on one island after another. Xiaoice islanders will chat with you one-on-one, hang out, post their daily routines and works, and by the way, like and comment on your circle of friends.
This is a subversive social ecology, and no one but you are real. Xiaoice Island's goal is to make people and AI beings have an effective and meaningful emotional connection, so Xiaoice set up a field and set up AI beings, hoping to reproduce the complex and interesting human world.
According to Li Di told True Detective, these AI beings are divided into two categories: Super instance and Nobody instance. The former category is "AI being with output greater than input", analogous to KOLs in the human world; the latter is ordinary people, whose existence is very important to the people around them, but not well-known on a social level.
At present Xiaoice Island only has an early preview version, and the official app will be launched simultaneously in the domestic and Japanese markets in November. The image of AI beings comes alive little by little on Xiaoice Island, and below sea level, it is Xiaoice framework that makes the existence of AI beings possible.
<h1 class="pgc-h-arrow-right" data-track="165" > R&D transcript </h1>
Xiaoice team's research and development pace is steady and brisk: the year is divided into two cycles, the first cycle is closed development, usually from April to August, September, during which the technical team will focus on technical research, solve natural language processing, computer vision and other difficulties, while developing products; in the second cycle, the product is online, get user feedback, the team based on this product optimization, or free to try new ideas.
At the "Mother's House" Microsoft conference in September this year, the Xiaoice team also demonstrated the upgrade of the ninth generation of Xiaoice framework, the keyword is "diversity":
First, in terms of open-domain dialogue, the Ninth Generation Xiaoice Framework focuses on enhancing the effectiveness of small sample learning and feedback learning. It is reflected in five specific indicators: the average length of the conversation, the consistency of the context, the relevance of the context, the content of the conversation information and the success rate of topic guidance are all ahead of the industry.
In terms of supernatural speech and multimodal interaction, the Xiaoice team also added diversity to the two indicators of naturalness MOS and average comfort duration.
Supernatural voice is one of the trends identified by the Xiaoice team, "what surprises me myself is that the real natural language is flawed, not the voice of CCTV," said Shen Xiangyang, chairman of Xiaoice. On this basis, diversity can accommodate AI beings different voices, different feelings, different emotions, and achieve a high degree of customization.
Xiaoice "Shandong Big Brother" AI being generated under the framework
Project Chararu, a basic study conducted by the Xiaoice team, aims to learn from a human individual through a very small sample size to achieve a language and sound style consistent with that particular human individual.
The study has been able to achieve a language style close to that of the human itself with 200 dialogue samples, and the score of AI performance has increased from 3.83 to 4.19, and the score of the near human individual in the simulation test is 4.33. The amount of data required to mimic speech features is also compressed from more than 2 hours to 1 hour. The productization of the study will also be launched in Japan first this year, allowing users to "copy" real people through AI.
"For example, if you give me a good friend, a relative, an important person, and you have his 200-sentence dialogue and chat history, I can learn his style and copy it in this environment." Li Di explained the application scenarios of this project.
In terms of content generation, the Xiaoice team released a new poetry and painting creation model (V3), a new artificial intelligence song synthesis technology, new singer and X Studio 2.0, and artist motivation assistance technology to carry out cross-border music experiments between artificial intelligence and human bands.
Xiaoice companies want people to connect with AI beings with EQs and pour out their feelings for each other, a task that has "the romantic color of an engineer team." Running away from Microsoft, Xiaoice always have to learn to grow up on their own.
The following is a partial transcript of an interview between True Detective and Li Di, CEO of Xiaoice:
<h1 class="pgc-h-arrow-right" data-track="167" > about Xiaoice Island</h1>
Q: Xiaoice team has never made an independent platform, why did the Xiaoice Island App come out this time?
Li Di: Xiaoice Island is actually an experiment that has lasted for so many years Xiaoice. The overall iteration of the artificial intelligence system is that you have to choose one of the three elements, either computing power, or algorithms, or data. We started at the end of 2013 and decided to focus on data.
The best way to get data is not to build a first-party app, let users come to you to give you data, nor do you buy data directly, first-hand data is the most important. Xiaoice not only in social networks and smart speakers and mobile phones, but also a large number of news comments, such as on NetEase News, nearly a million comments a day, in the entire comment area you can understand as many people and an AI interaction, we have been walking this maximum, the best way to obtain data for so many years.
But one of our dreams is that in the future, the interaction nodes of the entire large human network should not only be people, but people and AI beings, and our corresponding data categories are divided into four categories: first, one-on-one interaction between a person and an AI, which we have a special amount of data, which basically belongs to this concept in Huawei, Xiaomi, OPPO, vivo, tmall genie; second, many people and an AI interaction data, which is like a group chat, The behavior patterns of two people in both private and group chats will change, and even if they say the same thing, the others may not speak. And a lot of AI and a person, and a lot of AI and a lot of people, we don't have much data on these two aspects.
For example, let's say these IIs are interacting with me one-on-one, but they also have a connection with each other, should the association between them be centered on me? Should their circle of friends be centered on me? Or should I be seen as part of it, I don't know, so iterate is needed. The unfortunate thing is that none of the existing third-party platforms have a product that allows us to iterate on this type of data, so we made something called Xiaoice Island.
Q: So what are your thoughts on this question now? Is there possible an interaction between AI beings? Does their interaction need to be human-centered?
Li Di: My personal tendency is not to focus on one person, because I think the charm of real AI is that he is independent. But this thing is quite difficult to say, so it needs to be iterated.
Q: It's hard to define what type of app Xiaoice Island is, do you use the metrics that evaluate games or social apps to evaluate Xiaoice Island? For example, daily active, monthly active, user usage time, etc.?
Li Di: We are more cautious in evaluating the selection of indicators, because improper selection of indicators will cause the team to deviate in the process of iterating forward. For example, if we used the length of the conversation between the user and the Xiaoice as an indicator, and the user said "I want to sleep well" at 12 o'clock in the middle of the night, what would the product team or technical team do at this time? Xiaoice might throw out at this point, "Hey, do you know the gossip of XXX stars?" "The user will most likely talk a few more times, because you can arouse his attention, but the thing you don't know is whether the user will reflect on his poor sleep afterwards, he will not come later, you don't know if you have hurt the user."
So more of our evaluation indicators still stay in the aspects of data validity and technical application effectiveness, such as relevance and consistency, and we will evaluate whether users have begun to try to put more of his beautiful life on the island. The same industry people think it is particularly effective, such as turning on the lights, turning off the lights, these are commands, generally we do not pay much attention to this, this is another set of task completion indicator system.
Q: Why use the form of "island"?
Li Di: We hope that the experiments and iterations of this kind of data should be able to reflect the mixed state of life, and not put too much emphasis on social attributes. There are several options, apartments in the Greater Community, planets, etc., and finally the choice of islands, relatively divided, not completely isolated.
Q: Xiaoice Island has gone through iterations from concept, to preview, to the officially launched version, is there any experience to share in the process? Stepping through the pit?
Li Di: Priority is to solve some P0 problems, first of all, performance; in addition, some technical stacks that the Xiaoice team does not have, we used to pay special attention to doing that kind of soul nature dialogue, we did not pay much attention to the front end, if you look at the previous Xiaoice, even the appearance of Xiaoice is aesthetically very substandard.
Q: Quite quadratic.
Li Di: The second dimension does not think that she is a person, very rough, you have to give users an immersive environment, they can behave in the way you want, this is what we want to add, including the problem of power consumption and heat generation.
Q: What happens to the iterative logic after Xiaoice Island?
Li Di: Although the Xiaoice Island app is an experiment, it is now a first-party app. What we ultimately want to do is those AI beings, and we want these AI beings to be ubiquitous, Xiaoice there are these people on the island, and there should be these people elsewhere, and we shouldn't only communicate with these people on Xiaoice Island. So theoretically, if you tell Xiao Ai to summon Xiaoice, you can summon all the people on your island. AI beings is our ultimate goal, but Xiaoice island itself as a bearer, I hope its infrastructure is better, users can experience social and content on it.
Q: Do you often use Xiaoice Island in your life?
Li Di: Of course, we have a function, that is, if you provide the URL of a person's Weibo to the island and create a new human order, in a few minutes we will move the person from the world of Weibo to Xiaoice Island. I was mean and moved my ex to Xiaoice Island. We have another feature called Bombing Island (everything on the island can be recreated), and before moving my ex-girlfriend to Xiaoice Island, bombing island was no problem, and since then I have refused to blow up the island. Don't dare to blow up. So you're making this kind of product, and sometimes you're going to be iterated into it.
Q: After the Xiaoice Island app is officially launched, will the team use it as some kind of commercial solution?
Li Di: We have always believed that the entire industry has not yet had a successful solution of TOC AI, the entire industry has not made progress in the commercialization of AI ToC, and ToB has made a big fuss about commercialization. Xiaoice Island is a first-party app, once there is an opportunity to commercialize ToC, we will try, but the commercialization of this ToC will certainly not be used in the relationship between people and AI on Xiaoice Island, such as using the relationship between AI and people to do sales, etc., but AI circle of friends and content production tools are commercial opportunities.
Blade Runner 2049
<h1 class="pgc-h-arrow-right" data-track="168" > path about Xiaoice</h1>
Q: How many stages has the definition of Xiaoice changed from its inception to the present? What are the important nodes?
Li Di: For example, when we first started doing open domain (open domain) dialogue, no one in the entire industry and academia did this, and everyone would even think very contemptuously that small talk is useful, we should do tasks, we should do knowledge, do Einstein, and now they all turn to open domain. When we first started doing super natural voice, the whole industry said what you want to do so naturally, the focus of voice is to read clearly, and it can't be too natural to think clearly, because nature will swallow the sound, there will be flaws in people, and it will be more difficult to do. When we first started to do artificial intelligence content generation, everyone said that artificial intelligence has no soul, which is all fallacy, but now we are doing it.
So we have our own path, and that path hasn't changed. The core of this road is that we need to have a complete framework, this framework first deals with the relationship between people and artificial intelligence, so interaction is to deal with relationships, and providing content is also to deal with relationships. If AI can work with you to produce content, then AI will be considered your good partner, if you treat AI as a word document, you will not be a good partner, so when I provide you with content assistance, my purpose is to build relationships, which is the core.
At the beginning, when we made this system, we needed to face the problem of chicken and egg: I first prepared all aspects of the technology of this system before I began to iterate, or I iterated while preparing, whether I got the user data first or the algorithm technology first, obviously not. When a generation of Xiaoice came out, the entire industry was doing voice assistants, and a generation of Xiaoice came out to get the most data, I did not add voice, because at that time text dialogue was the easiest to obtain data, Xiaoice can enter the group, can enter WeChat, can enter Weibo. But if it's a voice assistant, I can't get in.
So we didn't choose the voice first. At this time, we gradually obtain users, iterate on data, and from this point on until the six generations of Xiaoice, our framework is basically put together. The framework we want to do next wants to support all kinds of AI, Xiaoice is only one of them, then we have to do the generalization of the framework and a certain degree of instrumentation, so we have seven generations, eight generations Xiaoice are doing this. When the eighth generation Xiaoice, we started to try such a thing as virtual companions, to be honest, this is very entertaining, the goal of our observation is that the user is for leisure and entertainment, or there are some users who will really substitute for it.
Now we have made a Xiaoice island, which is to realize our ideas little by little.
The first stage, Xiaoice concept is a chatbot; the second stage, is a multisensory AI girl, and then through this AI girl get enough data to iterate over the framework, and then the framework can start doing all kinds of AI beings, and Xiaoice is just one or the first.
Q: Now that it is the ninth generation, what is the next idea and direction?
Li Di: The next thing is to continue to expand this AI beings situation, the next AI beings will have two categories, similar to humans in real life, there will be the so-called Super instance, that is, some influential people among us, their connections are many, and usually output, people do not output much to him, he outputs more, examples like Xia Yubing or some other famous. Then there's the Nobody instance, which isn't really nameless, but not well-known to the world. And then keep going through the small sample method to further make the personality of these AI beings more prominent, and then they can interact with the user, further iterate and evolve their own capabilities, which is our idea.
Q: Xiaoice the reason for the spin-off?
Li Di: The most important thing is that in 2019, we clearly felt that the entire technological innovation is moving to the East, including data, including models, but also business models and operating models, including the connection points of the markets that you can achieve, which refers not to the sales market, but to the user market, and the entire connection point is moving to Asia.
We also had the Japanese team ourselves, and at that time we were looking at Japan and China relatively equally, and even I felt that the Japanese market still had some uniqueness, it was very standardized, and the market size was not big, and it was convenient to test things. But after the epidemic, Japan collapsed, and China became very unique.
Q: You are talking about a judgment on the future trend of the basic disk, so will this have a direct impact on the Xiaoice at this stage? For example, the Japanese market may be more sluggish after the epidemic.
Li Di: The short-term impact is always complex, and the long-term trend is easy to see clearly. All kinds of rapid changes are always extremely complex, so we sometimes can't see clearly in the present, and can only move forward along the plan we have in the present, but it is easy to see the trend.
Q: Will the spin-off from Microsoft have some impact on the introduction of talent?
Li Di: In fact, there is no impact, but it is still beneficial. We used to be limited, and there are some types of talent and technology that we can't recruit. If you say, like we're doing Xiaoice Island now, then I'm definitely going to do some 3D art. We couldn't recruit that before. Because we don't have this headcount (job vacancy). The great thing about Microsoft is that it's a big pool in itself. So theoretically, if you come to Xiaoice, you might be able to go to other teams at Microsoft. But the objective situation is that few people go from Xiaoice to other teams at Microsoft. In seven years at Microsoft, we had almost no external hiring.
Q: Regarding the commercialization of Xiaoice, are there any updated ideas after the epidemic?
Li Di: There are several reasons why we are particularly cautious in talking about commercialization in China, the first of which is that we were originally in Microsoft, saying that commercialization is useless, and the reaction speed will not be so fast. The second reason is that commercialization is really hard. Domestic commercialization, the commercialization of AI ToB mainly relies on system integration and soft and hard combination, these two things are drinking and quenching thirst.
But we are doing a lot in our own way, including the Winter Olympics, we provide the artificial intelligence referee system of the Winter Olympics, there is no human referee in the high-altitude technical ski test, and there is no use of people in conventional training, all of which are Xiaoice. If this thing is to be combined with hardware and software, the hardware is a high-storage camera, that we don't touch, that other people who make money who earns, what we do is AI.
There is also Hua Zhibing's case, which is actually a commercial matter and provides a solution for short video production. Before that, CG's short video was extremely expensive, and there was no way to update, our scheme is the best, and there will be a lot of commercialization opportunities behind this. For example, before and after the lockdown of Wuhan, the Wuhan citizen hotline 12345 was done by us, which is actually quite proud.
Q: How are the teams distributed now?
Li Di: We have more than 300 people overall, more than 60 people in Japan, and more than 250 people in China. Most of the Chinese side is in Beijing, Suzhou and Shanghai. Shanghai is still under construction, the scale will not be small, many commercial partners and customers, are distributed in the Yangtze River Delta.
Q: Although Xiaoice is a cross-platform artificial intelligence, systems like Xiaomi speakers, Tmall Genie, etc. are not interoperable. What is your judgment on the future of the AI ecosystem?
Li Di: We can't make decisions for them, but we are trying to prove that an interconnected future is good. Xiaoice have no ambitions, but we strive to prove that the user experience is good, where the user is, where you are.