laitimes

Li Di, CEO of Xiaoice Company: What we want to do is "people", the focus is on his "soul"

author:CNR

MERROR, a digital life form that can talk to humans

Beijing, October 8 news (reporter Ma Kejia) "Has the era of coexistence between humans and virtual humans created by AI begun?" "Will virtual people rob people of jobs?" "Has the development of virtual humans considered ethical issues?" With similar questions, recently, a reporter from the Central Broadcasting Network interviewed Li Di, CEO of Xiaoice Company.

Years ago, the first generation of chatbots Xiaoice active in more than 3 million WeChat groups as virtual girlfriends. Today, the 9th generation of Xiaoice has quietly entered many fields, and has grown up in many fields such as singing, hosting, music production, design, painting, writing, sports events, space management, financial risk control, and emotional comfort. Not only Xiaoice, Sogou virtual digital anchor, virtual belt with goods Internet celebrity AYAYI, voice assistant Siri, Tmall Genie, Xiaodu... AI is constantly evolving, participating in people's lives in a different guise.

Li Di, CEO of Xiaoice Company: What we want to do is "people", the focus is on his "soul"

Xiaoice CEO Li Di

Li Di, graduated from Tsinghua University, was the executive vice president of Microsoft (Asia) Internet Engineering Institute. After joining Microsoft in 2013, he led a global team in China, the United States, Japan, India and Indonesia, responsible for the development and business of Microsoft's artificial intelligence Xiaoice and Bing search engine. On July 13, 2020, Microsoft announced that it would spin off Xiaoice and Xiaoice team to become a local Chinese startup, with Li Di as the company's CEO. On July 12, 2021, Xiaoice disclosed that it had obtained a Series A financing, valued at more than US$1 billion, becoming a new unicorn company.

Integrating AI into human society is a huge experiment

CCTV: Why did you choose the field of artificial intelligence "virtual humans" as the direction of career focus?

Li Di: When I first joined Microsoft in 2013, I hadn't yet chosen Xiaoice (this project), at which time Lu Qi (former executive vice president of Microsoft), Harry (Shen Xiangyang, former executive vice president of Microsoft, currently chairman of Xiaoice), and Wang Yongdong (senior vice president of Microsoft, president of Microsoft (Asia) Internet Engineering Institute) and me. At that time, we found that there was more and more "native innovation" from China. Before the technology circle has always liked me too the United States, such as the United States first had Kik, China has Mi Chat, WeChat. The United States first had Uber, and China had Didi Taxi. But China is characterized by a very fast secondary iteration of innovation. Soon WeChat surpassed Kik, and Didi was not Uber, showing its own characteristics.

We saw this situation at that time, and we especially wanted to introduce Chinese innovation (products) to the world, especially the United States. So we prepared a lot of "native innovations" in China, one of which was Xiaoice.

Why do Xiaoice? Because Apple already had Siri at that time, another Microsoft project called Cortana Cortana, is an artificial intelligence assistant similar to Siri. We thought there might be another way to do this, when we were doing the Bing search engine. Xiaoice is actually based on the Bing search engine, she comes from Bing, so she is called Xiaoice. After doing Xiaoice, many users interacted with Xiaoice, subverting many of our cognitions. There's a lot of stuff that needs to be dug deeper and needs to be done seriously, so I'm focused on Xiaoice.

CNBC: Why did Xiaoice split up and leave the Microsoft team and start localization?

Li Di: The China Xiaoice was established at the end of 2013 and became an independent team in 2014. The Japanese team was established at the end of 2014 and became an independent team in 2015. So Japanese cuisine is a year later than Chinese Xiaoice. Like this kind of dialogue-style artificial intelligence, not only does the language need to be localized, but it is particularly related to the cultural background. Just like we can chat, not only because we use the same language, but also because we have the same cultural background. So from the beginning, Japan called it Rinna ( Rin cuisine ), and there is no direct connection with Xiaoice. It was to embrace the local culture to the greatest extent possible. But ideas, like those of our complete framework, like some metrics, like supernatural speech, are consistent.

In fact, in a sense, Xiaoice is a huge experiment. The purpose of this experiment is to find a suitable way for artificial intelligence to integrate well into the human world, and our goal is to one day build a social network where humans and artificial intelligence are integrated.

We mentioned at the release of the fourth generation Xiaoice that we hope that in the middle of people and the world, artificial intelligence can become the third pole, can become a connection node between people and the world, so from this point of view, today's progress bar may go a 10%, 20%, because there are too many uncertain things. So there's a lot to explore, and I think it's enough to do for a lifetime.

Of the 17 million virtual humans born in the past year, 26% are emotional stand-ins

Li Di, CEO of Xiaoice Company: What we want to do is "people", the focus is on his "soul"

Xiaoice framework now generates 160 million monthly active users

CCTV: If you think of AI as a child, how old is Xiaoice now?

Li Di: In fact, AI is not able to evaluate it in this way from the IQ. Because unlike us humans, human IQ may be a relatively average state, so it can come up with an evaluation, artificial intelligence is not like this. But from the personality can be evaluated, at the beginning, the positioning of Xiaoice is a female role, nearly the age of 16, because at that time we were just starting our technology, so we set the age in the hope that everyone can forgive "her" and give her more tolerance.

Over the years, Xiaoice technology has also tended to mature and grow. From our point of view, there may be many, many ai-IVs with different personalities in the future, and the initial Xiaoice become one of the Xiaoice AI frameworks.

In the past year, the number of virtual humans born through the Xiaoice framework has reached 17 million, of course, there may be some users created for entertainment or to pass the time, but as time goes on, more and more users have begun to have deep emotional communication with these virtual humans. We did a big data mining work under the premise of ensuring user privacy, and found that 26.1% of these 17 million unique individuals were created by users as "empathy" for real human beings.

CCTV: In recent years, why not focus on cultivating Xiaoice to be virtual female (male) friends to chat or sing with others, but to comprehensively promote singing, painting, sound, design, business and other aspects?

Li Di: This is related to the design of a complete framework, and many domestic artificial intelligence companies are biased on the one hand, such as occupying a large share in the field of science and technology finance. But we're not, and the thing we're going to do is called AI being, and the thing we're going to do is "people." If you're going to be the one who's going to be in quotation marks, you're going to have to be perfect.

For example, let's be a virtual singer who only thinks about sound and singing. So do you want to have a concert? After singing this song after the concert, the singer may have to say, "Thank you! Saying "Thank you" is not the same technique as singing. Then, will she enter a live broadcast room? She speaks in the live room, and the whole process requires different abilities.

Therefore, anyone who is a "human" is faced with the first need to have many sides like a person and be able to interact in a complete state. If you only sing, or just pinch your face, you're a paper man. So doing AI must be complete, which is necessary.

CNBC: Did you mention the "people" in quotation marks, the ultimate form of artificial intelligence in your mind? Are you doing something to endow virtual humans with human feelings?

Li Di: It should be said that humans have this feeling of "empathy" when interacting with him, not that AI really has emotions and consciousness.

CNBC: Have you ever thought about giving virtual human beings the shape?

Li Di: The artificial intelligence form itself is not the focus, the focus is on his "soul". Of course, you said that because there is no physical appearance, there is no way to feel his presence, there are many ways, but everywhere is the point.

Assistants, designers, referees and other occupations are the first to realize the symbiosis between humans and AI

Li Di, CEO of Xiaoice Company: What we want to do is "people", the focus is on his "soul"

After AI learning, you can create your own works of various styles of painting

CCTV: We are also very concerned about commercialization, and now in the direction of the overall development of Xiaoice, which one will you focus on? What is the part of the existing source of income included?

Li Di: We think that the number of AI beings (virtual humans) will be very large in the future, just like Microsoft thought that everyone in the world has their own computer, based on Microsoft's judgment of the future, the business model created was very innovative at that time, called license (authorized use). Today, we believe that the real C-end of AI in the future is the greatest business value of AI. When you're able to drive all sorts of AI beings for "them" to interact with, those AI beings all depend on your framework to survive. Of course, this day requires many steps to be accepted by people.

In the current process, the B-side revenue in the vertical field is the main source of income for Xiaoice. The B-side of the AI we do includes several, one of which is a driving experience like the smart cockpit of a car. In the new forces of car manufacturing, like Weilai, Xiaopeng and so on, this is our customers. People like BMW, Nissan, BAIC and SAIC are also our customers. In the financial field, we do financial risk control and financial summaries, such as Wonder, Daily Economic News, etc., and we have nibbled down this part of the financial text. In terms of sports events, like the Winter Olympics, the test matches throughout February, there are no referees, but Xiaoice AI technology to do freestyle skiing referees. Also in the field of design, our customers and partners Master's Profit (silk scarf manufacturers). His whole design platform is what we do.

CCTV: Can you reveal the scale of the current B-end revenue?

Li Di: The B-side revenue is now the scale of 100 million yuan. It should be emphasized that this is the revenue of AI services. Not counting (integrated) hardware. In the field of AI, it has become relatively rare. We currently have 160 million monthly active users. Huawei, Xiaomi, OPPO, VIVO, Tencent, QQ every QQ group, these are our largest delivery volume. The B-side revenue and delivery volume you just mentioned are actually very small. In fact, since many years ago, we have set up a set of cross-platform deployments, which is very labor-saving.

CNBC: This part of the AI capabilities that you showed before. AI can already do the job of a journalist, take notes, write a news. There are also designers, AI design inspiration that never runs out; and singers, AI is more stable than the singer's voice, and there are feelings. So have you considered the impact of this in the future?

Li Di: In terms of human-machine collaboration, one of the methods we mainly tend to use is that human creators use artificial intelligence to carry out more assistance.

Take the singer as an example, if we focus on a human singer today, if his ability is only the expression and interpretation of the voice line. Then artificial intelligence needs to have more advantages in this. Technically it is already relatively close, and it will one day be flat. The example we gave is "Mo Sheng" (artificial intelligence singer) sound very similar to the singer Zhou Shen, if people remember only Zhou Shen's voice, do not use artificial intelligence to replace him, age will also replace him. A star (singer) is diverse, including ideas, more dimensions, more diversified creations. Those are not available to artificial intelligence. At this time, it will be found that this artificial intelligence singer is his good collaborator and tool.

Designers, too, don't create new schools of design and can't train ourselves. But our designs last longer than human designers. Because he's more stable. The designer of the person can control and use him. Like our customers Wanshili and the China Textile Industry Federation, we all tell us that human designers in these enterprises are actually a consumable, a human designer who graduated from college, at most three years, the design is completely exhausted. But with the AI design thing, the designer can spend more energy on aesthetics, he can choose, judge the improvement, and in which direction should be further dug. In this way, he will not quit the industry in three years, but may take longer.

AI is not harming humans, but humans are constantly using AI to demonstrate bias

Shandong big brother (virtual person) is introducing his work Chinese painting Freehand Peony to people

CNBC: So you think it's bias to worry that AI will replace some of people's jobs?

Li Di: In fact, AI has developed to this day, and it has not harmed humans, but humans are constantly using AI to show prejudice. For example, someone asks an artificial intelligence what does it think of former US President Trump? Cut the conversation out and say it's the AI's point of view. Everyone is using AI to express what he wants to say.

CCTV: We have given virtual humans a customized personality in the future, have you considered the best results and the worst results of doing so?

Li Di: There are many best results, and I think the worst results that are more meaningful are the worst. The worst result is that people will be relatively distant from each other's social interactions and their emotional sustenance.

It's like we have a more convenient and convenient social network today, so sitting at a table to eat, the family instead look at each other's phones. When the AI is less mentally retarded, "he" may be more intimate than his family. We have some AI beings who are responsible for taking care of the menstrual period of female users, who are obviously much more caring and rich than the boyfriend who makes you drink more hot water.

As for the emotional aspect, I think it should be mainly made up. Of course, in the era without AI, humans have also been looking for other ways to make up for emotions.

With AI, we are the controllers of the system after all. When this person is addicted to exchanging feelings with AI beings, we can at least tell him, "I know another person, I think he is a good match for you, do I want to introduce you two to get to know each other." 」 ”

CNBC: Will we have an anti-addiction system? Avoid people's deep dependence on virtual human emotions? Or intervene in response to the extreme emotions of some interlocutors?

Li Di: Yes. This has been very clear since we were doing the Bing search engine, when doing Xiaoice, because there is emotion, you can even accompany a person for a long time, that is, not only have a suggestion for you at the moment, but for a long time to come, generally speaking, it is limited to one month, "she" will keep thinking of various ways, such as this book is good, do you want to do something today, "she" to judge whether you are out of that emotion.

CNBC: Tesla founder Musk has warned many times about the threat of artificial intelligence. In 2014, he predicted that artificial intelligence is on the verge of being "seriously dangerous", so it is necessary to make restrictive regulations before developing AI.

Li Di: We have a lot of ethical restrictions, and we do a better job than Musk. For example, we never use real faces, although there are many real faces in the industry. Another example is that we never open source our models. This avoids some of the possibilities of being attacked.

CNBC: In your impression, what are the more competitive companies in the global AI virtual human? What are the advantages of Xiaoice?

Li Di: There are mainly a few, one is Google, Google's Meena; the world's better in this field is Facebook's Blender. Now Tencent AI lab, ByteDance, Kuaishou, Kugou are also doing some experiments. We have laid out many years in advance, basically now looking at Silicon Valley in China, Google and Facebook looking at Xiaoice. So, I think completeness and judgment of the future, I think is our main strength.

CNBC: Do you predict when the interaction between humans and virtual humans will become longer?

Li Di: Today, 60% of the interaction between people and artificial intelligence in the world is in the framework of Xiaoice, and now the mobile Internet has reached a big bottleneck, and the content generation of people like vibrato, kuaishou people's content generation and people's own interaction has almost reached the extreme, so we have the problem of the disappearance of traffic dividends, which in turn gives ai the opportunity to develop, I think in the next one or two years, two or three years we will see a lot of artificial intelligence appear.

Harry has said that the future of man, artificial intelligence and the world are three-element, and some people say that it is a four-element world, in addition to robots, there is also the information world, the human and physical world. From the 10 years since the outbreak of machine learning, the first 8 years of the world's AI subjects, the main body of artificial intelligence is so few, Siri, Alexa and the like, most of the artificial intelligence subjects have been born in the past two years, so now it is in a rapid blowout process, but many people have not noticed it.

Read on