laitimes

Phantom future global chasing light (15)丨Will androids dream of electronic sheep? What will digital humans evolve into? Interview with SenseTime

author:Red Star News

Do androids dream of animatronic sheep? This is the science fiction masterpiece of "science fiction genius" Philip K. Dick, and it is also the human inquiry and imagination of artificial intelligence.

From October 18th to 22nd, the 81st 2023 World Science Fiction Convention will be held in Chengdu. On the eve of the conference, Red Star News and Daily Economic News jointly launched the large-scale media interview report of "Phantom Future, Global Chasing Light", pursuing the light of science and technology and dreams shared by different civilizations behind the transformation of science fiction into reality.

SenseTime, as a Chinese intelligent software company, will also participate in the World Science Fiction Convention. The SenseTime Ronin application platform developed by SenseTime takes digital human video generation technology as the core and has a variety of AI generation capabilities, including text generation, speech generation, motion generation, image generation, NeRF, etc. Red Star News recently interviewed Luan Qing, General Manager of Digital Entertainment Division of SenseTime's Digital Space Business Group, to discuss the present and future of artificial intelligence.

Phantom future global chasing light (15)丨Will androids dream of electronic sheep? What will digital humans evolve into? Interview with SenseTime

Luan Qing Photo courtesy of interviewee

If it rises to a philosophical point of view

It's hard to say whether robots will become self-aware

Red Star News Reporter: Will AI digital humans dream of electronic sheep?

Luan Qing: This question is quite science fiction. From my understanding, the current large model or a series of artificial intelligence technologies that simulate the human brain are generally believed to have not yet produced self-awareness, and are the aggregation and deduction of data, rather than a certain form of self-awareness.

If it rises to a philosophical point of view, what is self-awareness? In fact, it is the deduction of brain structure after information processing. From this perspective, it is difficult to explain whether robots develop self-awareness. The physical structure of artificial intelligence is simulating the brain, and the surplus electrical signals can also operate in the future, and it cannot be said that this situation will not happen in the future. But now, AI exists for human purposes.

Red Star News Reporter: Digital human/virtual human/bionic person, what is the professional technology behind these names?

Luan Qing: Digital human technology includes several aspects, on the one hand, human-computer interaction, that is, digital humans use human methods to speak, move, and express, simulating the perception and experience of human interaction. This includes two main technologies, one is the production of humanoid video, and the other is the use of AI to generate human voices.

In addition to human-computer interaction, another technology is the simulated brain, which is getting more and more attention in the future. In addition to anthropomorphism, the brain of digital people is very powerful, and the computing power is stronger than that of ordinary human brains. It can naturally experience people's feelings, and can also process and calculate information, give the best response, and even provide emotional value.

Red Star News Reporter: SenseTime divides digital humans into five levels: L1 to L5, and collectively refers to L4 and L5 digital humans as "AI digital humans". What is the most complex interaction that SenseTime's digital humans can complete today? What is the technical difficulty behind it?

Luan Qing: At present, one of the most commonly used digital humans is the interface module of human-computer interaction, which is used to generate videos, live broadcasts, and display information and content in a humanized way.

With the breakthrough of large models, it is now time for the "assisted driving" stage. Because the content generated by the large model still needs to be reviewed and adjusted by people, it is not "automatic driving" or "assisted driving". This is between L3 and L4, which produces full content, but needs to be fixed. The field of short video and live broadcast, which is now commonly used, is between L3 and L4, and is the largest application.

Another customer service scenario application is more L4 stage, reaching information-level interaction. For example, now open the ICBC APP, switch to the digital human mode, and all businesses can directly interact with the digital human customer service in the APP. The experience of this scene is L4, but there is still a certain gap in intelligence, so the next step of digital humans wants to reach the real L4, and even develop to L5, and they need technological breakthroughs.

Including now the big model is much more powerful than before, not like before it was stupid, now very smart. But emotional interaction, providing emotional value is still lame, there is no natural to how to communicate can not be distinguished.

There are three points to do in this technological breakthrough, one is that digital people need to be more deeply integrated with the industry. The knowledge, habits, and technical information in the industry field also need professional large models to help understand.

In addition to data opening, the second step is to open the interface. For example, if the operation understands to do this, can the system actually do it? To apply for a credit card, if there is no interface connected to the bank to open a credit card, you cannot get a physical credit card, which requires the interface to be opened.

Having done these two things, there is still something to consider. For example, digital humans can now make medical advice, but can't actually prescribe medicine. Logically, in terms of authority and responsibility, it cannot be done. Digital people can only give advice in some industries, but cannot practice.

Now that the industry has reached hundreds of billions of parameters, by the time GPT 4, it may reach trillion-level parameters, and digital humans can interact more naturally in terms of emotional value. It is not clear how this stage needs to be achieved, whether to modify the network structure, or increase the computing power and the number of network nodes, which is the core breakthrough point that is still being studied.

Red Star News Reporter: Hundreds of billions and trillions of parameters, do they refer to the density of data?

Luan Qing: It is the number of nodes in the model, which can be considered as neurons that simulate the brain, and the human brain should be at the trillion level. So theoretically, the current GPT 4 has reached the parameter level of the human brain. But from the perspective of intelligence, there is still a gap with the human brain.

After the big model breaks through

A dozen seconds of material can make a digital human

Red Star News: SenseTime introduced that AI digital humans are mainly used in the three directions of virtual idols, virtual customer service and super assistants.

Luan Qing: These three application scenarios were the most used applications of digital humans in previous years, in fact, today, the biggest applications of digital humans are short videos and live content generation.

Now many short videos, people do not know that they are made by digital people. For example, the female anchor shows Burger King's signature set meal in the live broadcast room; Short videos for hiring electricians and more. There are also professionals, lawyers, doctors, teachers who use digital humans to generate some content.

Red Star News Reporter: Digital humans are more widely used because of what upgrades have occurred in technology?

Luan Qing: After the emergence of large models, the core value is that they can be mass-produced, and production has become very simple.

4. 5 years ago, the amount of data required to make a digital human was relatively large, generally more than ten hours of video material, and at the same time needed to meet the requirements of multiple angles and actions, and the effect would be found stiff after the production was completed. At that time, many TV stations used digital human anchors in their daily news reports, especially breaking event reports, which was very valuable. However, due to the difficulty and cost of production, it cannot be promoted in the general public marketing scenario, and it is difficult to form a scale effect.

Now after the breakthrough of large models, the production of digital humans has become much easier, and a digital human can be made in a dozen seconds of material. In the past two years, technology has been continuously improving, last year, the year before last, it took three or five minutes, this year one or two minutes, or even tens of seconds.

Red Star News Reporter: What do customers hope to improve in the digital people that SenseTime provides for all walks of life?

Luan Qing: There are many demands, on the one hand, it is richer performance, and on the other hand, it is to run on lighter equipment.

Manifestations include freedom to do movements? Can you dance? Can the actions that are not entered be richer? Can it be directly AI to generate digital people, without looking for a directory, there is no copyright problem.

Recently, it has often been said that it is possible to make digital humans run on any device? Now many are still running on better hardware devices, or running in the cloud, the customer feels that it is too expensive, can he run on his own mobile phone?

The technical support behind it includes chip adaptation and performance optimization. The process of technology pushing into productization is to continuously apply to more scenarios and more complex conditions. In the end, it is still a test of the complexity of AI video generation, which is also the next hurdle that I think artificial intelligence will pass.

Red Star News: Imagine the future, what can we expect digital humans to evolve into in 5 years?

Luan Qing: Now film directors often tell me, when can digital people realize the script and generate a film?

Now some so-called digital people star, or just "change faces", that is, after the human performance, the face is painted green screen. This is actually not cost-saving, it is a gimmick. I think what the industry should really do is to make some content completely AI, shorten the production time, and reduce the cost of trial and error.

At present, movie-level digital people are still facing great challenges, and we are also making preliminary attempts with some stars, and found that there is hope in the field of short videos and short dramas, but the real high-quality screen has not yet broken through. At present, we are working animated films, and through artificial intelligence technology, we can convert live-action content into specific style animations, which I think is the most promising in a short period of time.

Red Star News reporter Cheng Luyang Yu Yao

Edited by Yu Dongmei

(Download Red Star News, there is a prize!) )

Phantom future global chasing light (15)丨Will androids dream of electronic sheep? What will digital humans evolve into? Interview with SenseTime

Read on