laitimes

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

Zhi DongXi (public number: zhidxcom)

Author | ZeR0

Edit | Desert Shadow

Zhidong reported on February 22 that in recent years, video face swapping, synthetic speech, repaired images, virtual digital people, etc. have appeared more and more frequently in social entertainment, film and television production, education, advertising and marketing and other fields, and diversified commercial applications have developed.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

▲ Huang Rong, played by Zhu Yin, was replaced by Yang Mi's face

Behind these applications is the deep synthesis technology at work.

Deep synthesis technology refers to the technology that uses generative synthesis algorithms represented by deep learning and virtual reality to produce text, images, audio, video, virtual scenes and other information.

While the demand for use continues to emerge, some malicious use of the technology to generate audio and video, such as face tampering with pornographic videos, fake face video to crack identity verification and other applications, not only to individuals, enterprises caused reputation damage and property losses, but also to social and national security threats.

In order to provide reference and guidance for the healthy development of artificial intelligence and deep synthesis technology, the Institute of Artificial Intelligence of Tsinghua University, Beijing Ruilai Intelligent Technology Co., Ltd., tsinghua University Intellectual Media Research Center, National Industrial Information Security Development Research Center, and Beijing Big Data Center recently jointly released the "Ten Trends Report on Deep Synthesis (2022)" (hereinafter referred to as the "Report").

From the aspects of technology research, field application, development trend and other aspects, the "Report" comprehensively and deeply introduces and judges the opportunities and challenges brought by the deep synthesis technology and application, and gives practical suggestions and measures on its development and governance.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

First, landing in many fields, in-depth synthesis content to meet the explosive growth

Deeply synthesized images, videos, audio, text and other content, such as popular film and television drama clips, face-changing videos of topic stars, etc., have strong entertainment and dissemination.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

▲Deep synthesis technology for video "face-changing" processing

As technology matures, more and more creators publish and share deep synthetic content on the Internet, and the number is growing rapidly year by year.

The "Report" shows that on mainstream audio and video websites and social media platforms at home and abroad, the number of newly released deeply synthesized videos in 2021 has increased by more than 10 times compared with 2017.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

Among them, the type with the largest number of deep synthesis videos is film and television music, involving movies, TV series, music and other content; the second is science and technology education, which focuses on the explanation and discussion of deep synthesis technology and shares the latest research results. The third to fifth video genres are life, entertainment and information.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

▲ Video clip of "AI Repair Beijing City 2.0 A Hundred Years Ago"

Virtual digitals such as "Xiao Cong", the sign language AI synthetic anchor who won the beijing Winter Olympics gu Ailing, and Luo Tianyi, a virtual idol who appeared on the 2021 CCTV Spring Festival Gala, used deep synthesis technology to explain the 2021 CCTV Spring Festival Gala.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

In the field of film and television production, deep synthesis technology has become a rescue tool for works dragged down by the behavior of bad artists, and film and television works such as "The Twelve Hours of Chang'an" and "Glorious Times" use this technology.

At the same time, the attention of deep synthetic content has also increased exponentially, and through the statistics of interactive data, the number of likes of newly released deep synthesis videos in 2021 has exceeded 300 million.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

Previously, a series of deeply synthesized videos such as "the Queen of England issued a Christmas message" and "Atang Ge performed hardware magic" have been popular "out of the circle", triggering a large number of hot discussions among platform users.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

▲ British Channel 4 produced a spoof version of the Queen's Christmas message

Tian Tian, CEO of Ruilai Wisdom, said that the continuous increase of research papers, the emergence of open source technology tools and a large number of representative methods have made the effect of deep synthesis content more realistic and more efficient, especially the emergence of algorithms such as generative adversarial network (GAN), which has made the synthetic content reach the degree of "difficult to distinguish between true and false".

Relevant data show that the number of papers in the field of deep synthesis has continued to grow since 2017. Among them, the research on image-generated video accounted for the highest proportion, reaching 64%, and audio and text accounted for 12% and 24% respectively.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

A number of synthetic products developed for the public have also been launched, and services in the form of video, voice and text are the most common.

For example, special effects video production software that supports style customization is popular on the Internet; voice direction has derived applications such as voice navigation, audiobooks, and automatic news broadcasting; text synthesis plays an important role in news reporting, poetry creation, chat Q&A, etc.

Deep synthesis technology also greatly enriches the information content of the virtual digital space, providing support for new business thinking such as the "metacosm".

Xue Hui, head of Alibaba's security perception and cognitive intelligence department, said that virtual people and digital people are the main applications of deep synthesis and an important part of the "metacosm".

Chen Changfeng, executive vice dean of the School of Journalism and Communication at Tsinghua University, believes that deep synthesis will redefine the virtual digital space, and from the perspective of communication sociology, a new human survival scenario will be developed based on deep synthesis technology.

Second, the risk has intensified, and technical testing has become an important response measure

Deep compositing inspires creativity in new content, but it also brings new threats.

In 2017. Adult videos made by users named "Deepfakes" using deep synthesis technology spread wildly in the Reddit community, and under the pressure of public opinion, the Reddit website suspended the user.

The user immediately released the source code to implement the technology on GitHub, the world's largest open source platform for code, which immediately aroused widespread attention and discussion among technology enthusiasts, and triggered a wave of creating and enriching technology projects and code related to deep synthesis.

Statistics show that since 2017, the number of open source projects released in the field of deep synthesis has continued to grow.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

Taking the five open source projects represented in image, audio, and text orientation (which can realize face replacement, motion or expression manipulation, image generation, sound reproduction, and text generation) as an example, the number of Stars has exceeded 10,000 in 2021.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

With the popularization of technology, criminals can easily forge audio and video, carry out illegal acts such as framing, defaming, fraud, extortion, and even fabricate speech of state dignitaries to disrupt social and political order.

For example, in April 2018, a technical team produced a face-changing video of former US President Barack Obama, in which the fake "Obama" scolded the then US President Trump as a "complete idiot".

In October 2021, police in Hefei, Anhui Province, seized a case of illegally using deep synthesis technology to forge mobile phone users' faces and dynamic videos to crack identity verification, providing technical support such as registered virtual mobile phone cards for the black and gray industry. In recent years, similar incidents have begun to enter the public eye more.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

Deep synthetic content blurs the boundary between true and false, and will have a huge impact on social trust, media trust, and political trust.

Chen Changfeng believes that the high difficulty of screening false content affects the effectiveness of fact verification, and in the node of major social events or political events, deep synthesis technology may be used to manipulate public opinions, with the help of social media, so that false information can cause viral spread in a short period of time and intensify social contradictions.

As the negative risks continue to intensify, how to effectively identify deep synthetic content has become the key, but with the continuous improvement of the quality of synthesis, the traditional biometric-based identification method is becoming more and more difficult to play a role.

In the view of Ren Kui, dean of the School of Cyberspace Security of Zhejiang University, the current detection of deep synthesis mainly relies on artificial intelligence models and the completeness of training data, including low detector versatility, applicability of open data sets, data sensitivity, etc., which will bring many challenges.

Wu Hequan, an academician of the Chinese Academy of Engineering, believes that there are two key points in the governance of deep synthesis. First, we must continue to develop technology, and we cannot prohibit it "one size fits all" to avoid hindering positive applications and innovation. Second, the derived security problems should be solved from the source, using technological innovation, technological confrontation and other ways to continuously improve and iterate the ability of detection technology.

Tian Tian also said that the emergence of new forgery methods, the increasing complexity of the network communication environment, and the existence of vulnerabilities and defects based on detection algorithms, anti-deep pseudo detection technology is facing "strong confrontation" and needs continuous updating and iteration.

The "Report" shows that at present, academia and industry have invested a lot of research on anti-deep pseudo detection, meta, Google, Microsoft and other institutions have launched deep synthetic video certification methods or products.

In China, Tsinghua University, University of Science and Technology of China and other universities have achieved remarkable results in the detection of deeply forged content.

Tsinghua University incubation team Ruilai Wisdom RealAI, Tencent Youtu Laboratory and other enterprises have built a face synthesis detection platform and released targeted detection products to support the detection of a variety of face replacement methods. For example, DeepReal, a deeply forged content detection platform launched by Ruilai Wisdom, has industrial-grade detection performance and detection capabilities to cope with changes in the real network environment.

Zhu Jun, director of the Basic Theory Research Center of the Institute of Artificial Intelligence of Tsinghua University, believes that deep synthesis detection faces continuous attack and defense and game, and in the future, it is necessary to integrate multi-modal content forensic analysis, traceability technology based on digital watermarking and other aspects of capabilities to achieve accurate identification.

Third, build a multi-dimensional governance mechanism to guide the benign development of technology

The benign development of deep synthesis technology is inseparable from the exploration of multi-dimensional governance mechanisms.

The "Report" shows that in addition to the development of deep-depth forged content detection technology, in recent years, in response to the challenges brought about by the malicious use of deep synthesis technology, countries around the world have issued relevant laws and regulations to explore the governance path of deep synthesis.

Internationally, the United States has made specific legislation at the federal and state levels, and the European Union has incorporated deep synthesis into existing legal frameworks such as the General Data Protection Regulation (GDPR). In addition, Germany, Singapore, the United Kingdom, South Korea and other countries have laws and regulations applicable to the trial of crimes related to deep synthesis technology.

The mainland is also actively exploring effective governance mechanisms.

Since November 2019, the Provisions on the Administration of Online Audio and Video Information Services, the Provisions on the Ecological Governance of Online Information Content, the Civil Code of the People's Republic of China, and the Provisions on the Recommendation and Administration of Internet Information Service Algorithms have all put forward regulatory requirements of varying degrees for the generation of synthetic content.

In January this year, the Cyberspace Administration of China (CAC) published the Provisions on the Administration of Deep Synthesis of Internet Information Services (Draft for Solicitation of Comments), which made specific provisions on the use, marking, scope of use, and penalties for abuse of deeply synthesized content.

AI face-changing, synthetic voice explosive growth! Tsinghua Released "Ten Trends report on deep synthesis"

For the exploration of the governance path of deep synthetic content, Chen Changfeng believes that efforts can be made from several aspects such as technology, ethics and legal system, technical aspects, through the participation and collaborative governance of government and social organizations; ethical aspects, establish and advocate relevant deep synthesis principles.

Duan Weiwen, director of the Research Office of Philosophy of Science and Technology at the Institute of Philosophy of the Chinese Academy of Social Sciences, suggested that systematic and forward-looking interdisciplinary research should be carried out on social, legal and ethical issues caused by deep synthesis technology, and targeted governance and supervision should be carried out on high-risk application scenarios that may occur.

Zeng Yi, a researcher at the Institute of Automation of the Chinese Academy of Sciences and an expert of the UNESCO AI Ethics Ad Hoc Expert Group, advocates the development of self-discipline and autonomy in industry and research, in his view, before the laws and regulations are not fully mature and systematic, the industry itself should strengthen the awareness of "theory first" and jointly prevent abuse and strictly prohibit bad use as an industrial community.

Associate Professor of the Law School of the University of International Business and Economics said that the social level should increase publicity and popularization, strengthen citizens' understanding of artificial intelligence technologies such as deep synthesis, improve the awareness of prevention in the whole society, and promote citizens as responsible users of deep synthesis technology, take the initiative to identify synthetic content and actively practice social supervision.

In this regard, Tian Tian also has a similar view, the essence of deep counterfeiting is the lack of transparency, so it is particularly important to improve the public's awareness of deep synthesis technology, only when the threshold is lowered to all audiences can recognize, discuss and understand this issue in a common framework, deep synthesis technology can develop healthily and benignly.

Conclusion: Deep synthesis urgently needs a method to follow

Overall, as deep synthesis technology matures, the synthesis process becomes more efficient and the content is more realistic, more and more related positive applications are generating rich commercial value. At the same time, in the face of the current technology still existing unsafe hidden dangers, the technology to detect deep forgery still needs continuous research and iteration.

The "Report" suggests that the regulatory authorities need to carry out forward-looking layout in advance, and on the basis of protecting the benign development of deep synthesis technology, formulate supporting regulations and management regulations for the reference of bad deep synthesis; at the same time, all parties should keep pace with the times to implement the new normative requirements, and under this premise, continuously pursue technological breakthroughs, continuously open up application scenarios of deep synthesis technology, create demonstration benchmarks, and form a driving effect on the artificial intelligence industry as a whole, thereby promoting the sustained and healthy development of new technologies.

Read on