laitimes

Digital humans, AI drawing... How will AI-generated content change?

author:Observer.com

In recent years, ChatGPT has set off a wave of artificial intelligence (AI), and the concepts of generative artificial intelligence and artificial intelligence generated content (AIGC) have quickly entered the field of vision of all sectors of society. With the rapid development of AI technology, AIGC seems to have unknowingly penetrated into all aspects of people's daily lives.

From July 6th to 8th, the 6th World Artificial Intelligence Conference (WAIC) with the theme of "Intelligent Connected World, Generating the Future" was held in Shanghai, covering four major sections: core technology, intelligent terminal, application empowerment and cutting-edge technology, including large models, chips, robots, intelligent driving and other fields, with more than 400 exhibitors, more than 50 outstanding start-ups, and more than 30 new products for the first time.

More than 30 models from Huawei, Alibaba, Baidu, iFLYTEK and other manufacturers were unveiled at the World Expo Exhibition and Convention Center, and the booths covering many AIGC projects such as image generation, video generation, and digital humans attracted a large number of visitors.

But the rapid development of AI technology has also prompted people to start thinking, what kind of prospects do big models and AIGCs have? How will they transform everyday life? What are the potential development thresholds and risks of generative AI technology? In this regard, at the "AIGC New Wave Forum in the Era of Focus on Big Model" held on July 7, experts from many institutions, enterprises and universities shared their views.

Digital humans, AI drawing... How will AI-generated content change?

World Artificial Intelligence Conference 2023

More than 30 large models compete on the same stage

This year's WAIC focused on large models and AIGC, HUAWEI CLOUD Pangu big model, iFLYTEK Xinghuo cognitive big model, SenseTime Chinese language big model, Alibaba Cloud Tongyi Qianwen, Baidu Wenxin Yiyan and other 30 large models were unveiled at the scene, and the products launched by major manufacturers covered computing power base, model as a service (MaaS), general large model, vertical application large model and other aspects.

As one of the "treasures of the town hall" of this conference, Ascend AI's "Big Model Super Factory" brought by Huawei covers the entire process of data & model preparation, computing power preparation & model training, and model deployment and integration.

At present, based on Ascend AI, the industry's first Chinese NLP large model with 200 billion parameters has been incubated. Pangu, the industry's first multi-modal large-scale model Zidong. More than 20 domestic models, including Taichu and HUAWEI CLOUD Pangu series. At the same time, Ascend AI has also adapted and supported dozens of mainstream open source models such as ChatGLM, LLaMA, GPT-3, and BLOOM.

On July 7, HUAWEI CLOUD CEO Zhang Pingan announced at the Huawei Developer Conference 2023 that Pangu Model 3.0 was officially released. According to him, the model is a series of large models that are completely oriented to the industry, "Pangu big model does not write poetry, nor does it have time to write poetry, because it wants to go deep into all walks of life and let AI give value to all walks of life." ”

Digital humans, AI drawing... How will AI-generated content change?

SenseTime's "SenseNova" large model system covers many fields such as text generation, image generation, and digital humans. Among them, "Discussion SenseChat" is a 100-billion-level parameter language model under the system, which has leading comprehensive capabilities of semantic understanding, multi-round dialogue, knowledge mastery, and logical reasoning. At present, "Discuss SenseChat 2.0" has provided customers with services in the fields of medical care, finance, mobile terminals, and code development.

As an advocate of "model as a service", Alibaba Cloud's Tongyi Big Model allows enterprises to fine-tune and train models, build an open model platform, and provide one-stop model services. Alibaba Cloud also demonstrated the three-tier architecture of its cloud computing technology system, including infrastructure as a service (IaaS), platform as a service (PaaS) and model as a service (MaaS) from bottom to top.

Digital humans, AI drawing... How will AI-generated content change?

In addition, a number of large models used in vertical fields were also unveiled at this year's WAIC. For example, the "Cao Zhi" large model released by Daguan Data is mainly for finance, government affairs and other industries, with the characteristics of long text, verticalization and multilingualism. The "Midu Wenxiu" large model released by Midu is specially built for the vertical field of proofreading, and its performance in Chinese spelling errata and grammar correction is better than the general large model ChatGPT.

Image generation, music creation, digital people... AIGC touches every aspect of life

With the support of the rapid development of large models, AIGC technology has become a highlight of this year's WAIC, image generation, digital human live broadcast, text generation PPT, automatic editing and other generative AI applications are launched simultaneously with large models, and the relevant booths can be described as crowded.

SenseTime's booth showcased images generated by SenseMirage, a creation platform that includes SenseTime's self-developed AIGC large model and convenient LoRA training capabilities, and provides third-party community open-source models to accelerate inference. According to reports, the parameters of the self-developed large model generated by Miaohua SenseMirage 3.0 have been increased to 7 billion, with stronger Chinese understanding ability and more diversified style choices.

Digital humans, AI drawing... How will AI-generated content change?

Fengyuan Technology also displayed the Wenshengtu MaaS platform service product "Fengyuan Yaotu", based on the computing power support of Fengyuan Technology's "Qisi" series of chips, to provide users with efficient, easy-to-use, safe and reliable Wensheng Tu services for the AIGC era. Through the integrated software and hardware solution, the product can reduce the engineering difficulty and computing power cost of large-scale AIGC applications.

Digital humans, AI drawing... How will AI-generated content change?

NetEase Fuxi launched the self-developed image generation model "Danqing" and the creative auxiliary product "Danqingyo" to promote the application of AI technology to the production and creation process of corporate art assets, and will launch the Spirit Art Platform. According to reports, "Danqing" is based on native Chinese corpus data and NetEase's own high-quality picture data training, which is a 100% domestic large model.

Digital humans, AI drawing... How will AI-generated content change?

Kingsoft Office's artificial intelligence application WPS AI with large language model capabilities has been connected to Kingsoft Office's WPS text, presentation, form, PDF, intelligent document and smart form and other components. Kingsoft Office said that this is the first ChatGPT-like application in the domestic collaborative office track, and it will anchor the development of AIGC, human-computer interaction, and knowledge reuse in three strategic directions in the future.

In terms of music creation, Tencent Multimedia Lab launched XMusic's generative universal composition framework based on AIGC technology, which supports multi-modal content such as videos, pictures, text, tags, and humming as input prompts to generate music with controllable mood, genre, and rhythm. It has broad application prospects in many fields such as video soundtrack, interactive entertainment, auxiliary creation, and music education.

Tencent also exhibited its exploration of applying generative AI to video games, scientific research, real-time translation and other fields. For example, the AI star exploration program demonstrated by Tencent Youtu uses AI technology and Tencent Cloud's computing power to help the "China Sky Eye" (FAST), and has quickly discovered dozens of pulsars in a relatively short period of time.

Digital humans, AI drawing... How will AI-generated content change?

In the era of smartphones, AI technology is also being combined with mobile terminals. Qualcomm exhibited a terminal-side generative AI use case demonstration, running a Stable Diffusion model with more than 1 billion parameters on an Android phone equipped with the second-generation Snapdragon 8 mobile platform, achieving 20-step inference in 15 seconds.

"Big models don't happen overnight"

The AI boom has swept the world, prompting people to think about how generative AI technology will bring to social production. What are the prospects for AIGC? In this regard, at the "AIGC New Wave Forum in the Era of Big Model" held on the morning of July 7, experts from many institutions, enterprises and universities shared their views.

He Xiaodong, president of JD Exploration Research Institute and president of JD Technology's intelligent service and product department, said that generative AI has made great progress this year, which can achieve text generation, code generation, image generation, video generation, etc., resulting in productivity changes. He believes that large models have brought opportunities for AI popularity, from the traditional customized AI model based on a certain scenario and an application, to a large general model that can serve multiple scenarios.

"On the one hand, the cost of the model itself is increased, because we all know that the large model itself requires a lot of computing power, a lot of data, and a large, strong comprehensive team. On the other hand, its deployment cost is actually greatly reduced, because one model can be deployed to more places. "It's like we're entering the industrial age, where tools are more expensive, but the efficiency of producing products has improved." ”

He Xiaodong said that he hopes that AIGC can join more creative fields, such as image painting generation, etc., "JD.com has built a one-stop artificial intelligence application platform Yanxi platform, integrating a large number of AI technologies from perception to cognition to generation, so that we can combine a variety of rich application-oriented products at all levels to serve all walks of life." All walks of life can really improve from this artificial intelligence technology progress. ”

Digital humans, AI drawing... How will AI-generated content change?

He Xiaodong, President of JD Exploration Research Institute and President of JD Technology Intelligent Service and Product Department, delivered a speech

Mei Tao, founder of HiDream.ai and a foreign academician of the Canadian Academy of Engineering, believes that multimodal AIGC mainly faces three major challenges: the first is tokenization, is there a better set that can contain text, vision, speech and other information together. The second is decoding, the transformer architecture commonly used in large language models is not well used in images and videos. The third is alignment, whether we can achieve the alignment of cross-correlations between different modalities.

Mei Tao pointed out that the current mainstream visual model capability boundary is basically in the range of billions of parameters, in terms of image generation, face details, finger details, object details and other problems have not been solved, there is still a lot of work to be done.

"One of the things we want to do is ask ourselves if it's possible to take the current visual multimodal base model from the GPT-2.0 era to the GPT-3.0 era." Of course, this is also one of our original intentions at HiDream. Mei Tao said.

Digital humans, AI drawing... How will AI-generated content change?

Mei Tao, founder of HiDream.ai and foreign academician of the Canadian Academy of Engineering, delivered a speech

Shang Mingdong, co-founder of Chapter 9 Yunji, talked about the transformation of artificial intelligence infrastructure. He said that the generation of large models is not achieved overnight, but requires a complete infrastructure upgrade, and cannot rely on a single large model to solve more problems. He pointed out that the composition of infrastructure includes aspects such as computing power, data and software.

Shang Mingdong mentioned that the US startup CoreWeave recently used 3584 H100 chips to complete the training of the GPT-3 model in only 11 minutes, with an overall cost of about $20,000. But training a GPT-3 model in 2020 will cost $4.5 million, and it will still cost $450,000 in 2022. "We see that with the evolution of computing power and some basic software in parallel, the cost of our computing power continues to decline. The cost of computing power will decline faster than the size of the model. Therefore, the computing power will not constitute a gap in large model calculations in the future. ”

He pointed out that training higher-quality and more efficient models requires better quality data, "We also know that limited by the boundaries of data, considering data privacy, data security and other factors, it is difficult for us to make it flow directly in common data." Therefore, we need to build a vertical field big model, combining computing power, data and basic software to empower thousands of industries. ”

Digital humans, AI drawing... How will AI-generated content change?

Shang Mingdong, co-founder of Chapter 9 Yunji, delivered a speech

In terms of basic software, Shang Mingdong believes that the core value of basic software lies in two points, one is efficient scheduling, managing data and computing power, and making it easier to build the original complex large model. The second is to improve the efficiency of training through efficient engineering, modularization and automation of basic software. The improvement of basic software efficiency means savings in computing power and cost reduction.

"Therefore, the challenge of the big model in the future is that we hope that the big model can be implemented in all walks of life, so the landing in thousands of industries needs to be combined with the business of each industry and combined with the business knowledge of the industry." Shang Mingdong said.

"AI is risky, but we shouldn't choke on food"

However, the development of AIGC also faces many controversies, from the threshold and barriers of development to the risks and safety of AI. In this regard, in the roundtable discussion session of the "AIGC New Wave Forum in the Era of Large Model", many experts pointed out that the development of generative AI may face many challenges and risks, but we cannot "choke on food" and need to find solutions in long-term development.

Qiao Yu, assistant director of Shanghai Artificial Intelligence Laboratory, believes that there are still many problems in large models, such as what people often call "illusion", value alignment, efficiency, etc., but all sectors of society need to look at AI with the perspective of development, security and development are "one body and two wings" relationship, especially China's large model is still in the catch-up stage, need to look at security from the perspective of development.

Talking about the possible value problems of large models, Qiao Yu said that the values of large models are derived from the training data, "Can we solve them from the training data?" When some models are used in professional fields, we have some security requirements around this professional field, and it is possible to inherit some from them. Therefore, it is necessary to think about security from different aspects of the development of large model technology. ”

Qiao Yu stressed that the safety of the big model is not only a problem faced by a certain scientific research group, a certain industrial field or China alone, it is a problem faced by the whole world and even all mankind, "so I think in the field of security, we should carry out more international exchanges and cooperation to face and solve together." ”

Digital humans, AI drawing... How will AI-generated content change?

"Focus on the Big Model Era AIGC New Wave Forum" roundtable discussion session

Wang Liwei, assistant professor at Hong Kong Chinese University, starts from the perspective of talent and research, he believes that from the perspective of the speed of talent training, a large base of excellent researchers can lower the threshold of future research and development, and the cost of computing power for training large models looks more optimistic. "I think in the short or medium and long term, whether it is talent reserve or computing power, the research and development threshold of large models may be slowly lowered."

Wang Liwei said that researchers need to pay attention to how to understand and evaluate the ability of large models, "If we continue to measure the ability of large models from an evaluation method, it may be more or less one-sided." He believes that it is good for academics to explore more in the direction of security and AI governance.

Zhang Tianyi, deputy general manager of the machine intelligence department of Ant Group and director of the Ant Safety Skybasket Lab, said that the risk problems brought by large models are not necessarily new problems, but more in-depth applications may cause more comprehensive impacts, including generated content security issues, technical security issues, privacy issues, compliance issues and ethical issues.

Zhang Tianyi believes that there are three risks in the current large model, one is technology-related problems, and the model itself may be attacked, broken through, and hijacked. The second is industrial risks, whether AI will bring monopoly, labor substitution and other problems. The third is the problem of content, whether it will provide users with unsafe content.

He said that there is no "panacea" for the risk problem of large models, it must be a long-term confrontation and game development process, "For example, a very direct application in the security industry now, we will also use large models to combat more risks in large models, which will also be a direction of 'defeating magic with magic'." ”

Xiao Rong, vice president and general manager of AI technology platform of Yuntian Lifei, summarized four problems for the development of generative AI, namely, the problem of "illusion" in generated content, the ability to use tools and external knowledge integration, logical reasoning ability and lack of continuous learning ability.

In terms of security issues, Xiao Rong believes that the big model actually has "values", and the knowledge it identifies is not necessarily the fact, "There are actually two ways to solve this problem, the first is why does it produce something wrong?" Maybe it's learning something wrong... Therefore, we need to systematically manage the corpus to ensure its safety and controllability. The second is something we're pushing hard for, like governance of model output. ”

But he also stressed that AI is a tool. "The more powerful tools are used to do evil, the greater the impact it may have." Xiao Rong said, "I think that whether the tools are used well or not is more a problem of people. We should not stop eating because we choke, not to see a powerful tool and not to use it, but to think more about how to order it. ”

This article is an exclusive manuscript of the Observer Network and may not be reproduced without authorization.

Read on