laitimes

AIGC technology – building more powerful artificial intelligence

author:Lin Likun

I. Introduction

In 2022, chatGPT large-scale language models, AI painting, generative AI and other technologies suddenly exploded, and artificial intelligence once again attracted people's attention. The core behind it is AIGC technology. Although there are many controversies, behind AIGC technology is an incomparably large market size that will change the existing pattern of various industries.

AIGC technology – building more powerful artificial intelligence

"The development of artificial intelligence", this image was made by Bing Image Creator

2. What is AIGC technology?

AIGC stands for Artificial Intelligence-Generated Content. It is a new type of content creation after PGC (Professionally Generated Content) and UGC (User Generated Content). AIGC relies on a variety of artificial intelligence technologies, which can automatically or semi-automatically generate text, code, images, voice, video and other information through learning from existing data and pattern recognition. AIGC has an amazing creative speed, and can be applied in many fields such as education, media, entertainment, scientific research, etc., and has a strong potential that cannot be ignored.

III. AIGC Generated Content

Using AIGC technology, artificial intelligence has developed from being able to understand content to being able to generate content autonomously, which can be divided into code, text, image, audio, video and other categories.

1. In the field of text, its application is mainly in specific scenarios such as text understanding, news writing, plot continuation, and human-computer interaction.

Using AIGC technology, articles and news reports and even poetry, dialogue, and more can be quickly generated. For example, the implementation of popular deep learning language models such as chatgpt4 and openAI is inseparable from AIGC technology.

2. In the field of images, AIGC can not only automatically complete basic operations such as watermark removal, light and shadow adjustment, and resolution adjustment, but also perform operations such as specified theme image generation, complete image generation, high blur image repair, and image style conversion. However, the stability of the high-quality images generated by the application needs to be improved.

3. In terms of audio generation, AIGC can extract existing audio features, perform specific video dubbing or song covers, and even support the generation of specific songs based on melody, music type, mood type, etc. Audio generation technology is mature, and it is widely used in scenarios such as voice customer service and digital broadcasting. The development rate in music composition and other aspects is relatively fast.

4. Video generation is similar to image generation, supporting video editing, video editing and video independent generation. It can complete functions such as adding and deleting video subjects, replacing faces, synthesizing virtual environments, generating video special effects, and automatic beauty. Its application range includes short video, animation, film, etc., which can greatly improve the efficiency of video production.

AIGC technology – building more powerful artificial intelligence

Machine Learning, an image made by Bing Image Creator

Fourth, the core technology of AIGC

AIGC can be seen as a highly intelligent search engine that can quickly query large amounts of raw data and perform a shallow processing process to export the results. According to the user's requirements, the output of more accurate answers, reducing the burden on users and creating higher economic benefits, which is what AIGC brings. To achieve these functions, many artificial intelligence technologies are also required. Here are some of the more core techniques.

Variational Autoencoder (VAE) Deep variational self-coding is a deep generative model that learns the potential representation of data and generates new data. VAE consists of two parts: encoder and decoder. The encoder maps the input data to a probability distribution in the latent space, and the decoder samples from the latent space and generates new data. VAE is trained by maximizing edge likelihood of the input data while minimizing KL divergence to constrain the distribution in the latent space.

It has high application value in data generation and speech synthesis.

Generative Adversarial Network (GAN) Generative adversarial network is an unsupervised learning method that learns by two neural networks playing against each other. (a branch of machine learning, deep learning) generative adversarial networks consist of a generative network and a discriminant network. The generated network takes random samples from the latent space as input, and its output needs to mimic the real samples in the training set as much as possible. The input of the discriminant network is the real sample or the output of the generating network, the purpose of which is to distinguish the output of the generated network from the real sample as much as possible. The generative network is to deceive the network as much as possible. The two networks are pitted against each other and constantly adjust the parameters, with the ultimate goal of making it impossible for the discriminant network to determine whether the output of the generated network is true. The method is also used to generate videos, three-dimensional object models, etc.

3, Transformer model (literally translated as "transformer") The original Transformer model used an encoder–decoder architecture. An encoder consists of an encoding layer that iteratively processes the input layer by layer, while a decoder consists of a decoding layer that performs the same operation on the output of the encoder. The function of each coding layer is to determine which parts of the input data are related to each other. It takes its encoding as input and passes it to the next coding layer. The function of each decoding layer is the opposite, reading the encoded information and using the integrated context information to generate an output sequence. To achieve this, attention mechanisms are used at each encoding and decoding layer. For each input, attention weighs the relevance of each other input and extracts information from it to produce an output. Each decoding layer contains an additional attention mechanism that extracts information from the output of the previous decoder before extracting it from the encoding layer. Both the encoding and decoding layers have a feed-forward neural network for additional processing of the output and contains residual joins and layer normalization steps. Transformer models are designed to process sequential input data such as natural language and can be applied to tasks such as translation, text summarization, text sentiment analysis, language modeling, video understanding, and more.

Large pre-trained models Large pre-trained models are a deep learning technology, and its process is divided into two steps: "pre-training-fine-tuning". The first step is to pre-train the model on large-scale unlabeled data to learn common language patterns; The second step is to fine-tune the model in the small-scale annotated data of a given natural language processing task, quickly improve the model's ability to complete these tasks, and finally form a deployable application model.

AIGC technology – building more powerful artificial intelligence

"Future City", an image made by Bing Image Creator

V. Prospects

As a new generation technology of artificial intelligence, AIGC is definitely not a flash in the pan, and it is already heralding the arrival of a new era of artificial intelligence. After the warm-up in 2022, AIGC will usher in rapid development in 2023, and the content and forms it generates are becoming richer, and the quality of generation is gradually improving. In industries with a high degree of digitalization and high demand for content, it has shown great market potential. Among them, multimodal generation drives the expansion and application of artificial intelligence in multiple fields. At present, the industrial form of AIGC technology shows a three-layer architecture of basic layer (model service), middle layer (2B), and application layer (2C) and continues to innovate and develop. AIGC is expected to promote the vigorous development of commercial applications, promote innovation in the digital culture industry, and promote the development of intelligent AI and metaverse.

Bibliography:

(1) Tencent Research Institute: 2023 AIGC Development Trend Report (with download) | Internet Data Information Network-199IT | Chinese Internet Data Research Information Center-199IT

(2) 10,000-word long text: AIGC technology and application full analysis - Zhihu (zhihu.com)

(3)AIGC_ Baidu Encyclopedia (baidu.com)

(4) AIGC: From beginner to proficient - Zhihu (zhihu.com)

(5) AIGC Industry Depth: Application scenarios, business models, market scale, industry chain and related companies in-depth combing [Produced by Huibo] - Zhihu (zhihu.com)

(6)https://indico.io/blog/sequence-modeling-neural-networks-part2-attention-models/

(7)http://jalammar.github.io/illustrated-transformer/

Read on