laitimes

Tencent is alive again! One sentence makes the picture the protagonist of the anime!

author:51CTO
Tencent is alive again! One sentence makes the picture the protagonist of the anime!

Written by | Qingzhu

出品 | 51CTO技术栈(微信号:blog51cto)

At the end of 2023, the AI circle seems to have been captured by the "Wensheng Video" model!

At the end of November, the AI Wensheng video tool Pika 1.0 was born, and the limelight was unparalleled for a while, and a few days ago, the team of Stanford University AI scientist Li Feifei and Google launched the AI video generation model W.A.L.T (Window Attention) to continue to blow up the circle.

1、吊打黑马AnimateDiff,实力碾压

AnimateZero is a video generation model released by Tencent's AI team, which improves the pre-trained Video Diffusion Models to treat video generation as a zero-shot image animation problem, and can more accurately control the appearance and motion of the video.

According to reports, the effect of this model is Animatediff, and it can be better compatible with the existing SD ecosystem. Let's take a look at the video generated by AnimateZero.

AnimateZero showcases personalized videos generated on multiple T2I models.

For example, in a video generated from a picture of an anime character, the characters move smoothly and incorporate small details such as discolored eyes and unkempt hair:

Tencent is alive again! One sentence makes the picture the protagonist of the anime!

Image

Looking at the generation of natural landscapes, the waves on the beach, the brilliant bloom of fireworks, and the atmosphere of lightning strikes all have a sense of immersion.

Tencent is alive again! One sentence makes the picture the protagonist of the anime!

Image

AnimateZero also demonstrates a dynamic effect that can be used to control a video by inserting text embeddings: after generating a video from an image, add text such as "happy + smile", "angry and serious", "open mouth", "very sad", etc., and the video character can show the corresponding emotions and actions.

Tencent is alive again! One sentence makes the picture the protagonist of the anime!

Image

In addition to generating personalized videos on existing models, how does AnimateZero "spike" AnimateDiff?

AnimateZero, for its part, says that a common use of AnimateDiff (AD) is to assist ControlNet (CN) with video editing, but it still has domain gap issues. AnimateZero (AZ) has a distinct advantage in this regard, which is to generate videos with higher subjective quality and a better match with a given text prompt.

AnimateZero official also gave a comparison of video effects:According to the original video generated a girl swimming in lava,AnimateDiff's video picture is relatively blurry,The effect of lava is almost invisible,In contrast,AnimateZero's video is significantly better than AnimateDiff in terms of its fit with the text and the beauty of the picture。

Tencent is alive again! One sentence makes the picture the protagonist of the anime!

Image

What if you want to turn the black car in the original video red?

Tencent is alive again! One sentence makes the picture the protagonist of the anime!

Image

Looking at the request to change the original video to a little girl running on the grass in the forest, the video generated by AnimateDiff shows neither the forest nor the grass, but only some green on the background wall and the little girl's hair, which obviously does not meet the requirements, while the effect of AnimateZero is much better, and it fits the theme perfectly.

Tencent is alive again! One sentence makes the picture the protagonist of the anime!

Image

2、AnimateZero 在 ?

AnimateZero is a zero-shot image animation generator based on a video diffusion model. The traditional video diffusion model (VDM) has the following problems:

  • Black Box: The spawn process is opaque
  • Inefficient and uncontrollable: It takes a lot of trial and error to get a satisfactory result
  • Domain gap: Domain limited by the video dataset used during training

AnimateZero solves the problem of the lack of precise control of traditional text-to-video (T2V) diffusion models by decoupling appearance and motion processes with a step-by-step approach to video generation. With zero-shot modification, it is also possible to convert the T2V model to an I2V model, making it a zero-shot image animation generator.

  • Decoupling: The video generation process is decoupled into appearance (T2I) and motion (I2V)
  • Efficient and controllable: T2I generation is more controllable and efficient than T2V, and you can get a satisfactory image before I2V generates video
  • Alleviate domain gaps: The T2I model's domain can be fine-tuned to align with the actual domain, which is more effective than tweaking the entire video model

Image

Comparison of the traditional video diffusion model (a) and the AnimateZero video generation model (b).

In addition to its own innovations, what are the advantages of AnimateZero over AnimateDiff?

  • Greater consistency: AnimateZero demonstrates greater consistency between text descriptions and generated video, and between the T2I (text-to-image) domain and generated video.
  • Diverse applications: Compared to AnimateDiff, AnimateZero supports a wider range of personalized images and can perform better in different styles (such as real style, anime style).
  • Stronger animations: AnimateZero outperforms AnimateDiff in terms of animation quality and stylistic consistency, especially when dealing with complex motion and uncommon objects.

Even the most perfect model has its limitations, and the performance of AnimateZero is limited by the motion prior of its base model, AnimateDiff. For some complex motions (such as sports) or animations of uncommon objects, AnimateZero may not perform as well as it should. In addition, since AnimateZero is an improvement based on AnimateDiff, its performance and application range are limited by the base model.

3. The explosion of AI video generation models

One year ago, ChatGPT swept the world with lightning speed, bringing major changes to the field of text creation, and a year later, the Wensheng video track has become an explosive trend, and players at home and abroad have "opened the book" one after another.

Let's take a look at foreign tech giants first:

On November 3rd, Runway announced an update to its AI video generation tool, Gen-2, and a week later, Runway released the Motion Brush feature to enhance the ability to edit video parts.

On November 16, tech giant Meta launched Emu Video, a Wensheng video model, which first generates images conditioned on text, and then generates videos conditioned on text and generated images.

Stability AI certainly doesn't show weakness. On November 29, Stability AI launched a video generation model called Stable Video Diffusion, which offers two models, SVD and SVD-XT.

In addition, Pika Labs, an AI startup that has recently become popular, has launched a web version of Pika 1.0, which directly throws off the experience link and detonates the market.

Domestically, on November 12, researchers from the Chinese Academy of Sciences and other institutions proposed a text generation video framework GPT4Motion without training on November 21; on November 18, ByteDance launched the Wensheng video model PixelDance, proposing a video generation method based on text guidance + image guidance at the beginning and end frames, making video generation more dynamic; on December 1, Alibaba's research team proposed a new framework Animate On December 5, Meitu released version 4.0 of MiracleVision, a large AI vision model that focuses on design and video capabilities.

4. What is the mystery behind the "open book"?

So, what is behind the accelerated explosion of AI video generation technology and products?

From a technical point of view, the artificial intelligence models of Wensheng Diagram and Wensheng Video have a high similarity, and the technology and experience of Wensheng Diagram can be used and referenced by Wensheng Video is an important reason.

Judging from market sentiment, recently, the post-95 girls built Pika Labs with a team of 4 people, quickly got out of the circle and swiped the screen, and received $55 million in financing within half a year of its establishment, with a valuation of $200 million. Immediately afterwards, the drama of "father and daughter are expensive" was also staged in A-shares, and his father's listed company gained 3 consecutive daily limits after this tool exploded. It can be seen that the ability to attract money in the field of Wensheng video is unprecedentedly huge.

In addition, the technology accumulation of domestic leading enterprises has met the conditions. Zhang Dafang, doctoral supervisor and professor of the School of Information Science and Engineering of Hunan University, analyzed that the parameters of the artificial intelligence model of Wensheng video are 1 billion to 10 billion, and domestic leading enterprises have been able to master the above technologies. Driven by accelerating the improvement of models, cleaning Xi data, adjusting the operation interface, and optimizing internal parameters, Wensheng video technology has gradually overcome many shortcomings and quickly entered commercial application.

At the same time, from the perspective of application, there is no doubt about the prospect of AI-generated video, and the fields of film and television, games, and advertising are all important scenarios for its implementation. Yi Zhang, CEO and Chief Analyst of iiMedia Consulting, said: "Personalized video production is more cumbersome and costly, even beyond hiring programmers to program. Many industries are hungry for a simple video generation tool. ”

According to the statistics of Yuehu iAPP, from Q2 of 2022 to June this year, among all categories of apps on the mobile Internet, the usage time of short videos accounted for more than 30%, the highest among all categories. To a certain extent, this demand also indicates that there is a huge incremental "reservoir" in the field of video production.

From an objective point of view, although major manufacturers and enterprises are competing to increase their weight, the optimization and iteration speed and commercialization process of related applications are slow, large companies and start-up teams are evenly matched, and the potential of Wensheng video applications has not been fully developed. How to find the balance between video generation time, effect, and cost still needs to seek the optimal solution in their own continuous practice.

Reference Links:

https://vvictoryuki.github.io/animatezero.github.io/

https://www.chinaz.com/2023/1212/1582268.shtml

https://baijiahao.baidu.com/s?id=1785065486791669561&wfr=spider&for=pc

Source: 51CTO Technology Stack

Read on