laitimes

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

author:The mountain monster Atu

All articles here are from

WeChat public account "Mars AIGC"

If you want to see more updated AI cutting-edge information, AI information and AI tool practice, please pay attention to the WeChat public account "Mars AIGC".

Nankai University has partnered with ByteDance to release the open-source StoryDiffusion framework, a revolutionary image and video generation project that can generate long sequence-consistent images and videos, and there is currently no comparable AI project in the world. With StoryDiffusion, you can generate a series of comics with consistent characters and scenes with one click, which is simply an AI tool customized for comic book producers and content creators.

Loading...

StoryDiffusion- Consistent Self-Attention for Long-Range Image and Video Generation

At present, the main projects to make image generation consistent are IPAdapter and PhotoMaker, both of which use pre-trained models on large datasets, allowing the direct use of a given image to control the generation of images.

What's not the same is that StoryDiffusion creates a variety of styles of imagery by proposing consistent self-focus, which can maintain consistency of the subject matter across multiple images, including consistent character styles and costumes for coherent storytelling, and is untrained and pluggable. Take a look at the image generation effect.

StoryDiffusion can create stunning, consistent cartoon-style characters.

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

StoryDiffusion can also maintain the identities of multiple characters at the same time and generate consistent characters in image sequences.

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

The effect of generating a series of comics with one click

Nankai University launches StoryDiffusion, a revolutionary image and video generation project
Nankai University launches StoryDiffusion, a revolutionary image and video generation project
Nankai University launches StoryDiffusion, a revolutionary image and video generation project
Nankai University launches StoryDiffusion, a revolutionary image and video generation project

The above official display effect is amazing, and the results of the trial and measurement are the same as the official display.

And that's not all about image generation, StoryDiffusion can also generate high-quality video with its image semantic motion predictor conditional on the consistent image generated or the user-input image. There are a lot of video demos that have been officially released, which may not be as long as Sora, but it can already be compared with Sora in terms of subject consistency in the video.

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Introduction to the technology

StoryDiffusion achieves this through two main module components: generating a consistent image or video sequence with rich content based on a predefined text story.

一致性自注意力机制 (Consistent Self-Attention):

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

The module requires no training and can be inserted directly into existing image generation models.

It improves the consistency of the characters by creating connections between batches of images during the generation process, effectively generating consistent images of faces and clothing.

This approach improves the user's ability to control the generated content through text prompts.

语义运动预测器 (Semantic Motion Predictor):

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

This module is used to convert a series of consistent images into a video that tells a story more vividly.

By encoding images into semantic space, it captures spatial information to more accurately predict motion, and even large movements can be expressed smoothly.

Compared with the method that only relies on the latent space of the image to predict the motion, the prediction results of semantic space are more stable, especially when generating long videos.

How to use:

At present, there are two official ways to use it, one is local deployment, and the other is network trial.

There are two ways to deploy locally, one is the Python clone installation project, and it should be noted that the graphics card is at least 20G video memory or more. The other is to install and run Pinokio with one click on your local computer.

There are also two ways to try it online, one is to run it on Google's colab cloud notebook, and the other is to try it out in the hugging space. I tried it on my hug face, and the consistency of the subject is very good, except for some small flaws, the effect is exactly the same as the official display.

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

1. Support image reference images, (cartoon reference images are not supported now).

2. Support typography style and subtitles. (By default, the prompt is used as the title of each image.) If you need to change the title, add a # at the end of each line. Only the section after # will be added as the title of the image).

3. [NC] symbol (The [NC] symbol is used as a sign to indicate that no characters should appear in the generated scene image. If you want to do this, add "[NC]" at the beginning of the line. For example, to generate a fallen leaf scene without any characters, write: "[NC]The leaves are falling." ”)。

At present, there is only an image generation function, and the official code for video generation is not provided. The official paper also mentions that while it is possible to generate longer videos using a sliding window, StoryDiffusion is not specifically designed for long video generation, so it may not work well when generating very long videos. When StoryDiffusion generates consistent images, there may be inconsistencies in some subtle garment details, such as ties, and more detailed text prompts are needed to maintain consistency.

StoryDiffusion is a groundbreaking exploration of AI generation consistency, providing a new perspective on AI generation capabilities in content consistency. As AI continues to evolve, tools like StoryDiffusion will play a vital role in storytelling and content creation.

Address: arxiv.org/abs/2405.01434

Project address: github.com/hvision-nku/storydiffusion

Trial address: huggingface.co/spaces/YupengZhou/StoryDiffusion

Read on