Nankai University launches StoryDiffusion, a revolutionary image and video generation project

All articles here are from

WeChat public account "Mars AIGC"

If you want to see more updated AI cutting-edge information, AI information and AI tool practice, please pay attention to the WeChat public account "Mars AIGC".

Nankai University has partnered with ByteDance to release the open-source StoryDiffusion framework, a revolutionary image and video generation project that can generate long sequence-consistent images and videos, and there is currently no comparable AI project in the world. With StoryDiffusion, you can generate a series of comics with consistent characters and scenes with one click, which is simply an AI tool customized for comic book producers and content creators.

StoryDiffusion- Consistent Self-Attention for Long-Range Image and Video Generation

At present, the main projects to make image generation consistent are IPAdapter and PhotoMaker, both of which use pre-trained models on large datasets, allowing the direct use of a given image to control the generation of images.

What's not the same is that StoryDiffusion creates a variety of styles of imagery by proposing consistent self-focus, which can maintain consistency of the subject matter across multiple images, including consistent character styles and costumes for coherent storytelling, and is untrained and pluggable. Take a look at the image generation effect.

StoryDiffusion can create stunning, consistent cartoon-style characters.

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

StoryDiffusion can also maintain the identities of multiple characters at the same time and generate consistent characters in image sequences.

The effect of generating a series of comics with one click

The above official display effect is amazing, and the results of the trial and measurement are the same as the official display.

And that's not all about image generation, StoryDiffusion can also generate high-quality video with its image semantic motion predictor conditional on the consistent image generated or the user-input image. There are a lot of video demos that have been officially released, which may not be as long as Sora, but it can already be compared with Sora in terms of subject consistency in the video.

Introduction to the technology

StoryDiffusion achieves this through two main module components: generating a consistent image or video sequence with rich content based on a predefined text story.

一致性自注意力机制 (Consistent Self-Attention)：

The module requires no training and can be inserted directly into existing image generation models.

It improves the consistency of the characters by creating connections between batches of images during the generation process, effectively generating consistent images of faces and clothing.

This approach improves the user's ability to control the generated content through text prompts.

语义运动预测器 (Semantic Motion Predictor)：

This module is used to convert a series of consistent images into a video that tells a story more vividly.

By encoding images into semantic space, it captures spatial information to more accurately predict motion, and even large movements can be expressed smoothly.

Compared with the method that only relies on the latent space of the image to predict the motion, the prediction results of semantic space are more stable, especially when generating long videos.

How to use:

At present, there are two official ways to use it, one is local deployment, and the other is network trial.

There are two ways to deploy locally, one is the Python clone installation project, and it should be noted that the graphics card is at least 20G video memory or more. The other is to install and run Pinokio with one click on your local computer.

There are also two ways to try it online, one is to run it on Google's colab cloud notebook, and the other is to try it out in the hugging space. I tried it on my hug face, and the consistency of the subject is very good, except for some small flaws, the effect is exactly the same as the official display.

1. Support image reference images, (cartoon reference images are not supported now).

2. Support typography style and subtitles. (By default, the prompt is used as the title of each image.) If you need to change the title, add a # at the end of each line. Only the section after # will be added as the title of the image).

3. [NC] symbol (The [NC] symbol is used as a sign to indicate that no characters should appear in the generated scene image. If you want to do this, add "[NC]" at the beginning of the line. For example, to generate a fallen leaf scene without any characters, write: "[NC]The leaves are falling." ”）。

At present, there is only an image generation function, and the official code for video generation is not provided. The official paper also mentions that while it is possible to generate longer videos using a sliding window, StoryDiffusion is not specifically designed for long video generation, so it may not work well when generating very long videos. When StoryDiffusion generates consistent images, there may be inconsistencies in some subtle garment details, such as ties, and more detailed text prompts are needed to maintain consistency.

StoryDiffusion is a groundbreaking exploration of AI generation consistency, providing a new perspective on AI generation capabilities in content consistency. As AI continues to evolve, tools like StoryDiffusion will play a vital role in storytelling and content creation.

Address: arxiv.org/abs/2405.01434

Project address: github.com/hvision-nku/storydiffusion

Trial address: huggingface.co/spaces/YupengZhou/StoryDiffusion

Nankai University launches StoryDiffusion, a revolutionary image and video generation project

Read on

Brother Dazhi, who reported Ni Haishan with his real name, deleted all works related to Ni Haishan, including the video of the real-name report. The old man was really intimidated, and ran to the servant Yong to leave a message behind the work. Cow boom

#头条创作挑战赛#(1) Parenting education class for parents aged 0-12 years old • 80 sessions (2) Early education game class for 0-6 years old • 100 sessions (3) Concentration family play class • 45 sessions (4) Parent-child communication

#头条创作挑战赛#一视频中, when the man in the toilet in the mall went to the toilet, there was a gap wide with a person's face between the toilet baffle and the ground, first with two arms on the ground, and then one was exposed

Another office indecency! The video of forgetting to turn off the camera was exposed, and the woman's identity sparked heated discussions

One-minute novel. In episode 208, a certain director posted a video of eating, eating and grinning, hiding his face, crying bitterly, and a certain director and Aunt Wang ate in a small restaurant. A certain guide said that he was pregnant,

@爱好短视频的你, the short video online competition of the song "Walking Through the Sword Gate" is hotly opened!

Isn't the cowhide really blown? I watched a video today that blew my mind! As you can see from the video, hundreds of cattle are going to graze on the other side of the river, facing the 100-meter-wide and surging river

This is the most explosive melon in today's entertainment industry! On June 5th, a very explosive melon was suddenly exposed in the entertainment industry, that is, a girl broke the news that the male idol Hu Xuanxuan had cheated many times during her pregnancy.

Video call with the "police", the price is 1 million?!

Video call with the police, the price is 1 million?!

When I first started making medium videos in 2020, I could post a video casually, and I could earn eight or nine yuan with more than 100 playbacks, but now, even if I have 1000 playbacks, the income will be returned

Wang Xiaofei's son smashed Ma Liuji, complaining that his father was like a "pig", and Ma Xiaomei posted a video of a big S

Another 141 people were killed, 47 were children! Live video exposure: This celebration, why

Newentai Air Energy Million IP Star Program - Short Video Matrix Customer Acquisition Training Camp is in full swing 🔥🔥🚩 with theory + practical operation, unlocking new skills 🚩 of online exposure from traffic to transaction,

#头条创作挑战赛#最近一位情感博主的视频, it resonated with too many people. His videos are sincere and delicate in emotion, and his grasp of the characters' hearts is accurate and in place, without distortion,

It was estimated that the remaining weight of 4~500 grams of honorable woody fragrance agarwood spice pieces was unexpected! There are about 601 grams left in the claimed firmness, and the weight is only 22 grams lower than the 623 grams shown in the previous video, which is heavy