Produced by | 51CTO Technology Stack (WeChat ID: blog51cto)

Sora has led the research on "video consistency", but time consistency alone can no longer satisfy the industry's desire for high-fidelity video. No, the Chinese have come out to blow up the field again!

Recently, a video model called VideoGigaGAN has become popular in the industry. Super Resolution Cinema Lenses, Don't Wait for Sora!

The Chinese continue to blow up the field!8 times the release of the SOTA model, and you can finally see the spider web silk clearly!

Image

According to reports, there are currently two major problems in the field of VSR (Video Super Resolution): One challenge is to maintain temporal consistency between output frames. The second challenge is to generate high-frequency detail in the upsampled frames. The main question of this paper is the second question. In response to this problem, it seems that the effectiveness of GANs (generative adversarial networks) has once again been verified.

1. Let the blurry video restore realistic details, 8 times more than SOTA

For example, in car recognition, previous VSR methods, such as BasicVSR++, lacked detail, while ImageGigaGAN can produce sharper results with richer details, but it produces videos with artifacts such as time flickering and aliasing (note the architectural footage in the video).

The newly proposed VideoGigaGAN method can generate video results with both high-frequency detail and temporal consistency, while significantly mitigating problems such as aliasing artifacts.

Image

VideoGigaGAN is a generative video super-resolution model that supersamples high-frequency details of video while maintaining temporal consistency. Compared to existing VSR methods, VideoGigaGAN is able to generate time-consistent videos with more fine-grained appearance details.

The study showed that VideoGigaGAN was very effective on public datasets and demonstrated video results that exceeded 8x the current state-of-the-art VSR models.

Image

Let's show a few comparison videos first, I believe you can't believe your eyes: video black technology is so shocking!

The time has come to witness the miracle -

The research team released a video comparison of enoki mushroom shabu-shabu, digression: Xu himself is also a Cooking enthusiast.

You should still remember that the Sora-like video released by the bird tool before, after flying from the book, there will always be a layer of ghost, this problem has been solved by VideoGigaGAN.

The animal world is very exciting, but if you can't see the web behind the spider clearly, how the "little flower cat" interacts with the rope will somewhat lose some of the beauty of the shot.

2. How? The answer lies in the model details

Next, let's take a look at the power of this model.

Image

First, the video super-resolution (VSR) model is built on top of the GigaGAN upsampler of the asymmetric U-Net architecture of the image.

Second, to enhance temporal consistency, the team upscaled image samples into video samplers by adding temporal attention layers to the decoder block.

Then, another trick is to enhance consistency by integrating the features of the stream-oriented propagation module.

Next, to suppress aliasing artifacts, the team used anti-aliasing in the encoder downsampling layer.

Finally, Xu et al. directly transfer high-frequency features to the decoder layer through a layer hopping connection to compensate for the loss of detail in the BlurPool process.

One thing to note here: because the spatial window size of temporal attention is limited. Therefore, Xu et al. introduced stream-oriented feature propagation into the amplified GigaGAN to better align features from different frames based on the stream information.

Secondly, there is anti-aliasing processing, which further mitigates the temporal flicker caused by the downsampled block in the GigaGAN encoder, while maintaining the high-frequency detail by transmitting the high-frequency features directly to the decoder block.

Of course, these ideas were also verified by the results of the final experiments. So, these model design choices are very important.

3.背后的一作：爱Cook的Xu yiyan

That's right, Xuyiyan is another Chinese scholar who graduated from South China University of Technology with a bachelor's degree and is now a doctoral student at the University of Maryland's Park College. Xu's current research interests include generative models and their applications, and he is also known to have done research on scene understanding in the field of autonomous driving.

Image

As mentioned earlier, Xu's personal hobbies are quite special: photography, hiking, and cooking.

Image

4. Netizens are hotly discussed: the quality is good, the duration is too short, we need 200 frames (at least 9 seconds)

The focus of research on the issue of camera length was the focus, and one user on HN commented: "The video quality looks good, but it has a lot of limitations. Our model struggled with extremely long videos, such as 200 frames or more. So he thinks more research is needed to use it in a real-world setting.

In this regard, some netizens showed a similar opinion: "To a certain extent, I will compulsively count the seconds of the shots, knowing that a show/movie has several shots that are more than 9 seconds, and can win our trust, I can let go." ”

According to another Hackernews user review, the average shot length for a modern movie is about 2.5 seconds, and about 15 seconds for animation. The frame rate of 30fps in this study is not enough, meaning that the time will be less than 7 seconds.

All in all, we are very much looking forward to the results if the paper can be scaled to 200 frames.

5.One More Thing：别忘了打上AI标签

In addition, the release of the research results has once again raised concerns about the misuse of AI. "This is great for entertainment, but overly realistic and clear images can still be used as any kind of 'evidence', and people don't know how the details of these hallucinations work, so such videos still need to be prominently marked. "However, the sober thing is that there are already quite a few software or video/photography functions on smartphones that are already using proprietary algorithms to "infer" whether there are fake details, and the scale of the inspection will be much larger. However, going back to this study, the most interesting thing is the magical ability to restore details. Think of the many images in TV and film, especially the precious influence of a decade ago, with this technology, it will no longer be difficult to "enhance" low-resolution images to make them clear!

Source: 51CTO Technology Stack

The Chinese continue to blow up the field!8 times the release of the SOTA model, and you can finally see the spider web silk clearly!

1. Let the blurry video restore realistic details, 8 times more than SOTA

2. How? The answer lies in the model details

3.背后的一作：爱Cook的Xu yiyan

4. Netizens are hotly discussed: the quality is good, the duration is too short, we need 200 frames (at least 9 seconds)

5.One More Thing：别忘了打上AI标签

Read on

Shine Warm New Shine Set Spider Wedding Dress Revealed Four parts have added a new double illusion function

ESA's Mars Express has discovered the mysterious "spider" landform in the South Pole region of Mars

20 photos of "live for a long time", mysterious snails appear in the forest, poisonous scorpions fight small spiders

Manchester City vs Nottingham Forest: Guashuai rotates 5 generals, led by Tintin, and Spider Foden charges

World of Warcraft's "Battle for the Center of the Earth" adds a arachnophobia mode, and 3 weapon oils return

Countless spiders! ESA probe captures terrifying images of the "ruins of an ancient city" on Mars

轩辕大模型的实践与应用 | ML-Summit 2024

The mobile UI model came out, and the Apple iPhone may welcome a new cycle of upgrades

iFLYTEK does not tell the "sexy story" of large models

Salute to the laborers! "Spider-Man" is busy cleaning the exterior wall of the high-rise building

Is there really life on Mars? The European probe found a large number of spiders gathered in the "ruins of an ancient city" on Mars

Meta released the "strongest open-source AI model", and the next generation may be stronger than GPT

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

The "ruins of the ancient city" on Mars were filmed in a terrifying scene

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges

Reborn Tang San: The three-five combination reappeared in the rivers and lakes, Tang San, Mei Gongzi vs Spider Tree Spirit, and Corpse Devouring Spirit

Baidu's strongest SOTA: 3DGS based on diffusion model!

The ancestors of spiders and scorpions 520 million years ago may have looked →like this

Sprint 2024 "Half Year Red" | Sixty percent of AI companies have achieved profitable growth, and large model companies have made money?

Compared with the logistics support system with a clear division of labor among the Autobots, the Decepticons can be said to be all soldiers and basically combatants. But that doesn't mean they're injured later

4K Wallpaper I Visual Feast: Spider Web 🕸️ "No. 263"

Dialogue with UBTECH Jiao Jichao: Large model accelerates humanoid robots to "work in the factory"

Celebrity life in prison: actresses take a bath and go to the toilet to be watched, but they raise spiders to relieve boredom

iFLYTEK's profit puzzle: high investment and low return in the field of large models

Ali Lin Junyang: Large models are not enough for many people, and building multimodal agents is the key

He died before he left the school, which made the hero cry. With 50 points, 4 rebounds and 4 assists, Mitchell almost played the pinnacle of his personal series in the G6 series against the Magic, but against Garland

Li Feifei, the godmother of AI, founded a spatial intelligence company that strives to overcome the existing limitations of large-scale AI technology

Pep Guardiola: Must win all three games remaining, and Haaland needs to play because the little spider is important