laitimes

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

author:Bad reviews
As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

A few days ago, when Shichao was surfing the Internet, he swiped several AI video clips.

The sense of oppression coming from the big ship, the hair and silk scarves blown by the wind, the astronauts walked directly into the real vegetable garden... The scenes made me stunned.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

The realism is also a must, as the camera moves by the lake, not only the light changes, but even the changes in the sky and trees are the same as what we see with the naked eye.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

If it weren't for the watermark in the bottom right corner, I would have almost thought Sora's video was new again.

So this time, the protagonist is not Sora, nor is it the well-known Pika and Runway Sora competitors, but the fledgling domestic video model Vidu.

The videos we saw were announced by Vidu at the Artificial Intelligence Theme Day of the Zhongguancun Forum a few days ago.

It can be generated for up to 16 seconds, and a prompt word "wooden toy boat sailing on the carpet" can generate the following long paragraph, the silkiness of a shot to the end, I am afraid that the director of Mouzi who is passing by will praise it when he sees it.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

Sora claims to be a true simulation of the physical world, and Vidu can do the same.

Let it generate a video of "the car speeding through the country road in the forest", like the sunlight through the cracks in the woods, and the dust raised by the rear wheels, which is very in line with our daily cognition.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

And Vidu's imagination is even richer than ours, a boat in the studio sails to the scene of the camera, it can be "shot" in minutes, looking at this effect, I don't know how many animators should tremble.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

Even under some prompts, Vidu's comprehension ability is stronger than Sora's, such as the prompt word "the camera rotates around the TV", Sora doesn't get the meaning of rotation at all, but Vidu can easily understand.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

One thing to say, after watching these videos of Vidu, Shichao really thinks that it is the only model on the market that can fight with Sora in terms of picture effect.

Although the current 16-second Vidu is not as long as the 60-second Sora, its progress is indeed visible to the naked eye, according to Geek Park, last month, Vidu can only generate 8 seconds of video internally, and last month, it can only generate 4 seconds of video.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

Anyway, the media have compared Vidu to a "Sora-level video model", and netizens have also shouted in the comment area urging them to open the closed beta test as soon as possible.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

But what Shichao is more curious about here is that we haven't heard of Vidu before, why did we suddenly make such a big battle with a thunderclap on the ground?

We also searched for information along the vine, and found that there are a lot of things worth talking about on Vidu, and even carefully smacked it, and we can find some traces of Vidu from Sora (Shichao didn't say the opposite).

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

Behind it is a company called Shengshu Technology, don't look at this company has just turned a year old, but it has begun to accumulate energy in the womb. Because its own mother is Rely Wisdom, an AI company in Tsinghua University, and the research team behind it is almost all of these people.

Before the establishment of Shengshu Technology, the team had already studied the video model in depth.

Especially in the very popular diffusion model of image generation, they are the first batch in the industry to study this model, and the whole paper has also been published in ICML, NeurIPS, and ICLR.

It is precisely because of such a good foundation that as early as September 2022, the team found the inspiration to do Vidu, which is the following paper.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

Shichao asked AI to help us interpret it, and the general idea is that the diffusion model is quite strong in generating images, and the Transformer used in the large language model has a scale effect, and the more parameters are piled up, the better the performance. The team wondered if they could combine the advantages of these two to improve the quality of image generation through the entire fusion architecture.

于是们 型里 的 U-Net 给 成 Transformer , 了个名字 U-ViT ( Vision Transformers )。 果 下来发 么一 合 真有用,光是相 大小的 U-ViT ,能 U-Net 。

Well, since this path is working, they have also set the technical route on U-ViT.

Ran goose... While the team was quietly brewing Vidu, a study by UC Berkeley on the other side of the ocean gave OpenAI's Sora a head start.

Just two months after the Tsinghua team submitted their paper, UC Berkeley also submitted their paper on the pre-printing platform ArXiv, which also said that they wanted to knead Transformers into the diffusion model, but the name was more straightforward, called DiT (Diffusion Transformers).

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

Doesn't it look familiar, yes, OpenAI's Sora model uses Berkeley's DiT technology route.

However, because the Tsinghua team was released two months early, CVPR 2023, the top computer vision conference of that year, rejected Sora's DiT and included U-ViT on the grounds of "lack of innovation".

And as early as the beginning of 2023, the Tsinghua team also used U-ViT to train an open-source large model with nearly 1 billion parameters, UniDiffuser.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

It is the first demonstration that the converged architecture also adheres to the Scaling Law, which means that the performance of the model will increase exponentially as the amount of computation and parameters increases. And this Scaling Law is also Sora's secret weapon.

So according to this calculation, Sora actually has to call Vidu grandmaster...

But in the real world, DiT is being taken by OpenAI all the way up.

As for the Tsinghua team, the computing resources are not as in place as OpenAI, and there is no such gem as ChatGPT, in short, nothing is perfect, they can only take their time, make images and 3D models first, and then make videos when they have a family.

Fortunately, they still have some strength, and they are slowly catching up. In March last year, after the Tsinghua team established Shengshu Technology, they have been working non-stop on their own products, and now image generation and 3D model generation can be used by everyone for free.

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

And relying on these two products, it has just completed its first anniversary, and it has saved hundreds of millions of dollars.

For example, when it was established for 3 months, it completed a wave of angel round investment of nearly 100 million yuan, and last month, it completed a new round of financing of hundreds of millions of yuan. The people who participated in the investment were also industry bigwigs such as Zhipu AI, BV Baidu Venture Capital, etc.

Anyway, looking at this wave of posture, Vidu may really become a dark horse in China to benchmark OpenAI's Sora.

However, on the side of Shengshu Technology, I think it is a bit unimaginative to only regard Vidu as a domestic version of Sora, because their positioning of Vidu is not just a video model, but pictures, texts, and videos, but now video is the focus for the time being.

Of course, if you are obedient, everyone will say, whether you can make it or not, we have to see the finished product in a down-to-earth manner.

The world has already gone to line up.,Wait for the internal test qualification.,And then synchronize with everyone.。。。

Read on