As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

2024-04-30 11:31:00

A few days ago, when Shichao was surfing the Internet, he swiped several AI video clips.

The sense of oppression coming from the big ship, the hair and silk scarves blown by the wind, the astronauts walked directly into the real vegetable garden... The scenes made me stunned.

The realism is also a must, as the camera moves by the lake, not only the light changes, but even the changes in the sky and trees are the same as what we see with the naked eye.

If it weren't for the watermark in the bottom right corner, I would have almost thought Sora's video was new again.

So this time, the protagonist is not Sora, nor is it the well-known Pika and Runway Sora competitors, but the fledgling domestic video model Vidu.

The videos we saw were announced by Vidu at the Artificial Intelligence Theme Day of the Zhongguancun Forum a few days ago.

It can be generated for up to 16 seconds, and a prompt word "wooden toy boat sailing on the carpet" can generate the following long paragraph, the silkiness of a shot to the end, I am afraid that the director of Mouzi who is passing by will praise it when he sees it.

Sora claims to be a true simulation of the physical world, and Vidu can do the same.

Let it generate a video of "the car speeding through the country road in the forest", like the sunlight through the cracks in the woods, and the dust raised by the rear wheels, which is very in line with our daily cognition.

And Vidu's imagination is even richer than ours, a boat in the studio sails to the scene of the camera, it can be "shot" in minutes, looking at this effect, I don't know how many animators should tremble.

Even under some prompts, Vidu's comprehension ability is stronger than Sora's, such as the prompt word "the camera rotates around the TV", Sora doesn't get the meaning of rotation at all, but Vidu can easily understand.

One thing to say, after watching these videos of Vidu, Shichao really thinks that it is the only model on the market that can fight with Sora in terms of picture effect.

Although the current 16-second Vidu is not as long as the 60-second Sora, its progress is indeed visible to the naked eye, according to Geek Park, last month, Vidu can only generate 8 seconds of video internally, and last month, it can only generate 4 seconds of video.

Anyway, the media have compared Vidu to a "Sora-level video model", and netizens have also shouted in the comment area urging them to open the closed beta test as soon as possible.

But what Shichao is more curious about here is that we haven't heard of Vidu before, why did we suddenly make such a big battle with a thunderclap on the ground?

We also searched for information along the vine, and found that there are a lot of things worth talking about on Vidu, and even carefully smacked it, and we can find some traces of Vidu from Sora (Shichao didn't say the opposite).

Behind it is a company called Shengshu Technology, don't look at this company has just turned a year old, but it has begun to accumulate energy in the womb. Because its own mother is Rely Wisdom, an AI company in Tsinghua University, and the research team behind it is almost all of these people.

Before the establishment of Shengshu Technology, the team had already studied the video model in depth.

Especially in the very popular diffusion model of image generation, they are the first batch in the industry to study this model, and the whole paper has also been published in ICML, NeurIPS, and ICLR.

It is precisely because of such a good foundation that as early as September 2022, the team found the inspiration to do Vidu, which is the following paper.

Shichao asked AI to help us interpret it, and the general idea is that the diffusion model is quite strong in generating images, and the Transformer used in the large language model has a scale effect, and the more parameters are piled up, the better the performance. The team wondered if they could combine the advantages of these two to improve the quality of image generation through the entire fusion architecture.

于是们型里的 U-Net 给成 Transformer , 了个名字 U-ViT ( Vision Transformers )。果下来发么一合真有用,光是相大小的 U-ViT ,能 U-Net 。

Well, since this path is working, they have also set the technical route on U-ViT.

Ran goose... While the team was quietly brewing Vidu, a study by UC Berkeley on the other side of the ocean gave OpenAI's Sora a head start.

Just two months after the Tsinghua team submitted their paper, UC Berkeley also submitted their paper on the pre-printing platform ArXiv, which also said that they wanted to knead Transformers into the diffusion model, but the name was more straightforward, called DiT (Diffusion Transformers).

Doesn't it look familiar, yes, OpenAI's Sora model uses Berkeley's DiT technology route.

However, because the Tsinghua team was released two months early, CVPR 2023, the top computer vision conference of that year, rejected Sora's DiT and included U-ViT on the grounds of "lack of innovation".

And as early as the beginning of 2023, the Tsinghua team also used U-ViT to train an open-source large model with nearly 1 billion parameters, UniDiffuser.

It is the first demonstration that the converged architecture also adheres to the Scaling Law, which means that the performance of the model will increase exponentially as the amount of computation and parameters increases. And this Scaling Law is also Sora's secret weapon.

So according to this calculation, Sora actually has to call Vidu grandmaster...

But in the real world, DiT is being taken by OpenAI all the way up.

As for the Tsinghua team, the computing resources are not as in place as OpenAI, and there is no such gem as ChatGPT, in short, nothing is perfect, they can only take their time, make images and 3D models first, and then make videos when they have a family.

Fortunately, they still have some strength, and they are slowly catching up. In March last year, after the Tsinghua team established Shengshu Technology, they have been working non-stop on their own products, and now image generation and 3D model generation can be used by everyone for free.

And relying on these two products, it has just completed its first anniversary, and it has saved hundreds of millions of dollars.

For example, when it was established for 3 months, it completed a wave of angel round investment of nearly 100 million yuan, and last month, it completed a new round of financing of hundreds of millions of yuan. The people who participated in the investment were also industry bigwigs such as Zhipu AI, BV Baidu Venture Capital, etc.

Anyway, looking at this wave of posture, Vidu may really become a dark horse in China to benchmark OpenAI's Sora.

However, on the side of Shengshu Technology, I think it is a bit unimaginative to only regard Vidu as a domestic version of Sora, because their positioning of Vidu is not just a video model, but pictures, texts, and videos, but now video is the focus for the time being.

Of course, if you are obedient, everyone will say, whether you can make it or not, we have to see the finished product in a down-to-earth manner.

The world has already gone to line up.,Wait for the internal test qualification.,And then synchronize with everyone.。。。

As soon as it was released, it was benchmarked against Sora, and this domestic model is so big?

Read on

CNCC | The future of multimodal affective computing under large models

The "Fuxi Eye" large model was released! It has the world's largest ophthalmic image database

New car | The AI large model is on the car, 13 new/27 optimizations, and the ZEEKR 009 glorious OTA upgrade

AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

Surveying and Mapping Bulletin | Ren Ping: Noise data visualization based on LOD1 city model

The terminal AI grading standard has been implemented, and the "fire" of the mobile phone model has burned to the agent

J Clin Invest丨Yang Weili/Li Shihua/Li Xiaojiang's team used monkey models to reveal new pathological mechanisms of Parkinson's disease

Tens of millions of dollars lost by poisoning for large model training? Anthropic found a hidden bug in the LLM codebase

Nearly 1,000 teenagers in the city gathered at Zhonghai Expo to show their skills in the three major model competitions of navigation, aviation and architecture

DeepMind and MIT developed Fluid, which enables autoregressive models to achieve large-scale expansion of Wensheng graphs

AI Weekly | ByteDance's large model training was "poisoned"; Microsoft will terminate the Azure OpenAI service for individuals in China

ByteDance responded to the attack on the intern for the training of the large model: it has been dismissed and does not affect the online business

The 4 domestic sneakers that were once blown into the sky, but now they have fallen off the altar, who is still following the trend?

The three domestic sneakers that once led the trend fell off the altar, why did the classic brands decline one after another

A number of large models have been rolled out in the field of traditional Chinese medicine, and the "AI old Chinese medicine" is coming?

A controversial domestic martial arts, but it has become the favorite of Korean players

Domestic App has no bottom line? Add ads indiscriminately, occupy the phone's memory, and then do it like this, you really have to uninstall it

Domestic power to the world! Zhiyuan Robot took the lead in starting the commercial mass production of general-purpose robots

Shoot the king to bomb? Photorealistic generative world model, with Pixar investment

Domestic manufacturers break through the problem of RGB OLED display!

Domestic mobile phones are "all staffed" to access DeepSeek, except for it

Tencent, Huawei, etc. access to DeepSeek lose more than 400 million yuan per month, and the MaaS model as a service is about to be subverted? Titanium media AGI

The domestic image flagship staged a "peak showdown", Huawei's new machine, is really here to make trouble!

The once domestic mobile phone brother suddenly "resumed", can the king return?

The sex robot was unexpectedly empowered by a large model, and the concept stocks of adult products rose collectively, against the sky?

The Rise of Trendy Toys: The "Battle of Destiny" of Domestic IP

The global otaku bedroom is occupied by domestic "AI dolls".

This domestic game named by CCTV, why is it mostly bad?

Another domestic dark horse after DeepSeek? The "world's first" general AI Agent swiped the screen overnight, and the whole network was asking for an invitation code

Behind the Selling of "Nezha 2" Derivatives (II)|The Rise of Emotional Consumption How to Long-term Operation of Domestic IP Derivatives

A huge loss of 4.2 billion! The domestic beverage giant has fallen, and now the founder has lost contact and dumped the mess on his son

A 4230% increase! Domestic replacement chips turned over overnight, and Huang Jenxun went to China overnight to "save the scene"