Runway developed a generic world model and wanted AI to better simulate the world

author：36 Krypton 2023-12-22 18:23:00

Text: Wang Yining

Edited by Shane

Recently, the AI video track has become popular, Pika launched version 1.0, and announced that it will raise tens of millions of US dollars, which is out of the limelight.

Runway, the developer of the phenomenal Gen-1 and Gen-2 products, suddenly announced the formation of a team to develop General World Models (GWMs), with the goal of creating an artificial intelligence system that is different from large language models and can simulate the real world.

After Runway shouted that it would do GWMs, it immediately aroused the doubts of many netizens.

Someone said:

This is a multi-modal large model with video, audio, text, and pictures

Someone else said directly: "It's a nice video, and Ruben (the puppy in the video) is cute." (But directly ignoring the new model)

Runway developed a generic world model and wanted AI to better simulate the world

△ Source: Twitter

What does Runway want to do with the world model, and why did you choose to do it at this time?

Simulate the world with a world model

For most users, the development of artificial intelligence in the past year has indeed exceeded our expectations and imagination, but when we are amazed that large language models can talk to us fluently, the illusion problem makes the large models "babble" or "answer the wrong question" from time to time, which also greatly reduces the actual use experience.

This kind of problem does not only exist in the field of large language models, but also in AI image expansion and AI video generation, such as the classic six-finger problem in AI-generated images:

△ Source: Twitter

Even Runway's own product, the Gen-2, is not immune to this problem. In a new 3-minute video, Runway attempts to explain the root cause of the problem – the lack of a comprehensive understanding of the real world with existing large models.

For example, although LLMs (large language models) can generate poems, articles, and even movies, they only know the rules of the language domain, so when they encounter problems that they don't understand, they often "make things up in a serious way".

Their underlying paradigm is: big model + big data = more knowledge about the world, and this paradigm also leads to the problem of pervasive hallucinations, and the same situation is also seen in AI video generation tools.

In fact, the concept of a generic world model proposed by Runway this time is precisely to respond to and solve this problem. Runway defines a "world model" as an artificial intelligence system that constructs an internal representation of an environment and simulates future events in that environment.

In short, Runway wanted the new model to be as close as possible to the real world we live in, simulating a wide variety of situations and interactions.

LeCun was supportive, but Runway wanted to do something different

The "World Model" is not a concept pioneered by Runway. Turing Award winner Yann LeCun came up with the concept last year to depict his vision of an AI that is closer to the real level of human beings.

He once criticized the GPT model in a public speech, believing that the autoregressive model generated according to probability cannot solve the illusion problem at all, and even asserted that the GPT model will not survive for 5 years.

LeCun wanted to create an in-house model that could learn how the Xi world works, and in June this year, he and his team released I-JEPA, a "human-like" AI model that allows the model to learn Xi common sense background about the world just like humans.

△ Source: Twitter

However, it seems that although their paper received a lot of applause and expectations when it was released, half a year has passed, and it seems that LeCun's world model has not yet found a way to be successfully implemented. This may also be the reason why the public has reservations about Runway.

So what kind of world model does Runway want to work on?

In the video, Runway revealed some of the ideas on how to develop the new model, which GWM wants to build is a kind of mental map that allows the model to learn more about the "why" and "how" of the world.

There seemed to be a challenge to bring this idea to life, and the Runway team recognized it. In their introduction to GWM, they mentioned that the two issues that need to be addressed at the moment are:

1. These models need to generate consistent maps of environments, as well as the ability to navigate and interact within those environments.

2. The model needs to capture not only the dynamics of the world, but also the dynamics of its inhabitants, which also includes building realistic models of human behavior.

△ Source: Twitter

Despite the muted response, Runway has clearly made up its mind to build a team and start hiring, with new positions opening up on the company's official website, covering areas as diverse as machine Xi, applied research, and data infrastructure.

△ Source: Runway official website

One More Thing

Looking back at the AI video generation track, the enthusiasm ignited by Pika 1.0 has not diminished but increased. Judging from the feedback of the first batch of users who have obtained the Pika evaluation qualification, the current evaluation of the actual effect and technical level of Pika 1.0 is also polarized.

Some users praised Pika 1.0 as the best AI video generation tool they have ever used, while some discord users found that the effect was not significantly different from that of other similar tools.

Domestic giants have also laid out AI-generated animations, and the competition between Ali and Byte has reached the point of face-to-face - Ali recently released an AI project called "Animate Anyone", saying that only a picture and a piece of skeletal animation are needed to make videos for anyone. Byte followed suit with the launch of "MagicAnimate" and directly became open source. In the end, the battle was temporarily suspended with Ali's quick release of "DreaMoving" in response.

Interestingly, one of the most popular Pika Labs was founded when the two co-founders' work was rejected at the first AI movie Festival held by Runway. In a recent interview, founder Chenlin Meng also mentioned that the current level of video quality that Runway, Genmo, Imagen Video and other companies can generate is similar, and there are many "artifacts", but this also shows that there is still a lot of room for technological innovation and breakthroughs in this field.

Chenlin Meng likened the current video generation technology to the "GPT-2 era", and there are still many uncertainties in the future competitive landscape. It remains to be seen whether GWM can help Runway overtake in corners.

Welcome to the exchange

Runway developed a generic world model and wanted AI to better simulate the world

Simulate the world with a world model

LeCun was supportive, but Runway wanted to do something different

One More Thing

Read on