laitimes

Huang Guan: Build a data engine for general embodied intelligence, and the commercialization speed is the fastest in China

author:Smart stuff
Huang Guan: Build a data engine for general embodied intelligence, and the commercialization speed is the fastest in China

作者 | GenAICon 2024

The 2024 China Generative AI Conference was held in Beijing on April 18-19, and Dr. Huang Guan, founder & CEO of Jijia Technology, delivered a speech entitled "Technology and Application Closed Loop, From Video Generation to World Model" at the AIGC Application Session on the second day of the conference.

Huang Guan believes that all "general intelligence" is moving towards "end-to-end large models", and all problems have become "high-quality data" problems. The "world model" is the most important source of "high-quality data" for embodied intelligence in the future, which combines various data such as Internet data, simulation data, remote operation data, and real collected data to learn, train, and combine to obtain an interactive physical world simulator.

Greatstar Technology is building a next-generation data platform based on the world model to provide services for end-to-end autonomous driving and general robotics. The DriveDreamer autonomous driving world model and the WorldDreamer general world model have been successfully commercialized.

The following is a transcript of Huang Guan's speech:

Today, I would like to share with you our thoughts and progress related to video generation and world modeling, as well as our thoughts on building a new generation of data engines for universal embodied intelligence.

We have summarized the current development trend of general intelligence, and the entire industry is moving from general content intelligence represented by GPT and Sora to general mobile intelligence. Whether it's an agent, autonomous driving, or a robot, the core is from generating content to generating action. When large models can reliably produce action, then the impact on the entire economy and society will definitely be larger, and it is what everyone calls the real "fourth industrial revolution".

1. There are three major directions for the development of the world model: video generation, autonomous driving, and general robots

The term world model was first proposed by Yang Likun, who said that GPT cannot reach AGI, and we need a world model. In fact, in the past two years, everyone at home and abroad has become more and more aware of the importance of the world model, and the core is to develop in three directions, including video generation, autonomous driving and general robots, all of which are very concerned about the progress of the world model.

Huang Guan: Build a data engine for general embodied intelligence, and the commercialization speed is the fastest in China

The first is video generation. At the beginning of this year, Sora detonated the entire AI circle, and it is worth noting that OpenAI did not see Sora as a simple Wensheng video model, but called it World Simulator (World Simulator), which has the prototype of the world model. Late last year, Runway also publicly announced that they were moving towards a generic world model.

We also have a job called WorldDreamer, which should be the world's leading next-generation Transformer architecture, not Diffusion architecture, to move towards general video generation and world model.

Huang Guan: Build a data engine for general embodied intelligence, and the commercialization speed is the fastest in China

Secondly, we see the autonomous driving industry. Since it is a world model, it will definitely affect the physical world, and will have a strong understanding and prediction ability of the physical world.

So we saw that Tesla started saying that they were working on the General World Model from the middle of last year, and at the same time, Tesla continued to increase its investment in the direction of the video base model. There is also Wayve, a British autonomous driving company, which is a company that Bill Gates immediately went to the UK to invest in after investing in OpenAI, because Bill Gates felt that Wayve made him see the hope of AGI in the physical world.

Excellent Technology is also the first company in China to start to make a model of the autonomous driving world, our model is called DriveDreamer, and has achieved large-scale commercial application.

In a bigger trend, you can see that now in the direction of general robots, Berkeley and Covariant have done a series of work related to world simulators and world models, including their recent RFM robot model related work. Google is also making an interactive world model, and the humanoid robot startup 1X is also using the world model to predict the future and realize general-purpose robots.

The global world model is developing very rapidly in combination with video generation, autonomous driving, and general robotics.

Second, general intelligence is moving towards an end-to-end large model, and the world model is the most important source of high-quality data

The current trend is that all general intelligence is moving towards end-to-end large models, whether it is generative intelligence, including the understanding and generation of language, video, images, 3D, etc.; Or embodied intelligence, including autonomous driving, general-purpose robots, etc.

Especially for autonomous driving, we have seen that Musk has frequently created momentum for Tesla V12 recently, which is a standard Video-in Action-out (video input-action output) system. General robotics is also the latest Silicon Valley trend, and everyone is moving towards an end-to-end, video-in-action-out paradigm.

Under this trend, everything becomes a problem of high-quality data, because this is no longer the rule-driven system of the past, and high-quality end-to-end data is needed to iterate and train such generative intelligence or embodied intelligence systems.

We believe that the world model is the most important source of high-quality data for the future of embodied intelligence. Nowadays, we see that there are many ways to solve data problems, including learning from images and video data on the Internet, learning with simulation data, or end-to-end learning through remote operation equipment such as Stanford robots, and autonomous driving or robots learning through real collected data.

The industry is first solving the problem of Sim2Real through various simulations, and through large-scale deployment, solving the problem of more real data sources.

Therefore, we believe that the future data source will definitely move towards the world model, which will combine all the above data to learn and train the combination to get an interactive physical world simulator.

3. Build a new generation of data platform based on the world model, and the commercialization speed is the fastest in China

What we're doing right now is building a next-generation data platform based on the world model, for end-to-end autonomous driving and general-purpose robotics. The underlying platform is a basic model with video generation and world model as the core. As you know, Sora is currently unavailable, both in terms of cost and speed, and we will pursue an order of magnitude reduction in speed and cost.

At the same time, we will have a complete platform service to serve end-to-end general autonomous driving, as well as general operation of general robots, general mobility and other related scenarios through data, so as to help the embodied intelligence industry explode.

At present, Greattech is one of the world's leading technologies in the autonomous driving world model, and our commercialization speed is also the fastest in the world. We have started practical business cooperation with many mainstream leading OEMs in China, and through the world model, it is used for data generation, closed-loop simulation and other related directions.

Huang Guan: Build a data engine for general embodied intelligence, and the commercialization speed is the fastest in China

At the same time, the more imaginative and valuable scenario is our world model and physical world simulator for general robots, which is currently our leading technology in China, and our commercialization speed is also the fastest in China.

Its core is the same as driving, first of all, it can be used as a simulator in the data generation and closed-loop simulation of general robots, and at the same time, it can also be used as part of the solution to move towards the end-to-end solution of robots. This paradigm is very different from the previous paradigm of autonomous driving and robot modules, and will move towards a unified end-to-end overall architecture for general embodied intelligence.

Huang Guan: Build a data engine for general embodied intelligence, and the commercialization speed is the fastest in China

The above is a complete summary of the content of Huang Guan's speech.

Read on