laitimes

Liu Shuquan and Zhou Guang: Complying with Artificial Intelligence 2.0, End-to-End Making Autonomous Driving More "Human"

author:Entrepreneurs

With the development of artificial intelligence, end-to-end models have emerged in the field of autonomous driving, integrating key modules such as perception, planning, and decision-making into a unified neural network, making autonomous driving more like human drivers, further promoting the mass production process of high-end intelligent driving, and also giving rise to new requirements for underlying infrastructure such as data and computing power.

Recently, Zhou Guang, CEO of Yuanrong Qixing, invited Liu Shuquan, Vice President of Tencent Smart Mobility, to experience the industry's first "no map" (using only navigation map) high-end intelligent driving mass production solution jointly created by the two parties on the roads of Beijing, and launched a wonderful dialogue on topics such as mass production of autonomous driving and the era of artificial intelligence 2.0.

Liu Shuquan and Zhou Guang: Complying with Artificial Intelligence 2.0, End-to-End Making Autonomous Driving More "Human"

Zhou Guang believes that smart cars are the key to unlocking general artificial intelligence in the physical world. The smart car is the first robot to reach tens of millions of data volumes, forming a massive consensus understanding of the physical world, which will precipitate into a basic model of the physical world, and will be easier to migrate to other robot scenarios in the future. Zhou Guang said that Yuanrong Qixing has always been in line with the development of artificial intelligence, and in the era of artificial intelligence 2.0 with end-to-end, large language models and generative AI as the core, Yuanrong Qixing can realize and explore this thing before most people in the industry, which is a technical intuition.

In the field of autonomous driving, Tencent plays a relatively pure role as a digital assistant. Liu Shuquan said that Tencent provides autonomous driving cloud, compliance cloud and map-related services for the industry. We hope to work with many partners to open up a complete set of cloud plus end architecture, so as to continuously optimize the algorithm through high-speed iteration and data training.

During CES in January this year, Yuanrong Qixing and Tencent announced that they had reached a cooperation in the field of maps, launching the industry's first high-end intelligent driving mass production solution using only navigation map data, which is expected to be launched into the consumer market this year.

The following is an excerpt from the conversation between the two sides:

Smart cars are the key to unlocking general artificial intelligence in the physical world

Liu Shuquan: This year, more and more cars with intelligent driving functions are on the market, and the price is constantly declining, the iteration is accelerating, and the overall technical route and program route are slowly beginning to converge, I want to hear your opinion.

Zhou Guang: After a year of "no map" solution, I think the industry consensus has been formed, and our solution is the industry's first autonomous driving solution that only uses navigation maps, which can provide a very high-quality urban NOA autonomous driving experience.

We did a generalization test, involving about dozens of cities, and overall, I think the quality of Tencent Map's data is quite high. But maybe in some individual cities, we still have some renewal problems, some like second-tier and third-tier cities, they build roads faster, their road topology has changed, and this may need to be updated. But I believe that with the mass production of high-level autonomous driving, with real-time feedback, the map will be updated faster.

Liu Shuquan: Actually, this is Tencent's so-called cloud-map integration. Through this cloud-plus-end architecture, when the vehicle discovers the difference in the physical world, it transmits the difference back to the cloud in real time, and then we update the map and send it down.

Liu Shuquan: How do you understand that smart cars are the key to unlocking general artificial intelligence in the physical world?

Zhou Guang: In fact, the earlier autonomous driving system is a classic robot, which has a perception, decision-making and positioning module. These modules are specifically designed for this scenario and lack practical versatility. The end-to-end intelligent driving system is driven by neural networks, including perception modules and decision-making modules, which are directly connected through neural networks and vector matrices, and there is no predefined interface, so it is also suitable for robots.

When you have tens of millions of massive data, you will slowly form some consensus understanding of the physical world, and you will have a basic model of the physical world, and it will be easier to migrate this model to other robot scenarios in the future.

Liu Shuquan: How is Yuan Rong going to achieve such a goal?

Zhou Guang: It's not a one-step process, in fact, we have gone through a lot of stages, the first stage is multi-sensor pre-fusion, and we have done point cloud rendering.

But at that point in time, I didn't expect it to become an end-to-end link. For example, today we have seven cameras and one lidar. Before the pre-fusion stage, it needs to have seven different algorithms, all of which are responsible for perception, and then do the back-end fusion, and then drive the car. Pre-fusion is actually putting everything in a coordinate system and using a unified algorithm to do perception recognition.

Pre-fusion is the first step, and the second step is to go to the high-definition map. High-precision maps can actually help us make advanced semantic judgments, for example, when we drive today, not only to look at the surrounding 100 meters, maybe you need to know the curvature of the road and other difficult tasks, are handed over to this map. With the development of artificial intelligence, we realized that the next step is to reproduce all the static elements and road topology through neural networks, and we have this "no graph" scheme.

Yuanrong started in early 2020, and after two years, it reached a relatively good effect for the first time in 2022, and in 2023, we will put all the dynamic and static perceptions in the same neural network. But at that point we realized that we had to subtract all the time. So we did the next thing: a data-driven predictive decision-making system. The whole system forms two modules, the perception model and the planning decision-making model.

At the beginning of last year, we realized that the two models are actually directly connected through this neural network, which is an end-to-end structure with no information loss. Therefore, we ran through the end-to-end in August last year, and in March this year, at NVIDIA's GTC conference, we made an official announcement.

The integration of graph and cloud provides the underlying "accelerator" for the mass production of intelligent driving

Zhou Guang: I just talked a lot about Yuanrong's end-to-end technology, and now I also want to ask Tencent, as a cloud business and a graph provider, how to face this track? What are Tencent's strengths?

Liu Shuquan: First of all, our strategic positioning is very clear. Tencent is playing a purely digital assistant role, providing autonomous driving cloud, compliance cloud, and some services related to navigation and maps for the industry.

I think there are a few more distinctive services: first of all, as I just mentioned, I want to have an end-to-end network, but in this process you must have a more accurate navigation service, it needs more accurate lane-level connectivity, like Tencent began to do last year, and the two sides combine the navigation capabilities with the end-to-end large model algorithm of Yuanrong to achieve the best tuning state.

Second, the autonomous driving-related business is a strong data-driven business, so it will definitely require higher computing power, higher storage, and wider network coverage, which is Tencent Cloud's strength. We unify network, storage, and computing to achieve higher cost performance, and there are some excellent cases in this regard: for example, the cooperation with NVIDIA, the cooperation with Bosch, and of course, the cooperation with Yuanrong. Form a data closed loop as a whole. In particular, we hope to work with many partners to open up a complete set of cloud plus end architecture, so as to finetune such an algorithm of ours through high-speed iteration and data training.

Conform to the trend of the era of artificial intelligence 2.0, and make autonomous driving more "human" end-to-end

Liu Shuquan: In fact, the end-to-end model of autonomous driving, it is the input of the integration of perception and regulation, and finally a decision result that is more like a human is obtained, so is this process an accident? Or is it from an academic development, or is there such a prediction in the evolution of technology? Is there such a derivation?

Zhou Guang: I think there is this feeling: that is, from the beginning of the fusion and BEV, you feel that this is right, but you don't actually know the endgame. Because at that point, there was still this battle for high-precision maps, and the battle for post-fusion and pre-fusion, but until you understand the end-to-end, you will find that in fact, all your foreshadowing is for the last step - to make an end-to-end system DeepRoute IO.

Our biggest advantage is that we have been complying with the development of artificial intelligence, especially the era of artificial intelligence 2.0, which is end-to-end, large language models, and generative methods, respectively, for language, digital generation, and the practice of robot physics. It can be said that this is a technical intuition.

Liu Shuquan: You mentioned a very important point, that is, the direct communication between the model regulation model of today's perception, do you have any tips to share in this area?

Zhou Guang: Let's do a biological anatomy: our human brain is definitely a neural network, but it will also be divided into various modules of perception, vision and language center. Today's end-to-end is also composed of modules with different functions, but they are all through direct connection, which actually involves your training methods, your training steps, and your data, which is actually today's core competitiveness, and it is really not those networks.

Liu Shuquan: Today we have an end-to-end large model, but there are too many model parameters, the model is too large, and our computing power is limited today. How can it be reasonably "fat reduced" and deployed in the car?

Zhou Guang: Today's end-to-end large model is not a complete Transformer-based, so its demand for computing power is relatively not so large, in addition, an end-to-end system does not mean that it must be big, like our product this time is called DeepRoute IO, IO is input, output (input, output), it just says that you are input, and then I have output, there is no human programming in the middle. End-to-end and large models are two different things, and you will choose a reasonable model size based on your data, the capacity of your network, and the scenario you want to achieve. Of course, your basic model optimization and cropping, these are some basic skills.

Liu Shuquan: In the process of autonomous driving, we often encounter some special scenarios, and we have to face a large number of uncertain factors such as traffic, pedestrians, bicycles, etc.

Zhou Guang: The previous predictions were based on velocity inference, that is, to make a uniform velocity assumption or to do some second-order derivatives of velocity, which is a relatively rudimentary approach, and this prediction based on data driven and end-to-end will be a richer prediction scenario. For example, a person on a safety island, maybe your prediction is that he is not very good at jumping down, but a person at the intersection may have a higher probability of running out, and it will consider the front and back performance of the whole scene, so that the car will be very "human" to drive.

Liu Shuquan: Just now, Dr. Zhou Guang mentioned a vision of building a general artificial intelligence door in the physical world, and Tencent also has a vision: to do a good job in digital assistants, to do a good job in the underlying cloud services, to do a good job in the underlying map services, to do a good job in the infrastructure of large models, and we will work together to build an overall partnership system and jointly open the door to the physical world, which I think is a great goal for us.

Zhou Guang: I think we will continue to work together in the entire industrial chain and ecological chain, and then work together for a win-win situation and move towards the goal.

Read on