laitimes

Robin Li's "platform" is extremely fast, will pure vision be the mainstream of autonomous driving?

author:DoNews
Robin Li's "platform" is extremely fast, will pure vision be the mainstream of autonomous driving?

Written by | Lee Shin Ma

Title Picture | Degrees

After many years, Baidu founder Robin Li once again "stood on the platform" for autonomous driving.

On April 15, he and Xia Yiping, CEO of Jiyue, had an online live broadcast. On a Jiyue 01, the automatic driving system took over the driver's responsibilities, and the two gave full play to their human strengths and interacted throughout the process.

On the urban roads of Shenzhen, almost an hour's journey, there is almost no takeover, and the performance of autonomous driving is not much different from that of surpassing human drivers, and there is absolutely no "rollover".

Xia Yiping's evaluation of this is "the strongest intelligent driving on the pure visual surface", and Robin Li believes that it is enough to "benchmark" Tesla, and even better than it in China - "China (Tesla) doesn't dare to drive." ”

Tesla is specifically mentioned because the two car companies have the same pure visual technology route. This technical route is to make the camera the main or only "eye" of the autonomous vehicle, which is recognized by artificial intelligence and then autonomous.

In the track of autonomous driving, there are multiple technical routes. The advantage of pure vision is that the cost of cameras is relatively low compared to lidar, millimeter-wave radar, ultrasonic sensors, etc., but the disadvantage is that cameras are easily affected by lighting conditions. In the case of backlight, fog, heavy snow, etc., the camera's recognition ability may be reduced, and many accidents in Tesla have occurred in similar situations. In addition, in the generation of three-dimensional space, the pure vision scheme is generated by two-dimensional images, which is "congenitally insufficient" in terms of accuracy and robustness.

Autonomous driving requires absolute safety, so for a long time, multi-sensor fusion technology solutions have been more competitive, the biggest problem is that the price of sensors is too expensive, and the industry hopes that through technological progress and mass production, the price of these sensors will be reduced enough for large-scale application.

Interestingly, artificial intelligence, especially neural networks and large models, is developing faster than the cost of hardware, which also allows us to see the trend of pure vision solutions becoming the mainstream technology route of autonomous driving.

1. Leading the route of pure visual technology

On this route, Tesla can be said to be the pioneer and leader. March 2024 is an important time node, and on the 13th, Tesla began to push the software update of FSD v12.3, which Tesla CEO Elon Musk called a "major release" equivalent to a major version update. This version brings a major change at the algorithmic level, moving from relying on hand-coded rules and machine learning models to an end-to-end neural network system.

Judging from the current evaluation video, this version has made significant progress in autonomous driving capabilities, and is close to the L4 level of autonomous driving. Musk deliberately opened a one-month free trial, which is enough to prove his confidence in this version. However, at present, we have not experienced the service in China for the time being, and the road environment in China is also significantly different from that of the United States.

Therefore, Jiyue, which also takes the pure visual technology route, can be said to be the closest domestic car company to Tesla. On the 25th, Jiyue also released the new version of OTA V1.4.0 software, and announced that in 2024, with the support of Baidu Map LD (lane-level navigation), Jiyue PPA intelligent driving will soon be able to be opened nationwide.

In the new version, the focus is on upgrading the OCC occupation network, which greatly improves the perception ability, and on the basis of reaching the centimeter-level 3D model depiction of lidar, the identification types of obstacles have increased again, including construction signs, fences, barricades, anti-collision barrels and other single static obstacles, as well as fences for temporary construction of roads, faulty vehicles parked on the side, Obstacles such as large garbage cans that are temporarily stacked.

The improvement of OCC perception capability has led to a significant upgrade of point-to-point pilot assistance PPA capabilities, which can cope with more complex driving scenarios. For example, in complex road conditions, temporary construction of the intersection of active detour and timely stopping, reasonable planning of routes, etc. In Robin Li's live broadcast, these abilities have also been basically reflected.

Second, the large model brings intelligent driving improvement

Autonomous driving has been developed for decades, why is the implementation speed of pure vision technology solutions suddenly accelerating? The answer is large models.

Studies have shown that for a truly autonomous driving system to reach mass production conditions, it needs to be proven on at least about 17 billion kilometers of roads. The reason for this is that even though the existing technology is able to cope with more than 95% of common driving scenarios, the last 5% of corner cases may still have problems (the corner case of autonomous driving refers to scenarios that the model has not seen before and will cause the model to recognize abnormalities).

In general, learning a new corner case requires collecting more than 10,000 samples, and the whole cycle takes more than 2 weeks. Even if a team has 100 autonomous vehicles and conducts 24-hour road tests, the time it takes to accumulate data is measured in "100 years" – which is obviously unrealistic.

The emergence of ChatGPT has shown us the huge potential of large models in all walks of life, and autonomous driving is also among them. Here, the specific technology will not be further expanded, quoting the description of its effect from the recently released Huawei Pangu car model: "The Pangu car model reshapes the training of autonomous driving, and can reconstruct the driving data to generate a virtual space that can be flexibly edited, such as the road space generated in the video of Huawei's Dongguan campus, which can add vehicles driving in the opposite direction in the designated driving path." The model builds different lighting, weather, and buildings based on overtaking routes, and quickly generates nearly 100 samples, so that the model can better learn how to deal with corner cases in complex overtaking scenarios. ”

By quickly restoring real scenes through large models and generating corner cases in various complex scenes for model training, the Pangu automobile large model shortens the closed-loop cycle of corner cases for autonomous driving from more than two weeks to two days.

As a high-end car brand launched by Geely and Baidu, Jiyue's autonomous driving capabilities come from Baidu, on the 25th, Baidu also released Baidu Apollo autonomous driving vision model VTA (Vision Takes All), the large model has greatly upgraded the dynamic and static detection, timing tracking, real-time mapping, scene understanding and other capabilities of autonomous driving, according to Wang Liang, chief R&D architect of Baidu intelligent driving business group and chairman of IDG Technical Committee, said: " Based on the large model, Baidu has built the industry's first intelligent driving data production line, LLM-enabled autonomous driving data index, and at the same time, through generative AI technology, Baidu also has the ability to efficiently process long-tail data, which are important data engines to promote the development of end-to-end autonomous driving technology. ”

The important point of the large model for the landing of autonomous driving is that in the case of no hardware upgrade, it can still rely on software upgrades to improve the level of intelligence, which is undoubtedly very beneficial to the pure visual route with low cost. Cameras have the lowest cost compared to other sensors, meaning they are the easiest to gain adoption and economies of scale, and the more vehicles equipped with vision-only solutions, the more significant the cost and performance benefits.

Here, we excerpt some of the descriptions from Robin Li's live conversation:

"Because it's an online upgrade, it's going to get smarter and smarter. ”

"Once you run, a lot of data is fed back into the positive cycle. This car, you should drive a new version of the car every day, is this feeling, just like buying a new car every day. ”

"In the future, it will learn all kinds of information about you, your preferences, and it will completely become a robot that understands you very well and understands you very well. ”

Robin Li's "platform" is extremely fast, will pure vision be the mainstream of autonomous driving?

Of course, the money saved on hardware may be spent in the form of software fees in the future. Tesla's FSD, for example, is currently selling for $15,000 in the United States, or $199 per month. However, in the long-term expectation of the development of autonomous driving, charging for individuals mainly exists in the transition stage, and unmanned shared taxis are the most likely final form of the industry.

In this area, Baidu and Tesla are also in the leading position, with the former's carrot run project steadily progressing, and the latter just revealing new plan progress. Pure vision, on the whole, is currently the closest technical route to achieve this goal.

Robin Li's "platform" is extremely fast, will pure vision be the mainstream of autonomous driving?

Image source: X

Read on