laitimes

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

In 2013, McKinsey selected 12 disruptive technologies that will determine the future of economic development, and driverless technology is one of them.

Strictly speaking, unmanned driving technology is a collection of technology clusters, which integrates technological achievements in artificial intelligence, cognitive science, automatic control, sensors and other fields, and is the most advanced digital information technology complex, also known as the crown jewel of artificial intelligence.

The industrial opportunities and economic value that autonomous driving technology can bring are also unimaginable, and the way of travel, logistics and transportation, transportation settings, urban planning and even social services of human society will be reshaped. The robot world that exists in science fiction movies will also come to life because of the realization of autonomous driving.

In addition to rejoicing, we always have a little doubt and worry about automatic driving, that is, as a travel technology that carries people's life safety, whether it can really mature and land and come to ordinary people.

Written by Micro.. In addition to talking about the possibility of automatic driving in general, it is better to return to the development of automatic driving technology and have a detailed review of the context of autonomous driving technology, so as to understand the technical stages and challenges that today's automatic driving is moving towards.

In fact, since the 1950s, the concept of autonomous driving has been born. However, the framework of autonomous driving technology that is common today was born in the 1980s, and the prototype of the technology that makes today's autonomous driving technology shine is that it appeared at the second DARPA Driverless Challenge in the United States around 2005, and its landmark event is the application of AI technology to unmanned driving.

After 2009, automatic driving officially established the technical solution framework of multi-sensor + computing power + automatic driving algorithm, and with the successive landing of Google Waymo unmanned vehicles and Tesla APs and FSDs, as well as the beginning of Chinese autonomous driving enterprises and new cars to invest in automatic driving, based on computing power, large model algorithms and mass production data-driven technical routes have become a trend.

DARPA+CMU: The "Pioneer" of Autonomous Driving

Autonomous driving, as the name suggests, is the ability of vehicles to autonomously perceive, judge and decide to drive. The earliest prototype that could be called a "driverless car" was the 1958 U.S. RCA, which guided the speed, direction, acceleration, and deceleration of the vehicle by embedding coils on the road. It can be said that this driverless vehicle is only formally "driverless", but it is neither autonomous nor intelligent.

The real milestone for driverless cars was a bulky truck developed by Carnegie Mellon University (CMU) in 1983, supported by the U.S. Defense Advanced Research Projects Agency (DARPA).

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

(In 1984, DARPA partnered with the U.S. Army to launch the ALV program.)

For the first time, the car uses lidar, computer vision and automatic control technology to complete the perception of the surrounding environment, and make decisions based on it, automatically controlling the vehicle, with a maximum speed of 31km/h in a specific road environment. Its most important significance is the establishment of a technical framework for autonomous driving: perception, decision-making and control.

But before AI technology can be applied to autonomous driving, the problem of visual perception in complex environments becomes an extremely challenging problem. How does a car see and understand the world? Everything was too rudimentary for the technology of the time.

Fast forward to 2004, DARPA took the lead in holding the first autonomous driving technology competition for the society with $1 million as the championship prize money, but because the on-board software and hardware of the first 15 teams were too rough, none of the participating teams could complete the competition that year.

It is worth mentioning that the team with the farthest mileage in the first edition came to the competition venue a few months before the start of the competition, converted the image information obtained by photography into a code, and after sensing it through radar and camera, the collected data was processed by the computer. This is almost in line with the current mainstream autonomous driving route.

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

(Vehicles in darpa Autonomous Driving Competition)

DARPA held its second autonomous driving competition in 2005, which not only increased the number of participating teams, but also successfully attracted the participation of many technology companies and algorithm experts. The 2005 DARPA Challenge was also a tipping point in the development of autonomous driving, with five driverless cars using AI recognition systems and successfully passing the desert track in harsh road conditions. Among them, Stanford University won the first place because it pioneered the use of machine learning as an AI technology to process road images, and a CMU vehicle called H1ghlander successfully overtook a human-driven vehicle.

At this stage, the sensors, cameras, computers and perception algorithms included in autonomous driving technology began to be established, becoming the standard paradigm for the development of various autonomous driving technologies since then.

Next, we entered the second stage, that is, the stage of technological accumulation of the early pioneers of autonomous driving.

Waymo and Tesla: "Parting Ways" or "Going Together"

DARPA sounded the clarion call for the development of the autonomous driving industry, and the first to launch a charge was Google.

In 2009, Google organized the technicians of the winning teams in DARPA's last two races to form the Project Chauffeur project, officially entering the field of autonomous driving.

In 2015, Chris Urmson, then head of Google's self-driving program, jokingly praised in a TED talk that assisted driving wants to achieve fully autonomous driving through continuous iteration, just like "believing that you can practice jumping hard and one day you can fly."

yes. During this period, Google chose a direct technical route to the L4 driverless technology. In the architecture of autonomous driving products, Google, which is the waymo after that, has adopted the sensor route of "high-precision map + lidar + camera", and the rich sensors cooperate with Google's accumulation in the field of maps, so that Waymo's unmanned vehicles have made a splash on the roads of California and Phoenix.

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

(Road test vehicle equipped with Google Waymo autonomous driving technology)

After Waymo, Tesla's Autopilot assisted driving system, which is also familiar to the public. The early Tesla also applied the camera + sensor solution, but resolutely did not use "ugly and expensive" devices such as lidar. Today, it is even more radical to adopt only pure visual solutions to achieve autonomous driving.

Unlike Waymo's "one-step" development path, Tesla follows the L2 to L4 progressive strategy, and with the advantages of its mass production vehicles, Tesla FSD obtains massive driving data, which greatly improves the capabilities of its high-end autopilot system FSD. It can be said that these two points are the main "parting ways" of Waymo and Tesla.

However, specific to the level of automatic driving algorithms, the two companies have a "special destination" side.

In terms of algorithms, Google Waymo uses the AutoML neural network architecture, which is not fixed, and will be based on accuracy and inference cost for screening tests, and based on this result, data collection, labeling, evaluation, verification, testing and deployment, and then continuous update and iteration.

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

(HydraNet architecture demonstrated by Tesla AI DAY)

Compared with Waymo, Tesla has a more in-depth application of vision algorithms, which is equipped with a multi-task learning neural network architecture HydraNets for each camera, first using the RegNet residual network and the BiFPN algorithm model to unify processing to obtain various types of image features at different precisions, and then using transformer deep neural networks to complete cross-time image fusion, to achieve 2D image-based output with 3D information, which makes it dare to abandon the sensor , relying solely on the camera to perceive and build the real world.

Obviously, Google Waymo's multi-sensor solution is more compatible with complex environments, but the mass production capacity and data acquisition cost affect its deep learning and evolution speed; Tesla can rely on a huge production fleet to obtain massive data, and the strong algorithm advantage also makes up for the impact of perception data without lidar on scene construction, but there are still practical deficiencies in dealing with complex environments. We will find that in the early days, when the owner turned on the Autopilot function, the problem of accurate identification and the failure to take over the traffic accident in time caused by the problem.

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

(Tesla released the AI training chip D1 on AI DAY)

In addition, Tesla and Google attach great importance to the capabilities of deep learning algorithms for autonomous driving, but in addition to the terminal HW4.0 chip, Tesla has also prepared a cloud-based Dojo supercomputing system to increase computing power to ensure the efficient integration of Transformer with massive amounts of data, which makes Tesla's prospects more promising. In Musk's view, Tesla is already a formula, software company, and a robotics company. Autonomous driving is what makes Tesla have this "Versailles" confidence.

The comparison between Google and Tesla allows us to recognize the development trend of autonomous driving technology, in the environment where the computing power performance has reached a period of stable development, the AI big model training determined by the amount of data acquisition and algorithm performance determined by the mass production capacity has become the key to competition in the field of automatic driving.

"Wei Xiaoli" and "The End of the Day": The Rising Star who avoids detours

If the autopilot industry is a red sea yet to be developed, then the success of Waymo and Tesla FSD has undoubtedly become a vane that autopilot companies can refer to - under this inspiration, there are also Apollo, AutoX, Xiaoma Zhixing, etc. that refer to the Waymo model, and there are also new cars such as Wei Xiaoli that refer to the Tesla model. The model of "car company + automatic driving team" has become a very representative one at the current stage.

From the perspective of the technological evolution of autonomous driving alone, the road test data of large-scale driving has become the ultimate barrier for each company to cross the "last mile" of automatic driving.

On the two routes to the ultimate "holy grail" of unmanned driving, we are also seeing that players who are in place in one step, despite the ambitious launch of their self-driving mobility products, are still facing the growth problems of limited mass production scale, insufficient road test data, and limited adaptation to the environment.

In contrast, the progressive players of autonomous driving who start with assisted driving and gradually solve the limited scenarios, because of their large-scale mass production business model and a large amount of real road data, the iterative speed of their auxiliary driving is accelerating, showing a very strong trend.

As an autonomous driving company with a million-level production and sales scale background such as the Great Wall, Zhixing has undoubtedly become the latest to be established in China, but the fastest progress, the fastest to get assisted driving products on the car, becoming the most noteworthy start-up in the current automatic driving stage.

In the judgment of the development of automatic driving technology, Gu Weihao of The Ultimate Wisdom Walk also locked the core of the development of automatic driving technology in the data. He believes that data is the biggest driving force of artificial intelligence, but also the biggest cost in this process of progress, the improvement of autonomous driving products is a long evolutionary process, just like Homo sapiens in the long historical process, must find a way to maintain life with the lowest energy consumption, so that there is an opportunity to develop intelligence and accumulate experience to evolve human civilization.

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

Therefore, for the iterative evolution of automatic driving technology, Mo Mo launched the data intelligence system MANA, and established an autonomous driving technology system belonging to Mo Mo Zhi Xing around MANA.

We can simply compare the technical ideas of Miller and Tesla. In the dispute between Tesla's pure vision and multi-sensor route, Zhixing chose the more pragmatic latter, and used transformer deep neural network to integrate data in three dimensions: space, time and sensor.

In terms of visual data, the idea of Zhixing is similar to Tesla HydraNet, the image taken by the camera is first processed by the ISP, and then handed over to the backbone network Backbone for feature output, and then transported to different Heads for global tasks, road tasks and target tasks, these features of the common backbone network, each task has its own independent Neck network, used to extract features for different tasks. But unlike HydraNet, MANA designs a Neck network for global tasks to extract global information, which is actually very important because global tasks rely heavily on the understanding of the scene, which in turn depends on the extraction of global information.

In terms of sensor data, The company uses the Industry's commonly used PointPillar algorithm, which can project three-dimensional information into two dimensions and perform feature extraction and object detection similar to visual tasks on two-dimensional data. The advantage of this approach is that it avoids a very large amount of three-dimensional convolution operations, and the overall speed of the algorithm is very fast.

In terms of perceptual fusion, the industry is accustomed to processing visual and perceptual data separately in a "post-fusion" manner, which causes neural networks to not be able to take full advantage of the complementarity of data between two heterogeneous sensors to learn the most valuable features - for this, MANA introduced Transformer to do pre-fusion in space and time, first Transformer encodes image features and decodes them into three-dimensional space, while coordinate system transformations have been embedded in the calculation process of self-attention to achieve pre-spatial fusion Secondly, time series data, as an old line of Transformer, can be naturally extracted to time series features.

From machine vision to data intelligence, Tesla and Zhixing have discovered the technical pathways for autonomous driving

In addition to the perceptual field, Zhixing has also laid out in the cognitive field, that is, through scene digitization and large-scale reinforcement learning, under the premise of ensuring the three elements of safety, comfort and efficiency, the human driving behavior in different environments is learned and trained - which not only requires a large amount of real driving data from the driver, but also requires the automatic driving system to carry out a large number of annotations, simulations and verification of the data by itself, which happens to be backed by the advantages of Great Wall Motors' mass production and data intelligence MANA The advantages of the core of the ultimate wisdom walk. Based on the massive data brought by the mass production capability, Zhixing can quickly iterate over the algorithm to deliver an autonomous driving system covering more scenarios.

From Google to Tesla, and then to the domestic self-driving technology companies such as Wei Xiaoli New Car and Millima Zhixing, with the rapid development and landing of autonomous driving technology in the past 20 years, relying on the scale effect of mass production models to provide massive data support for the iteration of the automatic driving system, it has become the only way to high-end automatic driving that is generally recognized by the industry.

Who will pick the "Crown Jewel"?

Limited by space, we have quickly sorted out several stages of the development of autonomous driving technology, and we can clearly see the decisive role of AI algorithms such as deep learning in the development of autonomous driving technology.

In this process, Google Waymo and Tesla have played the role of technology route leaders, and their technical practices have also provided a valuable reference for latecomers.

Back to the present, we see that the autonomous driving algorithm trained by the large MODEL of AI and massive data is becoming the only way to achieve the "crown jewel" of complete unmanned driving.

It is foreseeable that for the future, those autonomous driving companies that can obtain massive amounts of data and continue to promote data intelligence will undoubtedly become the lucky ones who laugh to the end.

Read on