laitimes

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

author:Death Little Tak

What would it look like if a robot had a brain?

In the science fiction movie "Mechanical Ji", Nathan, CEO of the world's largest search engine company "Blue Book", showed the audience the robot brain he invented, and left the following sentence: "People think that search engines are things that people think, but in fact, that is how people think." 」

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

Released in 2015, the film is hailed as one of the must-see films for AI enthusiasts, winning several international film awards, including the Academy Awards. But among the many awards, the single crown is "Best Supporting Actress", And Alisika Vikander is also the actor of the intelligent robot "Ava" in the film.

"Ava" is the name Nathan gave to "her", in order to create an artificial intelligence that can think independently, Nathan uses the algorithm of his search engine "Blue Book" to build the "thinking" of Ava's brain, so that it can learn the way humans think.

Coincidentally, wanting machines to have a human mind is also seen in Tesla's self-driving AI. At the 2019 Tesla AutoPilot Day, Andrej Karpathy (Tesla's head of AI) clearly conveyed to the public that Tesla Autopilot is imitating human driving, because the current transportation system is based on human vision and cognitive systems.

As a result, Tesla developed an "artificial neural network", and used a large number of effective driving data to train it, in the process of constantly improving and iterating visual algorithms, and finally took off the millimeter-wave radar in the middle of this year, and with the supercomputing Dojo surfaced, Tesla, which has long been criticized as an auxiliary driver, is one step closer to true automatic driving.

From learning to drive, to knowing how to drive better than humans, driving better, being an excellent "old driver" is the underlying logic of Tesla's continuous optimization of autopilot.

<h1 class="pgc-h-arrow-right" data-track="9" > neural network for "cloud drivers"</h1>

Pure visual self-driving solutions are Tesla's unique skills, but they need to be built on top of in-depth training in computer vision.

Computer vision is a kind of science that studies how machines "see", when humans see a picture, they can clearly identify the things in the picture, such as beautiful landscape photos, or a photo of a puppy, but what the computer sees is pixels (pixel), pixels refer to the small squares of the image, these small squares have a clear position and corresponding color values, the computer "remembers" is this pile of digital characters, not specific things.

If you want the computer to be able to quickly and accurately identify things in the picture like humans, the machine also has an artificial brain to simulate the process of processing image information by the human brain, which is divided into input layer, hidden layer, output layer, which has many artificial neurons, which can be regarded as pyramid cells and interneurons in the primary visual cortex of the human brain.

The whole training process can also be compared to a child looking at pictures and objects, and completing machine image cognition through repeated input, comparison, and correction. Usually in the early stage of training, the accuracy of the artificial neural network recognition results is very low, and the similarity between the output result and the actual value may be only 10%; in order to improve the accuracy, it is necessary to back propagate the error from the output layer to the input layer, and in the backpropagation, correct the parameter values of the hidden layer of the neural network, after millions of times of training, the error will gradually converge until the input and output end matching degree reaches 99%.

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

The above process is the key to understanding Tesla's autopilot AI, but the artificial neural network developed by Tesla focuses on the driving field, making a full-time cloud driver. For it, the best learning material is driving data, and the massive, diverse, real-world driving training dataset is a treasure book for autonomous driving AI to deal with various road conditions and traffic problems.

With the support of shadow mode, the driving data of Tesla's million-dollar fleet at every moment has become the nutrient for the "old driver" in the cloud to improve his driving ability. Today, Tesla Autopilot has been able to instantly complete the semantic recognition of various dynamic and static targets, road signs, and traffic symbols on the road, and the reaction speed is even faster than the human brain conditioned reflex.

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

In addition to coping with everyday driving scenarios, AI drivers also need to deal with some of the less common Corner cases. At Matroid Machine Learning 2020, Karpasia used the traffic metric STOP as an example to explain How Autopilot can cope with these long-tail situations.

In the daily driving process, the vehicle will always pass through a variety of STOP indicators, the most normal situation is a STOP sign standing on the side of the road or in the middle of the road, white on a red background, but in real life there will always be some unexpected situations, drivers will occasionally encounter some strange, need to combine the specific background to understand the meaning of the indicators, including but not limited to the following:

Invalid STOP indicators, such as being held in the hand by someone, but meaningless; STOP indicators with text descriptions below, such as not restricting the right line; STOP letters obscured by branches and buildings... These are all cases that do not occur very often but there are too many to list.

In the above situation, the human driver can easily identify the "STOP" in most cases and react quickly to the action. But for the computer, the situation becomes complicated, after all, it does not see a specific "STOP", but a bunch of meaningless digital code, if you encounter the existing training data set does not appear in the situation, such as some of the above strange, relatively rare indicators, the autonomous driving neural network can not process.

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

This rare long-tail data is usually endless, but it must be learned to deal with it in the shortest possible time, and if everything is human-operated, it will undoubtedly take a huge time cost and resources. Although at the AI conference on August 20, Karpasia revealed that the current Tesla labeling team has reached the level of 1,000 people, but in the face of massive driving data, thousands of people still seem to be a drop in the bucket, for which Tesla has developed data offline automatic labeling (Data Auto Labeling) and automatic training framework "Data Engine".

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

First, after understanding these long-tail situations, the Tesla neural network team will first compile a sample dataset and create a small local neural network to learn, train (in parallel with other neural networks), and deploy it to Tesla vehicles in English-speaking regions of the world through OTA.

Reusing the vehicle shadow mode, where the actual driving situation and the automatic driving AI decision are inconsistent, the branch car data will automatically be uploaded to the Tesla background data engine, and after being automatically labeled, it will be re-incorporated into the existing data training set and continue to train the original neural network until the new data is mastered.

In this way, under the feeding of a large amount of training data, the neural network has become "well-informed" and smarter, can recognize the STOP logo under different conditions, and the accuracy is gradually increased from 40% to 99%, completing a single task learning.

However, this is only to learn a static signal, in the car driving process will emerge countless static and dynamic signals, static such as roadside trees, barricades, telephone poles, etc., dynamic pedestrians, vehicles, etc., and these signals are captured by the camera and handed over to the neural network training and learning. Tesla's self-driving neural network has developed nine backbone neural networks (HydraNet) and 48 neural networks, identifying more than 1,000 targets.

However, it is not enough for the self-driving AI to learn to drive, but also to make it drive as familiar, safe and smooth as a human old driver.

<h1 class="pgc-h-arrow-right" data-track="27" > get rid of crutches, Autopilot first grows</h1>

Any experienced driver can easily judge the distance between the vehicle in front of us and us under different road conditions, so as to set aside a certain distance for driving safety.

But for sensors, in order to judge the distance of the object, we must understand the depth of the object, otherwise in their eyes, two cars that are exactly the same 10 meters and 5 meters away from us will be considered a relationship between one big and one small.

In this regard, some car manufacturers choose the lidar route to detect depth, and Tesla chooses a pure visual algorithm to imitate human vision to perceive depth, but Tesla first created a millimeter wave radar + visual sensing fusion route, until May this year, it was officially announced, take off the millimeter wave radar, online pure visual version of Autopilot.

As soon as this matter came out, there was an uproar in all walks of life, and many people could not understand why Tesla had to take away the cost-effective radar with a unit price of only 300 yuan and could add security to the safety of driving. As everyone knows, in Tesla's early multi-sensor fusion route, the existence of millimeter-wave radar is like a child's walker, just to help the neural network learn to train deep labeling (annotate).

At the 2019 autonomous driving conference, Karpasia introduced millimeter-wave radar this way, he said: "The best way for neural networks to learn to predict depth is to train through depth-labeled datasets, but compared to manually labeled depth, the depth data feedback of millimeter-wave radar is more accurate", so the introduction of millimeter-wave radar is essentially used to train and improve the neural network's prediction of depth.

It is worth mentioning that in the lower right corner of the background slide when he explained, it is clearly stated that the autopilot algorithm with millimeter-wave radar is "Semi-Automonous Self Driving", which translates to semi-automatic driving, and it is obvious that tesla Autopilot at that time was only a semi-finished product.

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

Until Tesla's visual algorithm predicts the performance of the depth, speed, and acceleration of objects, reaching a level that can replace millimeter-wave radar, Tesla's visual algorithm is truly independent.

At the June 2021 CVPR conference, Karpathi said that there had been "intermittent rollovers" or even miscalculations in the data collected by millimeter-wave radar. He gave three concrete examples: the sharp braking of the vehicle in front, the speed of the car in front of the bridge, and the judgment of a stationary truck on the side of the road.

Situation 1: The vehicle in front of the vehicle has a sharp brake, the millimeter-wave radar has 6 times in a short period of time, and the position, speed and acceleration of the car in front of the vehicle are zero.

Situation two: when a moving car passes under the bridge, the radar treats a static and moving object as a stationary object; at this time, visual sensing calculates the speed and displacement of the moving vehicle, resulting in the curve of the data fusion transmitting the error message of "the front car is slowing down and braking".

Situation three: a large white truck is parked next to the highway, and the pure vision algorithm finds the white truck at 180m away from the target vehicle and makes a forecast, but the fusion algorithm does not respond until 110m, which is 5 seconds delayed.

In the above case, the pure visual algorithm output is stable and significantly better than the radar + visual fusion algorithm, accurately tracking the driving conditions of the car in front and making data such as depth, speed, and acceleration.

Not only that, pure vision algorithms can also maintain the speed measurement and ranging of the vehicle in front of them in the fog, smoke, dust and other environments, so it is not surprising to remove the millimeter wave radar. According to the latest information released on Tesla AI Day, Tesla can currently get 10,000 short videos of people driving in harsh environments every week, including heavy rain, heavy snow, fog, night, strong light, etc. The neural network can accurately perceive the distance of the vehicle in front of it by learning and training these materials that have been labeled.

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

It can be said that Tesla's announcement to take away the confidence of millimeter-wave radar is the maturity of its own pure visual algorithm, and with the blessing of unsupervised self-study, Tesla's pure visual algorithm iteration and improvement have been significantly accelerated.

On July 10 this year, Tesla's pure visual version of FSD officially opened internal testing in the United States, 2,000 invited owners upgraded to FSD Beta V9.0 version through OTA, most of them are Tesla fans and small and medium-sized KOL, Youtube blogger Chunk Cook (hereinafter referred to as CC) is one of them, he also knows a little about engineering and aerospace expertise.

As soon as the system update was completed, CC started a new version of the FSD road test and uploaded the test video to YouTube. In the video, he went to a T intersection with more vehicles and faster speeds to perform a turn test, and the results showed that only 1 out of 7 times, FSD successfully completed automatic driving, and the rest needed to manually take over the steering wheel to complete the driving.

But soon, with the end of July, FSD pushing the new version of V9.1, CC found that the upgraded FSD showed his surprise. He also conducted seven more autopilot tests on the same road, and the results showed that 4 of the 7 times were relatively successful in autopilot, but there was some "grinding" in the cornering speed, which did not show the decisiveness that the old driver should have, but in terms of overall score, the new version of Autopilot was better than the old version.

On August 16, Tesla FSD was upgraded to the new version of V9.2, CC also preemptive test and upload video, or a road section, but the test time was changed to night, he publicly said that the most obvious improvement is Autopilot's acceleration performance, when turning like a human driver can accelerate decisively.

For a month before and after, the performance of pure visual Autopilot on the same path improved rapidly, and behind it was the embodiment of the strong self-learning ability of artificial neural networks. Musk said that FSD beta V9.3 and 9.4 are already in preparation, and will continue to optimize the details according to the owner's usage, improve the user experience, and prepare for major changes in the V10 version.

<h1 class="pgc-h-arrow-right" data-track="48" > Dojo to simulate the limits</h1>

It should be noted that when you are amazed by Tesla's pure visual Autopilot various old driver operations, you can't forget that most of these road tests occur in North America, and in non-English-speaking areas, such as densely populated Asian regions, the complexity of urban road traffic is very different from that of sparsely populated North America, and how to let the neural network learn to cope with various road conditions and traffic is more worth thinking about.

Collecting data in the field is one way, but only if you have a large fleet of vehicles driving in the area, and another solution is to simulate testing autonomous driving. Simulation, simply put, is the use of real-time data to reconstruct and reproduce real-time dynamic scenes in computer systems.

In addition to simulating traffic conditions in different cities, simulation tests can also simulate some extreme scenarios, such as various traffic emergencies or extremely rare traffic conditions. At AI DAY, Tesla engineers gave specific examples, including pedestrians running on highways, large numbers of pedestrians, or very narrow driving paths.

These cases are often very extreme, and the probability of appearing in everyday driving scenarios is minimal, but because of this, it is really valuable to train neural networks through simulation, and only through training can neural networks learn to respond correctly.

In order to really play a training role, these simulation tests must fully reproduce the real scene, including various pedestrians, vehicles, green forests, barricades, traffic lights, etc. on the road, including almost all the traffic elements you see on the road. Tesla has created 371 million images of in-car network training, as well as 480 million labels, and the scale of data is rapidly expanding.

What would it look like if a robot had a brain? How does Tesla train for autonomous driving? The neural network of "Cloud Driver" got rid of crutches, and Autopilot first grew into Dojo to simulate the limits

You know, the degree of fidelity that simulation testing can achieve is directly proportional to the data processing power that a computer can provide. The stronger the simulation of Tesla AI, the higher the requirements for hardware computing power and read and write speed.

Musk once said at the 2020WAIC conference that the current computer vision has surpassed the level of human experts, but the key to ensuring the realization of computer vision is the size of computing power, so Tesla has prepared the top supercomputing Dojo to ensure that all operations can be completed efficiently and accurately.

On AI day, supercomputing Dojo unveiled the true face of Lushan, built 3,000 Dojo 1 chips, and assembled into ExaPOD with a peak computing power of 1.1EFLOPS, surpassing the world's fastest supercomputing Japan Fugaku and becoming the world's first. After the press conference, Musk replied to questions from netizens on Twitter that ExaPOD's computing power is enough to simulate the human brain.

At this stage, Dojo, a performance beast, focuses on training Tesla's automatic driving neural network, with it, the learning potential of neural networks has suddenly become unfathomable, and at this point, Tesla has also gathered the three elements of automatic driving, data, algorithms, and computing power, and prepared for software and hardware to promote L5-level automatic driving.

But Tesla still has a long way to go to fast-forward to the end of autonomous driving, including legal and moral tests.

If

Read on