Tsinghua IEEE Paper: Using New Training Methods to Help Autonomous Driving Decisions Get Rid of "Roadside Interference"

2022-02-27 12:21:32

Compiled / Aaron, Cao Jin

Recently, scholars from Tsinghua University have proposed a new training method based on the implementation of autoencoders, enabling it to ignore the irrelevant features in the input image while retaining the relevant features. Compared with the existing end-to-end extraction method, this method only requires image-level labels, reducing the cost of labeling.

Tsinghua IEEE Paper: Using New Training Methods to Help Autonomous Driving Decisions Get Rid of "Roadside Interference"

The researchers found that the effectiveness of the method was verified by training convolutional neural networks (CNNs) models to process the output of the encoder and generate a steering angle to control the vehicle. The entire end-to-end approach to autonomous driving ignores the impact of irrelevant features, even if they do not exist when training convolutional neural networks.

Autoencoder based on convolutional neural networks

The authors of the paper list the main ideas and basic processes of the corresponding algorithm: the system consists of an autoencoder and an autoencoder as shown in Figure 1. The image from the front camera is supplied to the autoencoder as input. The autoencoder consists of an encoder and a decoder, and the output of the encoder acts as the input to the CNN, which calculates and outputs the steering angle to control the vehicle.

(Figure 1, schematic of the complete system, which includes an autoencoder to eliminate irrelevant features in the image, and a CNN that generates the control command)

An autoencoder is an artificial neural network designed to learn efficient data encoding in an unsupervised manner. It learns how to encode data efficiently and refactor the data from an encoded representation to a representation as close as possible to the original data. The two main applications of autoencoders are dimensionality reduction and information retrieval. Although dimensionality reduction is similar to our task, features are usually not removed because they all require useful features to be extracted from the input.

More recently, autoencoders have been shown to be applied to different tasks, such as image processing, where autoencoders can achieve image compression and image denoising, but these tasks are of little significance for accurate roadside object recognition.

In graphics compression, images are compressed to reduce the cost of storage or transmission; in image denoising tasks, noisy images are transferred to the original image. The noisy image is used as input and the original image is used as a label to train the network. Also, the noise image should be exactly the same as the original image.

From the examples in the article, if unrelated objects are treated as noise, it seems that the image denoising method can be used to extract related features. However, in the actual driving scene, unrelated objects such as the sky and trees cannot be removed, so this method is not feasible.

How Auto-Encoder works with CNNs

The researchers propose that the purpose of the algorithm is to remove all features from the image that are not related to decision-making while retaining all relevant features. To reduce the cost of labels, it is best to train the network using only image-level labels.

At the same time, in order to satisfy the definition of the end-to-end method, the output of the feature extraction process should have implicit significance. Compared to CNNs, autoencoders are a better option in this regard: it is not possible to understand the output of the encoder directly, but instead converts it to the original input because it contains as much information as the input.

There is always some error between the decoder's output and the original input. In other words, there is always some information lost. Ideally, the goal of the algorithm is to ensure that any lost information contains only irrelevant features, while retaining the features that you want to preserve. To achieve this, the network needs to be taught which types of features should be retained and which should be eliminated. Then, after repeating the training process several times, the network has the ability to extract the desired features from the input.

So, what is the role of CNNs? The CNN architecture of our system is shown in Figure 1, which consists of three convolutional layers and four fully connected layers, the last of which outputs control commands (i.e., steering wheel angles).

When training a CNN, the parameters of the autoencoder remain unchanged. During a professional driving test in a good scene, the training image will contain a lot of images of normal state. However, once the vehicle deviates from the center of the current lane, CNN may not be able to make the right decision.

To avoid this problem, the researchers employed an online training method shown in Figure 2: the vehicle is controlled by the network, while the control command is provided by the expert. The images acquired during training will be used as training data, and the commands given by the experts will be used as labels, which will then be used to train the network.

Since the network is randomly initialized, the vehicle is often in an abnormal state early in training, avoiding the problem of too many normal images.

(Figure 2: CNN training process.) Solid lines represent the flow of information used to control the vehicle, and dashed lines represent the flow of information used to train the model)

Emulators and

Dataset Description

The simulation simulator and data collection process are demonstrated, and the performance of the developed system is compared to the baseline model with the same network structure.

The simulation environment is built using PreScan, a simulation environment developed for intelligent vehicle systems where users can design realistic traffic scenarios. Once a specific traffic scenario is complete, the tool can automatically generate a Simulink model for testing autonomous driving algorithms.

To this end, the researchers developed the following four test plans.

1) Test scheme 1: The algorithm is trained in scenario 1-1, and tested in scenarios 1-3 and 1-4.

2) Test scheme 2: The algorithm is trained in scenario 1-2, and tested in scenarios 1-3 and 1-4.

3) Test scheme 3: The algorithm is trained in scenario 2-1, and tested in scenarios 2-3 and 2-4.

4) Test scheme 4: The algorithm is trained in scenario 2-2, and tested in scenarios 2-3 and 2-4.

(Figure 3: Scene with PreScan built-in)

The automatic decoder training process requires the collection of positive and negative samples. In the constructed scene, road and lane signs are the main factors influencing driving instructions, while trees and the sky are irrelevant. The researchers first take random pictures in a simulated environment and then assign each image to a dataset as shown below.

If the image consists primarily of road features, classify it as a positive sample. On the other hand, if the image consists primarily of tree or sky features, classify it as a negative sample. Otherwise, if the proportions of related and unrelated features are almost the same, the image is discarded. The positive and negative sample sets are shown in Figure 4.

In the method of training CNNs, the data for training purposes is collected during training. The input image size taken by the front camera is 240 × 320 × 3. Since the task is to stay in the lane, the label i.e. the angle of steering can be determined by a tracking algorithm that controls the vehicle to travel along the centerline of the lane, and this tracking algorithm is provided by the PreScan environment.

(Figure 4, part of the dataset is used to train autoencoders)

Taken together, the paper proposes a new training method that allows autoencoders to extract useful features from input images and apply them to end-to-end autonomous driving methods to ignore unrelated roadside targets.

We can draw some conclusions from this: First, using positive and negative alternating sampling when training the autoencoder, the encoder learns to remove those unrelated features from the input image, thus ensuring that the output feature map contains only the relevant features. In the image output by the decoder, unrelated objects, such as trees and sky, are practically indistinguishable, while road and lane markings are clear.

At the same time, the training method proposed in the article only relies on image-level marking to train the autoencoder. This approach reduces labeling costs compared to existing end-to-end multitasking autonomous driving methods.

In addition, using an end-to-end autonomous driving method composed of autoencoders and CNNs, even if there are almost no irrelevant objects in the training data, it will not be affected by unrelated objects on the side of the road. The resulting model and baseline model are less susceptible to shading. When the sunlight angle is set to 45°, the proposed model still provides good performance, while the baseline model cannot keep the vehicle in the lane.

One of the current limitations of this approach is the "simple scenario". To expand the scope of application, there can be different unrelated objects, such as buildings and surrounding vehicles. CnNs in this model can be replaced by reinforcement learning algorithms to handle dynamic scenes. A limited range of road tests can also be considered. In addition, in order to process such complex images, the architecture of the decision network will also be extended.

Wang, T., Luo, Y., Liu, J., Chen, R., & Li, K. (2022). End-to-end self-driving approach independent of irrelevant roadside objects with auto-encoder. IEEE Transactions on Intelligent Transportation Systems, 23(1), 641-650. doi:http://dx.doi.org/10.1109/TITS.2020.3018473

Main Author Information:

Yugong Luo (IEEE Member) – received his Bachelor of Science and Technology degree and Master of Science degree from Chongqing University in 1996 and 1999 respectively. He received his Ph.D. from Tsinghua University in 2003. He is currently a professor at the School of Automotive and Transportation at Tsinghua University. He has authored more than 70 journal articles and holds 31 patents. His main research interests are the dynamics and control of intelligent and interconnected electric vehicles and the control of automobile noise.

Tinghan Wang, who received his bachelor's degree in science and technology from Tsinghua University in 2016, is currently pursuing a ph.D. His research interests include end-to-end autonomous driving and deep reinforcement learning based on deep neural networks.

Jinxin Liu received his bachelor's degree in science and engineering from Hefei University of Technology in 2017. He is currently pursuing a Ph.D. from Tsinghua University. His research interests include automotive intent recognition and behavior planning.

About Auto Byte

Auto Byte launches the Automotive Technology Vertical Media for The Heart of Machines, focusing on cutting-edge research and technology applications in the fields of autonomous driving, new energy, chips, software, automobile manufacturing and intelligent transportation, and helping professional practitioners and relevant users in the automotive field understand technology development and industry trends through technology to gain insight into products, companies and industries.

Welcome to follow the stars and click Likes and Likes and Are Watching in the bottom right corner.

Tsinghua IEEE Paper: Using New Training Methods to Help Autonomous Driving Decisions Get Rid of "Roadside Interference"

Read on