Framework of Autonomous Driving Cognitive Enhancement System Based on Parallel Vision: Solving the Long Tail Problem in Autonomous Driving

Author: Cognitive autonomous driving workers

/Introduction/

Recently, the Institute of Automation of the Chinese Academy of Sciences, the Institute of Artificial Intelligence and Robotics of Xi'an Jiaotong University, Tsinghua University and other units jointly published a joint publication entitled "Parallel Vision for Long-Tail Regularization: Initial Results from IVFC" in IEEE Transactions on Intelligent Vehicles, the top international journal in the field of intelligent vehicles Autonomous Driving Testing". In this paper, a theoretical framework for analyzing and solving the long-tail problem of visual perception of autonomous driving is proposed, and a parallel vision system is constructed based on this, which is applied and verified at the Intelligent Vehicle Future Challenge (IVFC) in China.

引用格式：J. Wang et al., "Parallel Vision for Long-Tail Regularization: Initial Results from IVFC Autonomous Driving Testing," in IEEE Transactions on Intelligent Vehicles, doi: 10.1109/TIV.2022.3145035.

The main research contents and contributions of the thesis include:

The theoretical framework of long-tail regularization (LoTR) is proposed for analyzing and solving the problem of visual long tail.

Based on LoTR theory, a Parallel Vision Actualization System (PVAS) based on virtual-real interaction and ACP closed-loop optimization[1][2] was constructed to solve the long-tail problem.

Combining the theoretical analysis methods of LoTR with the practical system of PVAS, it is applied to the IVFC, the world's longest-running, largest number of driverless racing teams, and the widest range of impact.

introduction

A long tail is a visual representation of the characteristics of certain statistical distributions. In the "long tail" distribution, the distribution of low-frequency events is wide, and its overall probability of occurrence is comparable to that of high-frequency events.

In visual problems, from the perspective of data, the frequency of occurrence of conventional scenes is too high, and the frequency of extreme scenes is very low, many real-world long-tail scenes can only be obtained under specific conditions, and the diversity of training sets is not enough to characterize the long-tail distribution of the real world; and from the perspective of the model, many visual models only perform well in conventional scenes, and the perception ability of sudden extreme scenes is not good, and the model has incomplete problems.

Solving the long-tail problem requires a more comprehensive consideration of the influence of long-tail scenes on the basis of considering conventional visual problems, so that the visual system can achieve the greatest possible effective intelligent perception of complex scenes with long-tail effects.

Traditional visual research has difficulties in data acquisition, model learning and evaluation.

Collecting and labeling large-scale and diverse datasets from actual scenes is time-consuming, laborious, and manual labeling is also prone to errors; in addition, the actual scene is uncontrollable and unrepeatable, and it is impossible to separate each component of the scene (weather, lighting, etc.) and analyze the impact on visual algorithms separately.

Due to the difficulty of data acquisition, it is difficult to learn high-precision, generalized visual models using training sets with insufficient scale and diversity; many models are evaluated only in specific environments and are very incomplete.

The basic framework and ideas of parallel vision

Framework of Autonomous Driving Cognitive Enhancement System Based on Parallel Vision: Solving the Long Tail Problem in Autonomous Driving

Figure 1. Basic framework for parallel vision[3]

As shown in Figure 1, parallel vision[3] is the general application of the ACP (Artificial systems, computational experiments, and parallel execution) theory of modeling and regulation of complex systems in the field of visual computing, and is an intelligent visual computing method of virtual-real interaction.

In the paper, the theoretical method of parallel vision is introduced into the field of automatic driving, and the long tail problem in the traffic vision scene is solved, and good results are achieved.

The main idea of parallel vision is to use artificial scenes to simulate and represent complex and challenging actual scenes, train and evaluate visual models through computational experiments, and finally optimize visual models online with the help of parallel execution of virtual and real interactions to achieve intelligent perception and understanding of complex environments.

The visual method based on parallel vision can learn more effective visual computing models, and at the same time can comprehensively evaluate the effectiveness of visual algorithms in complex environments, so that model training and evaluation can be online and long-term.

By continuously optimizing the vision system, it can improve its operation effect in a complex environment, and through the integration of virtual reality, machine learning, knowledge automation and other technologies, the vision system can be truly applied.

The long-tail problem in autonomous driving and the normalization method

Figure 2. Schematic diagram of LoTR. (a) Histogram of the statistical distribution and the probability density curve of the LoTR distribution obtained under ideal conditions. (b) Histogram of the statistical distribution of real-world traffic events and a probability density curve for fitted long-tail distributions. (c) Ideally, the long-tail distribution and the LoTR distribution combine to form a uniform distribution. (The abscissa represents the event space.) "Crossroads wo TL" means "intersection without traffic lights").

As shown in Figure 2, the normalization of long-tail problems is an important theory based on parallel learning to solve long-tail problems, and the long-tail problems caused by data imbalance can be compensated by using virtual data to expand the real data in the artificial world.

Figure 2(b) shows a histogram of the statistical distribution of real-world traffic events and a probability density curve of a fitted long-tail distribution to satisfy the probability mass function of a long-tail scenario. Figure 2(a) is a long-tail normalized distribution curve constructed in the artificial world under ideal conditions, and its probability mass function satisfies:

where the probability mass function satisfies for a known target scenario in the real world.

At this point, you can construct a probability mass function:

Order, we can get:

As shown in Figure 2(c), a uniform distribution that eliminates the long tail effect.

The above derivation process can theoretically prove that the long-tail normalization based on parallel learning can theoretically solve the imbalance of data in the long-tail problem.

Cognitive enhancement system architecture for autonomous driving based on parallel vision

Figure 3. PVAS Overview Flowchart for IVFC. Among them, the virtual world refers to the ParallelEye-CS implemented through simulation, and the real world refers to the IVFC testing ground.

As shown in Figure 3, PVAS consists of two worlds and three units including artificial systems, computational experiments, and parallel execution, which together form a closed-loop system of virtual-real interaction.

In the virtual world, early work has built a computer simulation environment called ParalleEye-CS[4]. ParallelEye-CS aligns its overall layout with real-world testing grounds in IVFC.

In ParallelEye-CS, various scenes can be easily generated by modifying simulation parameters, and different combinations of these parameters correspond to different traffic scenes, so ParallelEye-CS can generate a variety of composite images with annotations.

According to The LoTR theory, we can simulate a raw data set in the virtual world based on real-world conditions that basically meet the long-tail distribution.

Based on the closed-loop optimization method of ACP, the environment parameter settings and traffic object positions in the virtual world can be continuously adjusted, which can iteratively produce a series of complex traffic scenes.

In this process, the distribution of the virtual dataset gradually meets the normalized long-tail distribution within a certain error range, and the autonomous driving vision system has also been initially trained and optimized in the virtual world.

At this time, we can summarize and implement hundreds of experimental experiences in the virtual world into the real world competition scene construction process, set up appropriate competition tasks and competition scenes, and better test the visual system capabilities of self-driving cars.

With valuable experience from the virtual world, the real world competition scene only needs to go through some simple testing and adjustment to achieve the ideal state.

At the same time, the autonomous driving vision system obtained in the virtual world can also be used as an effective initial state of the real vision system, speeding up the training process.

Finally, the experience accumulated after the end of each competition can also further guide the selection and setting of various parameters in the virtual environment during the preparation process of the next competition, so as to achieve the purpose of improving year by year, forming a large closed-loop optimization process between the virtual world and the real world.

In the paper, experiments are carried out on the above methods and theories, and the data analysis of the competitions over the years is combined to prove the effectiveness of the system.

Expansion: Practice and application of China's Intelligent Vehicle Future Challenge

Supported by the Natural Science Foundation of China, IVFC is an important part of the National Natural Science Foundation of China's major research project "Cognitive Computing of Audiovisual Information". Founded in 2009, as shown in Figure 4, the IVFC has been held for 12 years in Xi'an, Ordos, Chifeng, Changshu and other places, and is the longest-lasting unmanned car driving competition in the world.

Figure 4. IvFC Past Events (2009-present, postponed in 2021 due to the pandemic)

At present, Changshu has become a fixed competition site of the IVFC, as shown in Figure 5, which is the actual display of the "Tian" urban and rural road competition venue in the Changshu test site. A variety of different real-life traffic scenes can be built in the field under the guidance of parallel vision to test the ability of self-driving cars to handle a variety of common scenes and long-tail scenes on urban and rural roads. In addition, the Changshu Test Center also has nearly ten kilometers of elevated roads for testing the abnormal handling ability of unmanned vehicles between high-speed travel.

Figure 5. IVFC Changshu test base

bibliography

WANG Feiyue. Parallel System Methods and Management and Control of Complex Systems[J]. [J]. Control and Decision, 2004, 19(005):485-489,514.

Wang F.-Y. Parallel Control and Management for Intelligent Transportation Systems: Concepts, Architectures, and Applications[J]. IEEE Transactions on Intelligent Transportation Systems, 2010, 11(3):630-638.

Wang K, Gou C, Zheng N, Rehg J. M, Wang F.-Y . Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives[J]. Artificial Intelligence Review, 2017, 48(3): 299-329.

Li X, Wang Y, Yan L, Wang K, Deng F, Wang F.-Y. ParallelEye-CS: A New Dataset of Synthetic Images for Testing the Visual Intelligence of Intelligent Vehicles[J]. IEEE Transactions on Vehicular Technology, 2019, 68(10): 9619-9631.

Li L, Wang X, Wang K, Lin Y, Xin J, Chen L, Xu L, Tian B, Ai Y, Wang J, Cao D, Liu Y, Wang C, Zheng N, Wang F.-Y. Parallel testing of vehicle intelligence via virtual-real interaction[J]. Science Robotics 4, no. 28 (2019).

Framework of Autonomous Driving Cognitive Enhancement System Based on Parallel Vision: Solving the Long Tail Problem in Autonomous Driving

Read on

MIT develops a new AI vision system that could significantly improve the safety of autonomous driving

Discriminative or generative: which one represents the future of visual understanding?

Not enough for a pure vision system? Set degree dual radar solution on the car follow-up or carry more radar

Huawei Smart Screen V75 Pro Image Appreciation: Boundless Vision, Computational Picture Quality, Immersive Look and Feel

When will autonomous driving cool down, and how long will it take?

Starting at ¥10099! Folding screen flagship Huawei Mate Xs 2 configuration encyclopedia

Pull out the fog and scientifically prevent blue light

Toyota embraces the pure visual autonomous driving route, what industrial chain signal is released?

Visual automatic driving has "accidents", which is expected to accelerate the loading of lidar on the road

May Fourth Youth Day! Want to make your ads refreshing?! Activate the graph first

Why doesn't Tesla use a high-definition map

Between the square inches is full of technology, laser marking mobile phone card

Only one-tenth of the data is needed to complete the four visual tasks, and it is actually open source!

Why sound is suitable for building a brand strengthens the mind

Make it easy for logos to "play big cards"! Skillful use of formal suggestion

Middle-aged women, still spoiled by men as "princesses", can not do without these reasons

Stunning off-screen visuals! The ZTE Axon 40 series "Born for Big Scenes" was officially released

Tesla announced two new patents that will improve the visual perception system

From the Air to the Max, what's the problem with Rokid

Preview of the Global Industry Morning Post: Apple has officially stopped its rumored self-driving car program

Overview of the market development of China's automotive autonomous driving industry in 2024 (released by Zhiyan Consulting)

The post-90s AI genius builds trucks and relies on end-to-end to enter the first echelon of autonomous driving

Baidu Radish Express will push unmanned taxis to land, and Tesla/Weimei Holographic has multiple layouts to seize the highland of the autonomous driving industry

In "AutoNavi", you can also hit Pony.ai self-driving taxis!

Insurance fastens seat belts for autonomous driving

The latest statement of the Ministry of Industry and Information Technology: the preparation of the new era of intelligent networked vehicle industry planning, improve the high-level autonomous driving supervision

The global war for autonomous driving

Honor CEO Zhao Ming demonstrated the AI agent function of the Honor Magic7 series in a recent live broadcast, which is capable of completing complex operations with simple voice commands, such as:

China's autonomous driving unicorn advances into the US stock market! Pony.ai applies for an IPO or raise as much as $300 million

Yu Chengdong responds to Tesla's FSD will enter China: confident to win the competition, Huawei ADS 4.0 will launch high-speed L3 autonomous driving commercial next year [with intelligent networked vehicle industry scale forecast]

The first share of Robotaxi is here! Pony.ai went to IPO in the United States, and self-driving trucks contributed 70% of revenue

The speed of life and death in the race of autonomous driving: who will dominate the future of China and the United States?

From the Nobel Prize to autonomous driving: AI leads the global innovation race

Low visibility accidents are frequent, and Tesla's "full self-driving" is under investigation again, involving 2.4 million vehicles

Didi Autonomous Driving participated in the World Intelligent Connected Vehicle Conference