【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

One minute a day, take you through the top meeting articles of the robot

标题：Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

作者:Di Feng, Christian Haase-Schütz, Lars Rosenbaum, Heinz Hertlein, Claudius Gl ser, Fabian Timm, Werner Wiesbeck, Klaus Dietmayer

来源：2020 IEEE Transactions on Intelligent Transportation Systems (TITS)

Compiler: Jin Xiaoxin

Review: Chai Yi, Wang Jingqi

This is the 847th article pushed by Bubble one minute, welcome to forward the circle of friends; other institutions or self-media if you need to reprint, the background message to apply for authorization

summary

The latest advances in autonomous driving perception are driven by deep learning. To achieve robust and accurate scene understanding, autonomous vehicles are often equipped with different sensors (e.g., cameras, lidar, radar) and can incorporate multiple sensing modes to take advantage of their complementary properties. In this context, many methods have been proposed to solve the problem of depth multimodal perception. However, there are no common guidelines for network architecture design, and questions of "what to converge," "when to converge," and "how to converge" remain. This paper attempts to systematically summarize the methods of deep multimodal object detection and semantic segmentation in autonomous driving, and discuss the challenges faced by these methods. To do this, we first outlined the on-board sensors on the test vehicle, the open data set, and the context for object detection and semantic segmentation in autonomous driving research. We then summarized the convergence approach and discussed challenges and openness issues. In the appendix, we provide a table summarizing the topics and methodology. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 1: A complex urban scenario for autonomous driving. Driverless cars use multimodal signals for perception, such as RGB camera images, lidar points, radar points, and map information. It needs to perceive all relevant traffic participants and objects accurately, reliably, and in real time. For clarity, only the bounding boxes and classification scores for certain objects are drawn in the image. RGB images adapted from [4].

Figure 2.The abscissa represents the run time and the ordinate represents the average accuracy (AP), as shown in the figure is a deep learning method for vehicle detection on a KITTI bird's eye view test dataset. The graphical results are mainly based on the KITTI ranking on April 20, 2019, which only considers the published methodology.

Figure 3.(a) DARPA 2007 Boss self-driving car[2](b)Waymo self-driving car[14]

Figure 4. Faster R-CNN target detection network. It consists of three parts: a preprocessing network for extracting advanced image features, a regional recommendation network (RPN) for generating regional recommendations, and a Faster-RCNN header for fine-tuning each region.

Figure 5.(a) Number of movable objects in each dataset frame (b) Number of camera image frames in multiple datasets. You can see that the dataset size has increased by two orders of magnitude.

Figure 6.RGB images and different 2D lidar representations.

(a) Standard RGB images, represented by pixel grids and color channel values.

(b) Sparse (front view) depth map obtained from lidar measurements represented on the grid.

(d) Interpolation of reflectivity values measured on the mesh.

(e) An interpolated representation of the lidar points measured on a spherical map (surround view).

(f) Projection (no interpolation) from the measured lidar point (front) to the aerial view.

Figure 7: An example of a hybrid expert fusion method. Here we show the combined characteristics derived from the output layer of the expert network. They can also be extracted from the middle layer.

Figure 8: Description of early fusion, late fusion, and several intermediate fusion methods.

Figure 9:An example fusion architecture for a two-level object detection network. (a) MV3D(b) Video on demand (c) Flat section body point network [105] (d) recommendation.

Figure 10:Two examples of improving the efficiency of labeling lidar data. (a) Collaborative marking of lidar points for 3D inspection: Lidar points within each object are weakly marked first by a human annotator and then fine-tuned by an F-point net-based pre-trained lidar detector. (b) Collaborative training of semantic segmentation networks for lidar points (SegNet): To enhance the training data, image semantics can be transmitted using pre-trained image SegNet.

Figure 11.(a) Description of the impact of label quality on target detection network performance. The network is trained on labels subject to incremental interference. Performance is measured by mapping normalization to performance trained on uninterrupted datasets. The network is much more robust to random marker errors (from a Gaussian distribution with a variance of σ) than biased markers (all markers are shifted by σ). (b) Illustration of random marker noise and marker bias (all bounding boxes are moved in the upper right direction).

Figure 12.The importance of explicitly modeling and propagating uncertainty in a multimodal object detection network. Ideally, the network should produce reliable predictive probabilities (object classification and localization). An example of the figure depicts the high uncertainty of camera signals during night driving. This type of uncertainty information is useful for decision modules, such as maneuver planning or emergency braking systems.

Abstract

Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of “what to fuse”, “when to fuse”, and “how to fuse” remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference:

https://boschresearch.github.io/multimodalperception/.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Read on

Musk hates illness, but is more suited to China

He Xiaopeng: In 2023, Xpeng will launch 5 new cars / 3 previously planned

The new Xpeng P7 was released with LiDAR/styling details upgraded

The strongest opponent of the ideal MPV? New platform + lidar, Xpeng MPV released within the year

Hesai Technology landed on NASDAQ, and China's "first lidar stock" was born

Hesai Nasdaq listed: market value of 2.4 billion US dollars, becoming the first LiDAR stock in China

Hesai went public, Sagitar Juchuang "opened the bill", and the climax of domestic lidar development came

Musk: Only fools use lidar! Hesai: I don't believe it

Will car lidar hurt your eyes?

These Chinese companies have a hand in "opposition" with Musk

An article combing domestic lidar: listing is not the end, and the road ahead is still long

Overall on the same page as EC7! This NIO SUV is fully upgraded, not only has LiDAR?

Lidar will also get into more cars, contrary to Musk's ideas

Lidar burned out the phone camera, what happened?

11 screens, 25 radars The FF91, which took 9 years to mass-produce, has completely lagged behind?

These predictions and actions are more like "old drivers" to experience the Zhiji LS7 NOA Navigation Assist System

Autonomous driving technology development status and future challenges: a comprehensive analysis of the market, technology and human factors

The post-90s autonomous driving bull makes a car: the smart card is equipped with a wire-controlled chassis to prepare for unmanned driving

The L3 license has just been issued, and the L4 autonomous driving is coming again?

Gu Weihao, CEO of Momo Zhixing: AI large model is the only way to realize autonomous driving

Xiaomi Su7 buttocks were scolded, overtaking, encountering automatic braking, following the car, don't die, follow the tram, once the automatic/braking, the car has no time to counter the emergency, hurriedly braked, and hit the direction of rubbing one side hard

The first batch of 9 car companies have obtained the qualification for admission and on-road pilots, and L3 autonomous driving is coming

"All the people" vs. autonomous driving, this time netizens are actually supporting Baidu?

The L3 Golden Key is here! Nine car companies took the lead, and autonomous driving ushered in the dawn moment

9 major car companies get L3 tickets! Tesla, Mercedes-Benz, and BMW fell off the list, another milestone for domestic autonomous driving

Crowdsource map technology will become an important guarantee technology for autonomous driving to leap to L3 and beyond

The list of the first batch of L3 pilots has been released, and China's autonomous driving continues to lead, will Europe, the United States, Japan and South Korea still follow?

When I woke up, I suddenly found that my ideal was stunned, Xiaopeng was dizzy, and even BYD Wang Chuanfu was hidden, and all electric car companies were kept in the dark by Akio Toyoda. What a cunning! abundant

Is Level 3 autonomous driving 100% achievable?

To show off "self-driving" drivers sleeping on the highway? Risk Warning: Don't "Scatter" if you are concerned about your life

A purely money-spending thing has been done for 6 years in a row, and Ali is still happy to do it, and the ambition behind it can't be hidden! On June 13th, the finalists of the 2024 Ali Mathematics Competition were released

Black Sesame Intelligence passed the hearing of the Hong Kong Stock Exchange and will become the "first stock of domestic autonomous driving computing chips"