laitimes

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

One minute a day, take you through the top meeting articles of the robot

标题:Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

作者:Di Feng, Christian Haase-Schütz, Lars Rosenbaum, Heinz Hertlein, Claudius Gl ser, Fabian Timm, Werner Wiesbeck, Klaus Dietmayer

来源:2020 IEEE Transactions on Intelligent Transportation Systems (TITS)

Compiler: Jin Xiaoxin

Review: Chai Yi, Wang Jingqi

This is the 847th article pushed by Bubble one minute, welcome to forward the circle of friends; other institutions or self-media if you need to reprint, the background message to apply for authorization

summary

The latest advances in autonomous driving perception are driven by deep learning. To achieve robust and accurate scene understanding, autonomous vehicles are often equipped with different sensors (e.g., cameras, lidar, radar) and can incorporate multiple sensing modes to take advantage of their complementary properties. In this context, many methods have been proposed to solve the problem of depth multimodal perception. However, there are no common guidelines for network architecture design, and questions of "what to converge," "when to converge," and "how to converge" remain. This paper attempts to systematically summarize the methods of deep multimodal object detection and semantic segmentation in autonomous driving, and discuss the challenges faced by these methods. To do this, we first outlined the on-board sensors on the test vehicle, the open data set, and the context for object detection and semantic segmentation in autonomous driving research. We then summarized the convergence approach and discussed challenges and openness issues. In the appendix, we provide a table summarizing the topics and methodology. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 1: A complex urban scenario for autonomous driving. Driverless cars use multimodal signals for perception, such as RGB camera images, lidar points, radar points, and map information. It needs to perceive all relevant traffic participants and objects accurately, reliably, and in real time. For clarity, only the bounding boxes and classification scores for certain objects are drawn in the image. RGB images adapted from [4].

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 2.The abscissa represents the run time and the ordinate represents the average accuracy (AP), as shown in the figure is a deep learning method for vehicle detection on a KITTI bird's eye view test dataset. The graphical results are mainly based on the KITTI ranking on April 20, 2019, which only considers the published methodology.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 3.(a) DARPA 2007 Boss self-driving car[2](b)Waymo self-driving car[14]

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 4. Faster R-CNN target detection network. It consists of three parts: a preprocessing network for extracting advanced image features, a regional recommendation network (RPN) for generating regional recommendations, and a Faster-RCNN header for fine-tuning each region.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 5.(a) Number of movable objects in each dataset frame (b) Number of camera image frames in multiple datasets. You can see that the dataset size has increased by two orders of magnitude.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 6.RGB images and different 2D lidar representations.

(a) Standard RGB images, represented by pixel grids and color channel values.

(b) Sparse (front view) depth map obtained from lidar measurements represented on the grid.

(c) Interpolation depth map.

(d) Interpolation of reflectivity values measured on the mesh.

(e) An interpolated representation of the lidar points measured on a spherical map (surround view).

(f) Projection (no interpolation) from the measured lidar point (front) to the aerial view.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 7: An example of a hybrid expert fusion method. Here we show the combined characteristics derived from the output layer of the expert network. They can also be extracted from the middle layer.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 8: Description of early fusion, late fusion, and several intermediate fusion methods.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 9:An example fusion architecture for a two-level object detection network. (a) MV3D(b) Video on demand (c) Flat section body point network [105] (d) recommendation.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 10:Two examples of improving the efficiency of labeling lidar data. (a) Collaborative marking of lidar points for 3D inspection: Lidar points within each object are weakly marked first by a human annotator and then fine-tuned by an F-point net-based pre-trained lidar detector. (b) Collaborative training of semantic segmentation networks for lidar points (SegNet): To enhance the training data, image semantics can be transmitted using pre-trained image SegNet.

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 11.(a) Description of the impact of label quality on target detection network performance. The network is trained on labels subject to incremental interference. Performance is measured by mapping normalization to performance trained on uninterrupted datasets. The network is much more robust to random marker errors (from a Gaussian distribution with a variance of σ) than biased markers (all markers are shifted by σ). (b) Illustration of random marker noise and marker bias (all bounding boxes are moved in the upper right direction).

【Bubble one minute】Deep multimodal object detection and semantic segmentation of autonomous driving

Figure 12.The importance of explicitly modeling and propagating uncertainty in a multimodal object detection network. Ideally, the network should produce reliable predictive probabilities (object classification and localization). An example of the figure depicts the high uncertainty of camera signals during night driving. This type of uncertainty information is useful for decision modules, such as maneuver planning or emergency braking systems.

Abstract

Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of “what to fuse”, “when to fuse”, and “how to fuse” remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference:

https://boschresearch.github.io/multimodalperception/.

Read on