Event cameras, an innovation in autonomous driving perception?

2022-01-16 23:15:24

The contradiction between too many image pixels that need to be processed and insufficient computing power of the chip has become one of the bottlenecks restricting the development of automatic driving.

To solve the above problems, the combination of event cameras and pulse neural networks may be a viable solution.

Convolutional neural networks are an important means of current image object detection algorithms. Taking ResNet-152 as an example, a 152-layer convolutional neural network needs to process a 224*224 image with about 22.6 billion times, and if the network is to process a 1080P 30-frame camera, then the amount of computation it needs will be as high as 33 trillion times per second, which is very large.

Taking the current typical Baidu unmanned vehicle as an example, the computing platform is about 800 TOPS, of which 1TOPS represents that the processor can perform one trillion operations per second.

Suppose that a camera requires 33 TOPS, not to mention that unmanned vehicles are often equipped with more than ten cameras, as well as multiple lidar and millimeter-wave radar.

In order to accurately detect pedestrians and predict their path, chips often need multi-frame processing, at least 10 frames, or 330 milliseconds. This means that the system can take hundreds of milliseconds to achieve effective detection, while for a vehicle traveling at 60 km/h, 330 milliseconds can travel 5.61 meters.

If the frame count is increased to 30 frames per second for adequate safety, the image data is likely to overwhelm the autopilot chip.

In view of the problem of insufficient computing power, increasing the hash rate is the easiest way for players in the industry to think of. However, at present, the process of chips is constantly compressing, in the extremely small size, the quantum penetration effect is gradually significant, Moore's Law is gradually invalidated, and the improvement of chip computing power is also facing great challenges.

At the same time, the improvement of computing power is also accompanied by the improvement of power consumption, but in the context of new energy, the more energy allocated to the chip, the greater the endurance will be affected.

Computing power and energy consumption are gradually becoming a contradiction in the development of automatic driving.

So can we find another way? Bionics may bring us new ideas.

For humans, it is not difficult to notice moving objects in a still picture. For the frog, it can only see moving objects and turn a blind eye to still background images.

For this feature of living things, the researchers designed an event camera.

Traditional cameras repeatedly scan the entire scene at a fixed frame rate, faithfully outputting a video stream consisting of frames of pictures, regardless of whether there is target activity in the scene. There is no doubt that this continuous video stream has a high degree of information redundancy, and a large number of useless background images are also sent into convolutional neural networks for computation.

The event camera, on the other hand, records only pixels where the brightness "changes".

The effect of the traditional frame camera compared to the output of the event camera is shown in the following figure, that is, the traditional frame camera outputs all the information of the entire field of view (left), while the event camera only captures the moving arm in the scene, as shown in the right image.

Based on the focus on moving goals, event cameras may be able to make a difference in the field of autonomous driving.

Because the event camera eliminates a still background image, the amount of data generated per frame is greatly reduced, reaching the level of tens of kilobytes.

Compared with traditional cameras, event cameras also have the advantages of high frame rate, low power consumption, and high dynamic range:

1) High frame rate. In fact, the so-called "frame rate" concept does not exist for event cameras. Each sensor of the event camera can record changes in pixel brightness asynchronously, without waiting for the "exposure" of traditional cameras 30 times per second. Based on the lack of exposure, the output frequency of the event camera can be as high as 1 million times per second, far exceeding the frame rate of a traditional camera of 30 times per second.

2) Low latency. The event camera only transmits brightness changes, thus avoiding the transmission of large amounts of redundant data, so energy consumption is only used to process the changing pixels. Most event cameras consume about 10 mW of power, while some camera prototypes consume even less than 10 μW, far less than traditional frame-based cameras.

3) High dynamic range. The event camera has a dynamic range of up to 140 dB, which is far better than a 60 dB frame camera. This allows the event camera to work both during daylight hours in good lighting conditions and to capture dynamic information in the field of view at darker nights. This is due to the fact that the photoreceptors of each pixel of the event camera work logarithmically independently, rather than the global shutter mode. Thus, the event camera has similar properties to the biological retina, whose pixels can adapt to very dark and very bright light-sensitive stimuli.

The following illustration shows the event camera's characteristics of focusing on moving objects and high dynamic range. Traditional cameras have difficulty identifying pedestrians on the right side of a picture in a dim light. However, the event camera can very clearly capture the pedestrian on the right and filter out the vehicle information that is still on the right side of the image.

Event cameras, an innovation in autonomous driving perception?

Traditional cameras

Event camera

In the field of automatic driving, the event camera has a huge advantage over the traditional camera, but it should be noted that the event camera cannot extract the distance information, and the laser radar is required to determine the target distance.

Some people may wonder: the event camera is so good, why is there no large number of applications in the field of automatic driving?

In fact, the camera acquisition of information is only the first step, and the processing of camera information for subsequent events is a more critical part.

As shown in the following figure, the output of a traditional camera is a frame-by-frame still picture, while an event camera is an event stream.

In general, current neural networks are focused on how to extract pedestrians, cars, and other targets in each frame of still pictures, such as YOLO, resnet and other algorithms. For timestamp-based event streams, there is currently no effective algorithm for target identification.

The lack of event stream processing algorithms is inseparable from the current neural network structure.

The current mainstream neural networks are called second-generation artificial neural networks, based on precise floating-point operations, missing one of the most important factors in nature: time. For neural networks, the output results will correspond to the inputs one-to-one, and whenever the same picture is input, the neural network will output the same result.

But is the real brain based on this floating-point operation? Obviously not, the real brain is pulse-based, using pulses to transmit and process information. The following video briefly explains the working mechanism of pulse neurons.

Mechanism of neuronal membrane voltage operation

This pulse-transfer-based neural network is the spiking neural network (SNN), known as the third generation of artificial neural networks. Chips based on the structure of pulsed neural networks are also known as brain-like chips.

As can be seen from the above video, the moment the pulse occurs carries important information, and the pulse neural network naturally has the ability to process timing information, which is very consistent with the event stream output based on the event camera based on timestamps.

In addition, pulsed neural networks are event-driven, asynchronous, and extremely low-power.

1) Event-driven. In our brains, about 90% of neurons are silent at the same time. That is, neurons are inactive when there is no event input. This feature also makes the output of the event camera's event stream very compatible with the SNN, while the power consumption is greatly reduced.

2) Asynchronous operations. There is no concept of "main frequency" in pulsed neural networks. Traditional computers require a clock to ensure that all operations are performed on a time step, and the frequency of this clock is called the main frequency. At present, the mainstream computer frequency reaches more than 1GHz per second. However, taking IBM's neurological hardware TrueNorth as an example, a pulse emission rate of about 100Hz can complete tasks such as image recognition and object detection. The current general-purpose computer is basically the von Neumann structure, which has formed an insurmountable computing bottleneck as the CPU operation speed far exceeds the memory access speed. However, all the memory and operations of the pulsed neural network are reflected in the asynchronous pulses of neurons, and there is great hope of breaking through the current bottleneck of computer computing power.

3) Very low power consumption. In the famous human-machine Go war in 2016, the average power consumption cost of Google's AlphaGo system per game was as high as $3,000. As a pulsed neural network architecture, the human brain has a power of only about 20W. Previously, some scholars pulsed the classic algorithm YOLO in object detection, and when the same task was completed, the power consumption was reduced by about 280 times, and the speed was increased by 2.3 to 4 times.

In general, the combination of event cameras and pulsed neural networks is like a human looking around with the eyes and brain: automatically ignoring things that are still around them, focusing on and calculating on suddenly moving objects.

At present, the academic community has set off a boom in the research of pulsed neural networks, but because the development of neural hardware is in its infancy, and people's understanding of the working mechanism of the brain is not comprehensive enough, there is currently no commercial application based on pulse neural networks.

With the deepening of people's understanding of the brain, as well as the research and development of foreign TrueNorth, SpiNNaker, Loihi and domestic Tsinghua Tianjic and Zhejiang University's Darwin and other brain-like chips. We also expect that the combination of event cameras and pulse neural networks can bring new breakthroughs to the autonomous driving industry.

bibliography:

SANG Yongsheng,LI Renhao,LI Yaoqian,WANG Qiangwei,MAO Yao. Neuromorphic vision sensor and its application[J].Journal of Internet of Things,2019,3(04):63-71.

Kim S , Park S , Na B , et al. Spiking-YOLO: Spiking Neural Network for Energy-Efficient Object Detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7):11270-11277.

https://www.bilibili.com/video/BV1eE41167AX

https://www.bilibili.com/video/BV1EK4y1K71C

Questionnaire

Write at the end

About submissions

If you are interested in contributing to the "Nine Chapters of Intelligent Driving" ("Knowledge Accumulation and Collation" type article), please scan the QR code on the right and add the staff WeChat.

Note: When adding WeChat, be sure to note your real name, company, and current position

As well as information such as submission intentions, thank you!

"Knowledge Accumulation" manuscript quality requirements:

A: The information density is higher than the vast majority of reports of the vast majority of securities companies, and it is not lower than the average level of "Nine Chapters of Intelligent Driving";

B: Information is highly scarce, more than 80% of the information needs to be invisible in other media, and if it is based on public information, it needs to have a particularly strong and exclusive point of view. Thank you for your understanding and support.

Event cameras, an innovation in autonomous driving perception?

Read on