MIT develops a new AI vision system that could significantly improve the safety of autonomous driving

2021-12-13 20:41:09

Financial Associated Press (Shanghai, editor Huang Junzhi) news, as we all know, computer vision systems sometimes infer scenes that are contrary to common sense. For example, if a robot is processing a scene of a dining table, it might completely ignore the bowl that any human observer can see, think the plate is floating above the table, or mistakenly think that a fork is penetrating the bowl instead of leaning against it.

With that in mind, the risk is much higher if a computer vision system is transferred to a self-driving car — for example, the system can't detect emergency vehicles and pedestrians crossing the street.

To overcome these mistakes, researchers at the Massachusetts Institute of Technology (MIT) have developed a framework that could help machines see the world like humans. The new AI system they use to analyze scenes learns to perceive real-world objects from only a few images and perceive scenes based on those learned objects.

The researchers built the framework using probabilistic programming, an artificial intelligence approach that allows the system to cross-examine detected objects with input data to see if images recorded by the camera might match any candidate scenes. Probabilistic inference allows the system to deduce whether a mismatch may be due to noise or an error in the interpretation of the scene, which needs to be corrected by further processing.

This common-sense protection allows the system to detect and correct many of the errors that plague the "deep learning" methods that have also been used in computer vision. Probabilistic programming can also infer possible contact relationships between objects in a scene and use common-sense reasoning about those contacts to infer a more accurate location of objects.

"If you don't know the contact relationship, then you can say that an object floats above the table — that would be a valid explanation." As humans, we clearly know that this is physically unrealistic, and that objects placed on top of a table are more likely to be the posture of an object. Because our reasoning system knows this knowledge, it can infer more accurate postures. This is a key insight into this work," said lead author Nishad Gothoskar of the study paper, a PhD student in electrical engineering and computer science (EECS) in the Probabilistic Computing Project.

The researchers named the system "3D Scene Perception (3DP3) Programmed by Probability." To analyze the image of a scene, 3DP3 first understands the objects in that scene. After showing only five images of the object, each taken from a different angle, 3DP3 learns the shape of the object and estimates the volume it occupies in space.

Gothoskar said, "If I show you an object from five different angles, you can represent that object well. You'll learn about its color, shape, and be able to identify the object in many different scenes. ”

MIT develops a new AI vision system that could significantly improve the safety of autonomous driving

"This has a lot less data than deep learning methods. For example, the Dense Fusion neural object detection system needs to provide thousands of training examples for each object type. In contrast, 3DP3 only requires a few images per object and reports uncertainty in the shape part of each object. He added.

The 3DP3 system generates a graph to represent the scene, where each object is a node, and the lines connecting the nodes indicate which objects are in contact with each other. This allows 3DP3 to more accurately estimate how objects are arranged. (Deep learning methods rely on depth images to estimate object poses, but these methods do not produce a graph structure of contact relationships, so their estimates are less accurate.) ）

The researchers note that in addition to improving the safety of self-driving cars, the work could also improve the performance of computer-aware systems that must interpret the complex arrangement of objects, such as robots responsible for cleaning cluttered kitchens.

In the future, the researchers hope to push the system even further, enabling it to understand an object from a single frame in a single image or movie, and then be able to robustly detect that object in different scenes. They also want to explore the use of 3DP3 to collect training data for neural networks. It is often difficult for humans to manually label images with 3D geometry, so 3DP3 can be used to generate more complex image labels.

MIT develops a new AI vision system that could significantly improve the safety of autonomous driving

Read on