How do self-driving cars solve complex interaction problems? Tsinghua and MIT proposed the M2I program

2022-03-24 16:20:58

Heart of the Machine column

Author: Sun Qiao

The research team from Tsinghua university and MIT proposed a method based on self-supervised learning, so that the autonomous driving model learns to correctly judge the courtesy relationship in the conflict from the existing trajectory prediction data set.

When self-driving cars hit the road, it is inevitable to learn some "unspoken rules" on the road. Autonomous driving systems need to observe the color, and find out in time when they should slow down and give way, and when they should find that others are giving way and accelerate as soon as possible. Due to the complexity of the road environment, many novice drivers may not be able to make appropriate judgments.

This complexity makes it difficult for a rule-based approach to cover the whole picture without conflicting situations. The research team from Tsinghua University proposed a method based on self-supervised learning, learning various "etiquettes" on the road from the existing trajectory prediction dataset, and correctly judging the courtesy relationship in conflict. The study tested the predicted relationships on a Waymo Interactive Motion Prediction dataset full of complex interactions and proposed an M2I framework to use the predicted relationships for scene-level interaction trajectory prediction.

The project was mainly completed by Sun Qiao of Tsinghua University and Huang Xin of MIT, and guided by Zhao Xing, a teacher of Tsinghua MARS Lab.

Address: https://arxiv.org/abs/2202.11884

Project Address: https://tsinghua-mars-lab.github.io/M2I/

Trajectory prediction is an important part of the automatic driving system, which is indispensable for the safe driving of autonomous vehicles. The trajectory prediction module is usually used as a downstream system for detection and tracking, using existing high-precision maps and information about other vehicles or pedestrians around it to predict what behavior they are likely to do in the future. The trajectory prediction system outputs the prediction results in the form of a trajectory or heat map so that the downstream planning system can plan a decision or trajectory that is most reasonable for the next step of the autonomous vehicle itself.

While most trajectory prediction methods attempt to learn the relationship between vehicles and pedestrians on the road through GNN or Attention-based methods, these methods often face some of the following insurmountable challenges:

1. The relationships predicted by the model are implicit, so they lack interpretability, and it is difficult to determine whether the model really learned these relationships;

2. There is no uniformity between the relationship predicted by the model and the trajectory of the final output (as shown in the first row of Figure 1), and there will naturally be overlap, and the rationality of the scene level cannot be ensured;

3. There is an order relationship between the decisions of road users, and the model prediction cannot distinguish the logical prediction order, but can only predict one by one in parallel.

How do self-driving cars solve complex interaction problems? Tsinghua and MIT proposed the M2I program

Figure 1: There will be collisions between trajectories output by the vehicle-by-vehicle trajectory prediction method

To address these issues, the researchers propose a simple and effective framework for M2I (figure 1, line 2). Using the M2I framework, you can quickly modify any trajectory prediction model you have in hand to gain scenario-level relationship prediction capabilities and the ability to predict the trajectory of another car based on the trajectory of one car. Use both of these capabilities to ensure that your new model gets better predictions for interactive scenarios.

Multi-agent trajectory prediction to single agent trajectory prediction

First, let's look at the overall framework of M2I. M2I consists of three modules, as shown in Figure 2. The three modules are the relationship prediction module, the unintelligent trajectory prediction, and the conditional trajectory prediction.

Figure 2: M2I trajectory prediction framework

Relationship prediction

Complex road-user relationships can be abstracted into multiple pairs of relationships, and the study classifies each pair of road users into an Influencer and a Reactor, defining the responder as the party in the conflict who needs comity and the influencer as the one that doesn't. Thus, the trajectory prediction problem in the interaction can be abstracted into two trajectory predictions, one is to predict the trajectory of the influencer, and the other is to use the trajectory of the predicted influencer to predict the trajectory of the responder. This approach ensures the consistency of the trajectories predicted by the two at the scene level, thus minimizing unreasonable situations such as overlap.

So how do you predict who the influencers are and who are the responders? Or predict who should be courteous in a conflict. The study proposes a method based on interlacing of space-time trajectories to mine Ground Truth labels from existing data sets. Specifically, in the dataset, if the trajectories of any two road users intersect at different times, the method marks the agent who passes through this intersection as the influencer first, and the agent who passes later as the responder. By learning this automatically generated label, the model can learn the antecedent relationship in the event of a conflict.

The relational prediction model used in the study was adapted from DenseTNT's Trajectory Prediction Head to a normal classification Classification Head. The researchers found that without making any modifications to the rest of the existing model, the relationship predictions could be predicted with more than 90 percent accuracy. Comparative experiments have shown that Conditional Trajectory Prediction using a higher accuracy relationship yields better results.

Researchers have also extended relationship prediction to the relationship prediction of multiple agents. For multiple agents, the study predicted them in pairs and formed a directed graph to represent their relationships, as shown in Figure 3, the M2I relationship prediction module can be well extended to the relationship prediction of multiple agents.

Figure 3: Relationship prediction of multiple agents in complex scenarios

Trajectory prediction

The unintelligent trajectory prediction module in the M2I framework can be replaced with any common trajectory prediction module, and in the paper's experiments, researchers used DenseTNT for monoster trajectory prediction. For Conditional Trajectory Prediction, the researchers modified DenseTNT's Encoder to encode the influencer's future trajectory (in the Waymo dataset used, the future trajectory is 8s, for a total of 80 frames) with other information for the model to learn. The future trajectory of the influencer at the time of training is the ground truth trajectory in the dataset, and the future trajectory of the influencer at the time of prediction is the trajectory of the output of the single agent module. For Conditional Trajectory Prediction, the study did not modify the structure of the model other than Encoder.

Experimental results

Experimental results show that the DenseTNT model using the M2I framework is significantly better than other methods compared to several other methods on leaderboard. Especially in the interaction between vehicles, the use of M2I prediction is a significant improvement in performance on mAP compared to other models.

Figure 4: M2I significantly outperforms other existing methods on Interactive Motion Prediction

The study also tried to use TNT as Backbone. Experimental results show that using the M2I framework can also help TNT improve performance in interactive scenarios, thus demonstrating that the M2I framework can be not limited to a specified backbone.

Qualitative analysis shows that with the M2I framework, the predicted trajectory behaves closer to the real interaction trajectory at the scene level, as shown in Figure 5.

Figure 5: M2I better learns how two vehicles that are interacting in the scene should complete the turns one after the other

How do self-driving cars solve complex interaction problems? Tsinghua and MIT proposed the M2I program

Read on