Over the past two decades, autonomous driving technology has made significant progress, with some highly automated vehicles already available. Not long ago, we shared a robotics experiment that involved understanding complex instructions. It would be a magical experience if large language models (LLMs) could be used to understand and execute verbal instructions from self-driving cars, no longer limited to simple commands such as "Hey XX, open the sunroof, turn off the air conditioning", but involve complex instructions that actually manipulate the car on the road.

The authors of this article introduce Talk2Drive, an autonomous driving framework that leverages LLMs to interpret and respond to a variety of human commands, especially those that are abstract or emotional, while leveraging historical interaction data to enable a personalized driving experience. Unlike traditional systems where precise input is required, the Talk2Drive framework allows for more natural and intuitive communication with the vehicle.

This article will delve into the role of LLMs in autonomous driving decision-making, and discuss technologies such as vehicle configuration, perception systems, and communication devices, and how these technologies work together to enable autonomous navigation on real roads.

Thesis Title:

Large Language Models for Autonomous Driving: Real-World Experiments

Paper Links:

https://arxiv.org/abs/2312.09397

LLMs combine the advantages of autonomous driving

LLMs offer the following advantages over traditional systems for autonomous driving:

Understanding abstract expressions: Traditional systems have difficulty understanding human abstract instructions, while LLMs are able to understand and adapt to a variety of human emotions and contextual cues.
Personalized driving experience: LLMs enhance the driving experience by providing personalized driving modes based on historical human preferences and commands.
Real-time and security: LLMs have low latency when processing complex human instructions, which is critical for real-time applications and safety-critical scenarios.

Talk2Drive 框架

The Talk2Drive framework is an innovative approach to autonomous driving planning and control tasks that combines cloud-based large language models (LLMs) with real-world vehicle dynamics to respond to human input in a personalized way, as shown in Figure 1.

The big model has entered the autonomous driving process!

Figure 1. Talk2Drive frame structure

Instruction translation and contextual data integration

This is the first step in the framework to ensure accurate translation of the user's verbal instructions, and by integrating real-time environmental data, the system can understand and process these instructions more comprehensively and intelligently.

The Talk2Drive framework first recognizes human verbal commands through advanced speech recognition technology. Spoken commands are translated and converted into text instructions, and the key to this step is to ensure that the content and details of the spoken language are accurately translated into text format. At the same time, LLMs have access to real-time, cloud-based environmental data, including weather updates, traffic conditions, and local traffic rule information, which is presented in text format and plays a key role in the decision-making process, ensuring that the system's responses take into account the context of the environment.

LLM-based processing and inference

In this section, LLMs are used to process and reason about text commands, which are critical steps in the framework that enable the system to understand complex, context-rich instructions. The LLM interprets these text commands during the reasoning process. The goal of this step is for the LLM to understand the user's instructions and make decisions accordingly. In addition, the LLM can be combined with the contextual data provided in the previous step.

Generate executable code

The output of the LLM inference process is executable code that is used to plan and control vehicle behavior. Inspired by the concept of "code as a strategy", the resulting code is more than just a series of simple instructions, but involves complex driving behaviors and parameter fine-tuning that need to be performed in the vehicle's underlying controller. This includes fine-tuning of control parameters such as look-ahead distance and look-ahead ratio, in addition to the code modifying the vehicle's target speed based on verbal instructions from the driver.

Code execution and feedback in autonomous vehicles

This process realizes the transformation from the code generated by the language model to the actual driving behavior, and ensures the reliability and safety of the entire process through safety checks.

The code generated by the LLM is sent back to the vehicle's electronic control unit (ECU) via the cloud and executed in the ECU. The Talk2Drive framework sets up two security checks for the generated code:

First check if the generated code is formatted validly, if the code does not conform to the valid format, the framework will not provide feedback or actions related to the generated code, which ensures that the generated code is structurally correct to avoid possible errors.
Another safety check involves parameter validation, which assesses whether a given parameter is appropriate and safe in the current situation. This step helps to prevent the execution of potentially dangerous code and ensures that the generated code is appropriate and safe for the vehicle.

The execution of the code involves adjusting the basic driving behavior and various parameters in the vehicle planning and control system. The actuator controls the throttle, brakes, gear selection, and steering via the CAN bus and an electronically controlled drive system, as shown in Figure 2. This ensures that the code generated by the LLM accurately directs the vehicle to perform the appropriate driving behavior.

Figure 2 Autonomous driving function module and message flow

Memory modules and personalization

This module introduces a personalized driving experience to the Talk2Drive framework, allowing the system to more intelligently adapt to the user's driving preferences by recording, analyzing, and leveraging historical interaction data. The purpose of this new memory module is to store the historical interactions between people and vehicles, with a focus on enhancing the personalized driving experience.

Every interaction between the person and the vehicle is recorded and saved to a memory module in text format within the ECU. Records include human verbal commands, code generated by LLMs, and human feedback. The historical data in the memory module is updated after each trip, and each interaction with the vehicle is recorded in real time to reflect the latest status and preferences of the user with the vehicle.

If the user reacts differently to a similar command, the LLM will use the most recent feedback as a reference point for its current decision-making process, which can meet the user's potentially changing preferences. When a user issues a command, the LLM accesses the memory module and uses the stored information as part of the input prompt for the decision-making process.

Trajectory tracking

The driving trajectory of a vehicle is generated by recording a series of waypoints that represent the vehicle's position information in a local coordinate system and constitute the vehicle's predetermined travel route. The main function of the trajectory tracking module is to enable the vehicle to navigate in a specified waypoint sequence. It initiates the whole process by loading the selected tracks.

The system constantly checks the current state of the vehicle with the current target path and calculates the distance between them, known as the look-ahead distance. This look-ahead distance is used to determine if the vehicle is close enough to the current waypoint.

If the vehicle is close enough to the current path, the current target path is updated to the next waypoint.
If the distance between the vehicle and the current target waypoint does not reach the set minimum distance, the system will continue to navigate to the original current target waypoint through a pure tracking algorithm.

The above process is repeated until the vehicle reaches the final path, at which point the algorithm ends.

▲Figure 3 Trajectory tracking flow chart

The authors use a pure tracking algorithm as a path following method in the autonomous driving system. Its inputs include the target waypoint, look-ahead distance, and desired velocity, generating the wheel angle and current acceleration for vehicle control. The core idea of the pure tracking algorithm is to calculate the front wheel rotation angle using the look-ahead distance, turning radius, and direction angle of the look-forward point, and then use the calculated front wheel rotation angle and expected vehicle speed to achieve tracking of the target waypoint, as shown in Figure 4.

Fig.4 Schematic diagram of pure tracking path tracing algorithm

experiment

Figure 5 Setup of a real autonomous vehicle in an experiment

The sensor suite and connectivity setup for the autonomous vehicle platform are shown in Figure 5. The experimental test track is shown in Figure 6, with the specified trajectory of the test forming a rectangular ring that includes a long straight that allows for continuous speed and control evaluation, as well as corners at each corner.

Figure 6 Map of the experimental site

In the experiment, the subjects were divided into three groups, and the members within these groups had similar trends in driving behavior. Subjects were then asked to develop orders at three levels (direct, Xi inertial indirect and non-Xi indirect strategies). It is processed through the Talk2Drive framework, which initializes the trajectory tracking module. Each command is processed using four different language models, collecting data points such as speed and response delay, and then calculating evaluation metrics. In order to establish a baseline for speed difference and speed variance, different groups of human drivers are also asked to drive on the same trajectory and the average of their data is used as the baseline value, and Table 1 shows the specific values of these evaluation indicators.

▲Table 1 Results of Talk2Drive under different LLM models and command categories

Comprehension: The speed difference is used to assess the LLM model's ability to understand indirect commands. All LLMs tested in the framework were able to understand velocity commands of different velocity intent categories and accurately translate them into executable code with a 100% success rate.
Comfort: To assess the comfort level, velocity variance and acceleration are measured. The results show that the speed difference and average acceleration do not significantly exceed the baseline, while the average acceleration decrease does not exceed the recommended threshold for an "excellent" driving experience. This shows that the speed adjustment via Talk2Drive has no significant impact on driving comfort.
Latency: Takes into account the duration from the time the LLM API call is initialized to the time the command text is successfully received. The results show that GPT-3 has the shortest latency, likely due to its smaller model size. GPT-4 and PaLM 2 are slightly slower, with GPT-4 having a more stable latency, which may also be related to the number of users.

The takeover rate before and after the integration of Talk2Drive was evaluated, and various driving scenarios were simulated by simulating human drivers with different driving styles. When the driver thinks the default speed setting of the track tracking module is too fast or too slow, they take over the vehicle. As shown in Table 2, the integration of Talk2Drive enables drivers to interact with the system in a more intuitive and personalized way, communicating their preference for speed through verbal commands. This improvement in capability is reflected in a significantly reduced takeover rate in real-world driving scenarios, indicating that the system is better adapted to the driver's preferences and improves the overall user experience.

▲Table 2 Comparative analysis of takeover rate

At the same time, it was also shown that the introduction of the memory module significantly reduced the takeover rate, which illustrates the benefits of the history module in achieving a more personalized driving experience.

summary

This article shows the innovative application of LLMs in the Talk2Drive framework. Experimental results show that the Talk2Drive framework enables autonomous vehicles to efficiently understand and execute complex, context-rich human commands, providing a higher level of personalization for the driving experience. It also marks Talk2Drive as the first framework to successfully deploy LLMs on real-world autonomous vehicles, and sets a new milestone for autonomous driving technology with a 100% code execution success rate.

However, it is still challenging to implement the complex driving of the car with LLM in real-world scenarios, involving the speed of reaction and the ability to interpret instructions, while ensuring the security of the data. We expect to explore deep integration with other intelligent transportation systems and IoT devices in future research, with the aim of creating a smarter urban mobility network to co-create a smarter and more efficient urban mobility network." We look forward to the further development of autonomous driving technology to bring a more convenient, safe and personalized new experience to future travel~

The big model has entered the autonomous driving process!