laitimes

The value and experience points of multi-mode interaction in the intelligent cockpit

author:Everybody is a product manager
How multi-mode interaction can improve the driving experience of intelligent cockpit is a topic worthy of our long-term exploration. In this article, the author shares the value of multi-mode interaction in the intelligent cockpit, as well as several key points for designing a multi-mode interactive experience, let's take a look.
The value and experience points of multi-mode interaction in the intelligent cockpit

In HMI design, sight, hearing, touch, and smell all play different roles. Humans receive 83% of the information they receive through their senses, and in the cockpit, the driver's eyes get information from the dashboard, center console, rearview mirrors, HUD, ambient lighting, and the environment outside.

In the intelligent cockpit, in addition to the touch screen based on tactile perception, physical buttons and other interactive methods, air gestures, face recognition, posture recognition, eye tracking, ECG monitoring, respiratory monitoring and other interaction and recognition methods are gradually landing in the cockpit, which makes the form and content of multi-mode interaction more rich and diverse.

Achieving a safe, efficient and comfortable in-car interactive experience through multi-mode interaction is where the value of multi-mode interaction lies in the intelligent cockpit, so how to understand safety, efficiency and comfort?

1. The application value of multi-mode interaction

In the article "Ergonomics and Human-Computer Interaction Theory of Intelligent Cockpit Design", situational awareness, SRK model, multi-resource theory and Yeerd's law are mentioned, and the combination of these four models can explain why multimodal interaction needs to be considered in intelligent cockpit experience design. Taking the combination of the SRK model and Yed's law as an example, why novice drivers need to look ahead very attentively to drive, because their driving operations are still at the level of knowledge, and the cognitive load of novice drivers is at a high level, and when there are other things that affect novice drivers and cause overload of cognitive resources, it is easy to be dangerous.

But for skilled drivers, driving has become a skill, and they don't need to put most of their cognitive resources on the road, and many simple tasks can be performed at the same time. Although an experienced driver can multitask, he or she needs to be very attentive when encountering unfamiliar and harsh environments, as his understanding of the environment is at a level of knowledge.

Taking the combination of situational awareness and Yed's law as an example, in the manual driving state, the driver's driving process is to collect data (perception) of the surrounding environment, and then process (prediction and decision-making) and take action to operate the vehicle. When the car is in the state of intelligent driving, because the driver is likely not to pay attention to the driving task, once there is a problem and the driver needs to take over, the driver needs to perceive, predict and make decisions on the environment in a very short time and take action, and the cognitive load is likely to increase from a low moment to a high or even too high level, resulting in distraction or anxiety.

From the perspective of multiple resource theory, a good driving experience should be to present the information that needs the driver's attention through different channels, so as to reduce the cognitive load of the driverIn addition to cognitive load, the aforementioned face recognition, posture recognition, ECG monitoring, respiratory monitoring and other interaction and recognition methods are all to ensure that the driver is in a good driving state, so as to ensure the safety of passengers and vehicles.

Steering wheel buttons, voice interaction, air gestures and eye tracking can effectively improve the driver's operating efficiency, and allow the driver to control the entire car without leaving the seat with his back to effectively improve the comfort of operation, and the purpose behind these is still to allow the driver to control the vehicle more safely.

2. Four key points for designing a multi-modal interactive experience

How to let users know what is happening in real time through multi-screen interaction, voice interaction, atmosphere light interaction, tactile interaction and other interactive methods? This has always been a cutting-edge topic that is currently being discussed in both academia and industry. Here are four key points to focus on when designing multimodal interactions.

1. Information can be presented in a multi-channel redundant way, especially for high-priority or even urgent information

It has been shown that "visual + auditory" or "visual + vibro-haptile" warnings have been shown to have faster response times than unimodal warnings, which is related to the redundancy gain of multimodal interactions that can speed up the processing time of information.

Auditory or vibrating tactile signals are ephemeral, so information can be missed or forgotten, which is especially important in the case of critical information. When the driver has visual or auditory obstacles in receiving information due to his own reasons or environmental reasons, such as a dark environment or a noisy environment, multi-channel transmission of information can avoid the problem that the driver cannot receive the information as much as possible

2. Important information should be the most obvious in perception, especially warning messages should guide the user towards the source of danger

Since there is a large amount of information in different directions during driving, when an emergency is about to occur, the driver should be allowed to look in the direction of the impending danger at the appropriate time, such as the front/side/rear orientation of the vehicle, and the visual-based ambient lighting and auditory-based warning sound can effectively guide the user towards the source of danger.

3. The information conveyed by each modality is understandable, especially when the modalities are linked

A lot of information in the cockpit HMI is represented by text and symbols in the GUI interface, but it is a question whether this information can be easily understood when it is converted into speech or even dialogue, especially if the symbols are non-standard symbols or have ambiguity. Therefore, when designing the GUI information, you should consider what the equivalent voice information is. In addition, information with different priorities should be distinguishable from each other, especially haptic information, as most devices that implement haptic feedback have low resolution, making it difficult for the user to distinguish between similar differences in vibration feedback.

4. The input and output of information are reasonable to avoid causing discomfort to people

The sudden appearance of a bright light in a dark environment can easily cause eye discomfort, and the output of hearing, touch and smell should also be considered to avoid causing human discomfort. Loud auditory signals are uncomfortable or even deaf; Tactile signals that are too high can cause pain; Too high a concentration of olfactory signals can easily cause pungent nose and even olfactory failure.

In terms of information input, inefficient input and cultural differences can also cause discomfort. For example, during voice interaction, the user's instructions are awkward or the reading time takes several seconds, which will cause the user's dissatisfaction; The same gesture may be interpreted differently in different cultures. Take the "OK" gesture, for example, which means "okay" in the cultures of the United States, the United Kingdom, and China, but in parts of Turkey, Greece, Brazil, and Germany, the "OK" gesture is an extremely insulting and offensive gesture, and this problem is especially prominent when it comes to international design.

3. The future development trend and breakthrough point of multi-mode interaction

At present, different car companies have invested more multimodal technologies in intelligent cockpits, such as voice interaction, gesture recognition, face recognition, posture tracking, etc., but technologies such as eye tracking and heart rate recognition have not been used in intelligent cockpits due to insufficient accuracy. In the absence of eye tracking technology, there will be great problems in fitting the content of AR-HUD with road information, so that drivers will misjudge when making decisions. For example, in 2022, when a Xpeng Motors owner used the NGP assisted driving function, he was judged by the system to be "driving and sleeping" because of his small eyes, and his intelligent driving score was deducted 4 points.

It is not an easy task to achieve a significant improvement in technical accuracy. Taking speech recognition accuracy as an example, in 2015, the accuracy rate of Chinese speech recognition has reached 97% in a laboratory environment, but no significant change has been seen in this number in subsequent years.

When the results of a single modality are inaccurate due to accuracy problems, the fusion between modalities is more problematic, especially when some modalities involve environmental and human factors. For example, if a driver is "intently" looking at the road ahead, and the steering wheel rotation angle, road offset and other parameters are not abnormal, can we determine that the driver is driving seriously?

The answer is no, because the driver may be in a daze and already in a state of distraction. Why is this happening? Because when a person is in a daze, his blinking, head movement and other actions do not show distraction and fatigue, and the system cannot perceive whether the driver is driving normally. Therefore, there are likely to be many uncertainties behind the various "black technologies" realized through modal fusion, and those who must pay attention to the objectivity and accuracy of the scheme when solving similar problems.

In addition to multi-mode interaction, there are still a large number of technical problems to overcome, and there is one of the biggest obstacles in the landing process, that is, insufficient computing power. Although car companies are plugging more cameras and sensors into the cockpit, the question of whether the algorithm is enough has become a problem. In the intelligent cockpit, in addition to multiple screens, interface and animation rendering, as well as various common applications that occupy computing power, it is not easy to run on an on-board chip and ensure a smooth user experience while using technologies for multi-modality, such as voice source localization for voice interaction, call recognition, voice noise reduction, ASR (voice recognition) offline command recognition, face recognition, gesture recognition, DMS (driver monitoring system), AR-HUD navigation map navigation, etc., while running on an on-board chip.

At present, the on-board chip is 2~3 generations later than the current mobile phone chip, although the bottleneck problem caused by computing power will gradually decrease in the future, it is undeniable that there will be more new problems in the future, such as assisted driving and automatic driving are more mature, AR-HUD, audio-visual, game entertainment will have more requirements for computing power, and how much computing power is left for multi-mode interactive technology is also a problem.

In general, the difficulty of multi-mode interaction lies not only in the research of various computer technologies, but also in the study of human behavior, especially ergonomics, and more importantly, the correct identification of these behaviors and the intentions behind them, so multi-mode interaction is a system engineering involving psychology, ergonomics, computers and other disciplines. Before the maturity of various technologies, how to improve the intelligent cockpit driving experience through multi-mode interaction will be a long-term topic worth exploring.

This article was originally published by @ALICS on Everyone is a Product Manager. Reproduction without permission is prohibited

The title image is from Unsplash and is licensed under CC0

The views in this article only represent the author's own, everyone is a product manager, and the platform only provides information storage space services.