laitimes

What would be the ideal interaction mode for a smart cockpit?

Introduction: The author of this article, Li Xingyu, is the vice president of horizon ecological development and strategic planning, a senior expert in the fields of autonomous driving, artificial intelligence chips and edge computing, and has 18 years of experience in the semiconductor industry. Through the analysis of the development trend of human-computer interaction and the insight into the development of intelligent cockpits in China, the authors will draw the following key conclusions:

The most important trend of human-computer interaction in the future is that the machine moves from passive response to active interaction, from human adaptation to machine to machine continuous adaptation to human, the ultimate goal is to make the machine anthropomorphic, Turing test is the standard of measurement.

In the stage of human-machine co-driving, the human-computer interaction ability must match the automatic driving ability, otherwise it will bring serious safety problems. The cross-domain convergence of intelligent driving and intelligent cockpit is the direction of development.

In the future, physical screens and touch will no longer be the center of cockpit interaction, replaced by natural interaction + AR-HUD.

Speech, gestures, and eye tracking are the three axes of natural interaction, with sensors, computing power, and algorithms being the material foundation.

The current cockpit is dominated by the entertainment domain, but in the future, the positioning of the entertainment domain and safety domain (human-computer interaction and automatic driving) in the cockpit will be adjusted, and the safety domain will become the main control domain.

Intelligent cockpit human-computer interaction is an important breakthrough in the brand of China's intelligent automobile company.

The following is the body section:

Intelligent cars are the first form of robots, and the corresponding intelligent cockpit has also led to a new direction for the development of human-computer interaction in the robot era. Historically, every change in the way of interaction has reshaped the industrial landscape of smart devices. Just as the transition from DOS to Windows has brought about great changes in the industry, human-computer interaction has opened a door to a new industry.

Human-robot interaction will change the way we approach smart cars, pan-robots, and artificial intelligence. The most important invention of mankind was the creation of a language system for human interaction and brought about human civilization.

Today, human-machine natural interaction may be the next cornerstone invention, which, combined with the autonomous decision-making of machines, will bring about machine civilization, reshape the relationship between humans and machines, and will have a profound impact on our social work style and lifestyle.

What is the trend in the way humans and machines interact?

Where will the cockpit's human-machine interaction go in the future? The answer to this question may need to be sought from the history of smart device development.

The computer industry is the origin of the development of human-computer interaction technology. In fact, human-computer interaction was not called HMI at the beginning, but HCI, or Human–Computer Interaction. The history of the development of the PC is widely known, and the following figure is a simple division of development stages:

Computer human-computer interaction began with a DOS system plus a keyboard, and the operation of the command line interface requires very high professional skills, which can only be used by a small number of professionals.

The advent of the mouse and windows operating system changed everything, allowing pc users to explode. Next, touch became a simpler and more straightforward way to operate, and tablets like the Surface appeared. Microsoft Cortana represents the latest way of interaction, and we can use voice to interact with machines in a more natural way.

The history of the development of PCs and mobile phones reflects the development of the interaction between machines and people, that is, from complex to simple; from abstract operation to natural interaction. The most important trend in human-computer interaction in the future is the transition from reactive to active interaction.

Looking at the extension line of such a trend, the ultimate goal of human-computer interaction is to make machines anthropomorphic, it can be said that the development history of human-computer interaction is the history of human adaptation to machine to machine continuous adaptation of human development.

The development of the smart cockpit has gone through a similar process.

Multimodal interaction is an ideal model for the way the next generation interacts with humans. What is multimode interaction? Simply put, it is the use of gestures, eye tracking, voice and other ways to interact. The mode here is similar to the human "senses", and the multimodal state is to integrate multiple senses, corresponding to the five senses of human vision, hearing, touch, smell and taste.

But the naming of multimodal interactions is too technical, and I prefer to call it natural interactions. For example, gestures can be said to be native "mice", and different gestures can express rich semantics.

Typical gesture semantics and corresponding implementations

What is the natural interaction implementation?

Intelligent cars are essentially humanoid robots, and the two most important capabilities of robots are autonomous decision-making capabilities and human-computer interaction capabilities, and without any of them, it is impossible to effectively serve humans. Therefore, creating intelligent human-computer interaction capabilities is a must.

How to measure the intelligence of human-computer interaction? One of my thoughts is to use the Turing test, which is whether machines can behave indistinguishable from humans in terms of interaction behavior.

How do you interact naturally? Sensors, computing power and algorithms are indispensable.

More and more sensors will be integrated in the cockpit, on the one hand, the demand for computing power in the cockpit will continue to soar, and the demand for AI computing power in the cockpit will rise to more than 30 TOPS, or even reach the level of 100 TOPS level. On the other hand, the cockpit also provides better perceptual support.

Cockpit sensors are rapidly increasing in number and variety

AI computing can realize the perception of multiple information such as faces, expressions, gestures, and speech, so as to achieve more intelligent human-computer interaction. The computation of cockpit human-machine interaction must rely on edge computing, not cloud computing. Because of three points: reliability, real-time, and privacy protection.

Personal privacy protection is probably one of the biggest challenges facing our generation in the ERA of AI, and the privacy of the privacy space in the cockpit is more prominent.

Today's speech recognition, the vast majority of which is still carried out in the cloud, where biometric information such as voiceprints can easily reveal personal identity. By performing edge AI computing on the car side, personal biological information such as video and voice can be removed, converted into semantic information, and then uploaded to the cloud, which can effectively protect the privacy of personal data in the car.

In the era of autonomous driving, interactive intelligence must match driving intelligence

In the foreseeable future, human-machine co-driving will be a long-term state, and human-computer interaction in the cockpit is the first interface for people to understand the ability of automatic driving.

At present, there is an uneven evolutionary challenge in intelligent car technology, and the human-computer interaction ability lags behind the development of automatic driving capabilities, resulting in frequent automatic driving accidents and affecting the popularity of automatic driving.

Human-machine co-driving is characterized by human in the loop, so the human-computer interaction ability must match the automatic driving ability, otherwise it will bring serious expected functional safety problems, and almost all fatal accidents of automatic driving are related to this.

Even if there is no accident, the lack of understanding of the state of autonomous driving can cause serious panic and anxiety.

For example, in the actual driving conditions of the automatic driving system, there is often a "ghost brake" situation. If the human-computer interface can display the perceptual results of autonomous driving, the driver may understand that the system misjudgment is caused by the identification of a can on the road as a car.

This is the reason why Tesla is showing more and more self-driving perception results. As the capabilities of autonomous driving become stronger and stronger, users will pay more and more attention to the processes and states presented by autonomous driving systems in a virtual 3D environment.

Human-computer interaction and the development of autonomous driving complement each other, for example, in the future, more humanized parking should be people-vehicle co-parking, including human-to-vehicle takeover and vehicle-to-person takeover. For example, if the car encounters difficult road conditions, it may say that I am not sure and ask to take over. For example, if people can't stop for a long time, AI algorithms recommend whether to turn on automatic parking.

This cabin-to-berth integration solution can improve the overall experience of intelligent cockpit interaction and parking, and can also greatly save hardware costs: through the time-sharing reuse of AI chip resources, it can meet the needs of cockpit perception and APA parking perception at the same time, thus providing a cost-effective solution for the industry. It can also allow intelligence to explore more low-end models. In China, Horizon and Yingchi Technology Cooperation are promoting the development of this program.

At present, the interaction mode of the smart cockpit is mainly an extension of the mobile phone Android ecology, mainly supported by the physical screen. Today the screen is getting bigger and bigger, even reaching 60 inches, which actually occupies the space of high priority functions with low-priority functions, and also brings additional information interference, which is easy to distract people and affect driving safety.

Physical screens will still exist in the future, but I have a judgment that in the future, physical screens and touch will no longer be the center of cockpit interaction, but will be replaced by natural interaction + AR-HUD.

The first reason is that human-computer interaction for automatic driving is a problem of food and clothing, is just needed, belongs to the safety domain, and has the highest priority. Human-computer interaction for music, games and comfort is a well-off demand, belongs to the entertainment domain, and can have enough room to play after achieving the previous stage of tasks.

Therefore, in the future, the positioning of the entertainment domain and safety domain (human-computer interaction and automatic driving) in the cockpit will be adjusted, and the safety domain will become the main control domain.

The second reason is because the natural interaction mode + AR-HUD interface is more secure, such as through voice and gesture communication, can avoid the driver's line of sight shift, thereby improving driving safety, which can not be done by the large screen in the cockpit. In contrast, AR-HUD can avoid this problem while displaying autonomous driving perception information.

The third reason is that the natural interaction method is invisible, simple and more emotional interaction, which does not occupy too much of the valuable physical space in the car, but can be accompanied at any time, giving the driver and passengers more trust and security.

Based on the above analysis, the cross-domain integration of intelligent driving and intelligent cockpit in the future is a more certain development direction, and the final birth is the vehicle central computing platform.

Current stages of development, cutting-edge practices and challenges

At present, the speech recognition of the cockpit has been popularized, and the mainstream manufacturers of speech recognition mainly use end-to-end algorithms, and the speech recognition accuracy rate can be as high as 98% in the ideal experimental environment.

DMS is gaining popularity at a rapid pace, and it is predicted that by 2030, more than 50% of models equipped with in-car cameras will be used.

The next step will be the combination of voice + gesture + eye tracking + AR-HUD interactive interface, which is an intelligent interaction method corresponding to L3+ level automatic driving, and the industry's leading car companies have begun to layout.

The practice of China's independent brands in this area is basically on a par with foreign leading brands, and it is faster in terms of iteration speed.

In 2020, Changan launched the UNI-T model, which includes a number of active services. For example, if you are answering a phone call, the system will automatically reduce the multimedia volume; for example, when the central control screen of the car is in the off-screen state, looking at the screen for a second can wake up the screen. The solution is equipped with the Horizon Journey 2 chip, which supports the interaction of commands such as speech, action posture, and facial expression.

The ideal natural interaction goal is to start from the user experience, which needs to provide a stable, smooth, and predictable interactive experience. But no matter how full the ideal is, it must start from the reality of bone feeling, and the current challenges are still numerous.

For example, the current misidentification of natural interactions is still severe, and the reliability and accuracy of all-weather conditions and all-weather conditions are not enough. Another example is gesture recognition, maybe you inadvertently move a gesture, it will be mistaken for a command action, which is just one of the countless misidentification cases.

In addition, in the moving state, lighting, vibration, occlusion, etc. are all huge engineering challenges. The fluency of natural interactions is also an urgent problem to be solved, which requires higher performance sensors, more powerful computing power and efficient algorithms to gradually improve. At the same time, natural language understanding (NLP) and intention understanding are still in their early stages, and algorithmic theory innovation is needed.

Natural human-computer interaction will be the cornerstone invention of the robot era

In the current fierce industry competition, intelligent cockpit has become a key move for automakers to achieve functional differentiation, cockpit human-computer interaction and human communication habits, language and culture are closely related, so it must be highly localized, intelligent cockpit human-computer interaction is an important breakthrough in the brand of China's intelligent car companies, but also a breakthrough in China's intelligent automobile technology to lead the global technology trend.

The intelligent cockpit industry chain will continue to extend, there will be more players entering the big ecology of smart cars, smart car players will also cross borders into more robot fields, and the future development theme of the intelligent cockpit ecosystem will revolve around "ecological collaboration" and "cross-border extension".

This technological revolution will have a disruptive impact, not only opening up a new industrial ecology, but also having a profound impact on our social work style and lifestyle.

Read on