Empowering autonomous driving, is the big model really strong or false?

"Everything can be a big model" has become a topic that cannot be avoided by all industries this year.

Since the beginning of the year, ChatGPT, which can ghostwrite papers, reports and even news, has completely detonated the attention of all parties in society to the big model. At a time when the public is worried about whether it will be replaced by AI, the autonomous driving industry has aimed at the commercial value of large models and launched large models. Recently, Li Auto announced at the Ideal Family Technology Day conference that it adopts a large model algorithm and develops its own MindGPT. Can the much-touted large model bring a "second spring" to the autonomous driving industry?

Big models become new favorites

"Autonomous driving technology is entering a new stage represented by multimodal perception and cognitive understanding driven by both data and knowledge." Huang Wuling, deputy director of the Cognitive Intelligence Laboratory of the Xiong'an Innovation Research Institute of the Chinese Academy of Sciences, said in an interview with China Automotive News that the emergence of large language models (LLM) and cross-modal large models has brought breakthroughs to the development of the autonomous driving industry. At present, the industry's definition of a large model is that it can only be called a large model if the parameters reach more than 100 billion. When AI models are large enough, trained and learned, it is possible to achieve intelligence. Therefore, it is regarded as a large model that can change the world and has become a new hope for autonomous driving.

In the data closed loop and simulation, large models will empower autonomous driving. Wang Xiaogang, co-founder and chief scientist of SenseTime, said that in the era of artificial intelligence 1.0, a large number of manual labeling leads to long data labeling time, high cost and difficulty in mining. However, in the era of artificial intelligence 2.0, automatic labeling can be realized based on large models, which greatly reduces costs and can be quickly optimized and iterated. In addition, AIGC can also use artificial intelligence to generate content to simulate and generate highly realistic scenarios, helping autonomous driving technology to better test and optimize. "With the assistance of large models, R&D personnel can focus more on key algorithms and their improved experience, and focus on polishing more products that meet the user experience and have good results." Huang Wuling said.

In Wang Xiaogang's view, the application of multimodal large models can also realize end-to-end integrated optimization from perception to decision-making, planning, and control. "At present, most of the perception output is a result, based on rules to make some judgments, make decisions, and then based on manual rules, to achieve planning control." He believes that in the future, large models can achieve end-to-end autonomous driving through artificial intelligence, providing a more reliable, human-like experience.

At present, the industry generally believes that in terms of technology, the underlying architecture and most of the technical problems of autonomous driving have been solved, but due to the complexity of real road scenarios, even if the existing technology has achieved automatic driving in more than 90% of the scenarios, the remaining 10% of the long-tail scenarios can never be covered. Huang Wuling said that with the gradual maturity of the application of large models in the vertical field, under the premise of controllable costs and good performance efficiency, large models are expected to be applied to algorithm functions such as environmental cognitive understanding and intelligent decision-making, so as to precipitate and apply traffic instructions and driving experience, and alleviate the "long-tail problem" of automatic driving.

In addition, large models can also help autonomous driving "abandon" high-definition maps. High-precision maps are indispensable for achieving high-level autonomous driving, but the three "mountains" of difficulty in real-time updates, high regulatory risks, and extremely high costs have always been difficult to cross. Getting rid of high-definition maps has become the choice of many enterprises. As large models become more and more concerned, Essence Securities Research Report pointed out that AI large models will help enterprises achieve "high-precision maps". The BEV perception algorithm uniformly converts the pictures collected by cameras from different viewing angles, which is equivalent to the real-time generation of maps by the vehicle, supplementing the road topology information required for subsequent decision-making of autonomous driving, and realizing "de-mapping".

Products are released one after another

The popular Chat-GPT, full name "Generative Pretrained Transformer", uses the Transformer architecture proposed by Google in 2017. For the field of autonomous driving, the Transformer architecture is no stranger. As early as 2021, Tesla introduced the Transformer architecture into the field of autonomous driving and launched a BEV perception scheme based on Transformer. This is the first appearance of large-model technology in the autonomous driving industry, and it has also become the key to Tesla's realization of pure visual automatic driving solutions. Subsequently, Huawei, SenseTime, Baidu Apollo and other enterprises have successively launched their layout on "BEV+Transformer". CITIC Securities Research Report pointed out that with the successive implementation of urban navigation functions such as Xpeng City NGP, Huawei Urban NCA Function, and NOH City, "BEV+Transformer" will lead the autonomous driving perception paradigm.

Today, large models are by no means limited to autonomous driving perception. In April this year, DriveGPT "Snow Lake Hairuo" was officially released. According to Gu Weihao, CEO of MilliDriveGPT, DriveGPT continuously optimizes the cognitive decision-making model of autonomous driving by introducing driving data to establish RLHF (Human Feedback Reinforcement Learning) technology, which is mainly used to solve the cognitive decision-making problem of autonomous driving, and the ultimate goal is to achieve end-to-end autonomous driving. Gu Weihao said that DriveGPT will take the lead in exploring four application scenarios of intelligent driving, driving scene recognition, driving behavior verification, and difficult scene relief, and will first open two application scenarios of intelligent driving and driving scene recognition.

In the field of autonomous driving, SenseTime has developed UniAD, the industry's first end-to-end autonomous driving solution with integrated perception and decision-making, which surpasses the SOTA method in many key technical indicators such as multi-target tracking accuracy and lane line prediction accuracy, and greatly improves the overall system and performance. "In the future, we will use multimodal large models to further promote the development of autonomous driving technology, such as generating a large number of difficult samples through AIGC, using surround perception data and multimodal data as input to multimodal large models to achieve the integration of perception and decision-making." Wang Xiaogang said.

Mind GPT, which was officially unveiled not long ago, is a cognitive large model developed by ideal cars. Ideal has used 1.3 trillion tokens to train its pedestal model, making its dialogue generation, language understanding, knowledge question answering, logical reasoning and other capabilities safer, more accurate and more logical. With the empowerment of Mind GPT, the intelligent voice assistant equipped with the ideal car, the ideal classmate, will actively perceive the environment and others, learn and think, express and interact like a human. In addition, in terms of intelligent driving, the ideal AD Max 3.0 can get rid of the dependence on high-definition maps through large model AI algorithms, real-time perception, decision-making, and planning, and the recognition accuracy is quite high. Lang Xianpeng, Vice President of Intelligent Driving of Ideal Auto, said: "Driven by advanced technical architecture and efficient training platform, intelligent driving will soon achieve large-scale popularization in family travel, and the era of AI drivers replacing human drivers is no longer far away. ”

In addition, Baidu has previously said that it will apply the Wen Xin model to autonomous driving to deepen the understanding of Apollo autonomous vehicles on complex urban road conditions and further improve the safety and reliability of their autonomous driving. Based on Alibaba's Tongyi Qianwen model, Zebra.ai has created Banma Co-Pilot, a third-generation automotive AI capability system, to build cloud-integrated full-stack AI capabilities. A few days ago, Tesla CEO Musk also said that Tesla will usher in its own "ChatGPT moment", if not this year, it will certainly not be later than next year. A series of large-model products have been released one after another, which shows the degree of "favorited" of large models in the field of autonomous driving.

Commercialization is too early

"At present, it is not clear what impact the big model can bring to the industry, and some capable and funded companies are only in the first exploration stage, and it is too early to commercialize." Cao He, president of All-Union Auto Dealer Investment Management (Beijing) Co., Ltd., said.

Focusing on the large model of autonomous driving, Lu Zhaobo, a practitioner in the autonomous driving industry, does not have much confidence in the gold content of several large model products released earlier. He said bluntly: "DriveGPT is very unrealistic, even if large enterprises invest in research and development, it is difficult to see results without 5~10 years." The concept of this big model is so big that they might just do a simple data fusion. ”

In Lv Zhaobo's view, the advantage of the large model is that it can fuse various sets of data together and have a more accurate perception of the external environment. However, to use a large model, you first face the deployment problem. "If the large model is deployed in the cloud, the latency problem is difficult to solve; If deployed on the vehicle side, such a huge amount of data, the latency problem should not be underestimated. He said. Whether or not getting on the car has become the primary problem that plagues the commercialization of large models of autonomous driving.

In this regard, Yu Kai, founder and CEO of Horizon, pointed out when participating in the 2023 China Electric Vehicle 100 Forum that practical difficulties such as energy supply and heat dissipation at the end of the vehicle make it impossible for autonomous driving to use the huge amount of models and calculations in ChatGPT cloud computing. Gu Weihao said in an interview with the media that the size of the cloud model and the car end model is not completely equivalent to the relationship, the current DriveGPT parameter scale has reached 120 billion, but it does not mean that the 120 billion parameter large model is on the car end, the key is to retain the core capabilities.

In addition, the cost issue is also one of the problems. Some insiders pointed out that if the automatic driving system is to be on a large model, it will increase the cost of at least 50,000 US dollars, and the cost may further increase as the large model becomes larger. In this regard, Lu Zhaobo said that the cost problem can be solved by cloud deployment, but the premise is to solve the delay problem in the cloud. Even the big models themselves consider cost to be an important consideration. In response to the question "If ChatGPT is applied to autonomous driving, will it be too costly?" When asked this question, Chat-GPT replied that applying ChatGPT to autonomous driving systems involves certain costs, mainly involving computing resources, data collection and training, model development and integration.

Public opinion is fiery and capital is calm

Kai-Fu Lee, former chairman and CEO of Innovation Works, announced the preparation of Project AI 2.0, a global company, and then Sogou founder Wang Xiaochuan invested 50 million US dollars to establish Baichuan Intelligence. Previously, Sequoia China Seed Fund also said that it is paying close attention and beginning to deploy early-stage companies in the AIGC field. One side is a feast of capital, the other side is a little deserted. Since 2022, news of layoffs, closures, and shutdowns has filled the autonomous driving industry, and many people lament that autonomous driving has entered a "cold winter". Although it is still too early to apply large models in the field of autonomous driving, it is undeniable that the emergence of large models has rekindled the fire of the autonomous driving industry in the cold winter. Can this wave of linkage with large models allow the increasingly popular autonomous driving to regain the favor of capital?

Wang Yu, executive vice president and secretary general and researcher of the China Productivity Promotion Center Association, believes that the emergence of large models provides a group breakthrough opportunity, which can boost industry confidence and reshape the route of bicycle intelligent technology. However, in the view of Shao Yuanjun, an analyst in the automotive industry, although capital is enthusiastic about large models, after years of development, capital has recognized the development status of the autonomous driving industry and will not rashly invest in the embryonic period of large models.

According to Wang Xiaogang, a large model such as ChatGPT requires tens of millions of dollars in cost investment in just one training, and SenseTime has invested tens of billions of yuan in AI research and development in recent years, and the investment in AIDC infrastructure in Lingang alone has exceeded 5 billion yuan. With tens of billions of dollars of investment, it is difficult to achieve short-term profitability in the field of autonomous driving.

"Now affected by the economic situation, the entire capital industry itself is facing a cold winter, there is not much money, and it will be more cautious to shoot." Shao Yuanjun said. In this way, the hot large model seems to be difficult to solve the current cold of the autonomous driving industry.

Text: Zhang Yiwen Editor: Huang Bei Layout: Wang Kun

Add a chicken leg to the editor!