Jia Haonan sent from the co-driving temple smart car reference | Official account AI4Auto
The power of large models permeates all walks of life, and autobots are looking forward to great changes in the industry in a restless and anxious manner. But so far, the embarrassing status quo of the large model on the car is: it has nothing to do with the car.
Functions such as "Wensheng Map" do not match the core driving scene, and are not even good in-car entertainment. As for the AI transformation of car companies, it is obviously not helpful.
Large models reshape productivity, and the automotive industry cannot and should not be left behind. In fact, the AI industry has been thinking and practicing.
Recently, a white paper on swarm intelligence technology in the automotive industry driven by large models jointly released by industry, academia and research circles has clarified for the first time how large models should be used in the whole process of the automotive industry.
What is the use of large models in the automotive industry?
Let's break the sentence first: the automobile industry model, this understanding is more accurate.
Because the large model proposed in this white paper is not an application such as "Wensheng Diagram" for ordinary users, but a group intelligence product that provides services for the production and operation process of car companies.
What is Swarm Intelligence?
The task-specific AI model is an agent, and swarm intelligence refers to the collective intelligence formed by multiple agents through collaboration and information sharing, which can handle more complex tasks and exhibit capabilities beyond a single agent. Species such as bees and ants in nature exhibit such swarm intelligence.
The swarm intelligence, supported by large model capabilities, can communicate more efficiently and handle tasks on a larger scale and with a wider variety of tasks.
Swarm intelligence is more than just a simple automation tool in the automotive industry's operation process, including vehicle manufacturing, supply chain, R&D and engineering, sales and distribution, marketing, after-sales service, trade and logistics, leasing and financial services, and recycling and recycling.
For example, in automobile manufacturing, the operation status of the production line can be monitored in real time through the automatic interaction of multiple agents, and the maintenance needs of equipment can be predicted, thereby significantly reducing unplanned downtime.
In addition, the agents can help manufacturers optimize the inventory management and supply chain of parts by intelligently analyzing production data, which not only reduces inventory costs, but also improves production efficiency.
In addition, the inter-departmental agents can also intelligently adjust the production plan according to market demand, raw material supply and production capacity to ensure the efficient operation of the production line.
In addition to "building good cars", the value of swarm intelligence based on large language models is more reflected in helping car companies "sell good cars".
The marketing process of automobiles is usually divided into five aspects: customer acquisition, cleaning, conversion, reception and transaction.
In the early stage, through advertising, brand activities, automotive vertical media, brand private domain, content planting, etc., you can quickly obtain a large number of basic portraits and contact information of potential customers. Next is a series of communication, real vehicles, and explanations of "incubation and cultivation" work.
The cycle is long, the conversion rate is low, and it is especially dependent on the sales person's communication skills and energy, which has great uncertainty.
In the white paper, five smart marketing solutions have been built, namely the scenario solution of the Digital Intelligence Research Institute, the new media operation scenario solution, the user operation scenario solution, the intensive DDC scenario solution, and the intelligence operation scenario solution.
All sales results-oriented, automated, streamlined workflows with a focus on different combinations of agents to simulate the roles of each stage.
For example, for the customer's customized car purchase needs, the "sales intelligence experience" collects the user's personal situation, analyzes the demand model with a high degree of matching, and then expresses the results in professional words, and uses multiple rounds of dialogue to discuss the best sales plan with the customer.
At the same time, the operation supervisor agent can check the agent's follow-up status in real time, conduct follow-up status, quality analysis, review customer portraits, and feedback to the agent monitoring platform. Any experience of customer operation agent and customer communication will be precipitated with the increase of cases, forming an iterative mechanism of agent workflow, so that the efficiency of agent incubation customers will be continuously improved in the precipitation.
Therefore, in the intelligent sales scenario, a human sales manager can view the work of the entire organization in real time through the multi-agent monitoring platform, and the boundaries and scope of work capabilities have been greatly expanded.
Finally, to sum up, Tsinghua Natural Language Processing Laboratory, Yihui Intelligence, and Facewall Intelligence proposed a brand-new, To B large-scale model "onboarding" mode in the white paper:
Replacing different types of work in the business process of car companies with different AI models is, to put it simply, digital employees.
However, the innovation lies in the fact that they are not automated substitutions for simple and repetitive tasks, but rather a group of digital employees[6] who communicate and collaborate with each other through natural language, without formal "mastermind" control, to improve quality and efficiency.
Moreover, this synergy can be applied to almost every step from production to sales.
It is such a group of digital employees who have basic work ability and communication skills, that is, a large model with a certain AGI (general artificial intelligence).
How?
A single agent is relatively easy to do, and there are different basic models for different tasks, such as ResNet for classification object detection, GAN for sample generation, and so on...... As long as there is the right data to train.
But a business process, or a systems engineering, requires a lot of these foundational models to come into play. In the past, these models had little to no communication, and collaboration was largely based on human-written rules. This results in limited information processing capacity, one-sided and scattered output decisions, and high maintenance costs.
The key to the work of swarm intelligence proposed in the white paper is the organizational twin.
There are three key components: the job twin, the architecture twin, and the business twin.
Among them, job twins use large model technology to create digital employees, these virtual humans can simulate the way real people communicate, including voices and expressions, and have "perceptual intelligence". They are capable of content generation, basic communication, customer service, and more.
The agent system has a special prompt word framework, which cleverly designs the prompt words related to the position according to the promotion word framework, and accurately defines the scope and way of answering questions by the base model.
However, the pedestal model is a general-purpose language model, and its built-in knowledge is general, and it may not be able to give accurate answers to questions in a specific domain. To this end, the Retrieval Enhanced Generation (RAG) technology is also introduced, which can pour domain-specific documents and Q&A into the system to form a "long-term memory" and store it in a vector database or search system. During the generation process, relevant memories are injected into the teleprompter to enable digital employees to accurately answer domain-specific questions, thus compensating for the potential shortcomings of the pedestal model.
For example, in the automotive field, the agent can call the API interface and generate professional and traceable content based on the industry knowledge returned by the interface. However, when the long-term memory supplementation of prompt word engineering and knowledge base still cannot fully meet the business needs, efficient post-pre-training and efficient fine-tuning techniques can be adopted. Through fine-tuning and post-pre-training, we are able to "teach" vertical domain knowledge related to large models, giving digital employees personalization and making them better adapted to different business scenarios and user needs.
The architecture twin mirrors the organizational structure of the real company in the digital world, and defines the communication and logic between the agents through agent network technology. It can be vividly understood as the "OA process" that the above group of digital employees need to follow.
Based on large-model swarm agent technology, such as AgentVerse (jointly developed by Tsinghua Natural Language Processing Laboratory and Facewall Intelligence), it can not only define the memory and ability of the agent itself, but also define the way and logic of communication between agents, and can map the organizational structure of real humans to the digital twin world to a certain extent, and generate a digital twin architecture corresponding to the real company architecture.
This kind of technical architecture usually divides the multi-agent environment into several functional modules, including flexible code extension and customized function design framework, agent language interaction and cooperation mechanism, agent system function and structure evolution mechanism, etc.
The overall workflow is divided into four phases: the expert recruitment phase, where the agent composition is determined and adjusted based on progress in problem resolution. In the collaborative decision-making phase, the selected agents engage in a joint discussion to develop a strategy to solve the problem. In the action execution phase, the agent interacts with the environment to implement the action planned during the decision phase. The Evaluation and Feedback phase, where the difference between the current state and the desired outcome is assessed, and if the current state is not ideal, feedback is given for further refinement in the next iteration.
Technically, the technical framework defines the respective interfaces, and users can redefine the functions of different modules according to their own needs. This customizability allows the architecture of the digital twin to be no longer limited by fixed limitations, but can be flexibly adapted to the needs of different industries and enterprises. Users can customize the architecture of the digital twin according to specific scenarios and task requirements to better adapt it to actual application scenarios.
By integrating large language models, search augmentation technologies, and agent building, the business twin automates the execution of actual business and optimizes business execution results. This part is still a "tool" that uses large models to give digital employees [10] an increase in combat effectiveness.
For example, X Agent is an innovative AI agent framework based on the powerful large language model core, and the design innovatively introduces a "double loop mechanism", so that it can be comprehensively considered from the "macro" and "micro" perspectives when dealing with complex tasks, similar to the collaborative work mode of human "left brain" and "right brain".
The external loop assumes the responsibility of global task planning, and cleverly decomposes complex tasks into simple and actionable tasks, so that X Agent can efficiently complete the overall task decomposition and planning, and shows the leadership of macro task processing.
In the inner loop, the X Agent quickly changes its identity and acts as an efficient "executor" to ensure that the subtasks passed by the outer loop can smoothly meet expectations. It provides the flexibility to retrieve tools from external systems and solve them step by step according to the nature of the subtask.
After the subtask is completed, the inner loop generates a detailed reflection and passes feedback information to the outer loop, indicating whether the current task is completed or not, as well as potential optimization points in the task execution.
So, the key to everything, is on the big model. Here is a brief popularization of the large model:
Existing large language models are almost exclusively built on the basis of transformer models. The main idea is to obtain global information about an input sequence (which can be text, speech, image, video, etc.) through a self-attention mechanism, and to model each element in the sequence globally and make connections between the elements**.
Translated, Transformer has the basic ability to induct cause and effect in addition to perception, enabling artificial intelligence to take the first step towards understanding the world.
Therefore, the swarm intelligence proposed in the "White Paper" is based on the large models with certain general knowledge capabilities of Tsinghua Natural Language Processing Laboratory, Yihui Intelligence, and Facewall Intelligence.
The traditional AI agent, that is, a single artificial intelligence entity, also has the ability to perceive, make decisions, and know and act, but its mission goal is single, and the input data is relatively fixed.
For large models, the interaction with humans is based on prompts, and whether the user prompts are clear and unambiguous will affect the effect of the large model's answers. The "big" model captures complex language structures at a huge parameter scale for contextual understanding and coherent text output. This phenomenon of "ability emergence" is reflected in the ability of large models to perform high-level cognitive tasks, such as abstract thinking and creative writing. ChatGPT shocked the world precisely because it has an accurate understanding of almost every field that humans are involved in.
If such abilities are infused into a group of different agents, they can communicate directly using complex natural language.
and supports abstract thinking, complex problem solving, and rich information exchange. Based on an in-depth understanding and analysis of linguistic information, broader and deeper factors can be considered in decision-making.
For example, software development tasks can be broken down into a series of "production lines", and sub-tasks can be used to implement the solution proposal and decision-making discussion process between agents through role-playing communication:
Firstly, three roles are designed: CEO, CTO and CPO to discuss the software design scheme and decide the programming language to be used for the functional experience of the intelligent driving algorithm.
Then into programming, the programmer writes the code, and the designer does the GPU design.
Testing: The review and actual operation of the code involves two roles: "code reviewer" and "test engineer".
Documentation: There are two types of documents: environmental descriptions and user manuals, the former describes the environment on which the intelligent driving algorithm depends, and is completed by the CTO to guide the programmer. The latter is determined by the CEO and generated by the PRD.
Such a framework is particularly suitable for complex industry scenarios, especially in the automotive industry.
Smart cars are easy to do, but smart car companies are difficult to do
Indeed, with the strength of China's manufacturing industry and the complete level of supply chain, it is not difficult to "save" a smart car. For example, Xiaomi took 3 years, but it was not fast.
However, "smart" car companies are the most difficult challenge on the road to the survival of new forces and the transformation of old car companies.
Because of software algorithms, hardware domain control, self-development, etc., the money is well spent, and the team talents are naturally in place. However, how to turn large models into productivity and improve the quality and efficiency of the overall operation process is the most urgent need of car companies.
Yihui Intelligence revealed to the smart car reference that the car companies they contacted without exception showed interest in the application of AI Agent in improving work efficiency, optimizing costs, and improving customer experience.
In fact, from the previous examples, it can be seen that the headache of car companies is between the refined operation effect and controllable operating costs, and it is difficult to find an optimal balance with artificial deployment operations, whether it is production, procurement, marketing and so on.
From this point of view, the biggest significance of this first white paper on large models in the automotive industry jointly released by industry, university and research is to try to use the capabilities of large models to solve practical problems in the automotive industry and manufacturing industry.
And put forward a specific way: through the general knowledge and natural language processing capabilities of large models, a group of independent digital employees in the past can communicate and collaborate efficiently.
And then there's the schema architecture: the organizational twin, the process, the tools, and the methodology.
This is also the first time that the automobile industry has seriously treated large models as production tools, and found solutions with the end in mind, rather than a gimmick of "tricks and tricks".
According to McKinsey's estimates, by 2030, the digital workforce will form a market worth 1.73 trillion yuan, which naturally includes the automotive industry.
And the experience of the automobile industry can be replicated almost without damage to all large manufacturing industries.
The swarm intelligence technology driven by large models is the "spark" of AI transformation in the automotive industry, and the model and concept it pioneered are not limited to automobiles.