laitimes

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

author:Quantum Position

编辑部 整理自 MEET2024

量子位 | 公众号 QbitAI

"In the next one to two years, smart cars are actually at a critical breakthrough point. ”

At the MEET 2024 Intelligent Future Conference, Wang Xiaogang, co-founder and chief scientist of SenseTime, and president of Jueying Intelligent Vehicle Business Group, said this.

He also identified three specific things that would happen:

The first is end-to-end data-driven autonomous driving;

the second is the emergence of the cockpit brain with the large model as the core and the foundation;

The third is cockpit integration, where all cockpit and driving experiences are realized on the same chip and the same domain control, which greatly reduces costs and computing power.

And all of this is based on big models.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

SenseTime has been laying out large models since 2018, and SenseTime is the smart car business segment of SenseTime, adhering to the development strategy of the trinity of driving, cabin and cloud.

At the conference, Wang Xiaogang reviewed the development and introduction of AI in the past ten years, and discussed the technological breakthroughs and future development brought by general artificial intelligence and large models to smart cars.

In order to fully reflect Wang Xiaogang's thinking on large models empowering smart cars, Qubit edited the content of his speech on the basis of not changing the original meaning.

About MEET Smart Future: MEET is a top business summit in the field of smart technology hosted by qubits, dedicated to discussing the implementation and industry application of cutting-edge technology technologies. This year, dozens of mainstream media and live streaming platforms reported on the live broadcast of the MEET2024 conference, attracting more than 3 million industry users to participate online, and the total exposure of the whole network exceeded 20 million.

Presentation Takeaways

  • The development of general artificial intelligence large models requires the capabilities of software and hardware infrastructure.
  • Only by continuously and efficiently exploring on a strong infrastructure can we accumulate a large number of know-how in a short period of time and train models with a scale of 100 billion or even larger.
  • The emergence of a large model makes it possible to build the brain of an intelligent cockpit, through which various APPs and hardware devices in the cabin can be mobilized.
  • In the next one to two years, smart cars are at a key breaking point.

(The following is the full text of Wang Xiaogang's speech)

Hardware and software infrastructure are essential

Today, we will share the technological breakthroughs and future development opportunities brought by general artificial intelligence and large models to our smart cars.

Looking back on the development of artificial intelligence in the past ten years, the breakthrough of artificial intelligence in 2012 made the robot face recognition exceed the naked eye recognition rate, which led to a series of artificial intelligence landing and application in industry.

But the problem is that for different tasks, customized models are needed to customize the solution.

In the past few years, SenseTime has exported more than 30,000 commercial models, and on the one hand, we have seen the widespread application of AI, and on the other hand, we have also seen that the R&D cost is large and the R&D cycle is long.

The emergence of ChatGPT at the end of last year has brought a new paradigm of artificial intelligence, based on one or several very powerful large models that can solve many open-ended tasks, opening a new path for the large-scale industrial application of artificial intelligence.

The development of AI over the past few decades has been to solve the problem of small samples, when the amount of data was very small. The computing resources and models we use are also relatively small.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

However, in 2012, with the advent of the deep learning Xi, we saw a significant increase in computing power, including the emergence of Transformer and the emergence of large models, and the computing power has increased to a larger scale.

In the field of smart cars, our industry benchmark is Tesla.

Today Tesla has 14,000 GPUs, and by next year, it will increase its computing power by 100,000. Behind such a growth scale, the strong computing power support has guided us in some directions for the future development of the industry, and it is also an investment that many domestic OEMs can hardly match today.

SenseTime began to lay out large models in 2018 and invested more than 5 billion yuan in Shanghai Lingang to build an artificial intelligence data center.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

At that time, many people wondered, as a company that makes algorithms, why did it invest so much in the construction of infrastructure?

Today, we see the development of large models of general artificial intelligence, and the capabilities of software and hardware infrastructure are essential.

We currently have 30,000 high-end GPUs, including 6500P computing power, and in fact, this computing power will increase massively by next year, and it is expected to increase to 16500P computing power.

The large models we talked about, including perceptual models, various generative models, multimodal models of Wensheng diagrams, and various models of decision-making intelligence, all of which are built on the capabilities of powerful software and hardware infrastructure systems.

We have done some statistics, and in the past few months, models with tens of billions of parameters have been trained more than 100 times, and 1 billion models have been trained more than 1,000 times.

It is through continuous and efficient exploration on the basis of strong infrastructure that a large number of know-how can be accumulated in a short period of time, and a model with a scale of 100 billion or even larger can be trained.

And there is also a great correlation between these models, for example, multimodal models are developed on the basis of language models, as well as vision models.

Our decision-making intelligence model also leverages the powerful reasoning power of language models.

Large models empower intelligent driving

There are many application examples in smart cars, such as autonomous driving, from high-speed to urban areas, there are many complex scenarios.

Based on the large model, we can break our previous dependence on handwriting rules, and can do some complex reasoning for the scene.

For example, if you are given a photo on the left and ask "How should I get to Huangshi East Road?", the model will tell us the weather conditions today, the vehicles driving ahead, the road signs, and the left lane according to the road signs.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

In the image on the right, we see a scene at a complex intersection, asking "What decision should the white car make?"

Our large model can also analyze the traffic situation at the intersection based on the image, and can see that there is an ambulance in it, and we know what to do.

Jueying is the business segment of SenseTime Intelligent Automobile. In the era of intelligent vehicles, as a core supplier of large models and general intelligence, we mainly focus on intelligent driving, intelligent cockpit and AI cloud services.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

In terms of intelligent driving, providing solutions for intelligent driving that integrates software and hardware is also inseparable from large models.

Because the development trend of intelligent driving in the future is more based on vision, and it is a system that realizes autonomous driving end-to-end through a large model and neural network, we have also done UniAD work before.

In terms of intelligent cockpits, the existing intelligent cockpits are provided by various AI suppliers to provide some single-point AI functions, and OEMs organize these functions based on some rules to form some products or some solutions.

The emergence of large models makes it possible to create the brain of the intelligent cockpit, through which various APPs and hardware devices in the cabin can be mobilized.

This also relies on powerful AI cloud services. Today, many OEMs also want to have AI infrastructure, including the formation of a closed loop of data.

Collecting massive amounts of data from a large number of mass-produced vehicles, and analyzing, processing, and annotating these data quickly, efficiently, and cost-effectively, also reflects the advantages of large models.

We will establish a complete R&D system based on large models, from large model training, low-cost deployment, including model layer, and data production pipelines. On top of this, the technologies of various function calls and data models in these connection layers support the driving and cockpit, including the application of vehicle-road collaboration.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

The core of the cockpit brain is also based on the ability of the language model to control various software and hardware in the cockpit. There are a variety of sensors inside and outside the car, and multimodal models can sense the environment and passengers, including the driver's needs, in all aspects.

We have a memory module, which has a long-term and short-term memory for passengers and drivers, and through the combination of plug-in knowledge base and large model, knowledge integration can form a service of thousands of people.

AIGC can also realize AI virtual assistants, provide a variety of anthropomorphic services, and carry out various intelligent control through large models.

What you can see below is a series of intelligent cockpit applications and products developed in the intelligent cockpit based on large models, including air painting, content generation, AI manuals, health consultations, travel planning, etc., which can raise our intelligent experience in the cockpit to a new level.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

The future is to realize end-to-end autonomous driving with large models

We see that the future development trend of intelligent driving, the first is to develop in the direction of pure vision.

Today's intelligent driving system still relies on various sensors, and relies more on cameras in hardware, which greatly reduces the cost of hardware.

In fact, there are many modules in the intelligent driving system, including perception, fusion, prediction, positioning, decision-making, and regulation.

When the scenarios covered by our autonomous driving have changed from relatively simple high-speed navigation to more complex urban areas, the complexity of the scenarios has increased significantly, and at this time, relying on handwritten rules to solve various corner cases needs to rely more on data-driven.

Behind it, all the modules of perception, fusion, positioning, decision-making, regulation and control are connected in series through a model, and the data-driven coverage of as many scenarios as possible is also the route for the future development of autonomous driving that can be seen in the industry.

In September this year, Tesla announced that the route to future mass production of autonomous driving is an end-to-end large-scale model-based solution.

At the end of last year, we proposed UniAD, which uses a neural network to achieve preprocessing, perception, prediction, decision-making, and the connection of various modules, and this work also won the CVPR Best Paper Award this year.

In this work, we can see that through network data driven, the performance of each module in it can be greatly improved, and it is possible to get rid of the dependence on high-precision maps in the future with low-cost vision solutions.

Moreover, our multi-module large model and more language output can also provide more language explanations for various decisions of our autonomous driving, and now autonomous driving is no longer a black box, and making every decision can give us reasoning and logic.

It can also talk to people and control various behaviors of autonomous driving through voice.

We can also build world models and generate a variety of realistic autonomous driving simulation data through large models, so as to achieve data-driven and end-to-end training of autonomous driving.

Wang Xiaogang, SenseTime: The next one or two years will be a critical period for smart cars to break through, and large models are the foundation

Finally, we can see that in the next one to two years, smart cars are actually at a key breakthrough point, in fact, there are three things:

The first is end-to-end data-driven autonomous driving;

the second is the emergence of the cockpit brain with the large model as the core and the foundation;

The third is cockpit integration, where all cockpit and driving experiences are realized on the same chip and the same domain control, which greatly reduces costs and computing power, achieves better integration at the product level, and realizes better intelligent driving and intelligent cockpit experience.

All of these are also based on large models.

We are very much looking forward to the future intelligent driving can become a safe and reliable driver, and the intelligent cockpit can become a warm and good housekeeper who understands you, so as to achieve better human-machine co-driving.

— END —

QbitAI · Headline number signed

Follow us and be the first to know about cutting-edge technology trends

Read on