Accelerating the intelligent car into the AGI era, the original ecological large model of the SenseTime Jueying series was unveiled at the auto show

The 2024 Beijing International Automobile Exhibition (hereinafter referred to as the "2024 Beijing Auto Show") was grandly held today, and SenseTime made its debut at the Beijing Auto Show with a series of original ecological models and related product matrices.

As the first company to propose a general model of autonomous driving with integrated perception and decision-making, SenseTime showcased the road test performance of UniAD (Unified Autonomous Driving), a true end-to-end autonomous driving solution for mass production, for the first time at this auto show, and also brought an AI large model cockpit product matrix with multi-modal scene brain as the core and a new cockpit 3D interactive demonstration, showing a panoramic view of general artificial intelligence (AGI) A new future travel experience driven by technology, which is flexible, adaptive, deeply personalized, safe, reliable, and humane.

Wang Xiaogang, co-founder and chief scientist of SenseTime, and president of Jueying Intelligent Vehicle Business Group, said: "The competition of future automotive intelligence is essentially a competition for the integration and application of general artificial intelligence technology. SenseTime is committed to becoming a core technology partner to accelerate the entry of smart cars into the AGI era, bringing dual innovations in production efficiency and interactive experience to the smart car industry, and will work with partners to provide a new experience of smart cars driven by general artificial intelligence technology and jointly define the future of mobility. ”

Accelerating the intelligent car into the AGI era, the original ecological large model of the SenseTime Jueying series was unveiled at the auto show

(SenseTime debuted at the Beijing Auto Show)

Drive like a human! UniAD, a true end-to-end autonomous driving solution, made its debut on the car

With forward-looking industry insights and pioneering layout, SenseTime and its joint laboratories proposed the industry's first general model for autonomous driving integrated perception and decision-making at the end of 2022, and won the best paper at the 2023 International Conference on Computer Vision and Pattern Recognition (CVPR) the following year, leading the innovation trend of end-to-end autonomous driving.

At the Beijing Auto Show, SenseTime demonstrated the strength of China's end-to-end intelligent driving with the amazing real-car test results of the UniAD autonomous driving solution. In the absence of high-definition maps, only the actual road test results of visual perception are relied on. Whether it is a complex urban road or a rural road without a center line, the vehicle can efficiently and accurately complete a series of difficult operations including turning left on the bridge at a large angle, avoiding vehicles occupying the road and construction areas, and bypassing running pedestrians, so as to "drive like a human".

(Vehicles equipped with the UniAD solution can truly "drive like a human")

At the beginning of this year, Tesla began to push the FSD V12 version of the end-to-end autonomous driving solution to some users, and more and more "end-to-end" intelligent driving solutions have emerged in the industry, but most of the end-to-end solutions use a "two-stage" architecture composed of two models of perception and decision-making that are easier to implement, and there are still problems of information transmission filtering or loss between the two models. UniAD integrates perception, decision-making, planning and other modules into a full-stack Transformer end-to-end model to achieve true end-to-end autonomous driving with integrated perception and decision-making.

(UniAD True End-to-End: A Universal Model for Perception and Decision-Making Integration)

At present, the industry needs a true end-to-end autonomous driving solution, because the ceiling of rules-based intelligent driving solutions is beginning to appear. As urban areas become the main battlefield for the implementation of intelligent driving, the complexity of scenario computing has increased exponentially. The large amount of human resource investment only adds a limited number of rules, and cannot cope with an infinite number of complex scenarios and corner cases. The advent of end-to-end technology has opened up a whole new path and is beginning to drive the paradigm of autonomous driving away from a large number of manpower to continuous computing power investment and high-quality data input.

With the abundant computing power, high-quality simulation data and industry-leading model performance of SenseTime, the UniAD end-to-end solution has a higher capability limit, and its strong learning and thinking capabilities enable it to drive like a human; the data-driven end-to-end model has strong generalization capabilities and fast iteration efficiency, which can help car companies quickly open the city at low cost; and the pure visual perception no-map solution further reduces the software and hardware costs of the system, helping the popularization of intelligent driving, and realizing that it can be opened nationwide.

(DriveAGI, a next-generation autonomous driving model: Perceived, interactive, and trusted)

On the basis of the end-to-end system, SenseTime also introduced a new generation of autonomous driving model DriveAGI during the auto show, promoting the transition from data-driven to cognitive-driven autonomous driving. Relying on the powerful world understanding, reasoning, decision-making and interaction capabilities of the multi-modal large model, DriveAGI will be the technical solution that is closest to the human thinking mode, can best understand human intentions and has the strongest ability to solve difficult driving scenarios, taking an important step towards completely unmanned driving.

Keen insight, in-depth thinking, and efficient execution of the "multi-modal scene brain" to provide an intelligent cockpit that truly understands you

The Xiaomi SU7, which was launched not long ago, brought the AI large model into the cockpit, and the SenseTime "RiRixin" large model also fully assisted Xiao Ai's in-vehicle voice scene application.

On April 23, SenseTime released the newly upgraded "RiRixin SenseNova 5.0" large model. The 600 billion parameter "Ririxin 5.0" adopts a hybrid expert architecture (MOE) with stronger knowledge, mathematics, reasoning and code capabilities, becoming the first large model in China to comprehensively benchmark or even surpass GPT-4 Turbo, and its multimodal capabilities are ahead of GPT-4V. Based on the architecture of device-cloud integration, SenseTime's device-side large model greatly exceeds that of the same magnitude large model, and is comparable to the 7B and 13B large models, which is more suitable for vehicle-end deployment.

Based on the combination of multi-modal large models, large language models, Wensheng graph models and other capabilities, SenseTime has built a series of large-scale model cockpit products with multi-modal scene brain as the core, with panoramic perception, active care and creativity.

The multi-modal scenario brain with keen insight, deep thinking, and efficient execution that SenseTime is building is one of the core products to help smart cars move towards the AGI era. Driven by application scenarios and user needs, the multi-modal scene brain allows smart cars to efficiently and accurately perceive and deeply understand user needs and the surrounding environment, and use the scene brain as the core to open up different applications, and aggregate discrete single-point functions together, so as to provide users with more deeply personalized active care and services.

(Multi-modal large model can accurately perceive and recognize information outside the vehicle)

Recommending restaurants that meet the user's preferences based on geographical location, introducing the external natural scenery that the user is interested in, and recommending high-quality scenic spots are the basic functions of the Jueying AI large model cockpit product. With powerful multi-modal perception capabilities, vehicles equipped with SenseTime's Jueying solution can accurately perceive and recognize information outside the vehicle, including the models of surrounding vehicles, landmark buildings, etc., and provide more accurate and comprehensive content for people in the cabin by means of voice, pictures or videos, helping users understand and master the external environment information, breaking the limitations of the cockpit itself, and allowing users to enjoy a more free and unfettered travel experience.

In addition, with the multi-modal scene brain as the core, SenseTime can also provide more AI large-scale model cockpit products based on automotive scenarios. With SenseTime's "Big Doctor" medical and health model as a bridge, "Travel Doctor" allows users to obtain professional and personalized health management services in the cockpit, making travel more secure. Based on the AIGC large-scale model technology, the "magic brush" can transform the user's simple drawing lines into works of art with aesthetic sense, adding to the fun of driving.

("Travel Doctor" allows users to get professional and personalized health management services in the cockpit)

("Magic Brush" can transform the user's simple drawing lines into beautiful works of art)

Innovation in human-computer interaction is also an important factor driving the upgrade of the cockpit experience, and the launch of Apple Vision Pro last year demonstrated the innovative experience and application potential of 3D interaction. Relying on its profound perception technology R&D strength and efficient innovation and iteration efficiency, SenseTime brought two new cockpit 3D interactive demonstrations, 3D Gaze high-precision gaze interaction and 3D dynamic gesture interaction, to the scene, allowing the audience to experience a more intuitive in-cabin interaction and promote the evolution of cockpit interaction to a safer and more convenient 3D interaction.

Among them, 3D Gaze is the world's first intelligent cockpit technology that can interact with screen icons through gaze positioning, allowing users to accurately control the central control icon through their eyes without tapping the screen and complete a variety of interactive operations; 3D dynamic gesture interaction is an industry-leading intelligent cockpit technology that supports dynamic gestures and hand micro-motion recognition, and users can realize refined interaction of various cockpits through gesture "air", breaking the cumbersome and limitations of traditional button and screen touch methods.

Facing the future, SenseTime is also further exploring the integration of cockpit and driver, realizing the comprehensive integration of intelligent driving and intelligent cockpit at the hardware, software and application levels, improving user experience, reducing system costs, further breaking the boundaries and constraints inside and outside the cabin, and emerging more innovative functions to bring a safer, more comprehensive, and more humanistic new experience.

Taking the lead in completing the layout of the four major technology bases, SenseTime has accelerated the intelligent car into the AGI era

In the next one to two years, smart cars are at a critical point in time, and its deep integration with general artificial intelligence will open a new era. No company can succeed alone in the new era, and car companies need strong core technology partners to work together, and SenseTime is a core supplier with leading full-stack technology of "computing power + algorithm + mass production experience" that is scarce in the industry at the same time.

The breakthrough of AGI has set off the innovation of the technological paradigm, and the solid core R&D capability has become the key to industry competition. Relying on SenseTime's deep computing power reserves, native automotive vertical models, leading software and hardware architectures, and full-stack data production pipelines, SenseTime has taken the lead in building four major technology bases and is growing into a core technology partner that accelerates the entry of smart cars into the AGI era.

SenseCore is an industry-leading AI infrastructure that supports the efficient iteration of SenseTime's original large-scale models with up to 12,000 petaFLOPS of computing power. A series of original large models, such as DriveAGI and the cockpit-oriented multi-modal scene brain, accelerate the implementation of end-to-end autonomous driving and large models in intelligent cockpit scenarios, the innovative software and hardware architecture of device-cloud collaboration and cockpit integration enable intelligent vehicles to reduce costs and increase efficiency, and emerge innovative functions, and the full-stack data production pipeline enables high-quality training of large models.

(SenseTime's large-scale device supports the efficient iteration of the original ecological large-scale model of SenseTime's Jueying series)

With the four AGI technology bases, SenseTime will accelerate the embrace of the era of general artificial intelligence for smart cars, solve the problem of large-scale popularization of intelligent driving with end-to-end large models, bid farewell to the traditional single-point function development model of intelligent cockpits with multi-modal scene brains, drive industry production efficiency innovation, break the boundaries and constraints inside and outside the cabin, drive human-computer interaction experience innovation, and provide a new future travel experience that is flexible and adaptive, deeply personalized, safe and reliable, and humanistic.

Today, SenseTime has built a multi-faceted AGI product system of intelligent driving, intelligent cockpit and AI cloud, and is accelerating the all-round and in-depth application of original ecological large-scale model products in the field of automotive intelligence, accelerating the integration of AGI into the automotive industry, and win-win cooperation with the majority of car companies to open a new chapter in the future of travel.

From April 25th to May 4th, welcome to visit SenseTime's booth (China International Exhibition Center (Shunyi Pavilion) E1-W09) to explore the future mobility vision of the AGI era.

Accelerating the intelligent car into the AGI era, the original ecological large model of the SenseTime Jueying series was unveiled at the auto show

Read on