laitimes

SenseTime's ChatGPT is here! The "Daily New" large model system was released

Five years of sharpening a sword, 27,000 GPU blessings, on-site demonstration AI programming consultation, a few minutes to train out of Hong Kong style beauty.

Author |  ZeR0

Edit |  Shadow of indifference

SenseTime reported on April 10 that today, SenseTime announced that AGI (General Artificial Intelligence) will be used as its core development strategy, based on the "Daily New SenseNova" large model system, and continue to achieve more breakthroughs in AGI with "large model + large computing power".

SenseTime also demonstrated its ChatGPT-like platform, the self-developed Chinese language large model application platform "Discuss": live demonstrations of various abilities such as writing advertising words, writing invitations, alternately creating children's stories, reading and quickly doing reading comprehension, and writing code.

In addition, SenseTime has successively thrown out four generative AI application platforms based on the "SenseNova" large model system. SenseTime demonstrated the real-time AI Wensheng Diagram and AI hands-on process of creating vivid digital human videos, demonstrated the high-fidelity city-level reconstruction of high-fidelity city-level reconstruction and complex structural objects commercials made by its 3D content generation platform, and demonstrated smooth real-time interaction of 3D content with a tablet.

At present, SenseTime has built multiple AI models such as CV (computer vision), NLP (natural language processing), and AIGC (artificial intelligence content generation). Its SenseCore AI device is a rare large-model dedicated infrastructure in the industry, with 27,000 GPUs, which can output 5,000 PetaFlops computing power, can perform single-task training in clusters of up to 4,000 cards, and can achieve uninterrupted stable training for more than 7 days.

Based on AI devices, SenseTime will provide customers with a variety of large model-as-a-service (MaaS), including automated data annotation, large model inference deployment, large model parallel training, large model incremental training, and developer efficiency improvement.

Under the strategic system of "one platform and four pillars", SenseTime's "SenseNova" large model system has fully supported business sectors such as smart cars, smart life, smart business, and smart cities, and opened up a closed-loop application in multiple fields and industries.

01.

Chinese Language Big Model Emerges: AI Now Compiled "Handwriting."

Document OCR" program, which also acts as an online consultation for doctors

"Discuss" is SenseTime's self-developed Chinese language large model application platform, which supports single-round dialogue, multi-round dialogue and ultra-long text comprehension, can solve complex problems in seconds, provide customized suggestions, assist in the creation of text content, and has the characteristics of continuous learning and evolution.

, duration 01:26

Behind the platform is a large language model of 1 Chinese 00 billion parameters developed by SenseTime, which supports 600+ vertical fields such as enterprise services, urban management, and automobile travel.

▲SenseTime Chinese language model can understand the meaning of sentences and try to judge the rationality of sentences

It was shared that "consultation" has logical deduction and intelligence, which can improve judgment and creative intelligence in the interactive guidance process; With both breadth and depth of understanding, it has outstanding performance in vertical fields such as multi-round dialogue, ultra-long text knowledge understanding, consultation and programming, covering a wide range of applications.

For example, open a PDF file, through text recognition technology, it can quickly read and understand the Patent Law, and give corresponding answers according to the questions entered by the user for this article.

Negotiate also enables knowledge to be automatically updated and timely to answer knowledge that is not covered in documents, resulting in more credible, accurate, and secure texts and conversations.

At present, SenseTime has created an industry-specific Chinese language model for programming, medical and other subdivisions.

Facing the field of programming, SenseTime's "AI Code Assistant" is an AI-aided development tool based on large language models developed by SenseTime, which provides developers with functions such as code completion, comment generation, test code generation, code translation, code correction, code refactoring, and complexity analysis.

Enter Chinese requirements and it will automatically generate the complete code.

At the scene, SenseTime also asked the "AI code assistant" to write the program of "handwritten document OCR" and successfully ran through the code.

"AI Code Assistant" supports both Chinese and English and a variety of programming languages, and can quickly adapt to developers' personalized coding styles, improve development efficiency, reduce development errors, and help developers focus on more creative programming work and code design.

According to SenseTime's internal measurements, after applying the "AI Code Assistant", the code writing efficiency increased by 62%, and the first-time pass rate of the HumanEval test set was 39%.

For the medical industry, SenseTime's Chinese medical language model based on massive medical knowledge and real doctor-patient interactive dialogue data can provide multi-scenario and multi-round conversation capabilities such as guidance, consultation, health consultation, and decision-making assistance, and currently performs well in the fields of interrogation and triage, medical knowledge popularization, and disease differential diagnosis.

SenseTime Chinese medical language model has continuous learning ability, and can self-adjust and optimize according to user feedback and evaluation, so as to improve its understanding and analysis ability when dealing with various medical scenarios, and will continue to expand its consultation capabilities in drug treatment and surgical plans in the future to help doctors further improve the diagnosis and treatment rate.

02.

Generative AI Application Series:

AI graphics, automatic video production, 3D content generation

In addition to dialogue, SenseTime has also developed AI drawing, AI video production and generation tool platforms, including Miaohua, Ruying, Qiongyu, Gewu, etc., based on the "SenseNova" large model system, bringing productivity improvements to the short video and live broadcast industries.

1. "Miaohua" AI content creation community platform: a single card supports the generation of 5 512 resolution images every 10 seconds

"Second Painting" is an AI content creation community platform created by SenseTime, which helps users easily create high-quality artworks and automatically generate elements and details. The platform supports users to train personalized drawing models to meet the needs of different painting styles.

"Second Painting" adopts SenseTime's self-developed self-developed text and student graph generation model with more than 1 billion parameters, which is convenient for users to inference and self-training locally: 1) The inference speed is fast, and the single card supports the generation of 5 @512 resolution images every 10 seconds; 2) Single card supports customizing your own LoRA model @ 20 training pictures within 5 minutes.

SenseTime demonstrated entering a series of detailed Chinese descriptions in the chat box and letting the AI generate photos of Hong Kong-style beauties.

It seems that AI understands what beauty is, but it has not yet figured out what Hong Kong wind is.

But that's not a problem, just let AI learn now. Click "Train Model" in the menu bar on the left side of the page, enter the model prompt, upload 20 or more photos of Hong Kong style beauty, and train it. Soon, the AI study returned and generated a Hong Kong-style beauty that was more in line with the requirements:

SenseTime's generative model open source community brings together 10,000+ open source models. The model released based on the "Second Painting" platform can be set as a to B service API, and commercial services can be provided in combination with SenseTime's computing power.

2. "Ruying" video generation platform: chat to produce AI digital humans, automatically generate copy and final video

SenseTime's "Ruying" AI digital human video generation platform aims to make it easy for everyone to create videos, supporting full-stack intelligent creation of AI digital human action expressions, AI copywriting generation, AI cross-language manuscripts, and AI material generation, and can switch between cartoon/real styles.

, duration 01:37

The entire process does not require professional shooting equipment, and Rongying can generate highly realistic digital avatars, and create a variety of character video content quickly and efficiently through text drive. Moreover, the digital human image created is realistic and the expression is naturally rich.

Simply enter a rough idea of video creation in the dialog, and SenseTime can automatically generate the corresponding video copy, and directly AI drives various digital content to produce the final video.

The platform supports more than 100 languages, facilitating cross-language creation, and making it easier to obtain creative materials through image AI generation and other capabilities. SenseTime demonstrated switching to Arabic live, and in the resulting video, the digital human even lip-synced well.

The Informing AI digital human video generation platform can not only help creators quickly create various short videos, live broadcasts and other marketing content, but also provide video solutions for education and training, corporate publicity, entertainment and culture, etc., to improve brand awareness and user stickiness.

3. "Qiongyu/Gewu" 3D content generation platform: real-time editing and creation, restore real details

Qiongyu and Gewu are SenseTime's 3D content generation platform based on Neural Radiation Field Technology (NeRF), which can realize the reproduction and interaction of space and objects based on 3D content generation technology, from urban digital twins to small desktop figures.

All kinds of 3D content generated by the two platforms can be re-edited and recreated, and through the production of massive high-precision digital assets, it can meet various application scenarios with strong demand for interactive 3D reality content, such as film and television creation, architectural design, commodity marketing, and digital twin management.

Qiongyu large space 3D content generation application focuses on scene generation, replicating and restoring hyper-realistic scenes, not only supports free roaming, but also supports real-time interaction and editing, which can be used for digital twins, film and television creation, cultural tourism, e-commerce and other application scenarios in cities and parks.

Qiongyu has the advantages of algorithms such as centimeter-level reconstruction accuracy, real-time rendering and interaction of large scenes, multi-source data fusion, and ultra-refinement, and has the ability to generate large-scale spatial reconstruction at the city level, which can generate a space of 100 square kilometers. The traditional manual modeling task of 10,000 people/day can be completed in only 2 days (computing power is 1200 TFLOPS) through Qiongyu, and can restore real details and lighting effects.

Compared with the traditional object modeling method, the 3D content generation and application of lattice small objects can realize the reproduction and restoration of ultra-detailed objects of various categories, bringing 400% overall efficiency improvement, achieving 95% comprehensive cost reduction, covering all categories and good reconstruction effect.

SenseTime's ChatGPT is here! The "Daily New" large model system was released

With SenseTime's NeRF technology, the lattice can well support the reproduction of complex structural objects, and achieve accurate reproduction of lighting, perfect material restoration, suitable for commercial advertising, commodity marketing and other application scenarios. The picture below is a commercial video generated by SenseTime with lattice.

03.

Provide diversified MaaS services and open API interfaces

Provide more than 7,000 GPU computing resources

It can be seen that the landing layout of SenseTime's model system is mainly oriented to B customers, providing dedicated large models for subdivided application scenarios, and outputting the value of content generation technology into easy-to-use and easy-to-operate practical tools through external output in the form of the platform to release productivity.

The name of the large model system is taken from the soup plate inscription in "Liji University", which is "Gou is new, every day is new, and every day is new". SenseTime hopes to update the iteration speed of the model and the ability to deal with problems every day, and continuously unlock more possibilities for AGI.

Based on the "SenseNova" large model system, SenseTime will provide customers with API interfaces such as image generation, natural language dialogue, visual reasoning and annotation services, so that customers can call the AI technology capabilities of SenseTime's model on demand and carry out secondary development.

In addition, SenseTime will also provide customers with a variety of large model-as-a-service (MaaS) (MaaS), such as automated data annotation, large model inference deployment, large model parallel training, large model incremental training, and developer efficiency improvement:

(1) Large model parallel training and large model incremental training service: Help customers train large models in different industries at a lower cost and quickly combine relevant domain knowledge, realize the development of thousands of lines and thousands of faces, and reduce the cost of incremental fine-tuning by 90%.

At present, SenseTime's AI device has supported more than 10 large model training projects, including more than 7,000 GPU computing resources, and supported 8 customer-customized large model training tasks including the Internet, games, commercial banks and scientific research institutions.

(2) Automatic data annotation: intelligent annotation can be realized, bringing nearly 100 times efficiency improvement. The platform has more than 10 built-in general large models and industry-specific large models, supporting 2D classification, detection and intelligent labeling of 3D detection in various scenarios such as intelligent driving, smart transportation, and smart city, and has core advantages such as good labeling effect, high efficiency and low cost compared with traditional manual labeling and small model labeling modes.

(3) Large model inference deployment: It can minimize the cost of inference and increase efficiency by 100%, helping customers quickly deploy large model applications.

(4) Developer efficiency improvement: Open models and AI development toolchains for developers to empower developers to improve efficiency.

Whether it is Chinese language model application platform, four generative AI application platforms, open API interfaces and diversified MaaS services, it will help further lower the threshold for AI technology deployment in actual business links, reduce costs and improve efficiency.

The reason why SenseTime can release so many big moves at one time is inseparable from the technical accumulation and practical experience over the past five years.

04.

AI big model sharpens a sword in five years

Fully support the four major business segments

Since 2018, SenseTime has been committed to the research and development of AI large models, and in 2019, it used thousands of GPU cards for single-task training, and launched a visual model with a scale of 1 billion parameters, creating the industry's best algorithm effect at that time. In the past two years, a super-large visual model with a scale of tens of billions of parameters has been trained, which is equivalent to the training volume of a language model with hundreds of billions of parameters.

At present, SenseTime has successfully developed the world's largest general-purpose vision model with 32 billion parameters, realized high-performance object detection, image segmentation and multi-object recognition algorithms, and has been widely used in autonomous driving, industrial quality inspection, medical imaging and other fields.

SenseTime has also released the largest multimodal dataset (OmniObject3D) for realistic perception, reconstruction and generation.

Under the strategic system of "one platform and four pillars", SenseTime's "SenseNova" large model system has fully supported business sectors such as smart cars, smart life, smart business, and smart cities, and opened up a closed-loop application in multiple fields and industries.

In the field of intelligent driving, the production of high-precision vehicle-end models through large models has greatly improved the accuracy of FEW/ONE/ZERO Shot in the long-tail category, and the average accuracy in the key category has been improved by 3%. In addition, the large model provides high-precision intelligent annotation capabilities, provides core functions for data closed-loop, greatly reduces the amount of data that needs to be manually annotated, and accelerates the improvement of model accuracy.

Thanks to the large model capability, SenseTime has realized BEV surround view perception, achieved high-precision recognition of 3,000 types of objects, and built a multi-modal large model of autonomous driving that integrates perception and decision-making, bringing stronger environment, behavior, and motivation decoding capabilities.

In the field of biomedicine, SenseTime's AI device provides AI inference computing power for large protein structure models, and provides R&D platform and training computing power for protein interaction models.

SenseTime collaborated with Baiying Technology to train an antibody affinity prediction model. Through high-performance computing optimization, the inference time of large model protein structure prediction is reduced from hours to minutes, so that the protein structure prediction performance meets the standard of industrial applications, and the antibody screening efficiency is increased by 60%.

Based on its long-term practice in smart city and smart business, SenseTime has accumulated a large amount of high-quality real-world visual data, which in turn promotes SenseTime's continuous breakthroughs in visual technology and provides a strong foundation for the research and development of large models.

05.

5000P large computing power, 27000 GPUs

Support simultaneous training of 20 ultra-large models with 100 billion parameters

Dr. Xu Li, Chairman and CEO of SenseTime, said that in the era of large models of AI, the amount of computation required is equal to the product of the large model parameters and the amount of data processed.

▲Dr. Xu Li, Chairman and CEO of SenseTime

In the past five years, the number of parameters of AI large models has increased by an order of magnitude almost every year. In the past 10 years, the demand for computing power for the best AI algorithms has increased by more than 1 million times. People can hear about 1 billion words in their lifetime, and GPT-3 has learned about 500 billion natural language data, and the largest known natural language model has reached 2 trillion.

The amount of large model parameters will increase at an exponential rate, and with the introduction of multi-modalities, the amount of data will increase on a large scale, which will inevitably lead to a sharp increase in computing power demand.

SenseTime relies on the AI infrastructure SenseCore AI device to achieve a forward-looking computing power layout, as well as rich and high-quality visual information and technology accumulation derived from industrial practice, providing a powerful computing power foundation for the research and development of large models.

1, 5000 Petaflops ultra-large-scale computing power, is one of the largest intelligent computing platforms in Asia: SenseCore SenseTime AI large device currently contains 27,000 GPUs, can output 5000 Petaflops computing power, is one of the largest intelligent computing platforms in Asia.

2. Support simultaneous training of 20 ultra-large models with 100 billion parameters and provide one-stop infrastructure services: The current computing power of SenseTime's AI large device can support the simultaneous training of 20 ultra-large models with 100 billion parameters, and provide a one-stop large model infrastructure service system covering data, training tools, inference deployment, and performance optimization.

3. Support up to 4000 cards of parallel single-task training, which can last more than 7 days of uninterrupted and stable training: It not only supports SenseTime's own large model training project, but also trains models customized by other enterprises. SenseTime will strive to achieve the world's leading training indicators in 4,000-card clusters, providing a basis for trillion-level parameters.

It is reported that in the era of AI large models, measuring computing power and core indicators is not a simple number, one is the effective utilization rate in the state of multi-card parallelism, that is, the actual computing power that can support the training of large models; The second is the length of time the system can continue to run stably.

SenseTime's AI device integrates the core capabilities of AI, supercomputing, and big data, and supports large models to complete large-scale model training with trillion-level parameters on thousands of cards and petabytes of storage through AI-optimized high-performance computing, high-performance storage and cache, and high-performance networks, with features such as storage-computing separation, large-scale elasticity, and fault-tolerant scheduling.

SenseCore AI platform products also provide modular, full-chain data, training, and inference capabilities. It can realize tens of billions of data management and retrieval, manual annotation services, and accelerate the efficiency of AI large model development. One-click quantification, one-click deployment, and one-click application provide tools for rapid online verification of large models and accelerate innovation.

06.

Conclusion: With the strategic layout of "big model + big computing power"

The sword refers to infrastructure services in the AGI era

In the past decade, the AI technology revolution set off by deep learning has broken through the "industrial red line" in many fields, but under the needs of a wide range of complex scenarios, the development model of customized AI models still faces problems such as high R&D costs and long cycles.

Today, multimodal big models that integrate information and capabilities such as language and vision are giving rise to new research paradigms, continuously unlocking new capabilities of base models through reinforcement learning and human feedback, so as to solve massive open-ended tasks more efficiently.

A new technological revolution in AI has arrived, and its impact is destined to be far-reaching. Today, SenseTime delivered a phased answer to its AGI goal by showcasing the "SenseNova" large model system and the SenseTime AI device, which is rare in the industry.

Based on the "SenseNova" large model system, SenseTime has developed a Chinese language large model application platform, an innovative human-computer collaboration model, and a series of video content production and generation tool platforms including AI content creation, 2D/3D digital human video generation, and large scene/small object generation.

These platforms will bring productivity improvements to industries such as medical consultations, short videos, live broadcasts, commercials, merchandising, digital twins, film and television creation, and cultural tourism. The various API interfaces and MaaS services provided by SenseTime will further assist in the large-scale popularization of large-model AI technology in all walks of life.

Read on