Shanghai Artificial Intelligence Lab and SenseTime released OpenGVLab, a universal visual open source platform

2022-02-25 16:25:49

On February 25, Shanghai Artificial Intelligence Lab and SenseTime released OpenGVLab, an open source platform for universal vision, opening up its ultra-efficient pre-training model, ultra-large-scale public data set, and the industry's first evaluation benchmark for universal vision models to academia and industry. This move will provide important support for global developers to improve the training of various downstream visual task models, promote the large-scale application of AI technology, and promote the rapid development of basic research and ecological construction of artificial intelligence.

Shanghai Artificial Intelligence Lab and SenseTime released OpenGVLab, a universal visual open source platform

"Open source is a work of extraordinary significance, and the rapid development of artificial intelligence technology is inseparable from the open source co-construction, sharing and sharing of global research and developers for more than ten years." The relevant person in charge of the Shanghai Artificial Intelligence Laboratory said, "I hope that through the release of the OpenGVLab open source platform, we can help the industry better explore and apply general visual methods, promote the systematic solution of many bottlenecks such as data, efficiency, generalization, cognition and security in the development of AI, and contribute to promoting artificial intelligence scientific research innovation and industrial development." ”

At present, artificial intelligence technology is developing rapidly, but many AI models are still limited to completing a single task, such as identifying a single object, or recognizing photos with a more uniform style. If you want to identify multiple types and styles, you need to have enough versatility and generalization capabilities. The universal visual technology system "Student" (INTERN) solves this problem very well. The universal visual open source platform OpenGVLab is based on "shusheng". Relying on the support of "Shusheng" in general vision technology, OpenGVLab will greatly reduce the development threshold of general vision models, help developers quickly develop algorithm models for hundreds of visual tasks and visual scenes at a lower cost, efficiently achieve coverage of long-tail scenes, and promote the large-scale application of AI technology.

The first open source: tens of millions of fine labeling data sets, 100,000 labeling system

OpenGVLab fully inherits the technical advantages of the universal vision technology system "Shusheng", and its open source pre-trained model has extremely high performance. Compared with the previously recognized strongest open source model (OpenAI released in 2021 CLIP), OpenGVLab's model can fully cover the four core visual tasks of classification, object detection, semantic segmentation, and depth estimation, and has greatly improved the accuracy and data use efficiency.

OpenGVLab open source model inference results: the left side is the input picture, and the right side is the identified label

Based on the same downstream scenario data, the open source model reduces the average error rate by 40.2%, 47.3%, 34.8% and 9.4% on the 26 datasets of the four major tasks of classification, object detection, semantic segmentation and depth estimation, respectively, and at the same time, in the classification, detection, segmentation and depth estimation, only 10% of the downstream training data is used to exceed other existing open source models. Using this model, researchers can significantly reduce the cost of downstream data collection, and can quickly meet the training of multi-scenario and multi-task AI models with a very low amount of data.

At the same time, OpenGVLab also provides a variety of pre-trained models with different parameter quantities and different computation quantities to meet the application needs of different scenarios. In terms of ImageNet's fine-tuning results, inference resources, speed, etc., many of the models listed in the model library have different degrees of performance improvement compared with the previous public models.

In addition to pre-trained models, based on the total amount of 10 billion data, the Shanghai Artificial Intelligence Laboratory has built a large number of finely labeled datasets. The ultra-large-scale fine labeling dataset not only integrates the existing open source data set, but also realizes the coverage of tasks such as image classification, object detection and image segmentation through large-scale data image labeling tasks, with a total data volume of nearly 70 million. The scope of open source covers tens of millions of fine label datasets and 100,000 labeling systems. At present, the image classification task dataset has taken the lead in open source, and more datasets such as the object detection task will be open sourced in the future.

At the same time, there is also a large label system with a total label magnitude of 100,000, which not only covers almost all existing open source data sets, but also expands a large number of fine-grained labels on this basis, covering the attributes and states in various types of images, which greatly enriches the application scenarios of image tasks and significantly reduces the acquisition cost of downstream data. In addition, researchers can also add more tags through automated tools, continuously expand and extend the data tag system, continuously improve the fine granularity of the tag system, and jointly promote the prosperity and development of the open source ecosystem.

Industry Premiere: Universal Vision Benchmarks Promote Industrial Applications

With the release of OpenGVLab, the Shanghai Artificial Intelligence Lab also opened the industry's first benchmark for the evaluation of universal vision models. At present, the existing evaluation benchmarks in the industry are mainly designed for a single task and a single visual dimension, which cannot reflect the overall performance of the general visual model and are difficult to use for horizontal comparison. With innovative design at the task, data and other levels, the new universal visual evaluation benchmark can provide authoritative evaluation results, promote fair and accurate evaluation on unified standards, and accelerate the industrialization and application of universal visual models.

In terms of task design, the universal visual evaluation benchmark provided by OpenGVLab innovatively introduces a multi-task evaluation system, which can evaluate the overall performance of the model from five types of task directions, such as classification, object detection, semantic segmentation, depth estimation and behavior recognition. Not only that, the benchmark adds a new evaluation setting that uses only 10% of the data volume of the test data set, which can effectively evaluate the learning ability of the common model in small samples under the distribution of real data. After the test, the evaluation benchmark can also give the corresponding total score according to the evaluation results of the model, which is convenient for users to evaluate different models horizontally.

With the continuous deepening of the integration of artificial intelligence and industry, the industry's demand for artificial intelligence has gradually developed from a single task to a complex multi-task collaborative development, and it is urgent to build an open source and open system to meet the massive application needs that tend to be fragmented and long-tail. In July 2021, Shanghai Artificial Intelligence Lab released open source platform System OpenXLab, covering the new generation of OpenMMLab and OpenDILab, a decision-making intelligence platform. The joint release of open vision open source platform OpenGVLab by Shanghai Artificial Intelligence Lab and SenseTime will not only help developers lower the development threshold of universal visual models, lay the foundation for promoting the development of general vision technology, but also further improve the OpenXLab open source system and promote the basic research and ecological construction of artificial intelligence.

Shanghai Artificial Intelligence Laboratory is a new type of scientific research institution in the field of artificial intelligence in the mainland, carrying out strategic, original and forward-looking scientific research and technological research, breaking through the important basic theories and key core technologies of artificial intelligence, creating a large-scale comprehensive research base integrating "breakthrough, leading and platform-type", supporting the mainland artificial intelligence industry to achieve leapfrog development, aiming to build a world-class artificial intelligence laboratory, and becoming the source of the world-renowned original theories and technologies of artificial intelligence.

Shanghai Artificial Intelligence Lab and SenseTime released OpenGVLab, a universal visual open source platform

Read on