laitimes

The universal visual open source platform OpenGVLab was released, which greatly reduced the threshold for the development of universal visual models

On February 25, Shanghai Artificial Intelligence Lab, together with SenseTime, Chinese University of Hong Kong and Shanghai Jiao Tong University, jointly released the universal visual open source platform OpenGVLab, which opens its ultra-efficient pre-training model to the academic and industrial circles, and the public data set of tens of millions of fine labels and 100,000 labels, providing important support for global developers to improve the training of various downstream visual task models. At the same time, OpenGVLab also opened the industry's first benchmark for the universal vision model, which is convenient for developers to laterally evaluate and continuously tune the performance of different common visual models. At present, the OpenGVLab open source platform (https://opengvlab.shlab.org.cn) has been officially launched for researchers from all walks of life to access and use, and the follow-up project will also open an online reasoning function for all social people interested in artificial intelligence vision technology to experience freely.

The universal visual open source platform OpenGVLab was released, which greatly reduced the threshold for the development of universal visual models

"Open source is a work of extraordinary significance, the rapid development of artificial intelligence technology is inseparable from the global researchers for more than ten years of open source co-construction", the head of the Shanghai Artificial Intelligence Laboratory said, "I hope that through the release of the OpenGVLab open source platform, help the industry better explore and apply general visual AI technology, promote the systematic solution of many bottlenecks such as data, generalization, cognition and security in the development of AI, and contribute to the promotion of artificial intelligence academic and industrial development." ”

Nowadays, although artificial intelligence technology is developing rapidly, many AI models can only complete a single task, such as identifying a single object, or recognizing photos with a more uniform style. If you want to identify multiple types and styles, you need to have enough versatility and generalization capabilities. In November last year, the Shanghai Artificial Intelligence Laboratory, together with SenseTime, the University of Chinese of Hong Kong, and Shanghai Jiao Tong University, released the general vision technology system "Shusheng", which solved this problem well. As shown in the following figure, for different types of pictures, it can accurately identify the content in the picture, including the picture.

The universal visual open source platform OpenGVLab was released, which greatly reduced the threshold for the development of universal visual models

The universal vision open source platform OpenGVLab is based on the universal vision technology system "Shusheng" (INTERN). Relying on the strong support of "Shusheng" in general vision technology, OpenGVLab will help developers significantly reduce the development threshold of universal visual models, quickly develop algorithm models for hundreds of visual tasks and visual scenes at a lower cost, efficiently achieve coverage of long-tail scenes, and promote the large-scale application of AI technology.

Open ultra-high-performance models and tens of millions of refined data sets to reduce the input cost of the academic community

OpenGVLab fully inherits the technical advantages of the universal vision technology system "Shusheng", and its open source pre-trained model has extremely high performance. Compared with the previously recognized strongest open source model (OpenAI released in 2021 CLIPP), OpenGVLab's model can fully cover the four core visual tasks of classification, object detection, semantic segmentation, and depth estimation, and has achieved significant improvements in accuracy and data use efficiency.

Based on the same downstream scenario data, the open source model reduces the average error rate by 40.2%, 47.3%, 34.8% and 9.4% on the 26 datasets of the four major tasks of classification, object detection, semantic segmentation and depth estimation, respectively, and at the same time, in the classification, detection, segmentation and depth estimation, only 10% of the downstream training data is used to exceed other existing open source models. Using this model, researchers can significantly reduce the cost of downstream data collection, and with a very low amount of data, they can quickly meet the multi-scenario, multi-task AI model training.

At the same time, OpenGVLab also provides a variety of pre-trained models with different parameter quantities and different computation quantities to meet the application needs of different scenarios. Many of the models listed in the model library have different degrees of performance improvement compared with the previous public models in terms of imageNet's fine-tuning results, inference resources, speed, etc.

In addition to pre-trained models, based on the total amount of tens of billions of data, Shanghai Artificial Intelligence Laboratory has built a large number of finely labeled datasets, and will carry out data open source work in the near future. The ultra-large-scale fine labeling dataset not only integrates the existing open source data set, but also realizes the coverage of tasks such as image classification, object detection and image segmentation through large-scale data image labeling tasks, with a total data volume of nearly 70 million. The scope of open source covers tens of millions of fine label datasets and 100,000 labeling systems. At present, the image classification task dataset has taken the lead in open source, and more datasets such as the object detection task will be open sourced in the future.

In addition, the open source super tag system not only covers almost all existing open source data sets, but also expands a large number of fine-grained tags on this basis, covering the attributes and states in various types of images, greatly enriching the application scenarios of image tasks and significantly reducing the cost of downstream data collection. Researchers can also add more tags through automated tools, continuously expand and extend the data tag system, continuously improve the fine-grainedness of the tag system, and jointly promote the prosperity and development of the open source ecosystem.

Released the first universal visual evaluation benchmark to promote the unification of universal visual model evaluation standards

With the release of OpenGVLab, the Shanghai Artificial Intelligence Lab has also opened the industry's first benchmark for the evaluation of universal visual models, making up for the gap in the field of universal visual model evaluation. At present, the existing evaluation benchmarks in the industry are mainly designed for a single task and a single visual dimension, which cannot reflect the overall performance of the general visual model and are difficult to use for horizontal comparison. With innovative design at the task, data and other levels, the new universal visual evaluation benchmark can provide authoritative evaluation results, promote fair and accurate evaluation on unified standards, and accelerate the industrialization and application of universal visual models.

In terms of task design, the new and open general visual evaluation benchmark innovatively introduces a multi-task evaluation system, which can evaluate the general performance of the model from five types of task directions: classification, object detection, semantic segmentation, depth estimation, and behavior recognition. Not only that, the benchmark adds a new evaluation setting that uses only 10% of the data volume of the test data set, which can effectively evaluate the learning ability of the common model in small samples under the distribution of real data. After the test, the evaluation benchmark will also give the corresponding total score according to the evaluation results of the model, which is convenient for users to evaluate different models horizontally.

With the continuous deepening of the integration of artificial intelligence and industry, the industry's demand for artificial intelligence has gradually developed from a single task to a complex multi-task collaborative development, and it is urgent to build an open source and open system to meet the massive application needs that tend to be fragmented and long-tail.

In July last year, Shanghai Artificial Intelligence Lab released OpenXLab, an open source platform system covering the new generation of OpenMMLab and the decision-making AI platform OpenDILab. The joint release of the general visual open source platform OpenGVLab with SenseTime and universities can not only help developers reduce the development threshold of the universal visual model, lay the foundation for promoting the development of the universal visual technology, but also further improve the OpenXLab open source system and promote the basic research and ecological construction of artificial intelligence.

Leifeng Network

Read on