laitimes

Asia's largest! Shanghai "Big Mac" AI Computing Center was put into use, supporting trillions of parameters of large model training

Zhi DongXi (public number: zhidxcom)

Author | ZeR0

Edit | Desert Shadow

Zhidong January 24 news, today, in Shanghai Lingang New Area, SenseTime Artificial Intelligence Computing Center (AIDC) was officially put into use.

SenseTime began to lay out the artificial intelligence (AI) computing prototype development project in April 2018, and by March 2020, the AIDC project was officially launched. From the start of construction to the capping of the main structure, AIDC took only 168 days, setting a new record for the construction of Lingang.

This is one of the largest supercomputing centers in Asia, and the first super-large artificial intelligence computing center in East China, with the characteristics of openness, large-scale, low carbon and energy saving.

The computing center has a construction area of 130,000 square meters, a total investment of about 5.6 billion yuan in the project, a total of 5,000 equivalent 8,000-watt cabinets in the first phase, and a full load of 3740 PetaFLOPS (1 PetaFLOPS is a quadrillion floating-point operation per second), and the second phase is being planned, which is roughly 1 to 2 times the volume of the first phase.

As of June 30, 2021, SenseTime has strategically established 23 AI supercomputing clusters in key regional markets, with more than 20,000 GPUs, and a final strength of 117 billion billion floating-point operations per second. When AIDC is put into use, SenseTime will have more than 491 billion billion floating-point operations per second.

Recently, Yang Fan, co-founder and vice president of SenseTime, was interviewed by Zhidong and other media. Yang Fan revealed that AIDC aims to become one of SenseTime's supporting businesses by 2025.

It is reported that AIDC can currently complete the complete training of 1,000 billion parameter models. In the future, SenseTime's internal R&D system will be set up on AIDC, and it is expected that by 2024, when all servers are in place, aidc's localization hardware ratio will exceed 50%.

Asia's largest! Shanghai "Big Mac" AI Computing Center was put into use, supporting trillions of parameters of large model training

First, internally support AI large devices, and provide three types of services externally

What can AIDC do?

Internally, AIDC is senseCore's computing power base for SenseCore's General AI Infrastructure, and all software platforms and services included in the AI Device are running on the physical entity of AIDDC.

Externally, AIDC can independently provide computing power support. SenseTime will open the technical capabilities of AI large devices to industry-academia partners through AIDDC, so that more customers can obtain AI-as-a-service services on SenseTime's cloud platform and flexibly subscribe to various pre-trained AI models, thereby reducing barriers to entry for large-scale AI applications in various industries.

In terms of computing power, AIDC can be called a "Big Mac".

With a final strength of 3740 PetaFLOPS, it can process videos equivalent to 23600 years in 1 day, equivalent to the length of uninterrupted recording from the late Paleolithic period to today.

At present, AIDC can complete the complete training of 1,000 billion large models in the field of parametric vision in 1 day. Based on this super-large model, more than 20,000 commercial models can be derived, helping the industry to quickly verify multiple new scenarios at a very low downstream data acquisition cost.

In addition, based on the ultra-large-scale elastic scalable computing power, AIDC can ensure the large-scale computing power demand of AI model training for external operations.

As the underlying support, AIDC mainly provides three types of business routes after operation: the first, to provide computing power support for AI+Science basic scientific research such as medical protein folding and quantum science; the second, to provide integration capabilities to help enterprises build a complete set of production tool systems; and the third type, to provide end-to-end intelligent services.

Yang Fan said that with the support of AIDC, the production cost of an algorithm may drop to 1/10 of the past, or even lower.

Asia's largest! Shanghai "Big Mac" AI Computing Center was put into use, supporting trillions of parameters of large model training

By directly connecting to the new Internet exchange center, AIDC can not only provide customers with nearby access services and solve problems such as cross-network access, but also improve the efficiency of information interaction between enterprises, reduce transmission costs, improve transmission quality and stability, and achieve rapid interconnection between multi-point and multi-user networks.

It is reported that before the official completion, SenseTime already had potential customers and partners conduct some trial runs on aidcam. After the Spring Festival, AIDC will enter the official state of use.

For example, SenseTime's network management in Shanghai provides the Shanghai government's public services with about hundreds of AI algorithm applications involving garbage overflow, manhole cover loss, light box damage, bicycle random parking, illegal occupation of the road and other urban service management, and the iterative production of these algorithms relies on infrastructure such as AIDDC.

"Through AIDC, through the software integration inside, I feel confident that in the next two to three years, we can achieve the cost of domestic software and hardware integration and the cost of customers under the same scale of computing power, which is actually a goal that I am looking forward to." Yang Fan said.

Second, low computing power costs, accelerate the process of marketization of domestic AI chips

At present, SenseTime is exploring the construction of ai ecosystem from domestic chips, domestic servers, self-developed training frameworks, algorithms and landed industry applications.

In terms of CAPEX investment costs, AIDC can reduce the unit computing power cost of self-developed domestic chips; in terms of OPEX operating costs, thanks to the advantages of algorithm optimization, the training time is shorter, the efficiency is higher, and the resource occupation is smaller.

"Our plan is that no less than 50% of the domestic chips in the 3740 PetaFLOPS should be used as core AI chips." Yang Fan said that Lingang AIDCs are only one, SenseTime has more areas of aid and construction is being promoted, AIDC as a whole will be based on training, there will be some reasoning, but the proportion is relatively low.

He mentioned that in the past two years, SenseTime has carried out a lot of cooperation with a number of domestic AI chip manufacturers, hoping to accelerate the use of domestic cloud AI chips and corresponding servers on a larger scale and in the market.

It is reported that AIDC's current trial operation machine has a part of the localization ratio, and the increase of this proportion will bring great value to the decline in the overall cost of the AI industry chain, the improvement of the overall service level, and the formation of a more benign commercial competitive environment on the hardware side.

Asia's largest! Shanghai "Big Mac" AI Computing Center was put into use, supporting trillions of parameters of large model training

In the past two years, SenseTime has continued to promote the adaptation between domestic AI core software and hardware. In order to promote this matter, SenseTime led the establishment of the "Artificial Intelligence Computing Industry Ecological Alliance", referred to as the "ICPA Intelligent Computing Alliance" at the Shanghai World Artificial Intelligence Conference in July 2021.

SenseTime will promote the construction of ai ecology based on AIDDC and promote the application of domestic original technologies.

Yang Fan shared that since the establishment of the ICPA Computing Power Alliance, the alliance will organize one or two in-depth closed-door seminars gathering chip design experts, software design experts, software design experts, and industry standard experts every quarter.

In the early stage, SenseTime hopes to form a sufficiently standard, universal definition of the software and hardware interface layer.

As the largest AI software platform company in Asia, SenseTime has both core platform layer and operating system layer software capabilities, as well as a large number of downstream applications, and with various domestic hardware and chip manufacturers to do core software and system adaptation, can help them save R & D expenses and time costs.

In the medium term, after The SenseTime puts into operation in Lingang AIDC, it will establish the "CESI-SenseTime Joint Laboratory for Artificial Intelligence Computing Power and Chip Evaluation" with the China Electronics Technology Standardization Institute (the Fourth Research Institute of Electronics of the Ministry of Industry and Information Technology) to carry out the formulation of AI computing power and chip standards, the development of AI chip evaluation tools, and provide support such as AI computing centers, chip testing and verification services and talent training.

In the future, the laboratory will become a neutral third-party AI chip and AI server evaluation agency, providing reference standards for the industry and promoting each hardware manufacturer to better enhance its own products.

In the long run, since SenseTime itself has a large number of downstream industry applications, SenseTime will spare no effort to introduce relatively good domestic AI chips and their servers into solutions integrated into its own and partners, and quickly introduce it to the market.

Third, six major technical highlights, interpreting the hard power of AIDC construction

SenseTime's AIDC supports R&D through its large-scale data processing and high-performance computing capabilities.

Yang Fan stressed that aidC's computing power is not a pile, which involves many leading technologies on the communication side and storage side. AIDC has achieved multiple breakthroughs in high-performance computing, distributed scheduling, data I/O, hardware and software collaboration, and system security.

Asia's largest! Shanghai "Big Mac" AI Computing Center was put into use, supporting trillions of parameters of large model training

(1) High Performance Computing: SenseTime has developed a high performance computing engine that includes a wealth of highly optimized computing programs, compilers and runtime environments. Compared with the calculation engines provided by chip suppliers, SenseTime's calculation engines significantly improve the end-to-end operational efficiency through optimized operators and full-graph optimization techniques, covering not only neural network computation, but also pre-processing and post-processing stages.

(2) Efficient distributed scheduling: AIDC has a distributed task scheduling system, which can dynamically dispatch tens of thousands of computing tasks on thousands of GPUs. The system schedules more than 20 million tasks per year, ensuring that R&D activities are carried out in a timely and efficient manner. With the support of multiple scheduling strategies, the scheduling system can maintain a high utilization of computing power and greatly reduce the average cost required to train a model.

(3) High-speed data I/O: When training a model on a dataset, each data sample is loaded and processed multiple times in a high frequency and random order. SenseTime's AIDCs provide very high IO throughput, allowing training tasks to load more than 2 million images per second, ensuring that training tasks can run at full speed without waiting for data.

"In 2018, we did a pre-research project for a prototype, which realized the operation of connecting 1,000 GPU cards to the same network to load data for calculation. Today we are making 5,000 to 10,000 cards larger and connecting them to the same network for calculation. Yang Fan said.

(4) Hardware/software collaborative design: In a distributed environment, the complex operations of coordinating the GPUs of each computing node to communicate with each other and frequently obtain data from the distributed storage system can easily cause significant losses in runtime performance. In this regard, SenseTime adopts a hardware/software collaborative design approach, configures hardware settings according to its understanding of AI tasks, and designs software stacks and optimizes across layers. Through this design, SenseTime's AIDCs can produce tens of thousands of models per year.

(5) High standard of system security: SenseTime ensures system security at multiple levels when designing its architecture. For example, SenseTime has developed comprehensive guidelines to classify data by different levels of security and grant appropriate access rights; SenseTime's storage systems include advanced access control systems; sensitive data is stored and transmitted in encrypted form; and computing resources assigned to different authorization groups are reasonably isolated. SenseTime's security team monitors the operation of the AIDC in real time and takes action when potential risks arise.

(6) Green and low-carbon data center construction: AIDC adopts various cutting-edge energy optimization measures, and it is expected that the power consumption after the start of AIDDC will be about 10% lower than the industry average of other data centers in China, and can save about 45 million kWh of power consumption per year. AIDC expects to peak carbon emissions around 2025, with estimated peak emissions of no more than 350,000 metric tons of CARBON dioxide equivalent and net zero emissions around 2050.

Fourth, to build a smart computing center, it should first assess the needs of regional industrial upgrading

Can AI computing centers really bring value to industrial applications? How to efficiently use the resources of the AI Computing Center?

Speaking of these issues, Yang Fan said that SenseTime is very confident about the future application scenarios of AIDC. SenseTime not only does it itself, but also makes a good calculation and evaluation of the demand intensity and scale of a local industrial upgrade.

In his view, the construction of a smart computing center in a place, the first consideration is to assess the industrial base of the place and the needs of industrial upgrading in the next three years, and then calculate whether today's AI technology and product suppliers can meet these needs, in order to know how large-scale intelligent computing centers should be built.

Asia's largest! Shanghai "Big Mac" AI Computing Center was put into use, supporting trillions of parameters of large model training

Data governance is also a major challenge in the development of the AI industry. The means of production in the agricultural age are land, the means of production in the industrial age are energy, and the means of production in the digital age are data.

For energy, one liter of oil plus one liter of oil is two liters of oil. For land, one acre of land plus one acre of land is two acres of land.

But the data is different, that is, 1T data plus 1T data, although it becomes 2T data, its actual value is greater than 2T. More data put together will bring non-linear growth in value.

"This is an extremely important new feature that is different from the means of production in the past agricultural era and the industrial age." Yang Fan shared some views that the greatest value of data is low cost, replicability and non-linear growth value achieved after aggregation.

How to achieve more data connections, while ensuring data security and privacy control, and being able to clearly define the ownership provisions in the middle? These need to continue to explore the industry to find a clear answer.

Yang Fan said that SenseTime's construction of ADC is also an exploration, and it may be in the next one to two years, after the trial operation phase of AIDDC begins, SenseTime will focus on some thinking, exploration and experimentation in this regard, because he believes that this is one of the most core things in the future.

Conclusion: IN THE FUTURE WILL ESTABLISH AIDCs in more regional markets

In Yang Fan's view, senseTime's core strength lies not only in technological leadership, but also in how to continuously commercialize innovative technologies.

Previously, from the original starting point of innovation to the final customer value, the process was long and involved a lot of links. When the cycle of doing this is shortened from three or four years to three or four months, this is the long-term core competitiveness of SenseTime for the industry.

Only technology companies are not good at innovation, and traditional enterprises need to cooperate with iterative experiments, carry out corresponding cooperation and investment, and even need a certain amount of silent cost. Today, many industries in China are doing digital transformation and intelligent upgrading, so the thinking and attitude that customers are willing to spend time and be willing to open up and share to do this together is also very important.

In addition to Shanghai Overseas, SenseTime also plans to build an AIDC city in China's four super-first-tier and core regional centers to expand SenseTime's AI-as-a-Service services to more regions.

Yang Fan believes that AIDC will continue to iterate in the future, evolve towards how to make a technological innovation less cost and more efficient, share SenseTime's precipitated capabilities with more partners and customers, and bring greater value to the AI industry.

Read on