laitimes

In the era of large models, how can supercomputing applications be better implemented?

author:Titanium Media APP
In the era of large models, how can supercomputing applications be better implemented?

Supercomputing, to ordinary people, is both strange and familiar existence. People can often learn about the supercomputing field through the news, such as which country has won the first place, and which country has landed a new supercomputing project...... However, it seems that supercomputing is far away from us, because unlike traditional data centers, almost all supercomputing projects were originally used in the field of national high-tech scientific research.

However, with the continuous integration of supercomputing, artificial intelligence, big data and other emerging technologies in the past two years, the application boundaries of supercomputing have been continuously expanded, and the scenarios have been continuously enriched, which has become an important driver for technological innovation and industrial transformation. For example, AI for Science, high-performance data analysis, HPDA, etc., and accelerated the implementation in the fields of risk personnel identification, autonomous driving, new drug research and development, and disaster weather prediction.

What is the difference between supercomputing, intelligent computing, and traditional data centers?

According to the data, the average annual growth rate of the scale of computing power in mainland China in the past five years is nearly 30%, and the use of computing power has become an essential skill for scientific research and enterprise innovation. These computing power are mainly concentrated in supercomputing centers, intelligent computing centers, and traditional data centers. However, the application scenarios and technical architectures of these "brothers" are very different.

The biggest difference between supercomputing, intelligent computing, and traditional data centers is the different application scenarios. Supercomputing is mainly used in large-scale scientific computing, engineering simulation, weather forecasting, bioinformatics and other fields, which need to process massive data and high-complexity calculations, and have extremely high requirements for computing performance.

Intelligent computing, on the other hand, is mainly used in artificial intelligence, machine learning, image processing, speech recognition, and other fields, which require rapid iteration and optimization of models, and have high requirements for computing efficiency. In this regard, Miao Hui, product manager of HPC and AI computing power of Qingyun Technology, said that in terms of concept, there is a corresponding difference between supercomputing and intelligent computing. Supercomputing, or high-performance computing, is usually composed of a large number of compute nodes and a high-speed interconnection network, which can perform a large number of parallel computing tasks at the same time. Intelligent computing, that is, artificial intelligence computing, has the ability of independent learning, independent reasoning and independent decision-making, can simulate and solve complex problems, and has a certain level of intelligence. "Both differ in terms of computing power, processing and applications. Miao Hui emphasized.

Compared with supercomputing and intelligent computing, traditional data centers have a wider range of applications, including cloud computing, big data analysis, and enterprise-level applications. Data centers need to meet the needs of a variety of applications, while also providing flexible IT services and reliable data storage services.

From the perspective of the supercomputing industry, supercomputing has provided flexible, fast, efficient, safe and reliable computing power support for many industries such as automobile manufacturing, meteorology and oceanography, gene sequencing, new drug research and development, chip manufacturing, and oil exploration.

On the other hand, from the perspective of technical architecture, there are also big differences between supercomputing, intelligent computing, and traditional data centers. The technical characteristics of supercomputing are mainly reflected in high performance, high throughput and low latency. In order to meet the needs of large-scale scientific computing and engineering simulation, supercomputing needs to have powerful computing and storage capabilities, as well as efficient network communication capabilities.

The technical characteristics of intelligent computing are mainly reflected in adaptive, intelligent, and distributed computing. The intelligent computing system can automatically adjust computing resources according to application requirements, optimize the computing process, and improve computing efficiency.

Traditional data center aspect. Distributed architectures are typically used, with computing and storage resources distributed across servers. This architecture can provide flexible IT services and reliable data storage services, but the computing power and storage capacity are limited.

In addition, the cabinet size is also a major difference between supercomputing, intelligent computing, and data centers. For example, the 605-square-meter Sunway Taihu Light is about the size of 10 badminton courts, and it needs to be placed in a special building. The "Tianhe-2" is even larger, covering an area of 720 square meters. This is significantly different from the cabinet size of intelligent computing centers and general data centers.

China's supercomputer is in an important stage of "rapid development" and "catching up with the first-class".

In 2022, with the birth of the large model, the development of the intelligent computing center will be driven and more people will be aware of the intelligent computing center. In contrast, the supercomputing center is actually "quietly developing".

In recent years, mainland supercomputing has entered a stage of rapid development, and domestic supercomputing platforms, mainly national supercomputing centers, are strengthening their pursuit of sustainable development.

From the perspective of policy development, the policy of the supercomputing industry in mainland China has undergone a transformation from promoting the construction of supercomputing centers to strengthening the overall planning and intelligent scheduling of computing power and then to the echelon layout of computing infrastructure.

Back in 2016, the "Outline of the National Innovation-Driven Development Strategy" released by the mainland proposed to build digital infrastructure such as supercomputing centers. Not only that, the national "14th Five-Year Plan" also further emphasized: accelerate the construction of a national integrated big data center system, strengthen the overall planning and intelligent scheduling of computing power, and build E-level and 10E-level supercomputing centers......

In February 2023, the "Overall Layout Plan for the Construction of Digital China" initiated by the Central Committee of the Communist Party of China and the State Council proposed to systematically optimize the layout of computing infrastructure, promote the efficient complementarity and collaborative linkage of computing power in the east and west, and guide the reasonable echelon layout of general data centers, supercomputing centers, intelligent computing centers, and edge data centers.

From the perspective of supercomputing layout, by the end of 2023, 14 supercomputing centers have been formed in the mainland, which are located in Tianjin, Guangzhou, Changsha, Shenzhen, Jinan, Wuxi, Zhengzhou, Kunshan, Chengdu, Xi'an, Wuzhen and other places. Under the guidance of more and more favorable policies, China's supercomputer has entered the fast lane of development.

According to the latest TOP500 list released at the 2023 International Supercomputing Conference (SC23), the world's top conference in the field of supercomputing, the Frontier supercomputer deployed at the Oak Ridge National Laboratory in the United States has won four consecutive championships and is still the only E-level supercomputing system on the list. Another supercomputing system in the United States, Aurora, made the list for the first time, ranking second. In third place is the Eagle system installed in the Microsoft cloud in the United States, which is also the highest ranking of cloud systems on the TOP500 list. Japan's supercomputing system, Fugaku, moved to fourth place from second place last year, and LUMI, Europe's largest supercomputing system, ranked fifth.

It is understood that China has not submitted the test results of the new system to the TOP500 for a long time and has not participated in the ranking. In this list, Sunway Taihu Light and Tianhe-2 ranked 11th and 14th respectively.

To sum up, China's supercomputer is in an important stage of rapid development and catching up with the world's first-class level. In this important stage, it is inseparable from the joint promotion of "production, education, research and application" and other forces.

On the one hand, they use the power of science and technology to promote the development of supercomputing, and on the other hand, with the emergence of continuous application scenarios, they also provide more stages for supercomputing to "show their strengths".

At this stage, it is a key step to promote the supercomputing center from "fighting separately" to "interconnection", and realizing "computing power interconnection", and this step is also the focus of the next work of many manufacturers and local authorities.

At the local level, in May 2023, the National Supercomputing Internet Consortium was established, and the first batch of 15 regional and university supercomputing centers intend to enter the network. Yang Guangwen, director of the Wuxi National Supercomputing Center, once said that in order to solve the challenges faced by the operation of the supercomputing center in the mainland, improve the application level of supercomputing in the mainland, and promote the transformation of the supercomputing center from providing bare metal to providing multi-field application services, it is imperative to build a supercomputing Internet.

It is understood that as early as 2020, the National Supercomputing Wuxi Center undertook the national high-performance computing special project "Research and Construction of National High-performance Computing Environment Application Platform and Service System". The National Supercomputing Center in Wuxi, together with the National Supercomputing Guangzhou Center, the Computer Network Information Center of the Chinese Academy of Sciences, and other national supercomputing centers, as well as Tsinghua University, Zhijiang Laboratory and other core application units, has carried out research on the supercomputing Internet technology system and application mode.

At the level of technology service providers, technology service providers represented by Huawei, Qingyun Technology, and Inspur are also actively deploying in the field of supercomputing interconnection. Taking Qingyun Technology as an example, relying on the accumulation of technology and landing practice, Qingyun has officially released the AI intelligent computing platform and AI computing cloud service. Miao Hui told Titanium Media that for multiple heterogeneous computing centers of different scales, the AI intelligent computing platform can provide unified management, intelligent operation and maintenance, efficient user self-service, etc., realize the matching of computing resources according to applications and needs at any time, automatic switching, realize the standardized operation of a variety of computing scenario services, and comprehensively improve the operational efficiency and platform capabilities of the computing center with an open application framework and model services.

In addition to the active layout of major manufacturers. At the national level, efforts are also being made to promote the development of the supercomputing industry. Recently, the National Supercomputer Internet launched a "experience officer" recruitment plan. It is understood that the plan will invite 1,500 application developers and users from the fields of scientific research, manufacturing, artificial intelligence and other fields across the country to serve as the first batch of public beta users of the national supercomputing Internet to help the national supercomputing Internet optimize the function of the iterative platform and improve the full-process delivery experience of computing power products.

In addition to the industry side and the user side, "learning and research" is also a vital part of the development process of supercomputing. The development of supercomputing also needs to provide a steady stream of talents and technical support for "learning and research".

Talent development is a top priority

As mentioned above, talent development is a top priority for any industry, and the same is true for supercomputing.

Taking Yancheng Supercomputing Center as an example, it is understood that Yancheng Supercomputing Center will strengthen cooperation with Tsinghua University, Peking University and other colleges and universities to carry out real-time maintenance of network communication and storage reading and writing, improve the overall operation efficiency of supercomputing Internet, and build a "supercomputing industrialization talent training base" for the whole country.

According to relevant sources, through projects such as intelligent pedestals, industry-education integration education bases, and crowd intelligence, Kunpeng and Ascend have carried out a number of cooperation with 10 leading universities in Hubei, covering 30,000 students; More than 10 local universities, including Huazhong University of Science and Technology, have completed more than 50 project docking, covering thousands of scientific researchers.

It is understood that according to the plan, in the next three years, the talent plan will be extended to more than 50 colleges and universities in Hubei, training 100,000 college computing talents, and providing a steady stream of power for the development of Hubei's digital economy.

Coincidentally, Jinan Supercomputing Center jointly built the School of Cyberspace Security of Qilu University of Technology, which is a national doctoral degree authorization point in computer science and technology approved by the Ministry of Education, supporting the construction of computer science and technology discipline in Qilu University of Technology, which has become one of the 13 "peak disciplines" in the key construction of provincial universities in Shandong Province, with an international doctoral/master's joint training program.

In addition to the support of major universities, appropriate forward-looking talent training is also essential. Because most of the talents are still in the student stage, it takes a few years to enter the society and devote themselves to the development of the industry, so in the process of talent training, appropriate forward-looking is particularly important, in this regard, Liu Jun, a member of the ASC Organizing Committee, said that the ASC Organizing Committee will also take into account the representativeness and advance in the process of setting the competition questions.

It is worth noting that it is based on the consideration of appropriate advancement and representativeness that the application of large models in 2023, which will be hot in 2023, is set as one of the questions of this year's ASC preliminaries. This year's competition, combined with the LLaMA model with high industry awareness, high degree of open source and wide application prospects, competed around the LLaMA inference acceleration of the large language model," Liu Jun pointed out, "Relying on the strong background support such as the call of global resources and code support, the competition will test how to more effectively obtain relevant resources and realize the inference acceleration of the large model." It also paves the way for the integration of large models and supercomputing in the future."

It is conceivable that in 2024, under the joint promotion of "industry-university-research-application" and other parties, as well as the trend of continuous penetration of large models into various industries, China's supercomputing field will usher in a year of rapid development. At the same time, it is also necessary to strengthen international cooperation and exchanges, and actively participate in international competition, so as to continuously enhance the international influence of mainland supercomputing.

Read on