laitimes

Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem

author:Communication Industry News
Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem
The end-to-end open and decoupled intelligent computing solution is the key to the healthy development of the industry.

The emergence of large models has given rise to the demand for large computing power, and end-to-end open and decoupled intelligent computing solutions are the key to the healthy development of the industry. ZTE has been committed to becoming an end-to-end open and decoupled intelligent computing solution provider, accelerating the innovation, R&D, and commercialization of AI technology, and striving to achieve a win-win business ecosystem with industry partners. Recently, "Communications Industry News" interviewed Chen Xinyu, vice president of ZTE, and discussed how ZTE can cope with challenges and help the development and application of large models.

Communications Industry News: What work has ZTE done and what results have been achieved in the end-to-end open and decoupled intelligent computing solution?

Chen Xinyu: ZTE adheres to the concept of open decoupling, gives full play to the advantages of ZTE's software, hardware and engineering capabilities, works with partners to build a multi-channel supply chain, and provides users with end-to-end open decoupled intelligent computing solutions through innovation in hardware, software and capability platform.

Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem

Chen Xinyu, Vice President of ZTE, said that ZTE believes that an open technology ecosystem can build a win-win business ecosystem, and the end-to-end open decoupled intelligent computing solution is the key to the healthy development of the industry.

In terms of hardware, ZTE uses a flexible base and adapts to a variety of CPU platforms and GPU modules to realize core replacement, card replacement, and no seat replacement. It supports three CPU platforms and is adapted to mainstream GPUs, providing users with diversified computing power, and users can flexibly choose computing power according to different needs and situations such as cost, policy, supply, and power consumption.

In terms of software, software and hardware decoupling are realized through heterogeneous resource management, training and promotion job scheduling, and heterogeneous collective communication. It shields the differences between chips from different manufacturers and adapts to mainstream AI frameworks upwards, providing a high-performance, high-reliability, and easy-to-migrate environment for model operation. Through in-depth optimization of software and hardware synergy, resource efficiency is maximized. In addition, we continue to research computing power offloading and on-network computing technologies to improve computing power utilization.

In terms of platform, it adapts to mainstream frameworks such as PyTorch and TensorFlow to realize automatic compilation and optimization to the back-end platform, and provides an end-to-end engineering toolset from data processing, model development, training, optimization, evaluation, and deployment to support the guarantee and management of the whole life cycle. At the same time, migration tools are provided to support global application migration without induction, reducing user migration costs.

Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem

End-to-end open and decoupled intelligent computing platform. ZTE believes that an open technology ecosystem can build a win-win business ecosystem, and an end-to-end open and decoupled intelligent computing solution is the key to the healthy development of the industry. Through the decoupling of software and hardware, training and deduction, and model decoupling, we will promote the componentization and sharing of various capabilities, accelerate the commercialization of AI technology innovation, R&D, and application, and build an open technology ecosystem. Through the complementary advantages of chip manufacturers, hardware manufacturers, model developers, and application developers in the industry, we will jointly become bigger and stronger, and jointly realize the vigorous development of the intelligent computing ecosystem.

Communications Industry News: The emergence of large models has given rise to the demand for large computing power, which has brought challenges to the infrastructure, what measures has ZTE taken to deal with it?

Chen Xinyu: At present, the cluster scale cannot meet the training of more than one trillion super-large models, and it is imperative to break through the upper limit of large-scale cluster networking in China. From the GPT300 billion model to the GPT4 trillion model, the model parameters increase by 10 times per year, and the total training computing power needs to be increased by dozens of times. However, the performance of computing chips can only increase by 2~4 times per generation, and a single cluster needs more GPU cards to meet the training needs of trillions of large models.

In order to build a larger-scale computing power cluster, ZTE has continuously researched and optimized the high-speed interconnection technology scheme between GPU cards from the two dimensions of in-machine and inter-machine to meet the training needs of more than one trillion large models. In the machine, an open OLink interconnection protocol is proposed, which breaks through the TP8 limitation in the machine and supports the large TP computing power of 16 to 128 GPU supercomputing nodes. Through the continuous evolution of large-capacity switch chip capabilities, it provides a switch-box-box interconnection solution based on the standard RoVEv2 protocol to meet the flexible networking requirements of ultra-large-scale computing power from 1,000 to 10,000 cards.

Communications Industry News: With the completion of intelligent computing infrastructure and the maturity of large model training, the implementation of industry applications has become the biggest challenge, what solutions does ZTE have? How to promote the commercial closed loop of AI applications?

Chen Xinyu: Enterprises have shortcomings in the application of AI technology, and the protection of private data limits the effectiveness of model training. In addition, the personalized needs of different industries and enterprises also increase the complexity of application implementation.

In order to solve these problems, ZTE proposed the solution of introducing AiCube training and pushing all-in-one machine. In terms of software and hardware, it provides a multi-category high-computing hardware base and an easy-to-use training and promotion platform, with built-in mainstream large models and AI applications. In terms of service, we provide customized services and training services.

In order to promote the implementation of industry applications, ZTE works closely with industry partners to launch a variety of integrated solutions. For example, for the field of industrial quality inspection, we provide machine vision all-in-one machines; For the medical industry, the intelligent question and answer all-in-one machine for intelligent guidance was launched. Users do not need professional technology accumulation, large-scale investment, professional computer rooms, or professional teams to build their own exclusive large models and use AI to improve productivity. The integrated deployment of intelligence, computing, and use can greatly reduce the threshold for AI promotion and accelerate the commercialization of industry markets.

Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem

ZTE AiCube training and pushing all-in-one machine. In the process of commercialization of AI applications, the combination of training and promotion can accelerate the business closed loop. China has a large and rich number of application scenarios and private domain data, which is one of the biggest advantages in the global AI competition. In terms of application, ZTE uses self-developed or open-source basic models, combined with rich industry data and knowledge engineering, to build domain models, create industry models, and achieve breakthroughs from "0" to "1". Based on the domain model, combined with different scenarios, the application expansion from "1" to "N" is realized.

In terms of the market, the C-end market demand is relatively consistent, so cloud deployment is more appropriate. In the B-end market, due to the fact that application scenarios are still being explored and incubated, and due to considerations such as private domain data security, customers prefer to adopt the form of private domain deployment. Therefore, ZTE advocates building a full-link service from the central cloud to the dedicated cloud. The central cloud supports large model pre-training and cloud inference based on general data, while the dedicated cloud provides local fine-tuning and inference based on private domain data. This service model is built and maintained by operators and leased by enterprises, which can better meet customer needs and accelerate the implementation and application of large models in different scenarios.

Communications Industry News: What are ZTE's practices in large models and applications?

Chen Xinyu: In terms of large models and applications, ZTE adopts the "1+N+X" strategy, and the basic large model adopts both self-research and ecological cooperation, and on this basis, pre-trains "N" domain large models through domain knowledge increment, including R&D large model, industrial large model, communication large model, government affairs large model, etc., and then derives "X" applications and builds a new engine for industrial digital and intelligent transformation.

Among them, in the field of R&D, the R&D code model assists the company's developers to increase their coding efficiency by 30%; In the field of manufacturing, in the company's Nanjing Binjiang base, the industrial model shortens the order scheduling time by 88% and improves the process file generation efficiency by 50%; In the field of communications, at the Wuzhen World Internet Conference, the critical insurance assistant based on the communication model can generate key event assurance solutions with one click, which can improve the guarantee efficiency by 80% compared with the traditional support process.

The industry's first large-model-based "Zhiyu" anti-fraud system application conducts incremental training on millions of fraudulent SMS samples, which can accurately identify spam SMS messages that have undergone various mutations and interference in combination with contextual semantic association information, and the interception accuracy is increased to 99%; In the field of water conservancy, based on the water conservancy model, multiple rounds of dialogue, intention identification, knowledge Q&A, etc., are realized to help the construction of the water conservancy knowledge platform, and the accuracy rate of river knowledge Q&A reaches 90%; In the field of urban lifeline safety engineering, it is the first to use large models to realize visual intelligent identification of various risks such as gas, water accumulation, and road hazards, and automatically generate emergency response plans to ensure the safety of people's lives and property.

Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem

Written by: Cui Liangliang

Editor: Cui Liangliang

Guidance: Xin Wen

Telecommunications Day Special Article| Promoting Sustainable Development: A World with Light, Without the Digital Divide Telecommunication Day Special Article Bridging the Digital Divide: 5G Takes Connectivity from "Wireless" to "Infinite"

What they say: Message from Industry Leaders on World Telecommunication Day (2)

Telecom Day Message | Hengtong Cui Wei: Develop new quality productivity and work together to move towards a better future

Telecommunications Day Special Article| Boosting Digital Innovation: China Information Technology 5G Intelligent O&M Empowers Smart Wireless Signal Day | 5G-A sets sail to boost the growth of new quality productivity

Telecommunications Day Special Article| Enabling new digital productivity, SIEMIC's next-generation all-optical access network helps sustainable development

Latest: The theme of this year's World Telecommunication Day has been determined, and the theme publicity of the World Telecommunication Day on May 17 and the industry research of "Telecom Day Editor's Choice" have been launched

Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem
Telecommunications Day Special Article| ZTE: Accelerate the prosperity of the intelligent computing ecosystem

Read on