laitimes

Cloud Leopard Intelligence and FlintBall Technology jointly developed a large-scale high-performance AI computing power platform

Cloud Leopard Intelligence and FlintBall Technology jointly developed a large-scale high-performance AI computing power platform

Jiwei Network News, recently, Cloud Leopard Intelligence and Flint Yuan Technology reached a strategic cooperation, relying on the two sides in the DPU (Data Processing Unit) and AI computing in the field of hardware and software advantages, jointly develop a large-scale high-performance AI computing platform, to provide more efficient solutions for cloud AI computing.

As one of the three major elements of the development of the artificial intelligence industry, AI chips are its carriers. As the key hardware of the artificial intelligence industry, AI chips are the core computing engines used for AI training and inference in AI acceleration servers, and are widely used in artificial intelligence, cloud computing, data centers, edge computing, mobile terminals and other fields. At present, China's AI chip industry is still in its infancy, and the market space needs to be explored and developed. According to iResearch statistics and forecasts, the size of China's AI chip market in 2020 is 19.7 billion yuan, and by 2025, the scale of China's AI chip market will reach 138.5 billion yuan, and the relevant CAGR in 2021-2025 will reach 47%, and the overall market growth rate will be faster.

However, for many enterprises, the cost of independently building their own AI hardware clusters for one-time investment and subsequent O&M is very high, with a long construction cycle and low utilization. The cloudification of AI computing power provides AI computing power in the form of on-demand allocation, which improves efficiency and reduces costs. Therefore, the cloudification of AI computing power is one of the trends in the future development of AI.

Ai computing power cloudification requires not only high-performance AI computing power, but also high-performance data processing capabilities, including data communication between AI chips, AI chips to obtain storage data, etc. With the explosive growth of AI data, the iteration speed of models is faster, the scale of models is getting larger and larger, and the requirements for data processing capabilities are getting higher and higher.

AI chip manufacturers, GPU manufacturers, and AI algorithm developers are constantly exploring hardware and software solutions. Mainstream GPU or AI chip manufacturers provide their own proprietary solutions, such as NVIDIA's NVLink, GPUDirect and corresponding NCCL software, but the AI market and users are looking forward to an open technology and corresponding high-performance solutions.

At present, Cloud Leopard Intelligence has reached a strategic cooperation with Flinthara Technology to jointly develop and provide large-scale high-performance AI computing platform solutions. The two parties innovatively proposed DataDirectPath technology: this technology is used for high-performance distributed data communication, based on the high flexibility and high performance of DPU, to achieve high-performance direct communication with the cloud flint T20 and between and storage, improve the efficiency of AI training, but also reduce training costs. DataDirectPath includes: DataDirectPath RDMA and DataDirectPath Storage. DataDirectPath RDMA is used for ai-accelerated data communication between chips, while DataDirectPath Storage is used for ai-accelerated high-speed data reads and writes between chips and memory.

Clouded Leopard Intelligent cooperated with Flintwon Technology, based on Cloud leopard Edu DPU and Flint technology Cloud Flintstone T20, took the lead in launching the DataDirectPath Storage solution to provide a more efficient solution for AI training storage access. In the traditional solution, when the T20 accesses the storage, the data needs to be moved to the system memory, and then the system memory is moved to the target device. In the innovative DataDirectPath Storage solution, based on DataDirectPath Storage technology, the T20 obtains data directly through the DPU, bypassing system memory and CPU, making data access faster, less access latency, and less overhead. DataDirectPath Storage not only supports the Bypass CPU of the data channel, but also supports the Bypass CPU of the control channel, which makes the control path more efficient and greatly reduces the CPU overhead.

(Proofreading/Andy)

Read on