laitimes

Fujitsu uses the Fugaku supercomputer to train LLMs

author:The semiconductor industry is vertical
Fujitsu uses the Fugaku supercomputer to train LLMs

THIS ARTICLE IS SYNTHESIZED BY THE SEMICONDUCTOR INDUSTRY (ID: ICVIEWS).

Before Monaka, Fujitsu used Fugaku to train LLMs.

Fujitsu uses the Fugaku supercomputer to train LLMs

Although Fujitsu's Fuyue supercomputer is no longer the fastest machine on the Supercomputer 500 list, it is still a very powerful system, and the versatility of the A64FX processor allows it to be used for a variety of workloads, such as AI. This week, Fujitsu released Fugaku-LLM, a large language model with advanced Japanese processing capabilities designed for research and commercial applications.

Fujitsu's Fugaku-LLM is trained on 13,824 nodes of the Fugaku supercomputer, which is based on the A64FX processor and supports FP64, FP32, FP16 and INT8 modes for a wide range of AI and traditional supercomputer applications. The training of Fugaku-LLM makes use of distributed parallel learning techniques optimized for supercomputer architectures and interconnects.

With 13 billion parameters, Fugaku-LLM pales in comparison to GPT-4's 175 billion parameters, but Fugaku-LLM is the largest LLM ever trained in Japan. Fujitsu says its 13 billion parameter LLM does not require a lot of computing resources for inference, which is the best option for Japanese companies and researchers. About 60% of the training data is in Japanese, and 40% of the data is in English, math, and code data.

This extensive Japanese-centric training sets it apart from other Japanese models that are primarily trained on English datasets. As a result, Fugaku-LLM has an excellent level of Japanese, achieving an average score of 5.5 on the Japanese MT-Bench, which is the highest score among publicly available models trained using Japanese raw data. According to Fujitsu, it excels particularly in the humanities and social sciences, achieving an impressive benchmark score of 9.18.

The Fugaku-LLM program is driven by collaborations between leading Japanese institutions such as Tokyo Institute of Technology, Tohoku University, Fujitsu Corporation, RIKEN, Nagoya University, CyberAgent, and Kotoba Technologies. One of the reasons they collaborate is the shortage of GPUs that are typically used to train and inference AI models. Another reason is that the model can be used with Fujitsu's next-generation 150-core Monaka data center CPU, which is optimized for AI and HPC workloads.

Fugaku-LLM is now available for academic and commercial purposes under GitHub and Hugging Face's designated license terms (although Fujitsu has not provided any links). In addition, it will be available through the Fujitsu Research Portal from May 10, 2024.

Introduction to Fuyue Supercomputer

Supercomputer Fugaku is a supercomputer jointly developed by Fujitsu and RIKEN as the successor to the Kyoto. R&D began in 2014 and was officially activated in 2021. Fugaku is located in the RIKEN Computational Science Research Center on an artificial island in Minato-ku, Chuo-ku, Kobe City, Hyogo Prefecture, and its computing power is 100~120 times that of Beijing, and its power consumption is 30 million~40 million watts, while Beijing's power consumption is 12.7 million watts. "Mt. Fuji" is another name for Mt. Fuji.

Fuyue is the world's first ARM-based supercomputer, using Fujitsu's 48-core A64FX SoC, which is different from the x86 and x64 mainstream platforms of Intel or AMD, which were mostly used in previous supercomputers. With a total of 158,976 nodes, Fugaku has a peak performance of 1 exaFLOPS (1,000 petaFLOPS). In addition to getting a good score in Linpack, Fuyue also achieved 1.421 exaFLOPS in HPL-AI.

On June 23, 2020, Fuyue was officially certified as the No. 1 supercomputer in the TOP500 with a computing speed of 415 PFLOPS. Later, on November 17 of the same year, the TOP 500 ranking was successfully ranked first.

On May 22, 2023, Tokyo Institute of Technology, Fujitsu, RIKEN, and Tohoku University announced that they will use Fugaku to develop Japanese-based generative AI. In the future, the two organizations will jointly build basic technologies centered on the Japanese language and provide them to domestic companies free of charge from 2024. Japan's move is to compete with the United States and develop its own technology to avoid the generative AI monopoly of OpenAI and Google in the United States.

Officials said that Japan's Institute of Physics and Chemistry and Japan's Tohoku University will also cooperate in the research and development of generative AI, independently developing large-scale language models as the basis of generative AI. The AI learning of this project will use Japanese data and other materials published on Wikipedia to improve Japanese conversation skills. In addition to this, the project will also work with CyberAgent, Japan's largest online advertising agency, which is also developing its own generative AI.

Nikkei Chinese said that in Japan, the parameters of language models that determine AI performance were often only about billions, and the large-scale language models disclosed by CyberAgent on May 17, 2023 are up to 6.8 billion. For comparison, OpenAI's GPT-3 language model in the United States has 175 billion parameters. In the future, Tokyo Institute of Technology and others will build large-scale language models with about 100 billion parameters.

Fuyue fell to fourth in the global supercomputing list

In November 2023, RIKEN announced that the supercomputer "Fugaku", jointly developed with Fujitsu, ranked fourth in the "TOP500" of the world's supercomputing speed rankings. In the ranking released in May 2023, "Fuyue" once ranked second, but was surpassed by the most advanced supercomputers launched in the United States.

It is reported that the global supercomputer performance ranking is published twice a year by an international conference organization composed of experts. Since June 2020, "Fugaku" has been ranked No. 1 in the world four times in a row. After being crowned by the American "Frontier" in May 2022, it ranked second three times in a row. However, due to the fact that many research institutions in the United States have successively launched the most advanced supercomputers, supercomputers from the United States occupy the top three on this list.

According to reports, "Frontier", which has been ranked No. 1 in the world for four consecutive times, has achieved a computing speed of 119.4 kyos per second (10,000 times that of 1 trillion kyo), while "Fugaku", which ranks fourth, has achieved a speed of 44.2 kyogyos per second.

*Disclaimer: This article was created by the original author. The content of the article is his personal point of view, and our reprint is only for sharing and discussion, and does not mean that we agree or agree, if you have any objections, please contact the background.

Read on