Microsoft Open Source Deep Speed Chat: The era of ChatGPT for everyone is here

Shin Ji Won reports

Editor: Aeneas is sleepy

Microsoft's open-source DeepSpeed Chat allows developers to realize the dream of a ChatGPT!

The dream of a ChatGPT is about to come true?

Just now, Microsoft open-sourced a system framework that can add the complete RLHF process to model training, DeepSpeed Chat.

That said, high-quality ChatGPT-like models of all sizes are now at your fingertips!

Project address: https://github.com/microsoft/DeepSpeed

Unlock ChatGPT with one click, easily saving 15 times money

As we all know, because OpenAI is too unOpen, the open source community has successively launched LLaMa, Alpaca, Vicuna, Databricks-Dolly and other models in order to allow more people to use ChatGPT-like models.

However, due to the lack of an end-to-end RLHF scale-up system, the training of ChatGPT-like models is still very difficult. The emergence of DeepSpeed Chat just completes this "bug".

Even brighter, DeepSpeed Chat has knocked down the cost.

Previously, expensive multi-GPU setups were beyond the reach of many researchers, and even with access to multi-GPU clusters, existing methods could not afford to train hundreds of billions of parameters of ChatGPT models.

Now, for $1620, you can train an OPT-66B model in 2.1 days with the hybrid engine DeepSpeed-HE.

If a multi-node, multi-GPU system is used, DeepSpeed-HE can train an OPT-13B model in 1.25 hours for $320 and an OPT-175B model in less than a day for $5120.

Elvis, a former Meta AI expert, retweeted excitedly, calling it a big deal and wondering how DeepSpeed Chat compares to ColossalChat.

Let's take a look at how it works.

After DeepSpeed-Chat training, the 1.3 billion parameter version of "ChatGPT" performed very well in the Q&A session. Not only can you get the context of the question, but the answer given is also similar.

In multiple rounds of conversation, the performance demonstrated by this 1.3 billion parameter version of "ChatGPT" completely exceeds the inherent impression of this scale.

A piece of code to generate your first ChatGPT

Of course, before the experience, you also need to configure the environment:

A cup of coffee, 1.3 billion parameter version of ChatGPT

If you only have about 1-2 hours of coffee or lunch break, try training a "little toy" with DeepSpeed-Chat.

The team has prepared a training example for the 1.3B model that can be tested on a consumer GPU. Best of all, when you come back from your lunch break, everything is ready.

Consumer NVIDIA A6000 GPU with 48GB of memory:

One GPU Node, 13 billion parameters in half a day

If you only have half a day and a server node, you can generate a 13 billion parameter-like ChatGPT model with pre-trained OPT-13B as the actor model and OPT-350M as the reward model:

Single DGX node with 8 NVIDIA A100-40G GPUs:

Ultra-money-saving cloud solution, training 66 billion parameter models

If you can use multi-node clusters or cloud resources and want to train a larger, higher-quality model. Then just enter the model size (e.g. 66B) and number of GPUs (e.g. 64) based on the following line of code:

8 DGX nodes, each equipped with 8 NVIDIA A100-80G GPUs:

Specifically, the time and cost required by the DeepSpeed-RLHF system for different scale models and hardware configurations are as follows:

What is DeepSpeed Chat?

DeepSpeed Chat is a general system framework that enables end-to-end RLHF training similar to ChatGPT models, helping us generate our own high-quality ChatGPT-like models.

DeepSpeed Chat has the following three core features:

1. Simplify the training and reinforcement inference experience of ChatGPT-type models

Developers can implement multiple training steps with a single script, and can also use the inference API for conversational interactive testing after completion.

2. DeepSpeed-RLHF module

DeepSpeed-RLHF replicates the training model in the InstructGPT paper and provides data abstraction and blending capabilities that allow developers to train using multiple data sources from different sources.

3. DeepSpeed-RLHF system

The team integrated DeepSpeed's training engine and inference engine into a unified hybrid engine (DeepSpeed Hybrid Engine or DeepSpeed-HE) for RLHF training. Since DeepSpeed-HE is able to seamlessly switch between inference and training modes, it can take advantage of various optimizations from DeepSpeed-Inference.

The DeepSpeed-RLHF system has unmatched efficiency in large-scale training, making complex RLHF training fast, economical, and easy to scale up:

Efficient and economical:

DeepSpeed-HE is more than 15 times faster than existing systems, making RLHF training fast and affordable.

For example, DeepSpeed-HE can train an OPT-13B model in just 9 hours on the Azure cloud and an OPT-30B model in just 18 hours. These two trainings cost less than $300 and $600, respectively.

Excellent scalability:

DeepSpeed-HE can support the training of models with hundreds of billions of parameters and show excellent scalability on multi-node and multi-GPU systems.

Therefore, even a model with 13 billion parameters can be trained in just 1.25 hours. For a model with 175 billion parameters, training with DeepSpeed-HE takes less than a day.

Democratize RLHF training:

With a single GPU, DeepSpeed-HE can support training models with more than 13 billion parameters. This makes it easy for data scientists and researchers who do not have access to multi-GPU systems to not only create lightweight RLHF models, but also large and powerful models for different use cases.

Complete RLHF training process

To provide a seamless training experience, the researchers followed InstructGPT and included a complete end-to-end training process in DeepSpeed-Chat.

DeepSpeed-Chat's RLHF training flowchart includes a number of optional features

The process consists of three main steps:

Step 1:

Supervised fine-tuning (SFT), which uses curated human responses to fine-tune a pre-trained language model for a variety of queries.

Step 2:

Reward model fine-tuning uses a dataset containing multiple answers to the same query scored by humans to train an independent (usually smaller than SFT) reward model (RW).

Step 3:

RLHF training, in this step, the SFT model is further fine-tuned from the reward feedback of the RW model by using an approximate strategy optimization (PPO) algorithm.

In Step 3, the investigator provides two additional features to help improve model quality:

- Exponential Moving Average (EMA) collection, an EMA-based checkpoint can be selected for final evaluation.

- Hybrid training, which mixes a pre-training target (i.e., the next word prediction) with a PPO target to prevent performance regression on a common benchmark such as SQuAD2.0.

EMA and hybrid training, two training features, are often overlooked by other open source frameworks because they do not hinder training.

However, according to InstructGPT, EMA checkpoints tend to provide better response quality than traditional final trained models, while hybrid training can help the model maintain its pre-training baseline resolution capability.

As a result, the researchers provided users with these features that allowed them to fully gain access to the training experience described in InstructGPT.

In addition to being highly consistent with the InstructGPT paper, the researchers also provide features that allow developers to train their own RLHF models using a variety of data resources:

Data abstraction and blending capabilities:

DeepSpeed-Chat is equipped with (1) an abstract dataset layer to unify the format of different datasets; and (2) data splitting/blending capabilities, whereby multiple datasets are appropriately blended and then split in 3 training phases.

DeepSpeed hybrid engine

Steps 1 and 2 of the instructed RLHF pipeline, similar to the general fine-tuning of large models, are achieved by a combination of ZeRO-based optimization and flexible parallel strategies in DeepSpeed training to achieve scale and speed.

Step 3 of the pipeline is the most complex part in terms of performance impact.

Each iteration needs to efficiently handle two phases: a) the inference phase, which is used for token/experience generation, generating inputs for training; b) During the training phase, update the weights of the Actor and Reward models, as well as the interactions and scheduling between them.

It introduces two major difficulties: (1) memory cost, as multiple SFT and RW models need to be run throughout the third phase; (2) The generation of answers is slower, and if not accelerated correctly, it will significantly slow down the entire third stage.

In addition, two important functions added by researchers in the third phase, exponential moving average (EMA) collection and hybrid training, will incur additional memory and training costs.

To address these challenges, the researchers put all of DeepSpeed's training and inference system capabilities into a unified infrastructure, the Hybrid Engine.

It uses the original DeepSpeed engine for fast training mode, while effortlessly applying the DeepSpeed inference engine for generation/evaluation mode, providing a faster training system for the third stage of RLHF training.

As shown in the figure below, the transition between DeepSpeed training and inference engines is seamless: by enabling the typical eval and train modes for the actor model, when running the inference and training process, DeepSpeed chooses different optimizations to run the model faster and improve the throughput of the entire system.

DeepSpeed hybrid engine design to accelerate the most time-consuming part of the RLHF process

During the inference execution in the experience generation stage of RLHF training, the DeepSpeed hybrid engine uses a lightweight memory management system to process KV cache and intermediate results, while using highly optimized inference CUDA cores and tensor parallel computing, achieving a significant increase in throughput (tokens per second) compared with existing schemes.

During training, the hybrid engine enables memory-optimized techniques such as DeepSpeed's ZeRO series technology and low-order adaptive (LoRA).

The way researchers design and implement these system optimizations is that they are compatible with each other and can be combined to provide the highest training efficiency under a unified hybrid engine.

The hybrid engine can seamlessly change model partitioning in training and inference to support tensor-based parallel inference and ZeRO-based training sharding mechanism.

It can also reconfigure the memory system to maximize memory availability in each mode.

This avoids memory allocation bottlenecks and supports large batch sizes, greatly improving performance.

Together, hybrid engines push the boundaries of modern RLHF training, delivering unparalleled scale and system efficiency for RLHF workloads.

Effectiveness evaluation

Compared to existing systems such as Colossal-AI or HuggingFace-DDP, DeepSpeed-Chat has more than an order of magnitude of throughput, being able to train larger actor models on the same latency budget or similarly sized models at a lower cost.

For example, on a single GPU, DeepSpeed increased the throughput of RLHF training by more than 10 times. While both CAI-Coati and HF-DDP can run 1.3B models, DeepSpeed can run 6.5B models on the same hardware, directly 5x higher.

On multiple GPUs on a single node, DeepSpeed-Chat is 6-19 times faster than CAI-Coati in terms of system throughput and 1.4-10.5 times faster than HF-DDP.

According to the team, one of the key reasons why DeepSpeed-Chat has achieved such excellent results is the acceleration provided by the hybrid engine during the generation phase.

Resources:

https://github.com/microsoft/DeepSpeed

Microsoft Open Source Deep Speed Chat: The era of ChatGPT for everyone is here

Read on

A panacea for equity? Musk's open source plan for Twitter algorithms is far more complicated than imagined

Huawei experts explain the OpenHarmony open source Hongmeng hardware resource pooling model

Mozilla's open source speech dataset has 20,000 hours of content, and new support for Cantonese and Hokkien languages

Microsoft's participation in the Open 3D Foundation will drive the development of open source 3D engines

In an instant 5k+star, Musk bought Twitter for $44 billion and announced that it would open source

Only one-tenth of the data is needed to complete the four visual tasks, and it is actually open source!

For the first time in the history of a tech company: a Meta open source AI model of the size of the GPT3 parameter

Huawei Hongmeng HarmonyOS 94 JS/eTS open source components are newly launched

Taking Log4j as an example, how to assess and classify security risks

A programmer's success story: from open source tools to a $7.5 billion software empire

The Fudan MOSS model is scheduled to be open source in mid-April, and Qiu Xipeng explains in detail how to build it

Musk does not talk about Wude: while publicly calling for the suspension of AI research, while secretly developing an "AI version of WeChat"?

Meta has thrown out another AI open source masterpiece! Animated graffiti and exposed new datasets

Microsoft seeks to transform its digital advertising business with ChatGPT

Ten thousand layoffs turned around and embraced AI, and Meta was going to change its name again

Microsoft Google wants to reinvent the business with AI, Musk said that AI will destroy humanity... Talk about AI

Microsoft Azure OpenAI International Edition integrates ChatGPT and other five large model services

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

Samsung "backstabbed" Google

Musk threatened to sue Microsoft, saying it "illegally used Twitter data for AI training."

Keep up with Microsoft! Google's generative AI Bard can program and debug code bugs too

Bing chat improvement report: Correctly display math formulas to reduce abnormal ending of conversations

Gates: AI will disrupt education, but in the short term "there will be far more failures than successes"

Fudan MOSS big model is open source! Github and Hugging Face went live at the same time

"Red Sky Island" debut rollover was bombarded with bad reviews, and the president of Xbox apologized

GPT-4 Windows Fried Field! The whole system is a conversational robot, and Microsoft has built an AI universe

Game information: Microsoft is determined to win and settle with Sony Nintendo for mergers and acquisitions!

Open source big model, the next "stuck neck" technology? Deep web

Sony Hong Kong service PS+ one, two and three levels of membership officially increased the price, and the national service annual membership has risen to 309 yuan

Microsoft today officially launched the XGP Core service: replacing the Gold membership and providing a mini-game library