laitimes

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

author:Architectural thinking

outline

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

First, the details of the large language model

transformer and LLM

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.1 Model structure

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.2 Training objectives

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.3 tokenizer

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.4 Location Code

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.5 Layer normalization

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.6 Activation Functions

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.7 Multi-query Attention 与 Grouped-query Attention

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.8 Parallel transformer block

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.9 Summary - Training stability

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Second, the distributed pre-training of LLM

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Peer-to-peer communication vs. collective communication

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.1 Data parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.2 Tensor parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model
"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.3 Pipeline parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.4 3D parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.5 Mixed Precision Training

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.6 Activate Recalculation

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.7 ZeRO, zero redundancy optimizer

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.8 CPU-offload,ZeRO-offload

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.9 Flash Attention

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.10 vLLM: Paged Attention

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Third, the parameters of LLM are fine-tuned efficiently

Why efficient parameter fine-tuning?

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.1 prompt tuning

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.2 prefix tuning

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.3 adapter

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.4 LLaMA adapter

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.5 LoRA

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.6 Experimental comparison

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Read on