Details of the technical principles of mainstream large language models
1. Compare the details of LLaMA, ChatGLM, Falcon and other big language models: tokenizer, positional coding, layer normalization, activation function, etc. 2. Distributed training technology for large language models: data parallelism, tensor model parallelism, pipeline parallelism, 3D parallelism, zero redundancy optimizer ZeRO, CPU offloading technology ZeRo-offload, mixed precision training, activated recalculation technology, Flash Attention, Paged Attention. 3. Efficient parameter fine-tuning techniques for large language models: prompt tuning, prefix tuning, adapter, LLaMA-adapter, LoRA.
0. Outline
Details of the technical principles of mainstream large language models
1. Details of large language models
1.0 transformer and LLM
Details of the technical principles of mainstream large language models
1.1 Model structure
Details of the technical principles of mainstream large language models
1.2 Training objectives
Details of the technical principles of mainstream large language models
1.3 tokenizer
Details of the technical principles of mainstream large language models
1.4 Location Code
Details of the technical principles of mainstream large language models
1.5 Layer normalization
Details of the technical principles of mainstream large language models
1.6 Activation Functions
Details of the technical principles of mainstream large language models