"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

author：Architectural thinking 2023-09-14 15:20:00

outline

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

First, the details of the large language model

transformer and LLM

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.1 Model structure

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.2 Training objectives

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.3 tokenizer

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.4 Location Code

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.5 Layer normalization

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.6 Activation Functions

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.7 Multi-query Attention 与 Grouped-query Attention

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.8 Parallel transformer block

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

1.9 Summary - Training stability

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Second, the distributed pre-training of LLM

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Peer-to-peer communication vs. collective communication

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.1 Data parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.2 Tensor parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.3 Pipeline parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.4 3D parallelism

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.5 Mixed Precision Training

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.6 Activate Recalculation

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.7 ZeRO, zero redundancy optimizer

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.8 CPU-offload，ZeRO-offload

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.9 Flash Attention

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

2.10 vLLM: Paged Attention

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Third, the parameters of LLM are fine-tuned efficiently

Why efficient parameter fine-tuning?

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.1 prompt tuning

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.2 prefix tuning

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.3 adapter

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.4 LLaMA adapter

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.5 LoRA

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

3.6 Experimental comparison

"Artificial intelligence" cannot miss the technical principle details of the mainstream large language model

Large language models artificial intelligence Language models

Previous: Tsinghua proposed a steady-state micro-beam EUV light source scheme, and the power easily exceeded 1kW, how did it happen?

Next: In 1943, the whole Soviet Union rushed to Leningrad: only one cat survived in the city, but a million rats appeared

Read on

Global AI Agent inventory, big language model entrepreneurship must refer to 60 AI agents
Large language models Language models
09-26
Reversing the Curse: The Powerlessness of Big Language Models
Large language models artificial intelligence Language models
09-27
CNCC | Prospective problems and challenges of large language models in mathematics: theory, methods and applications
Large language models Language models
09-28
Recently, the desktop operating system, the three camps have very large version updates. First of all, domestic DeepinOS accesses AI large language models. Immediately after the 26th, Microsoft Wind
Large language models artificial intelligence Language models apple
09-28
The implementation practice of large language model in data warehouse data governance
Large language models Language models
09-28
The breakthrough of the big language model is to equip AI with five senses and five senses
Large language models Language models
09-29
How to use big language models to build a private knowledge base?
Large language models Language models
09-29
🚀Langchain-Chatchat: The New Choice for Local Knowledge Base Q&A! 🌟 Project Highlights: Based on the Big Language Model: Combining Langchain and Ch
Large language models Language models openai source
09-29
Microsoft launched the AutoGen framework to help developers create complex applications based on large language models
Large language models Language models
09-29
Live Review | Potential and resistance, explore the application of big language models in the field of financial risk control
Large language models Language models
10-01
Under the wave of ChatGPT, look at the development of China's large language model industry #Dongshroom Business School#
Large language models Language models ChatGPT China
10-01
The Big Language Model of Federal Law
Large language models Language models
10-03
The bookstore picked it up casually and took a look, and stood for three hours to read it, the fastest reading speed 😂 ever#Large Language Model#OpenAI
Large language models Language models
10-04
KOSMOS-2.5: Multimodal Large Language Model for Reading "Text-Dense Images"
Large language models Language models
10-04
MIT Amazing Proof: Big Language Model is the World Model? LLM understands space and time
Large language models Language models world
10-05
How to Become LLM Word Master! "The Underlying Mental Method of Big Language Model"
Large language models Language models
10-05
Llama 3: The next frontier of open-source large language models
Language models model
04-21
The secret of using large language models: How to control AI with efficient prompt words?
Language models model
04-21
Apple has been exposed to a big move again, self-developed device-side large language model, AI is a new way out of "revitalization"?
Language models apple News equipment model
04-23
No wonder the previous iPhone 16 series national version of the AI function will be provided by Baidu, the original Baidu in the Chinese artificial intelligence invention patent enterprise ranking is still high. Ranked in the top 10
artificial intelligence Language models China cell phone model series
04-24
Apple released OpenELM, an efficient language model based on an open-source training and inference framework
Language models apple model
04-24
Solomonov: The Prophet of Large Language Models
Language models Solomon model
04-24
Large Language Model Deployment: vLLM and Quantization
Language models model
04-24
Apple launches OpenELM, an efficient language model, Xiaomi plans a new car for 150,000 yuan, and AI successfully rewrites human DNA
Language models apple New cars millet success model mankind
04-25
The combination of deep learning and chemical language models is used for de novo drug design, which is published in the journal Nature
Language models Deep learning Learn devise chemistry model
05-07
The tuyere belonging to major technology companies is here again! This large language model leads to the "new industrial revolution."
Industrial Revolution Language models industry technology model
05-11
The landing of large language models Why the first step is to do customer service
Language models model
05-11
OpenAI launches new large language model GPT-4o; Apple will start selling the Vision Pro in China; SoftBank sold almost all of its shares in Alibaba
Language models China apple openai model
05-14
探索大语言模型：理解Self Attention| 京东物流技术团队
Language models model
05-14
The synergy of knowledge graphs with large language models
Language models model
05-15
NetApp and Lenovo offer converged infrastructure solutions optimized for generative AI
Simple, Affordable, Powerful Infrastructure Solutions to Improve Accessibility of AI to the Enterprise SAN JOSE, Calif.--(BUSINESS WIRE)--Intelligent data infrastructure company N...
artificial intelligence
05-15
Nvidia, Microsoft and other large manufacturers continue to deploy and invest, and Southeast Asia has become a new hot spot for artificial intelligence
artificial intelligence
05-15
Multi-functional RNA analysis, the RNA language model of the Baidu team was published in the journal Nature
Language models model
05-15
Why AI can revolutionize mathematics
artificial intelligence mathematics
05-15
How can humanity break free from being replaced by artificial intelligence?
artificial intelligence mankind
05-15
More and more Italian students are using artificial intelligence to do their homework
artificial intelligence Italy
05-16
The parameters are improved slightly, and the performance index explodes! Google: Large language models hide mysterious skills
Language models model
05-17
Liang Jianzhang: Artificial intelligence is not as subversive as the mobile Internet, and humans will still dominate innovation activities [with analysis of the current situation of the artificial intelligence industry]
artificial intelligence move industry activity mankind
05-16
Golden Sheep Network Commentary: The high-quality development of artificial intelligence must be calculated with three accounts and three sets of punches
Artificial intelligence is an important driving force for a new round of scientific and technological revolution and industrial transformation, and its development and application have brought new opportunities and new challenges to the development of human society.
artificial intelligence
05-17
With a net income of 50 billion in a single quarter, executives: Tencent will be the "biggest beneficiary" of the deployment of artificial intelligence
artificial intelligence
05-16
Artificial intelligence fever is sweeping the world, who is the hero in the AI concept section? | Annual Report Research Topic
artificial intelligence
05-16
Artificial intelligence fever is sweeping the world, who is the hero in the AI concept section? | Annual Report Research Topic
artificial intelligence News
05-17
Under the wave of generative AI, what are foreign AI startups doing?
artificial intelligence
05-16
NetApp introduces a unified data store built for the era of artificial intelligence
"The intelligent data infrastructure, powered by the new AFFA family of integrated data services, powers artificial intelligence and simplifies a wide range of workloads at scale...
artificial intelligence storage
05-16
Risk and Governance of Generative AI – The Case of ChatGPT
Generative AI, represented by ChatGPT, brings many risks while creating social welfare. Therefore, it is imperative to combine the development of generative AI in mainland China...
artificial intelligence ChatGPT Ji
05-16
GPT-4o: The all-powerful revolution of artificial intelligence
artificial intelligence
05-17
How scary is AI? AI Painting Midjourney is on fire
artificial intelligence painting
05-17
Liu Shuquan and Zhou Guang: Complying with Artificial Intelligence 2.0, End-to-End Making Autonomous Driving More "Human"
artificial intelligence Autonomous driving
05-17