definition

Large language models (LLMs) are very large deep learning models that are pre-trained based on large amounts of data. The underlying converter is a set of neural networks that consist of encoders and decoders with self-attention capabilities. Encoders and decoders extract meaning from a range of texts and understand the relationships between words and phrases within them.

Converter LLMs are capable of unsupervised training, but a more precise explanation is that converters can perform autonomous learning. Through this process, the converter learns to understand basic grammar, language, and knowledge.

Unlike earlier recurrent neural networks (RNNs) that processed inputs sequentially, transformers process entire sequences in parallel. This allows data scientists to train transformer-based LLMs using GPUs, dramatically reducing training time.

With transformer neural network architectures, you can work with very large-scale models, often with hundreds of billions of parameters. This large-scale model can ingest large amounts of data, usually from the internet, but can also ingest data from sources such as Common Crawl, which contains more than 50 billion web pages, and Wikipedia, which has about 57 million pages.

History

According to Wikipedia, the history of natural language processing (NLP) is closely related to the development of large language models. Here are some of the key historical moments:

(1) Early exploration:

The history of machine translation dates back to the 17th century, when philosophers such as Leibniz and Descartes proposed coding about the relationship between words between languages. In 1950, Alan Turing published his famous article "Computing Machines and Intelligence", proposing the standard of intelligence now known as the Turing test.

(2) Early success of NLP systems:

In the 1960s, some notable NLP systems emerged, such as SHRDLU, a natural language system that worked in a restrictive "block world".

In 1970, William A. Woods introduced Enhanced Transition Networks (ATNs) for representing natural language input.

(3) Introduction of machine learning:

At the end of the 1980s, there was a revolution in the field of NLP, with the introduction of machine learning algorithms for language processing.

During this period, research increasingly focused on statistical models that made soft, probabilistic decisions by weighting real values based on the characteristics of the input data.

(4) Recent research trends:

Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms that are able to learn from data that is not manually labeled.

(5) International platform development

Google's BERT (2018):

BERT（Bidirectional Encoder Representations from Transformers）是谷歌推出的模型，采用了Transformer架构，特别在理解语言上下文方面取得了突破。

OpenAI's GPT series (from 2018):

The GPT (Generative Pre-trained Transformer) series, from GPT to GPT-3, has gradually improved the scale and capabilities of the model, especially GPT-3 is known for its huge parameter scale and wide application capabilities.

GitHub Copilot(2021年):

GitHub Copilot is an AI programming assistant jointly developed by GitHub and OpenAI, based on OpenAI's Codex model, and trained specifically for programming languages. It is capable of automatically generating code snippets based on comments and supports multiple programming languages.

Anthropic的Claude:

Claude is a large language model developed by Anthropic that was designed with a special focus on security and interpretability, with the aim of creating a more reliable and ethical AI.

(6) Platform development in China

Baidu's ERNIE series:

ERNIE（Enhanced Representation through Knowledge Integration）是百度推出的一系列模型，特别在中文NLP任务中表现出色。

Wenxin Yiyan (Baidu):

Wenxin Yiyan, a chatbot developed by Baidu that can interact with people, answer questions and collaborate on creation, is seen as a Chinese competitor to ChatGPT.

Interpreting Qianwen (iFLYTEK):

Qianwen is a large language model launched by iFLYTEK, focusing on machine translation and cross-language understanding, providing high-quality translation services.

Tiangong Kaiwu (Huawei):

Tiangong Kaiwu is a large-scale pre-trained language model launched by Huawei, which aims to improve the ability of machines to understand and generate natural language.

iFLYTEK Spark (iFLYTEK):

iFLYTEK Xinghuo is another important language model launched by iFLYTEK, which is mainly used for speech recognition and speech synthesis, strengthening iFLYTEK's leading position in the field of speech technology.

Running logic

LLMs are neural network models trained through deep learning and are capable of performing a variety of language tasks, such as text generation, translation, summarization, question answering, and more.

(1) Model core

Most LLMs are based on a neural network architecture called a "Transformer". The Transformer architecture implements language understanding and generation based on the context semantics obtained by Encoder encoding, and then realizes language understanding and generation through Decoder multi-round attention decoding.

(2) Training process

Pre-training: Unsupervised pre-training on massive text data to learn the statistical rules and language representation of texts. Pre-training improves the model's ability to understand the language.

Fine-tuning: Further training for specific tasks (e.g., translation, summarization, etc.), using a small amount of labeled data to fine-tune the pretrained model to make it perform better in specific domains.

(3) Working principle

Input: The user enters a piece of text, such as a question or prompt. Processing: The model processes the input text through a self-attention mechanism to understand the context and semantics. Output: The model generates a response, which could be an answer, continued text, or other relevant information.

Application scenarios

LLMs have many practical applications.

(1) Text generation

LLM can generate natural language based on the prompts given by users, such as a series of written content such as copywriting, novels, scripts, questionnaires, etc. The more detailed the prompts given by the user, the higher the quality of the content generated.

(2) Knowledge base answers

The technology, often referred to as knowledge-intensive natural language processing (KI-NLP), refers to LLMs that can help answer specific questions based on information from a digital archive.

(3) Search

LLMs can understand language and find relevant results more deeply than traditional search engines. It not only allows users to enter keywords, but also supports long and short sentences, as well as specific and specific questions.

(4) Machine translation

LLM can automatically extract keywords, phrases and other features from the source language text data to better understand the semantics and structure of sentences, and improve the accuracy and fluency of machine translation.

(5) Code generation

LLMs are good at generating code based on natural language prompts, such as code for programming languages such as JavaScript, Python, PHP, Java, and C#.

(6) Text classification

Using clusters, LLMs can classify text with similar meanings or sentiments. Uses include measuring customer sentiment, determining relationships between texts, and document searches.

Special note: This article is only for learning and exchange, if there is anything wrong, please contact the editor in the background.

- END -

Original source: CAT Course Showcase - Zhang Entong, Luo Shaowen - 2023

Tweet editors: Zhang Entong, Luo Shaowen

Technical Science - Overview of Large Language Models

definition

History

Running logic

Application scenarios

Read on

Use LM Studio to deploy local AI large language models with one click

With 3 times the sensitivity, it only takes a few seconds to search for millions of protein pairs, and Fudan and others have developed new language models

8.3K Stars!

Meta Researchers Crack the Curse of Large Model Reversal and Launch "Language Model Physics"

Decoding AI: Demystifying the "brain" of chatbots - large language models

Predicting protein co-regulation and function, Harvard & MIT trained a genomic language model

Intel has made important progress in the field of artificial intelligence accelerators, and its subsidiary HabanaLabs is in

Researchers propose a new concept of artificial intelligence that allows large language models to interact with the real physical world

Llama 3: The next frontier of open-source large language models

The secret of using large language models: How to control AI with efficient prompt words?

Apple has been exposed to a big move again, self-developed device-side large language model, AI is a new way out of "revitalization"?

No wonder the previous iPhone 16 series national version of the AI function will be provided by Baidu, the original Baidu in the Chinese artificial intelligence invention patent enterprise ranking is still high. Ranked in the top 10

Apple released OpenELM, an efficient language model based on an open-source training and inference framework

Solomonov: The Prophet of Large Language Models

Large Language Model Deployment: vLLM and Quantization

Apple launches OpenELM, an efficient language model, Xiaomi plans a new car for 150,000 yuan, and AI successfully rewrites human DNA

Effect of arc resistance model on fast transient overvoltage in 1000 kV GIS

The large-scale model product NETA "Qiankun Circle" will be on the car, and Nezha Automobile will enter the era of smart cars 2.0

GPT-4 Turbo-level domestic large model debuted, and Zhou Guanyu's F1 race data analysis stunned the big guy

iFLYTEK Xinghuo launched the first intelligent twins platform, agilely reaching the last mile of large-scale model application enterprises

Byte released ViTamin, a visual foundation model, and achieved SOTA for multiple tasks, which was selected for CVPR2024

Peking University | CLIP model semantic information and 3DGS, real-time and accurate semantic understanding of 3D scenes

Turn in | OccGen: A new breakthrough in generative 3D semantic occupancy prediction models in the field of autonomous driving

Whispering (52): Intensive reading of journal articles - model building and model analysis

Supporting 13 billion parameter large models to lead the industry, MediaTek released the strongest intelligent cockpit chip

"Mode" does not care about or "model" is the opposite of the opposite: on the development trend of the traffic model in the troubled times YEF2024

iFLYTEK Xinghuo launched the first intelligent twins platform to agilely reach the last mile of the landing of large model application enterprises

Stardust Intelligence released an AI robot with full marks in operation ability and large model blessing

Hima Practice: The Way of Audio Editing in the Model Era - Cloud Editing Word Editing

Hima Advertising Algorithm Optimization Practice (1): The Evolution of Advertising CVR Model

When "elderly care" meets AI models

STAR Model: Unlock the Four Golden Keys to Success in Life