laitimes

Technical Science - Overview of Large Language Models

author:Translation technology is a thousand questions

definition

Large language models (LLMs) are very large deep learning models that are pre-trained based on large amounts of data. The underlying converter is a set of neural networks that consist of encoders and decoders with self-attention capabilities. Encoders and decoders extract meaning from a range of texts and understand the relationships between words and phrases within them.

Converter LLMs are capable of unsupervised training, but a more precise explanation is that converters can perform autonomous learning. Through this process, the converter learns to understand basic grammar, language, and knowledge.

Unlike earlier recurrent neural networks (RNNs) that processed inputs sequentially, transformers process entire sequences in parallel. This allows data scientists to train transformer-based LLMs using GPUs, dramatically reducing training time.

With transformer neural network architectures, you can work with very large-scale models, often with hundreds of billions of parameters. This large-scale model can ingest large amounts of data, usually from the internet, but can also ingest data from sources such as Common Crawl, which contains more than 50 billion web pages, and Wikipedia, which has about 57 million pages.

History

According to Wikipedia, the history of natural language processing (NLP) is closely related to the development of large language models. Here are some of the key historical moments:

(1) Early exploration:

The history of machine translation dates back to the 17th century, when philosophers such as Leibniz and Descartes proposed coding about the relationship between words between languages. In 1950, Alan Turing published his famous article "Computing Machines and Intelligence", proposing the standard of intelligence now known as the Turing test.

(2) Early success of NLP systems:

In the 1960s, some notable NLP systems emerged, such as SHRDLU, a natural language system that worked in a restrictive "block world".

In 1970, William A. Woods introduced Enhanced Transition Networks (ATNs) for representing natural language input.

(3) Introduction of machine learning:

At the end of the 1980s, there was a revolution in the field of NLP, with the introduction of machine learning algorithms for language processing.

During this period, research increasingly focused on statistical models that made soft, probabilistic decisions by weighting real values based on the characteristics of the input data.

(4) Recent research trends:

Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms that are able to learn from data that is not manually labeled.

(5) International platform development

Google's BERT (2018):

BERT(Bidirectional Encoder Representations from Transformers)是谷歌推出的模型,采用了Transformer架构,特别在理解语言上下文方面取得了突破。

OpenAI's GPT series (from 2018):

The GPT (Generative Pre-trained Transformer) series, from GPT to GPT-3, has gradually improved the scale and capabilities of the model, especially GPT-3 is known for its huge parameter scale and wide application capabilities.

GitHub Copilot(2021年):

GitHub Copilot is an AI programming assistant jointly developed by GitHub and OpenAI, based on OpenAI's Codex model, and trained specifically for programming languages. It is capable of automatically generating code snippets based on comments and supports multiple programming languages.

Anthropic的Claude:

Claude is a large language model developed by Anthropic that was designed with a special focus on security and interpretability, with the aim of creating a more reliable and ethical AI.

(6) Platform development in China

Baidu's ERNIE series:

ERNIE(Enhanced Representation through Knowledge Integration)是百度推出的一系列模型,特别在中文NLP任务中表现出色。

Wenxin Yiyan (Baidu):

Wenxin Yiyan, a chatbot developed by Baidu that can interact with people, answer questions and collaborate on creation, is seen as a Chinese competitor to ChatGPT.

Interpreting Qianwen (iFLYTEK):

Qianwen is a large language model launched by iFLYTEK, focusing on machine translation and cross-language understanding, providing high-quality translation services.

Tiangong Kaiwu (Huawei):

Tiangong Kaiwu is a large-scale pre-trained language model launched by Huawei, which aims to improve the ability of machines to understand and generate natural language.

iFLYTEK Spark (iFLYTEK):

iFLYTEK Xinghuo is another important language model launched by iFLYTEK, which is mainly used for speech recognition and speech synthesis, strengthening iFLYTEK's leading position in the field of speech technology.

Running logic

LLMs are neural network models trained through deep learning and are capable of performing a variety of language tasks, such as text generation, translation, summarization, question answering, and more.

(1) Model core

Most LLMs are based on a neural network architecture called a "Transformer". The Transformer architecture implements language understanding and generation based on the context semantics obtained by Encoder encoding, and then realizes language understanding and generation through Decoder multi-round attention decoding.

(2) Training process

Pre-training: Unsupervised pre-training on massive text data to learn the statistical rules and language representation of texts. Pre-training improves the model's ability to understand the language.

Fine-tuning: Further training for specific tasks (e.g., translation, summarization, etc.), using a small amount of labeled data to fine-tune the pretrained model to make it perform better in specific domains.

(3) Working principle

Input: The user enters a piece of text, such as a question or prompt. Processing: The model processes the input text through a self-attention mechanism to understand the context and semantics. Output: The model generates a response, which could be an answer, continued text, or other relevant information.

Application scenarios

LLMs have many practical applications.

(1) Text generation

LLM can generate natural language based on the prompts given by users, such as a series of written content such as copywriting, novels, scripts, questionnaires, etc. The more detailed the prompts given by the user, the higher the quality of the content generated.

(2) Knowledge base answers

The technology, often referred to as knowledge-intensive natural language processing (KI-NLP), refers to LLMs that can help answer specific questions based on information from a digital archive.

(3) Search

LLMs can understand language and find relevant results more deeply than traditional search engines. It not only allows users to enter keywords, but also supports long and short sentences, as well as specific and specific questions.

(4) Machine translation

LLM can automatically extract keywords, phrases and other features from the source language text data to better understand the semantics and structure of sentences, and improve the accuracy and fluency of machine translation.

(5) Code generation

LLMs are good at generating code based on natural language prompts, such as code for programming languages such as JavaScript, Python, PHP, Java, and C#.

(6) Text classification

Using clusters, LLMs can classify text with similar meanings or sentiments. Uses include measuring customer sentiment, determining relationships between texts, and document searches.

Special note: This article is only for learning and exchange, if there is anything wrong, please contact the editor in the background.

- END -

Original source: CAT Course Showcase - Zhang Entong, Luo Shaowen - 2023

Tweet editors: Zhang Entong, Luo Shaowen

Read on