With 137 billion parameters and close to the human level, Google dialogues with the AI model LaMDA released a paper

Reports from the Heart of the Machine

Editors: Du Wei, Chen Ping

Google's LaMDA has a conversational quality that is close to human.

Language models can accomplish different tasks, such as translating one language into another, summarizing long documents into short summaries, and so on. Among the many tasks, open domain dialogs can be one of the most difficult because open domain dialogs require models to cover different topics. In conversational tasks, the model should follow Responsible AI practices and avoid making factual statements that are not backed up by external sources of information.

Recently, more than 50 Google researchers co-authored the paper "LaMDA: Language Models for Dialog Applications" to introduce the latest progress of the language model LaMDA. The paper outlines how they are making progress in secure, reliable, and high-quality conversational applications. LaMDA is built by fine-tuning a series of Transformer-based neural language models specifically for conversation, with up to 137B parameters, and the models can also make conversations using external knowledge sources.

With 137 billion parameters and close to the human level, Google dialogues with the AI model LaMDA released a paper

Address of the paper: https://arxiv.org/pdf/2201.08239.pdf

Romal Thoppilan, one of the authors of the paper from Google Brain, said: "The LaMDA model is trained with up to 137B parameters, and it demonstrates a near-human level of conversation quality and significant improvements in safety and factual basis.

Goals and measures

Guiding the training of a conversational model includes two crucial factors: goals and metrics. LaMDA has three main goals – quality, safety and Groundedness.

Quality: Google breaks down quality into three dimensions, sensibleness, specificity, Interestingness (SSI), which are evaluated by human evaluators.

Rationality refers to whether the model produces responses that are meaningful in the context of the conversation (e.g., no common sense errors, no absurd responses, and no contradictions with previous responses);

Specificity is measured by judging whether the system's response is specific to the context of the preceding conversation, rather than a generic response that applies to most contexts;

Fun is a measure of whether the model produces an insightful, unexpected, or witty response, and is therefore more likely to create a better conversation.

Security: Google has also made significant progress in developing and deploying Responsible AI. Its security metrics consist of an illustrative set of security goals that capture the behavior that the model should demonstrate in a conversation. These goals attempt to limit the output of the model to avoid any unexpected outcomes that could cause harm to the user and to avoid exacerbating unfair biases.

Fundamental: Current generations of language models often produce statements that seem plausible but actually contradict known external facts. This inspired Google's fundamental research into LaMDA. Casual responses that don't carry any real-world information affect information, but they don't affect the underlying nature. While establishing a Response generated by LaMDA in a known source does not by itself guarantee the accuracy of the facts, it allows a user or an external system to judge the validity of the response based on the reliability of its source.

LaMDA pre-trained and fine-tuned

After defining goals and metrics, Google described a two-stage training for LaMDA: pre-training and fine-tuning.

LaMDA pre-training

During the pre-training phase, Google first collected and created a dataset of 1.56T words from public conversation data and other public web documentation, nearly 40 times the amount of words used to train previous conversation models. After marking the dataset as a 2.81T SentencePiece token, Google pre-trained the model using GSPMD to predict all the next tokens in the sentence. Pre-trained LaMDA models have been widely used in Google's natural language processing research, including program synthesis, zero-sample learning, style transfer, and more.

LaMDA fine-tuning

During the fine-tuning phase, Google trains LaMDA, performs hybrid build tasks to generate natural language responses to a given context, performs classification tasks about whether the responses are safe and high-quality, and ultimately generates a multitasking model that can do both. The LaMDA generator is trained to predict the next token on the dialog dataset of the back-and-forth conversation between two authors, and the LaMDA classifier is trained to predict the safety and quality (SSI) rating of the response generated in context using annotated data.

During a conversation, the LaMDA generator first generates several candidate responses given the context of the current multi-round conversation, and then LaMDA predicts the SSI and safety score for each candidate response. Candidate responses with low secure scores are filtered out first, and the remaining candidate responses are re-ranked based on SSI scores and the highest score is chosen as the final response. Google uses the LaMDA classifier to further filter out the training data used to generate the task to increase the density of high-quality candidate responses.

LaMDA generates and scores a candidate response.

LaMDA handles arbitrary user input in a reasonable, specific, and interesting way.

The foundation of the facts

While one is able to use tools and refer to an established knowledge base to detect facts, many language models use only internal model parameters to acquire knowledge. To improve the foundationality of LaMDA's original response, Google collected and created datasets of conversations between humans and LaMDA that were annotated using search queries and results where applicable. Google then fine-tuned LaMDA's generators and classifiers on this dataset to learn to invoke external information retrieval systems during user interactions and improve the fundamentals of responses. While the work is still in a very early stage, Google sees promising results.

Zero Sample Domain Adaptation: An example of a LaMDA conversation that looks very realistic and pretends to be Mount Everest. The results show that the subject of the dialogue, Mount Everest, provides an educational and factually correct response.

assess

To quantify progress based on its own key metrics, Google collects responses from pre-trained models, fine-tuned models, and human evaluators (i.e., human-generated responses) to multiple rounds of double-author conversations, and then asks different human evaluators a series of questions to evaluate those responses based on quality, safety, and underlying metrics.

Google has observed that LaMDA is significantly superior to pre-trained models in every dimension and at all model sizes, and that quality measures such as reasonableness, specificity, and interest often increase with the amount of model parameters, whether fine-tuned or not. Security doesn't seem to benefit from model scaling alone, but it does improve by fine-tuning. As the size of the model increases, so does the foundationality, perhaps because larger models have a greater ability to remember uncommon knowledge, but fine-tuning allows the model to access external knowledge sources and effectively shift the load of remembering knowledge to external knowledge sources. Fine-tuning can also close the quality gap with the human level, although the model still performs less well than humans in terms of safety and groundwork.

Compare pre-trained models (PTs), fine-tuning models (LaMDA), and human evaluator-generated conversations (Human) in terms of rationality, specificity, interest, safety, fundamentality, and informationality.

Use Python to quickly build an NVIDIA RIVA-based intelligent Q&A bot

NVIDIA Riva is an SDK that uses GPU acceleration to rapidly deploy high-performance conversational AI services for rapid development of speech AI applications. Riva is designed to provide easy, fast access to conversational AI capabilities, out of the box, and to quickly build high-level conversational AI services with a few simple commands and API operations.

January 26, 2022, 19:30-21:00, the latest issue of online sharing mainly introduces:

Introduction to Conversational AI and NVIDIA Riva

Build a speech recognition module with NVIDIA Riva

Build intelligent Q&A modules with NVIDIA Riva

Build speech synthesis modules with NVIDIA Riva

With 137 billion parameters and close to the human level, Google dialogues with the AI model LaMDA released a paper

Read on