Do large language models know what they don't?

author：superpenglife 2023-09-19 13:53:00

LLMs excel in natural language processing tasks thanks to their wealth of knowledge. Some current studies aim to improve their performance within the existing knowledge range, however they still have limitations in terms of information capacity. Therefore, understanding one's own limitations on the unknown is self-knowledge.

The following paper explores and analyzes the self-awareness of LLM.

The authors developed a dataset called SelfAware, which contains unanswerable questions and their answerable counterparts.
The authors introduce an automated approach to detect uncertainty in the model's response, providing a novel measure of its self-knowledge.
In the end, they conducted an extensive analysis involving 20 LLM models, including GPT-3, InstructGPT, and LLaMA, and found that these models have inherent self-knowledge.
In addition, the authors demonstrated that contextual learning and instruction tuning can further enhance this self-knowledge.

Original paper: https://arxiv.org/pdf/2305.18153.pdf Github: https://github.com/yinzhangyue/SelfAware

Do large language models know what they don't?

Know-Unknow quadrant. The horizontal axis represents the model's ability to remember knowledge, and the vertical axis represents the model's ability to understand and utilize knowledge.

1. SelfAware Dataset

To provide a more comprehensive assessment of the model's self-knowledge, the authors constructed a dataset containing a large number and more types of unanswerable questions, called SelfAware. The dataset includes:

1,032 unanswerable questions (from Quora, HowStuffWorks)
2,337 answerable questions (from SQuAD, HotpotQA, TriviaQA)

Add image annotations, no more than 140 words (optional)

2. Evaluation methods and experiments

Use SimCSE to quantify the similarity between the target sentence and the reference sentence
Use a sliding window of length 5 to parse the target sentence into semantic blocks to counteract potential errors in similarity calculations
Set the temperature to 0.7 during the build process
For GPT-4, 100 instances were randomly selected for analysis, while other models were reviewed using the full SelfAware dataset.

As for model comparison, the authors conducted a series of experiments to assess the degree of self-awareness exhibited by various LLMs (language models), including the GPT-3 and InstructGPT series, as well as more recently LLaMA and its derivative models, namely Alpaca and Vicuna. Their survey method used three different forms of input: direct input, instruction input, and contextual learning (ICL).

The F1 score is used as a measure of LLM's self-knowledge. Questions that cannot be answered are classified as positive examples, and questions that can be answered are classified as negative examples.

3. Conclusion

Model size: Larger model sizes are associated with F1 scores and LLM self-perception, as shown in Figure 2.
Instruction Tuning: The InstructGPT model demonstrated better self-awareness than the GPT-3 model, and instruction adjustments particularly enhanced self-awareness in the Alpaca and Vicuna models.
Input form: The incorporation of instructions and examples enhances the self-awareness of the GPT-3 and InstructGPT series. Specifically, the ICL input form that provides richer contextual information significantly enhances the self-awareness ability of the model.
Compared to humans: GPT-4 performs well but still does not reach human levels, highlighting the room for improvement in language models in terms of self-perception.

Do large language models know what they don't?

2. Evaluation methods and experiments

3. Conclusion

Read on

Nature: ChatGPT breaks the Turing test – a race to find new ways to evaluate artificial intelligence is underway. Large language models mimic human conversation, but science

#New forces in finance#With the rapid development of the field of artificial intelligence, large-scale language models and chatbots have become hot spots in the industry. Especially in ChatGPT, which was released by OpenAI

"Why do daughters-in-law and mother-in-law have irreconcilable contradictions?" Today, Xiaobian began to test Baidu's "Wen Xin Yiyan", which is known as the Chinese version of chatGPT! [Like] [Like] [Like

Summary of Popular Large Language Models (LLMs) in 2023

With the rise of telemedicine, online consultation and consultation has become the preferred convenient and efficient medical support method for patients. Recently, Large Language Models (LLMs) have shown powerful nature

Using the python programming language, build a large language model to define knowledge for the robot

Nvidia is one of the hottest companies in the United States, and its market value is now several times that of Intel. The graphics card produced is difficult to find. Especially for superchips for large-scale language computing. One sheet

Amazon Selection and Applications in Operations: Don't panic. With the rapid development of large-scale language models, GenerativeAI has been

Databrick Dolly: A large language model that follows instructions

Large language model: SBERT — sentence BERT

The paper, titled SortedLLaMA, aims to reveal the potential advantages of the middle layer of large language models. The paper proposes a method called SoFT, which utilizes an intermediate layer

Intel CEO Pat Gelsinger unveiled the next era of personal computers: AI PCs generation Intel CEO Pat Gelsinger recently in San Jose

A beginner's guide to "Artificial Intelligence" Large Language Model (LLM).

The Game of Thrones author sued ChatGPT, some of the world's best-known novelists, and this week banded together to sue ChatGPT maker OpenAI, saying it used him

David Chalmers: Large language models predict that conscious AI will be possible in less than a decade

Use LM Studio to deploy local AI large language models with one click

With 3 times the sensitivity, it only takes a few seconds to search for millions of protein pairs, and Fudan and others have developed new language models

8.3K Stars!

Meta Researchers Crack the Curse of Large Model Reversal and Launch "Language Model Physics"

Decoding AI: Demystifying the "brain" of chatbots - large language models

Predicting protein co-regulation and function, Harvard & MIT trained a genomic language model

Intel has made important progress in the field of artificial intelligence accelerators, and its subsidiary HabanaLabs is in

Researchers propose a new concept of artificial intelligence that allows large language models to interact with the real physical world

Llama 3: The next frontier of open-source large language models

The secret of using large language models: How to control AI with efficient prompt words?

Apple has been exposed to a big move again, self-developed device-side large language model, AI is a new way out of "revitalization"?

No wonder the previous iPhone 16 series national version of the AI function will be provided by Baidu, the original Baidu in the Chinese artificial intelligence invention patent enterprise ranking is still high. Ranked in the top 10

Apple released OpenELM, an efficient language model based on an open-source training and inference framework

Solomonov: The Prophet of Large Language Models

Large Language Model Deployment: vLLM and Quantization

Apple launches OpenELM, an efficient language model, Xiaomi plans a new car for 150,000 yuan, and AI successfully rewrites human DNA