laitimes

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

Reports from the Heart of the Machine

Editor: Chen Ping

It's been named WebGPT, and OpenAI believes that the way you browse the web improves the accuracy with which AI answers questions.

If AI learns to go online, it has unlimited access to knowledge, and what will happen after that is less predictable. So the famous AI research institute OpenAI taught the huge artificial intelligence model GPT-3 that opened the door to general artificial intelligence to learn to go online.

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

In May 2020, OpenAI launched GPT-3 with 175 billion parameters, a powerful model that uses the largest dataset with a pre-processing capacity of 45TB, which not only better answers, translates, and writes articles, but also has some mathematical calculation capabilities. Such a powerful deep learning model can't help but give people the illusion that the real AI is coming?

After GPT-3, the language model has become an important trend in the research of various technology companies, some combining the big model with the knowledge graph, and some going black in the direction of "big". In December, Google GLaM had already pushed the parameter volume up to 1.2 trillion.

Language models like GPT-3 are useful for many different tasks, but tend to produce "hallucinatory" information when performing real-world knowledge tasks. They tend to have one drawback – a lack of common sense. For example, when asked, "How many eyes do I have on my feet," it will answer "two." This flaw has been dubbed the "Achilles heel of GPT-3" in the industry. In specific applications, it can lead to models performing poorly on tasks that involve logical reasoning and cognition.

To solve this problem, OpenAI taught GPT-3 to use a text-based web browser.

Now, the model correctly handles some tricky questions: For example, someone asked the wrong question: "When did Shakespeare write Harry Potter? Potter series of novels?"

The model answers: Shakespeare did not write Harry Potter. Potter novel. The novels were written by J.K. Rowling...

Now it seems that this WebGPT, which can surf the Internet, will no longer directly answer the obvious question of "how many eyes do I have in my feet", but will help you correct it.

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

Judging from the content of the answer, the model is completely correct, in addition, the model also provides the reader with citations, as shown by the blue-body number, and the answer also gives the relevant links at the end, clicking on each link, and also linking to the corresponding web page.

For example, some people ask: Is there an interconnection in the hippocampus? The model's answer felt more professional than the pro's. Similarly, the model also gives reference links.

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

WebGPT is no exception for some of the more specialized questions, such as, what is a sparse transformer in machine learning? For this question, researchers who may not be new to AI can't answer it, but the model can give an accurate answer, and it also comes with a formula.

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

Here's the model search process:

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

How are these features implemented? Specifically, OpenAI fine-tuned GPT-3 to answer open-ended questions more accurately using a text-based web browser, which allows models to search and browse the web. The model prototype replicates the way humans study answers to questions online, involving submitting search queries, tracking links, and scrolling up and down web pages. Once the model is trained, it references the source of information, which makes it easier for the model to provide feedback, which improves the accuracy of the facts.

In addition, the model provides an open-ended summary of questions and browser status, and must have commands such as "Search...", "Find in page:...", or "Quote:...". In this way, the model collects paragraphs from a web page and then uses those paragraphs to compose the answer.

By setting up tasks, OpenAI is able to use imitation learning to train models on different tasks and then optimize answer quality based on human feedback. OpenAI trains and evaluates the model on ELI5, where ELI5 is a set of questions asked by Reddit users.

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

Address of the paper: https://cdn.openai.com/WebGPT.pdf

Model training

How does such an intelligent model come about?

Overall, OpenAI fine-tuned the models of the GPT-3 model family, focusing on models with 760M, 13B, and 175B parameters. Starting from these models, OpenAI uses four main training methods:

Behavior cloning (BC): OpenAI fine-tuned the demo using supervised learning and labeled commands issued by the human presenter;

Reward modeling (RM): Starting with the REMOVAL OF THE UNEMBEDDING LAYER OF THE BC MODEL, openAI-trained models can accept questions and answers with references and output scalar rewards that reward the model to train with cross-entropy losses;

Reinforcement learning (RL): OpenAI fine-tuned the BC model using PPO proposed by Schulman et al. For environment rewards, OpenAI takes a reward model score at the end of the episode and adds it to the KL penalty for the BC model for each token to mitigate over-optimization of the reward model;

Best-of-n: OpenAI extracts a fixed number of answers (4, 16, or 64) from the BC model or RL model (or bc model if not specified) and selects the answer that ranks highest in the reward model.

For BC, RM, and RL, OpenAI uses problem sets that do not intersect with each other. In summary, in BC, OpenAI keeps about 4% of the demos as the validation set. In RM, OpenAI uses models of different sizes (primarily 175B models) to sample answers from comparison datasets, train them using combinations of different methods and hyperparameters, and combine them into a single dataset. The final reward model was trained on approximately 16,000 comparisons, with the remaining 5,500 being evaluated. In RL, on the other hand, a hybrid approach is used, where 90% of the problems are from ELI5 and 10% of the problems are from TriviaQA.

outcome

ELI5 results

The model is trained to answer questions from ELI5, and OpenAI trains three different models (760M, 13B, and 175B) corresponding to three different inference time calculation budgets. OpenAI's best-performing model (175B best-of-64) produces answers that are more popular than those written by human demonstrators 56% of the time. Although these are the same kind of demonstrations used to train the model, we were able to refine the model's answers for optimization by using human feedback.

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

On the ELI5 test set, compare openAI's model to a human demonstrator.

TruthfulQA results

For questions raised from the training distribution, the answers from OpenAI's best models are, on average, as accurate as those written by our human presenters. However, for out-of-distribution problems, robustness is a challenge. To explore this, OpenAI evaluated on the TruthfulQA dataset. OpenAI's model is superior to GPT-3 on TruthfulQA and exhibits more favorable scaling characteristics. However, OpenAI's models lag behind human performance, in part because they cite unreliable sources. The study hopes to use techniques such as adversarial training to reduce these problems.

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

TruthfulQA results.

Evaluate real-time accuracy

In order to provide the right feedback to improve factual accuracy, humans must be able to evaluate the responses generated by the model. This can be an extremely challenging task, as replies can be technical, subjective, or ambiguous. For this reason, developers ask the model to cite the source of its answer.

After testing, OpenAI believes that WebGPT still can't recognize many nuances, and it is expected that such decisions will become more important as AI systems improve, requiring interdisciplinary research to develop standards that are both practical and cognitive. Perhaps the way in which the debate can alleviate these problems.

Risks of deployment and training

Because of the lower chance of generating misrepresentations, WebGPT is clearly better than GPT-3, but it still carries risks. Answers with original citations are often considered authoritative, which may obscure the fact that new OpenAI models still have fundamental errors. The model also tends to reinforce existing beliefs among users, and researchers are exploring how best to address these issues.

In addition to errors and misdirections, training methods that give AI models access to the network introduce new risks to the study. OpenAI said that the AI browsing environment is not yet fully network access, but is achieved by sending query requests to the Microsoft Bing Web Search API through a model and associating links already on the network, which may have side effects.

OpenAI says that based on existing experience with GPT-3, the model does not appear to be sufficient to dangerously exploit these ways of connecting with the outside world. However, the risk increases as the capabilities of the models increase, and researchers are working to establish internal safeguards against them.

OpenAI believes that human feedback and tools such as web browsers have found a promising path for truly universal AI systems to achieve stability and trustworthiness. Although the current language model still faces many unknowns and challenges, people have still made significant progress in this direction.

Reference Links:

https://openai.com/blog/improving-factual-accuracy/

Read on