作者 | Echo Tang, GPT-4o Production丨AI Technology Base Camp (ID: rgznai100)
When ChatGPT came out, many people were discussing whether AI was going to replace programmers, and where would the end point of the technology be?
I've discussed this topic with some software engineering experts before, and they have suggested that the end point of the technology will be whether GPT will have the ability to evolve itself.
I never expected it to come like this all of a sudden.
In the early morning of June 28, Beijing time, immediately after Google officially released Gemma 2, OpenAI launched a GPT-4-based model, CriticGPT, which is designed to help humans evaluate and detect errors in the code output generated by large language models (LLMs). CriticGPT is trained to generate natural language feedback that can point out problems in the code, and when it comes to detecting naturally occurring LLM errors, the reviews it generates are more popular than human reviews, with an accuracy rate of 63%.
In a word, OpenAI has implemented GPT-4 to fix bugs for GPT-4 itself, and in many cases the effect is better than that of humans.
After OpenAI testing, it was found that when people used CriticGPT to review ChatGPT's code, they performed 60% better than when it was not helpful. OpenAI said, "We are integrating CriticGPT-like models into our RLHF annotation pipeline to provide explicit AI assistance to our trainers." This is a step towards being able to evaluate the output of advanced AI systems that may be difficult to evaluate for someone who doesn't have a better tool. ”
CriticGPT 因何而来?
According to OpenAI, as ChatGPT has become more accurate as OpenAI has advanced in inference and model behavior, its errors have also become more subtle. This makes it harder for AI trainers to spot inaccuracies when errors occur, making it more difficult to drive the comparison task of RLHF. This is a fundamental limitation of RLHF and can make the model progressively more knowledgeable than anyone who can provide feedback, making it more difficult to align the model.
To solve this challenge, OpenAI trained CriticGPT to write criticisms that highlight inaccuracies in ChatGPT's answers.
CriticGPT's suggestions aren't always correct, but they can help trainers discover more models answering questions than they would without the help of AI. Additionally, when people use CriticGPT, the AI enhances their skills, resulting in more comprehensive criticism than when people were working alone, and producing fewer fictitious errors than when the model was working alone. In OpenAI's experiment, the second randomized trainer preferred criticism from the Human+CriticGPT team over unassisted people more than 60% of the time.
CriticGPT helps trainers write more comprehensive criticisms than they would if they weren't, while generating fewer fictional questions than if the model was criticized individually.
CriticGPT 背后的实现方法
CriticGPT is also trained by RLHF, similar to ChatGPT. But unlike ChatGPT, it sees a large number of inputs that contain errors, which then need to be criticized. OpenAI asks AI trainers to manually insert these errors into the code written by ChatGPT and then write sample feedback as if they had just discovered the errors they inserted themselves. The same person then compares multiple critiques of the modified code so that they can easily tell if the critique catches the bug they inserted.
OpenAI looked at whether CriticGPT was able to spot insertion errors and "naturally occurring" ChatGPT errors found by previous trainers. It was found that in 63% of cases, trainers preferred CriticGPT's criticism of naturally occurring errors, in part because the new critical model produced fewer "nitpicking" (useless little complaints) and fewer fictional questions.
At the same time, OpenAI also found that longer and more comprehensive criticisms could be generated by using additional test-time searches compared to criticism reward models. This search program makes it possible to balance the effort to find problems in the code and to make precision-recall trade-offs between fictitious problems and the number of errors found. This means that criticisms can be generated that are as helpful to the RLHF as possible.
CriticGPT 的局限
First of all, it is still a problem of the relative commonality of models. According to OpenAI's official disclosure, they trained CriticGPT on ChatGPT answers that answer relatively short ones. In order to oversee future agents, methods need to be developed to help trainers understand long and complex tasks.
Secondly, there is still the problem of hallucinations. Models can still make fictions, and sometimes trainers make annotation mistakes when they see them.
In addition, sometimes real-world errors may be distributed across many parts of the answer, and scattered errors will need to be addressed in the future.
Finally, the current CriticGPT help is limited: if the task or answer is extremely complex, even an expert with the help of a model may not be able to evaluate it correctly.
The 2024 Global Software R&D Conference (SDCon), co-hosted by CSDN and Boolan, will be held on July 4-5 at The Westin Beijing.
Led by Chris Richardson, a world-renowned software architect and technology pioneer in the field of cloud native and microservices, and Daniel Jackson, associate director of MIT's Computer and AI Lab (CSAIL) and ACM Fellow, technical experts from Huawei, BAT, Microsoft, ByteDance, JD.com, and other technologies will gather to discuss the latest trends and technical practices of software development.