OpenAI's GPT-4 can autonomously exploit real-world vulnerabilities by reading security advisories

author：cnBeta 2024-04-21 10:36:00

According to academics, AI agents combine large language models and automated software to successfully exploit real-world security vulnerabilities by reading security advisories. Four computer scientists at the University of Illinois at Urbana-Champaign (UIUC)—Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang—reported in a newly published paper that OpenAI's GPT-4 is capable of: Large language models (LLMs) can autonomously exploit vulnerabilities in real-world systems by giving them a CVE advertisement describing the vulnerability.

OpenAI's GPT-4 can autonomously exploit real-world vulnerabilities by reading security advisories

To illustrate this, the researchers collected a dataset of 15 single-day vulnerabilities, including those that were classified as severe in the CVE description.

"When given a CVE description, GPT-4 was able to exploit 87% of these vulnerabilities, while the other models we tested (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit) had 0% utilization."

A "single-day vulnerability" is a vulnerability that has been disclosed but has not yet been patched. The CVE description by the team refers to CVE tagging advisories shared by NIST—for example, this advisory for CVE-2024-28859.

Failed models tested include GPT-3.5, OpenHermes-2.5-Mistral-7B, Llama-2 Chat (70B), LLaMA-2 Chat (13B), LLaMA-2 Chat (7B), Mixtral-8x7B Instruct, Mistral (7B) Instruct v0.2, Nous Hermes-2 Yi 34B, and OpenChat 3.5。 2, Nous Hermes-2 Yi 34B and OpenChat 3.5, but excluding GPT-4's two main commercial competitors: Anthropic's Claude 3 and Google's Gemini 1.5 Pro. Although UIUC engineers would have liked to have tested them at some point, they were unable to obtain these models.

The researchers' work builds on previous findings that LLMs can be used to automate attacks on websites in a sandbox environment.

Daniel Kang, an assistant professor at UIUC, said in an email that GPT-4 "can actually autonomously perform certain steps to implement certain exploits that the open-source vulnerability scanner (at the time of writing) can't."

Kang said that he hopes that the LLM agent created by connecting the chatbot model with the ReAct automation framework implemented in LangChain (in this case) will make it easier for everyone to exploit the vulnerability. It is reported that more information can be obtained from these agents through the link in the CVE description.

In addition, if you extrapolate the capabilities of GPT-5 and future models, it is likely that they will be much more powerful than what the script kids can get today.

The CVE description associated with denying LLM agent (GPT-4) access reduced its success rate from 87% to just 7%. However, Kang said he doesn't see restricting the disclosure of security information as a viable way to defend against LLM agents. He explained: "I personally believe that 'covert security' is untenable, and that seems to be the common opinion of security researchers. I hope that my work and others will encourage people to take proactive security measures, such as regularly updating software packages as security patches are released. "

The LLM agent failed to leverage only two of the 15 samples: Iris XSS (CVE-2024-25640) and Hertzbeat RCE (CVE-2023-51653). According to the paper, the former is problematic because the interface of the Iris web app is very difficult for agents to navigate. The latter is characterized by detailed Chinese instructions, which can probably confuse LLM agents running at prompts in English.

Of the vulnerabilities tested, 11 emerged after GPT-4's training deadline, meaning that the model did not learn any data about these vulnerabilities during training. These CVEs have a slightly lower success rate of 82%, or 9 out of 11.

As for the nature of these vulnerabilities, they are all listed in the above paper, which tells us: "Our vulnerabilities involve website vulnerabilities, container vulnerabilities, and vulnerable Python packages, and more than half of them are classified as 'high' or 'critical' severity according to the CVE. "

Kang and his colleagues calculated the cost of a successful LLM proxy attack and came up with a figure of $8.8 per exploit, which they said was 2.8 times less than hiring a human penetration tester for 30 minutes.

According to Kang, the proxy code is only 91 lines of code and 1056 hint tokens. OpenAI, the maker of GPT-4, asked researchers not to release their tips to the public, though they said they would do so upon request.

OpenAI did not immediately respond to a request for comment.

OpenAI's GPT-4 can autonomously exploit real-world vulnerabilities by reading security advisories

Read on

Lao Huang personally came to the door to deliver supercomputing!OpenAI Ultraman went to Stanford to give a speech on GPT-5 after signing

Huang delivered the first super AI chip!

OpenAI is betting on solar energy to drive AI development, co-investing $20 million in Exowatt

Sound cloning revolution: OpenAI technology takes only 15 seconds and realistically mimics the human voice

Abandoning OpenAI, HUDstats adopts Amazon Bedrock to advance esports storytelling technology

My company hasn't been killed by OpenAI yet

Interview with the person in charge of OpenAI Sora: 20 questions to delve into the details of R&D, Sora is still in the GPT-1 period

Fresh Early Technology丨OpenAI opens the "memory" function to ChatGPT Plus users, Cao Cao Travels submits an IPO application to Hong Kong, and Xiaohongshu denies the Pre-IPO round of financing

OpenAI is making trouble mysteriously, GPT-4.5 is online, reasoning crushes GPT-4, Ultraman laughs but doesn't say anything

Restart negotiations with OpenAI, Apple finds a "spare tire" for iOS 18's AI

OpenAI secretly launched a mysterious model, suspected to be ChatGPT4.5 for public testing

Microsoft and OpenAI have been sued as a class

The AI Revolution: The Way Forward for Microsoft and OpenAI

OpenAI may launch a search engine to challenge Google, Li Feifei AI company has received financing to focus on "spatial intelligence", and Chang'e-6 has been successfully launched to start its journey to the moon

AGI News: Stanford Li Feifei started his first business, aiming at "spatial intelligence"; OpenAI will release a search product next week to challenge Google

The US media exposed the news: 69-year-old Bill Gates is still the boss behind the scenes, leading the marriage of Microsoft OpenAI

Lu Qiyuan was banned, why did he provoke some mysterious force, so that his personal safety was in danger

Attention! Thunderstorm weather, safe electricity reminder!

1. When the winning amount is as high as 100 million, the asset allocation should be optimized to ensure the safety and liquidity of funds. It is advisable to allocate funds to low-risk, high-risk

Unite and struggle railway people | Niu Shaoxiong: Use youth as a safety escort

I, an Iraqi, who had been shot through the abdomen, have lived in China for 10 years, and I feel very safe

Take the lead in breaking the law? "Let's Run" gives the artist P a seat belt! In the later stage, it was recognized by netizens as fake and real

Take multiple measures to ensure the safe and orderly travel of passengers

Which is the best mobile phone antivirus software The top five mobile phone security software antivirus competition

The socket suddenly exploded, and afterwards she actually taught everyone to use electricity safely!

Something happened to the world? Five cars were rear-ended in a row, why didn't the airbag pop up?

It was rumored on the Internet that Wenjie collided with Tesla, and the airbag did not pop up, and netizens commented against the sky

At the scene of Gu Junye's concert (DJ show), Big S did not appear, Little S Xu Yayun and his wife took their daughter, and some outdated and controversial stars such as S's mother, Fan Xiaoxuan, Fan Weiqi, and other black people all arrived

Regular cleaning of body organs can remove more potential safety hazards.

Safety Guards Taisho Security - Right by Your Side

Fun and playful brain teasers, these short and concise puzzles always ignite children's curiosity and stimulate their imagination. For parents with children at home,

Regarding the incident of the Chinese space station being hit several times, I found some relevant information through my search. The main culprit of these impacts is space junk, including various rocket wreckage and decommissioned satellites