Use large language models to automate threat intelligence analysis workflows in your security operations center

2024-08-02 10:50:00

本次分享论文：Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers

Basic Information

Original Authors:PeiYu Tseng, ZihDwo Yeh, Xushu Dai, Peng Liu

作者单位：Penn State University, State College, PA, 16801

关键词：LLMs, agent, threat intelligence analysis

Original link: https://arxiv.org/pdf/2407.13093

Open source code: Not available

Thesis Essentials

Introduction: SIEM systems play a key role in the Security Operations Center (SOC), which monitors and analyzes cyber threats. However, current SIEM systems are unable to automate the processing of cyber threat intelligence (CTI) reports written in natural language, resulting in analysts having to spend a lot of time on manual analysis. This paper proposes an AI agent that utilizes large language models (LLMs, such as GPT-4) to automate repetitive tasks in CTI reports. Through a four-step filtering process, the agent generates accurate regular expressions and provides a graph that helps SOC analysts respond to threats faster and more accurately. This innovation significantly reduces analysts' workload and improves the efficiency and responsiveness of the SOC.

Objective: This paper aims to solve the problem that the current SIEM system cannot automatically process the cyber threat intelligence (CTI) reports written in natural language. By developing an AI agent that utilizes large language models (LLMs, such as GPT-4), it automates the analysis tasks of CTI reports to reduce the workload of analysts. By extracting important information, generating regular expressions, and building a threat intelligence graph, the agent helps security operations centers (SOCs) improve efficiency and speed response to cyberattacks.

Research Contributions

1. A new AI agent is proposed to automatically extract important information from CTI reports and generate regular expressions (Regex).

2. To ensure the accuracy of the generated Regex, the investigator employs a four-step filtering process to rule out potential false positives and false negatives.

3. The AI agent can also provide a diagram that depicts the connections between different threat intelligence in the CTI report.

4. This project proposes for the first time an AI agent that does not require any human intervention, leveraging the revolutionary capabilities of LLMs to achieve a high degree of automation of CTI analysis workflows.

introduction

Cybercrime causes huge economic losses to the world every year, with consumers and businesses in the United States losing more than $12.5 billion in 2023 alone. To combat these threats, organizations are increasingly relying on security operations centers (SOCs), with SIEM systems becoming their core tools. SIEM systems help detect attacks with a real-time correlation engine, but they still rely on analysts for extensive manual analysis when it comes to cyber threat intelligence (CTI) reports written in natural language. This process is not only time-consuming, but also increases the response time to attacks.

While there have been some studies that use machine learning techniques to automatically extract information from security documents, these domain-specific AI models are limited in their ability to deal with diverse and ever-changing attack techniques. Therefore, this paper proposes an AI agent that uses large language models (such as GPT-4) to automate the processing of repetitive tasks in CTI reports, thereby improving the efficiency of SOC and reducing the workload of analysts.

Research Methods:

The AI agent proposed in this paper automates the processing of cyber threat intelligence (CTI) reports in eight steps.

Firstly, the CTI report was segmented by paragraphs, and large language models (LLMs) were used to extract the indicators of attack (IOCs) in each paragraph.

Secondly, the LLMs were run multiple times and the voting mechanism was adopted, combined with the retrieval enhancement filtering, to purify the extraction results. Then, a regular expression (Regex) was generated by retrieving an enhanced matching mechanism to distinguish between capture groups and non-capture groups.

Thirdly, LLMs are used to identify dependencies between IOCs, classify and verify each dependency.

Finally, a threat intelligence diagram is constructed to show the connections between IOCs. Through these steps, AI agents are able to automatically extract key information from CTI reports, generate accurate regex, and provide a relationship graph to help security operations centers (SOCs) respond quickly and efficiently to cyber threats.

Overview of AI agents

The researcher's AI agent workflow is divided into two parts. First, the researchers segmented the CTI report and used LLMs (such as GPT-4) to extract the indicators of attack (IOC) in each segment. Second, the filtration cleanup response is enhanced by multiple LLM runs and retrieval. Thirdly, the researchers distinguish between capture and non-capture groups in IOC strings, and generate regular expressions (Regex) for SIEM rules, which are verified by the Regex tester. The researchers then identify the dependencies between the IOCs, classify them, and validate them. Finally, the researchers construct a diagram to show the connections between the IOCs. This process effectively solves several technical challenges in automating the processing of CTI reports, improving the efficiency and accuracy of the SOC.

Study evaluation

The researchers tested the AI agent on more than 50 cyber threat intelligence (CTI) reports, and the results showed that the agent was effective in identifying and dealing with a large number of indicators of attack (IOCs). In the experiment, LLMs identified more than 2,900 potential IOCs, and about 2,300 valid IOCs were identified through sanitization, including file names, domain names, hashes, IP addresses, command line and registry keys.

The researchers found that the AI agent generated about 2,200 regular expressions (Regex) and successfully constructed a threat intelligence graph. Compared to the real situation of manual recognition, AI agents only miss 3% of IOCs. These results show that AI agents not only significantly improve the efficiency of CTI report processing, but also significantly reduce the workload of analysts, helping security operations centers (SOCs) respond more quickly to cyberattacks.

Conclusion of the dissertation

This paper proposes an innovative AI agent designed to automate the processing of repetitive tasks in cyber threat intelligence (CTI) reports. By harnessing the power of large language models (LLMs, such as GPT-4), the agent is able to accurately extract important information from CTI reports, generate regular expressions (Regex), and build threat intelligence graphs. This not only reduces the workload of security analysts, but also significantly improves the efficiency and responsiveness of the security operations center (SOC).

Experimental results show that the AI agent is efficient and accurate in identifying and processing Indicators of Attack (IOCs). Overall, the research in this paper provides an effective solution for automating CTI analysis workflows, and has a wide range of application prospects.

Original author: Interpreting agents of the paper

Proofreading: Little Coconut Wind