天天看點

kestrel 威脅狩獵——通過流量和程序發現異常

Practicing Backward And Forward Tracking Hunts on A Windows Host

Xiaokui Shu and Ian Molloy  ·  August 16, 2021  ·  15 min read

In our previous blog post, we showed how to get started with the Kestrel Threat Hunting Language, such as connecting to data sources and performing your first hunts using the 

GET

 and 

FIND

 commands. In this post, we’ll introduce the 

APPLY

 keyword, which adds powerful analytics and enrichment capabilities to hunts.

We will show a Kestrel hunt performing backward and forward tracking on a Windows host to unearth the root cause and impact of activities related to an IP address. We will walk the process tree and extend analysis to network traffic and IP addresses, and we will use both pattern matching and analytic hunting steps to build the huntflow.

Table of Content

  1. Hunting Environment
  2. Start From An IP
  3. From IP to Process
  4. Backward Tracking
  5. Forward Tracking
  6. Applying An Analytics

Kestrel Installation And Monitoring System Setup

If you’re unfamiliar with setting up Kestrel for your environment, start by reviewing our previous blog Building A Huntbook to Discovery Persistent Threats from Scheduled Windows Tasks. There you will find the steps to monitor a Windows host with Sysmon, stream the log data to Elasticsearch via winlogbeat, install Kestrel in a Python virtual environment, and install and configure the STIX-shifter elastic_ecs connector to access the log data.

Entity-based Reasoning From An IP Address

Sysmon traces Windows activities and stores individual events as records in Elasticsearch. Reasoning on records directly requires excessive knowledge of record semantics, which gets worse when querying multiple data sources with different record formats. The Kestrel runtime identifies entities in records and automatically collects all available information about entities in related records before presenting the complete view to the user. Kestrel enables us to hunt through entities, e.g., establishing an IP address entity in a Kestrel variable from hundreds of records, each of which describes one aspect or appearance of the entity, finding all process entities connected to that IP address, and tracing other related entities such as network-traffic, file, or registry-key. This entity-based hunting approach makes it easy for a human to organize hunts, pivot from one hunt step to another, and enables composable huntflow development.

In this tutorial, we will start from the IP address 

72.21.81.200

 and walk the associated process tree to reason about whether the IP address is suspicious.

First, let’s create a Kestrel variable starting from the IP address as an entity:

ip200 = GET ipv4-addr FROM stixshifter://host101
        WHERE [ipv4-addr:value = '72.21.81.200']
        START t'2021-04-01T00:00:00Z' STOP t'2021-04-06T00:00:00Z'
           

We specify the type of entity 

ipv4-addr

 we would like to get from the monitored Windows host 

host101

 and describe the matching criteria using the STIX Pattern 

[ipv4-addr:value = '72.21.81.200']

. We limit the time of the search for the first 5 days in April to limit our search. The time range is required in our first 

GET

 command in a huntflow to set the scope. Otherwise, STIX-shifter defaults to the past five minutes.

If you have setup multiple data sources besides 

host101

, you can use auto-complete in Kestrel to list all available data sources by pressing 

TAB

 after 

stixshifter://

.

kestrel 威脅狩獵——通過流量和程式發現異常

After putting the command in a Jupyter cell and executing it, we get a summary for the code block—one IP address entity and 350 records/logs, each of which has some information about the IP. We don’t bother with the number of records since we are doing entity-based reasoning and Kestrel will assemble the entity from the 350 records for us.

Obtaining Associated Network-Traffic And Processes

Finding hosts connecting to the IP address is good, and pinning down the processes on the hosts that communicate with the IP is even better. To achieve this, we need to connect to an endpoint monitoring system such as Sysmon mentioned in our previous blog post. From an endpoint’s view, a process creates a network-traffic that reaches a remote host or an IP address entity. We can use the FIND command to navigate through connected entities; the Kestrel runtime will generate/execute corresponding data source queries and assemble entities from returned records.

According to the relation chart in the FIND command syntax, we use 

ACCEPTED BY

 relation to describe the IP addresses in 

ip200

 are the destination IPs of the network-traffic returned. We then use 

CREATED

 relation to describe the network-traffic is associated with the process to be returned. The 

DISP

 command will print select attributes of the entities in a Kestrel variable without side effects. Now let’s put all four commands together into a code block and execute it in a Jupyter Notebook cell:

# obtain network traffic that has ip200 as the destination IP
ip200nt = FIND network-traffic ACCEPTED BY ip200
DISP ip200nt ATTR src_ref.value, src_port, dst_ref.value, dst_port

# obtain processes creating the network traffic
p = FIND process CREATED ip200nt
DISP p ATTR pid, name, command_line
           

We get back 350 network-traffic entities in 

ip200nt

 and 34 process entities in 

p

:

kestrel 威脅狩獵——通過流量和程式發現異常

Here is a partial list of the network-traffic entities in 

ip200nt

 out of the 350 entities:

kestrel 威脅狩獵——通過流量和程式發現異常

Are all network-traffic HTTPS (port 443)? If we only display the destination port attribute of entities in 

ip200nt

DISP

 will deduplicate the results before output, so it is easy to find out the answer using another 

DISP

 command in a new cell:

DISP ip200nt ATTR dst_ref.value, dst_port
           

After executing the cell, it is clear we guess it right: all network-traffic to the IP 

72.21.81.200

 are pointing to its port 443:

kestrel 威脅狩獵——通過流量和程式發現異常

Nothing is explicitly suspicious about 

ip200nt

, and let’s move to the results we show about 

p

 in our previous executed block: the 

DISP p

 command shows an abridged list of the 34 associated processes:

kestrel 威脅狩獵——通過流量和程式發現異常

We know all 34 processes in the Kestrel variable 

p

 connected to 

72.21.81.200:443

. Next, we pick up the first process 

BackgroundDownload.exe

 to start walking the process tree to further our understanding of the activities on the Windows host.

Walking Up The Process Tree of BackgroundDownload.exe

Backward tracking is a hunting strategy to walk back the control-flow or data-flow of entities and understand their origin or provenance. The most common task for process entities is to backtrack their control-flow and walk up the process tree to check whether given processes are created by a malicious or potentially compromised process.

Let’s take a close look at a subset of processes with name 

BackgroundDownload.exe

 in 

p

. From their command line we can guess they are benign and belong to Microsoft Visual Studio. Let’s check their parent process to verify this.

# get a subset of entities from variable `p`
bgdownloads = GET process FROM p WHERE [process:name = 'BackgroundDownload.exe']

# obtain parent processes of bgdownloads
bgdp = FIND process CREATED bgdownloads
DISP bgdp ATTR pid, name, command_line
           
kestrel 威脅狩獵——通過流量和程式發現異常

We get four 

BackgroundDownload.exe

 processes from 

p

 and three processes as their parent processes. From the parent process names and executable paths, we are more confident 

BackgroundDownload.exe

 processes are part of Microsoft Visual Studio and spawned by processes from the suite.

Next, let’s see if we can trace back one level to find the grandparent processes:

# grandparent processes of `BackgroundDownload.exe`
bgdpp = FIND process CREATED bgdp
           
kestrel 威脅狩獵——通過流量和程式發現異常

Good, we see two processes, and Kestrel also gets back some records with network activities when trying to get the most complete information of the entities—there are 146 network-traffic records related to the two processes. However, this is only a summary of the command execution, and we are not sure whether the network-traffic entities are directly or indirectly linked to the grandparent processes. Let’s print out details of the grandparent processes and the network traffic:

DISP bgdpp ATTR pid, name, command_line

bgdppnt = FIND network-traffic CREATED BY bgdpp
DISP bgdppnt ATTR src_ref.value, src_port, dst_ref.value, dst_port
           
kestrel 威脅狩獵——通過流量和程式發現異常

Zero network activities: the 146 related 

network-traffic

 records cached could be indirectly associated with the process in 

bgdpp

 such as their parent or child processes.

It is easy to guess the 

devenv.exe

bgdp

 is spawned from 

explorer.exe

bgdpp

 (likely a double click by a human user); 

svchost.exe

services.exe

. We can choose the former to verify:

bgdp_devenv = GET process FROM bgdp WHERE [process:name = 'devenv.exe']
bgdp_devenv_parent = FIND process CREATED bgdp_devenv
DISP bgdp_devenv_parent ATTR pid, name, command_line
           
kestrel 威脅狩獵——通過流量和程式發現異常

Bingo. And we can walk the process chain up of 

bgdp_devenv_parent

# Let's go further up to pull out parent process of `bgdp_devenv_parent`.
ppp = FIND process CREATED bgdp_devenv_parent
DISP ppp ATTR pid, name, command_line
           
kestrel 威脅狩獵——通過流量和程式發現異常

The 

explorer.exe

 is spawned by 

svchost.exe -k DcomLaunch -p

, which is the Windows DCOM Server Process Launcher and this behavior is expected. The DCOM launcher is the great grandparent of 

BackgroundDownload.exe

, and it is one of the core Windows system services. We could stop here, but we can also try to trace back further to see what is the limit of the Sysmon monitor regarding its visibility into the very early phases of system bootup—of course, Sysmon can only see things after it is started/spawned itself by a Windows service process.

# Let's see how far we can go to the origin of the process tree in sysmon
pppp = FIND process CREATED ppp
DISP pppp ATTR pid, name, command_line
           
kestrel 威脅狩獵——通過流量和程式發現異常

OK. We just hit the limit of Sysmon, which does not log the birth of 

ppp

Forward Tracking From iexplore.exe

After backward tracking and finding some interesting branches in the process tree, we can perform forward tracking, or walk down the tree, to check other activities from the entities. This is a common hunting strategy to understand the impacts of an entity.

We start from processes that talk to IP address 

72.21.81.200

, and we already have all such processes in variable 

p

. Let’s list all process names in 

p

DISP p ATTR name
           
kestrel 威脅狩獵——通過流量和程式發現異常

In the last section, we find the parent process of 

BackgroundDownload.exe

. We could find the siblings of 

BackgroundDownload.exe

 by walking down the process tree from 

bgdp

 (the parent processes of 

BackgroundDownload.exe

). We can also go beyond the process tree and forward track files, network-traffic, registry-keys and even further to other processes via files (data-flow analysis).

Let’s try a simple task: start from the 

iexplore.exe

 processes in 

p

 to (i) walk down their process tree if they fork processes, and (ii) go beyond the process tree to network activities at leaf processes.

# first walk down the tree from the iexplore processes in `p`
ie = GET process FROM p WHERE [process:name = 'iexplore.exe']
DISP ie ATTR pid, name, command_line

ie_children = FIND process CREATED BY ie
DISP ie_children ATTR pid, name, command_line
           
kestrel 威脅狩獵——通過流量和程式發現異常

As shown in the execution summary, the Kestrel variable 

ie

 contains two process entities with pids 7356 and 11368. There is only one process in 

ie_children

 with pid 8508. Let’s check its network-traffic as we discussed:

# second let's go beyond the process tree for network activities of the child process
ient = FIND network-traffic CREATED BY ie_children
DISP ient ATTR dst_ref.value, dst_port
           
kestrel 威脅狩獵——通過流量和程式發現異常

Entity Enrichment With Kestrel Analytics

We find 76 network traffic entities from the IE process with pid 8508 shown above, all of which look like web connections. However, could a malicious C&C sever hide in the list? Usually attackers do not use IP addresses directly for C&C, but domain names created by domain generation algorithms (DGA). If we can enrich the IP addresses with their domain names, we may discover something malicious.

Enriching entities is another type of hunting steps besides pattern matching, and Kestrel supports such hunting steps as analytics. A Kestrel analytics is given all records of a list of entities, runs pre-programmed logic on the entities, checks with external threat intelligence, or matches entities with a pre-trained machine learning model. Finally, the analytics generates new attributes for the entities and gives them back to the Kestrel runtime to complete enrichment.

The analytic we use here is domainnamelookup in the Kestrel analytics repository. It is a Kestrel analytic executed via the docker interface. To use the analytics, first clone the repo and go to the domainnamelookup analytics directory. Then do docker build:

$ docker build -t kestrel-analytics-enrichdomain .
           

The analytic is now available as enrichdomain in Kestrel. Kestrel calls analytics using the 

APPLY

 command (more details in documentation). After typing 

APPLY docker://

, we can press 

TAB

 to list all available analytics. If enrichdomain does not show up, restart the Jupyter kernel to re-initialize the Kestrel kernel and its analytics interface manager.

# next let's apply the analytics to all entities in `ient`.
APPLY docker://enrichdomain ON ient

# print out all attributes including the ones added/enriched by `enrichdomain`
INFO ient
           
kestrel 威脅狩獵——通過流量和程式發現異常

Looking for attributes about domain names, we find two attributes newly added by the analytic: x_domain_name and x_domain_organization. We can now 

DISP

DISP ient ATTR dst_ref.value, dst_port, x_domain_name, x_domain_organization
           

No domain names here appear to be generated by naive DGA (examples from the article Domain Generation Algorithms – Why so effective?). Not bad to rule out a threat.

Stretch Hunts

We hope you enjoyed this tutorial on how to use the Kestrel Threat Hunting Language to extend your searches and pivot between entity types to perform provenance tracking and impact analysis. More can be done by backward and forward tracking control-flow (through the process tree) and data-flow (through files) and applying other analytics in the hunts, and we hope to bring more powerful capabilities in future releases.

  1. Kestrel 1.0 has just process–executable relation for files, which is not very powerful for data-flow tracking. In the future, Kestrel will support STIX 2.1 with SROs including universal file type support for more powerful data-flow tracking.
  2. In the Kestrel analytics repository, there is an example analytic SANS IP Enrichment that enriches network traffic entities with IOC information provided by SANS.
  3. It is easy to build your own analytics, especially the ones run as docker containers. Check out the analytics template to start, and watch out for a future blog post to guide you in detail!

Until next time, happy threat hunting!

繼續閱讀