Latte: Large-Scale Lateral Movement Detection
Latte: Large-Scale Lateral Movement Detection
Abstract—The frequency of recent headlines indicates that vulnerability state [1], or for the targeted scenario, e.g., anoma-
attacks on governmental and corporate computer networks are lous behaviors will result in unusual substructural patterns [5].
increasing. Once they infect one computer, the attackers are Also, most previous studies employ datasets that are either
quite likely to explore the network by accessing additional
computers. Such “lateral movement”, i.e., the process attackers small or contain simulated attacks.
use to move from one computer to the next in a compromised The two most important problems that organizations face
network, increases the difficulties of preventing data exfiltration. related to lateral movement include forensics analysis and
To deal with challenges from large-scale data and little knowledge general detection.
of the attackers, we propose Latte, a graph-based detection Forensic Analysis Problem. Forensic analysis is needed when
system to discover potential malicious lateral movement paths.
We model computers and accounts as nodes, and computer-to- an organization determines that its network includes a compro-
computer connections or user logon events as edges. We address mised account or computer which is exhibiting confirmed evi-
the lateral movement problem in two ways. Starting with an dence of a severe threat. In this case, the infected organization
infected computer or account, forensic analysis quickly identifies must quickly mobilize to identify and disable all compromised
other compromised computers. To discover a new attack, general accounts and computers. For the forensic analysis problem
detection identifies unknown lateral movement across nodes
which are not known to be compromised. A key component for depicted in Figure 1, we consider both inbound and outbound
general detection is a remote file execution detector which filters paths including the compromised computer. For inbound path
out the majority of the rare paths in the network. We provide analysis, we search for the malicious path or paths that lead
separate algorithms for these subproblems and validate their to the infected computer. Similarly for outbound paths, we
effectiveness and efficiency on two, large-scale datasets, including search for the malicious path or paths originating from the
one with a confirmed attack and one from a penetration test.
known infected computer.
Index Terms—Lateral Movement, Advanced Persistent Threat General Detection Problem. While the forensic analysis
problem deals with the case where an infected computer is
already known, general detection involves identifying infected
computers or compromised user accounts and the malicious
I. I NTRODUCTION
lateral movement paths in the network connection graph with-
Attackers are successfully penetrating governmental and out any prior confirmed detections. Without prior detections,
corporate computer networks with the intent of exfiltrating analysts spend much of their time hunting for evidence of
sensitive data at an alarming rate. Attacks that are directed possible intrusion including lateral movement. In other words,
towards organizations often begin with a spearphishing cam- the task is to first discover the initial infected computer or
paign aimed at targeted individuals which attempts to co- compromised user account.
erce the potential victims into installing malware on their To address both the forensics analysis and general detection
computers by opening an attachment or clicking a malicious problems, we propose Latte, a new graph-based, lateral move-
link. Once executed, the malware typically drops a backdoor ment detection system to discover malicious lateral movement
providing the attacker with remote control over the infected paths. Latte analyzes large-scale event logs collected from
computer. Next, the attacker begins to use lateral movement operational networks. In our system, we model computers and
techniques to explore the network, i.e., the process an attacker accounts as nodes, and computer-to-computer connections or
uses to move from one computer to the next in a compromised user logon events as directed edges.
network [1]. The system first uses Kerberos [6] service ticket request
Lateral movement can be viewed as a set of malicious events to construct the network connection graph. Kerberos
paths corresponding to the attacker’s activity within a large is a computer network authentication protocol which allows
graph of benign connections created by the daily operation of nodes communicating over a non-secure network to prove their
the organization’s users. Prior studies have shown that graph- identity to another in a secure manner. For the forensics anal-
based algorithms are promising for monitoring the network ysis scenario, Latte uses this connection graph in combination
to discover vulnerabilities [2], [3] or detect anomalies [4], with a list of confirmed malicious nodes to identify possible
[5], [8]. However, they often have strong assumptions, either malicious lateral movement. For general detection, which is
for the network, e.g., each computer has a certain defined illustrated in Figure 2, Latte first correlates a number of system
2
Uncompromised Nodes tential lateral movement [7], an important design goal for Latte
Undiscovered Compromised Nodes is not to require any kernel mode components to be installed
Known Compromised Node
Legi!mate Access on each host computer. This allows our system to be more
Malicious Access easily deployed in large-scale networks without the increased
attack surface, instability, and maintenance introduced by a
kernel mode component. To achieve this goal, all of Latte’s
inputs are system and security events are generated by the
production operating system. The previous system proposed
in [7] requires potential malicious lateral movement subgraphs
Outbound Paths for
to be detected in a kernel mode agent. These subgraphs are
Inbound Paths for
then aggregated by a central system. As noted by the authors
Time in [7] and confirmed by our experiments in Section IV, the
number of potential graphs increases exponentially with the
Fig. 1. Forensic Analysis. A known compromised node (denoted as a red path length. Latte instead runs in a reasonable amount of time
star) is discovered and the task is to identify the unknown compromised com-
puters and user accounts (blue nodes) along the malicious access paths (red because the most time-consuming graph processing blocks are
dashed lines). Searching for potentially compromised nodes which connect implemented on a large-scale MapReduce platform.
to the known compromised node considers inbound paths, while discovering We evaluate Latte on two datasets. Both datasets include
downstream nodes which are connected to by the known compromised node
studies outbound paths. one instance of lateral movement. The first dataset contains
events from an actual attack and was provided to us by an
anonymized organization. In some cases, the Microsoft Threat
Uncompromised Nodes
Undiscovered Compromised Nodes
Intelligence Center (MSTIC) conducts joint operations with
Legitimate Access customers to better understand targeted attacks. The second
Malicious Access
Remote File dataset contains data from an internal penetration test on
Execution Detection Microsoft’s computer network. Both datasets contain at least
90 days of Kerberos service ticket request events indicating
user-computer logons and computer-to-computer connections.
Moreover, the second dataset also contains additional Win-
dows system and security events which we use to detect
possible remote file execution activity.
We validate the effectiveness and efficiency of our system
on these datasets separately. For forensic analysis, our method
Time can promote the malicious lateral movement paths as the most
suspicious paths in most cases. For general detection, Latte is
Fig. 2. General Detection. The figure represents all the user accounts and
computers in a large-scale network without any prior confirmed intrusions. able to initially discover nodes that were infected, and the
The figure indicates that one computer in the graph is suspicious due to the corresponding lateral movement paths, during the penetration
detection of a possible remote file execution. The general detection task is to test. Key contributions of our work include the following:
identify the malicious lateral movement path involving the blue nodes.
• We utilize graph concepts for the problem of tracking
and security events to indicate possible remote file execution. It malicious lateral movements across computers and accounts.
then combines these possible remote file execution detections • We provide an algorithm to aid analysts in the forensic
with rare, anomalous paths in the network connection graph analysis scenario to help identify lateral movement paths into
to significantly filter out benign lateral movement activity. To and out of a newly discovered compromised computer or
aid in the detection of malicious lateral movement, the figure account.
indicates that one suspicious computer in the network has
• We introduce a general detection algorithm for identifying
generated an alert indicating a potential remote file execution
previously unknown malicious lateral movement on a net-
(RFE), a key component of lateral movement. When operating
work by a combination of a remote file execution detector
in isolation on a large graph, both the RFE and the rare
and a rare path anomaly detection algorithm.
path detectors can lead to many false positives. However, by
• We validate our algorithms on two large-scale datasets with
combining these separate detection event streams, Latte can
significantly reduce the number of false positives related to known lateral movement collected on operational networks.
detecting the initial compromised node. The lateral movement
path is then recovered from the rare paths which include the II. S YSTEM D ESIGN
computer involved with the remote file execution detection. We now describe Latte, our lateral movement detection
While monitoring the entire network, Latte generates alerts system which is illustrated in Figure 3. System and security
when there is suspicious lateral movement detected by our events are used to construct the network connection graph
algorithm. In some conditions, Latte could be used to auto- and detect potential remote file executions. Latte does not
matically disable an account or a computer’s network access process the events stored on the local machine. To reduce the
if the detection confidence is high enough. likelihood of log tampering on the host by the attackers, these
Unlike an earlier graph-based system designed to detect po- events are instead forwarded to Windows Event Forwarding
3
(WEF) servers and stored in separate logs in a MapReduce and timestamp. It should be noted that since the timestamps are
file system. The system and security events are analyzed to required to reconstruct the malicious lateral movement graph,
identify potential Remote File Executions (RFEs) which can we cannot aggregate multiple paths to reduce the scale of the
be a key component of lateral movement. A detailed discussion collected data.
of the Remote File Execution Detector is presented later in Thus, one input parameter to the system is K, the desired
Section III. From the network connection graph, we propose subpath length. From experiments in Section V, we show
a path-rate score which provides a measure of how rare (i.e., that the number of K-hop paths in the network increases
anomalous) a path is in a computer network. Given a list of exponentially as K increases. From our experience, longer
one or more compromised computers or accounts, this path- subpaths with length K ≥ 3 miss shorter lateral movement
rate score is used to assist analysts in discovering additional activity and make it more difficult to identify malicious lateral
compromised nodes in the Forensic Analysis Module. In the movement paths. As a result, we set K = 2. Latte also generates
absence of any known detections, the General Detection all results for 1-hop paths in case the attacker fails to make a
Module combines the output of the Remote File Execution second movement to a third node.
Detector and the path-rate score to discover new compromised Some servers located on the network contribute an ex-
nodes. Ranking the outputs from the Forensic Analysis Module tremely large number of connections. In our data, the com-
or the General Detection Module can be used to aid analysts in puters with such a high indegree (inbound) or outdegree
the discovery of unknown compromised nodes. In addition, the (outbound) are domain controllers (DCs), AppLocker servers
results of the General Detection Module can lead to automatic and WEF servers. To prune the search space further in the
disabling of compromised user accounts or computers. Latte general detection scenario, we introduce an optional filtering
is fully implemented on a MapReduce platform to efficiently mechanism, Node Filtering, to remove nodes with an inbound
process the large-scale system and security event inputs. We or outbound which is greater than or equal to the threshold
next discuss the details of several of the high-level blocks in F. Node Filtering is not required in the forensic analysis
the figure in the remainder of this section. case because the graph is already filtered based on the known
compromised node. It can be disabled in the general detection
Automatic Account
scenario but requires more time to generate and process the
Disabling
connection graph. It should be noted that by including Node
Windows System
and Security Events
RFE Ranking Filtering, Latte will fail to detect any rare connections to
Events
106, 129, 4624, 4688, Remote File
4769, 5140, 7045 Execution
General Detection
Module
Discovery or from the high degree node. In our experiments, setting
F = 10000 is a good compromise value between reducing
Detector
Ranking
Windows Event
Forwarding
Servers
the size of the graph and the false negative rate.
Network
Graph
Path-Rate Forensic Analysis
Module
Compromised
Computer and
Path-Rate Score. To facilitate the discovery of rare paths
Score
Construction Graph Account List
in the network connection graph, Latte computes a path-rate
score which represents the probability of occurrence of the
Fig. 3. Latte system overview. Each box with a solid outline represents a high-
level block, while the two boxes with a blue background color represent our
entire path over some period of time. We 0)
first assign a weight
two detection modules. The box with the dashed boundary is a required input to each directed edge, w(v, v 0) = C(v,v
X , which represents the
feature for the forensic analysis module. The thick black arrows correspond probability of a daily connection from v to v 0 over an X-day
to system outputs.
history of data, where C(v, v 0) denotes the number of days v
Network Graph Construction. The network connection connects to v 0 during a period of X days. Ideally, X will be
graph is G = (V, E) where V denotes the set of accounts or as large as possible to search for activity in the distant past.
computers (v1, v2, ...), called nodes. E is the set of connection In practice, X is chosen based on the costs associated with
relationships (e1, e2, ...) between pairs of nodes, called edges, storing the connection logs and the computation time required
constructed from the Kerberos Service Ticket Request events to construct and process the graph.
(e.g., 4769 Windows security events). Each Kerberos record The path-rate score, p, which represents to probability of
includes a source node, a destination node, and a timestamp a K-hop path, is then calculated by the multiplication of the
indicating when this connection occurred. Since there is an weights of all of its constituent edges:
order between the connection of a pair of nodes, we employ
p = w(v1, e1, v2, e2, ...eK , vK+1 ) = Πi=1
K
w(vi, vi+1 ). (1)
directed edges in the model. We then model lateral movement
as a path which is a sequence of edges connecting a set of For mathematical simplicity, we compute the path-rate score
nodes, e.g., v1, e1, v2, e2, ...eK , vK+1 . Since the number of edges in (1) assuming the edges are independent. An idea which is
on this path is K, this path is called a K-hop path. similar to the path-rate score was previously proposed in [8].
The main purpose of Latte is to identify malicious lateral Since paths are intended to model lateral movement across
movement subgraphs in the overall network connection graph. nodes within a network, we add a time constraint to filter out
Lateral movement paths can take several forms including linear impossible paths. Specifically, we require edges to be created
paths, directed acyclic graphs (DAGs), or cyclic graphs. To in sequence with time(ei ) < time(ei+1 ) for i = 1,2, · · · , k-1
facilitate accurate and efficient processing, we first search for where time(ei ) denotes the timestamp when edge ei is created.
malicious subpaths (i.e., rare K-hop paths), and then construct Given enough computational and storage resources, no addi-
the overall malicious graph by joining these subpaths together tional constraints are required. However even after eliminating
based on an exact match of the source node, destination node, impossible paths, the graph can still be extremely large.
4
To further prune the graph, Latte includes an optional time core idea is that, by combining these remote file execution
constraint for each pair of edges. This constraint is motivated detections with the rare connection path detections, we can in-
by discussions with analysts who believe that attackers do crease the probability of detecting malicious lateral movement
not typically remain active on the network for an extended without generating a large number of false positives. There can
period of time to avoid detection. Formally, this constraint be a feedback loop from the general detection module to the
requires that each pair of edges in a path must be created forensic analysis module to further analyze potential malicious
within a certain period of time, |time(ei ) − time(ei−1 )| < T, lateral movements.
where | · | represents the absolute value, and T is a user-defined Implementation. Given the large-scale datasets, Latte cannot
input threshold representing the maximum amount of time the be implemented using standard techniques and efficiently
attacker uses to make a pair of consecutive lateral movement executed on a single computer. Since there are millions of
hops on the network. Since T only constrains a pair of edges, nodes and hundreds of millions of edges, we implement all
this model is able to capture much longer lateral movement of our algorithms in a large cluster using Microsoft’s Cosmos
activity within a single session. For example, consider 2-hop MapReduce framework. This framework supports a SQL-like
paths e1 −→ e2 and e2 −→ e3. If e1 −→ e2 is determined to syntax, and includes a distributed storage component. After
be malicious, e2 −→ e3 can quickly be evaluated based on an parsing the users’ input code, it generates a parallel, optimized
exact match of e2. “execution plan” for the defined queries. Latte requires 44
Varying T is a tradeoff between potential false positives, minutes to build the Network Connection Graph and Compute
false negatives, and the number of results returned by Latte. the Path-Rate Scores, 31 minutes to execute the Remote
When T is small, the filtered subgraph is small and will File Execution Detector, and 3 minutes to correlate the RFE
only detect quick lateral movements. However, Latte will miss Detector and Path-Rate Score.
any lateral movement (i.e., a false negative) if the attacker
introduces large delays between hops. On the other hand, using
III. R EMOTE F ILE E XECUTION D ETECTION
large values of T results in a large number of potential lateral
movement paths which can lead to increased false positives. We next discuss how we combine the Windows system
Forensic Analysis Module. The forensic analysis scenario and security events to detect possible remote file execution.
assumes we have a list of compromised nodes as an additional Remote file execution occurs when a user on one computer
input feature, which contains at least one known compromised runs a program on a second computer and is often used
computer or user account. The goal in forensic analysis is by attackers during lateral movement. The PsExec tool is
to quickly identify all malicious lateral movement paths into commonly used by IT administrators on Windows computer
and out of each compromised node. To do so, we set this networks to execute a file on a remote computer. An account
infected node as either the starting or ending node in the 2- that has administrative privileges on the remote computer -
hop path, and apply the path-rate score to generate all possible typically a domain administrator - logs on and mounts the
outbound or inbound paths. Because we only consider a single ADMIN$ share, writes an executable to this share or one
malicious node, the majority of the rare 2-hop paths are filtered of its subdirectories, remotely installs a service pointing to
out revealing an extremely limited number of suspicious paths the executable, and remotely starts the service. The result
to investigate manually. Once we confirm a malicious lateral is the remote execution of a file performing the configured
movement subgraph, we add the two newly discovered nodes actions of the IT administrator. While mechanisms such as
to the compromised node list. When there are multiple infected the PsExec example exist to facilitate remote administration
nodes in the list, Latte repeats the process for each additional of computers on a domain, this and similar techniques are also
node. In some cases, these individually confirmed subpaths can used by attackers to achieve lateral movement on a computer
be combined to reveal the entire malicious lateral movement network. Once an attacker has obtained the credentials to a
graph. privileged account, such as a domain administrator, Remote
General Detection Module. For the general detection prob- File Execution (RFE) can be used to infect other computers
lem, the task is to initially detect malicious lateral movement on the network at will.
without the knowledge of any previously discovered compro- Classes of RFE extend beyond service installation. Mis-
mised nodes. Initially, we tried to address the general detection use of the remote registration of scheduled tasks, the WMI
problem by simply ranking the path-rate score. However, this Win32 Process class and the Windows Remote Management
approach generates too many rare, but legitimate, connection (WinRM) have also been observed in the pursuit of lateral
paths to discover malicious activity. movement.
Instead, we first analyze the event logs for signs of re- Remote service installations or task registrations are imme-
mote file execution, a key component of lateral movement. diately preceded by a network logon from the user performing
While the Remote File Execution Detector generates far fewer the action because of the associated RPC endpoint interaction.
false positives, it still requires significant manual analysis. Additionally, when the service or task is started following
Furthermore, identifying remote file execution alone does not the installation, we may observe a process creation related
identify the potential lateral movement paths. To overcome to the service or task that was installed. At the core of this
these issues, the General Detection Module searches for lateral detection technique are the service installation (7045) and task
movement using the path-rate score in combination with registration (106) events. From these events, we can correlate
possible remote file execution detections on a network. The backward-in-time with the logon event to identify whether
5
there were any network logon success events (4624) by the IV. DATASETS
user who installed the service or task. Finding a match between We utilize two datasets in this study to analyze the proposed
these within a short time-frame is highly correlated with a Latte system. Both datasets consist of Windows security and
remote service installation or task registration. To bring further system events plus labels from analysts for malicious activity.
context to these events, we then extend the correlation by An anonymized organization provided us with the first,
looking for a process creation event (4688) and any share Attack Dataset containing a confirmed attack exhibiting lateral
access event (5140) that may be associated with this activity. In movement among a collection of four computers in a large
summary, we are looking for a combination of the following: network. This dataset only contains the Kerberos Service
a remote logon, a service installation or task registration Ticket Request (4769) event logs with a total of 1,190,639
resulting from that logon, a process creation event of the active nodes and 250,614,631 connections (edges) collected
executable pointed to by the service or task, and optionally, over a 90-day period leading up to the detected attack. Since
an ADMIN$ share access attempt. this dataset contains the connection events from the 4769
Logon Correlation. For a remote service installation event Kerberos Service Ticket request logs, it enables us to only
(7045), the Event/System/Security@UserID contains the se- evaluate the Forensic Analysis Module.
curity identifier (SID) of the user account that installed the The second, Penetration Test Dataset was generated by
service. We search backward-in-time from the installation Microsoft and includes an attack with lateral movement con-
event to find a matching Network Logon Success (4624) ducted by a penetration tester. This dataset includes all of the
event where the TargetUserSid values match. In the case of Windows events found in Section III. As a result, this dataset
a task registration, we are provided with the user account that allows us to evaluate the performance of Latte’s General
registered the task in the UserContext field of the event. We Detection Module. The penetration tester was not aware of
use this information in a similar way to trace the registration how Latte detects lateral movement. For a six-month period
back to any remote network logon success event for that leading up to the penetration test, the number of source nodes
user by correlating UserContext with the TargetUserName and is 3,412,030, and the minimum, maximum, and mean out
TargetDomainName fields. degree is 1, 1,631,308, and 336, respectively. Similarly, the
Task Correlation. In the case of the task registration minimum, maximum, and mean in degree is 1, 3,273,616, and
we are left with some extra work to do as the path 346, respectively.
to the executable is not provided in this event. The
Event/System/Correlation@ActivityID field in the XML pro- Computer1
vides a robust mechanism to find any associated “Task Created
Process” (106) events for this task. Performing a forward-in- e1
Metric Value
connection, e1, was made on the first day and the third day. # of potential remote file execution 45
The connections represented by e2, e3 and e4 occurred on the events generated in the second day
second day while the e5 and e6 connections were generated Ranking of malicious path e2 −→ e3 1
Ranking of malicious path e4 −→ e3 1
on the third day. We first evaluate the detection results of
# of paths tied for the top rank 2
the graph’s 2-hop path results ranked by the path-rate score which include a node with an RFE detection
and Remote File Execution Detector in isolation. Then we Combined Top-1 Precision / Recall 1.0 / 0.2
present Latte’s results, which processes all the events in the full Combined Top-5 Precision / Recall 0.4 / 0.4
dataset. While we did not use the optional Node Filtering for TABLE IV
forensic analysis, we do include it for these general detection R ANKING OF THE TARGET PATHS BY THE General Detection Module FOR
THE PENETRATION TEST WITH ALL THE W INDOWS SYSTEMS AND
experiments, with a connection threshold F = 10000, to prune SECURITY EVENTS FOUND IN S ECTION III.
the graph.
Computer4
Detector can be used for general detection with daily manual
e5 inspection by analysts.
Computer1
Case Study 3: General Detection Results Using All the
e6 Data. While the previous results indicate that the RFE detector
e2
e1
can be used to identify malicious remote file execution with
e3
Computer2 Computer3 manual inspection, there are still too many false positives to
e4
Account1 use it for automatic account disabling. In addition, analysts
often tire from doing repetitive manual analysis day after day
Fig. 5. Connection graph for the Penetration Test Dataset. without discovering malicious activity. For the twenty days of
logs preceding and including the lateral movement in Figure 5,
Case Study 2: Path-Rate Score Ranking and RFE Detec- 777 Remote File Execution events were generated on the
tions in Isolation. First, we consider graph-based detection network. To further improve the system, we next investigate
by only ranking the path-rate score, and the results are combining the RFE detection results with the potential lateral
summarized in Table III. After filtering all potential paths as movement graph in the General Detection Module.
described in Section II, this dataset has 264 million, 2-hop We correlated all the RFE events on Day 2 with Latte’s
paths on Day 2 and 608 million, 2-hop paths on Day 3. Among daily, network connection graph. A summary of the results
them, 88,175,654 paths are tied on Day 2 for the first rank and for this experiment is provided in Table IV. Of the 45
share the same minimum path-rate score, 1/X 2 = 1.235e − 4, potential RFE events that were generated on the second day,
since both edges on the path have just one connection during we manually inspected all the 2-hop paths which included
the observation window (X = 90 days). any of these nodes. As noted in Table III, we found that
The results from this experiment indicate that the path-rate the two, 2-hop paths (e2 −→ e3 , e4 −→ e3) with lateral
score in isolation is not a good candidate for general detection. movement had the smallest path-rate score — i.e., they ranked
One reason is that many of the paths are tied for the top rank. at the top of list. In addition, these malicious 2-hop paths were
While we found that the penetration tester’s movements were the only two items that were tied with the smallest path-rate
rare in general, many other paths on the network were equally score; none of the 2-hop paths involving the other RFE false
rare or even more rare. Even though one 2-hop path was ranked positive nodes had the smallest path-rate score. Although not
first, it was tied with millions of other paths with the same conclusive, this exercise provides evidence that it is possible
path-rate score. Therefore, we conclude that only ranking the to automatically discover a detection by intersecting rare RFE
path-rate score does not have high enough precision to be used events with rare paths in Latte’s connection graph. In other
for general detection. words, this experiment demonstrates that we need to combine
Next, we report the performance of the Remote File Ex- the RFE detection with the path-rate model to overcome the
ecution Detector. The Remote File Execution detections on high false positive rate for the RFE detector. RFE detection
the network significantly reduce the amount of raw data that alone is not sufficient.
analysts need to manually inspect to detect the penetration
testers. As shown in Table IV, there were 45 alerts of possible VII. R ELATED W ORK
remote file execution on the network on the second day of the Lateral Movement Detection. The detection of lateral move-
exercise, and two of these were due to the penetration tester’s ment is an understudied problem, although a few papers have
activity on nodes Computer3 and Computer4 in Figure 5. explored the topic. Neil, et al. [8] propose a graph-based
There were no indications that any of the other alerts were model to detect anomalous paths in a graph. An anomaly
due to malicious lateral movement activity and are considered score based on the connection’s p-value is first learned. Then
to be false positives. It is worth noting that the attack graph a hidden Markov model is used to identify anomalous paths.
in Figure 4 does not include any user nodes that the analysts To make our system scalable, we instead use probabilities
could identify. Presumably this is because the attacker had to construct the path score. Also, we add the RFE detector
already installed a backdoor on the computer sometime in the to more effectively detect intrusions. We tested our system
past. This result demonstrates that the Remote File Execution on actual attack data and red team activity with confirmed
8
lateral movement, where Neil, et al. tested their algorithm on paths for manual investigation by analysts. To overcome
simulated and network data. No attacks were confirmed in this problem, we propose the General Detection Module
38 detections in their network data during a 20 day period. which combines the path-rate score with possible Remote
Authentication data is modeled in [9] using a bipartite graph, File Execution detections to filter out most of the benign
and the authors measure the risk of credential hopping by lateral-movement paths. We demonstrate this idea by using a
computing the largest connected component of this graph. In remote file execution detector which correlates six additional
[1], Purvine et al. assume each computer has a vulnerability Windows system and security events caused by the installation
state, construct a reachability graph of the network, and then of a new service or service, or the creation of a new process.
define an impact metric to monitor the network’s security. The These new tools for detecting lateral movement activity, as
authors then experiment on datasets with artificially injected well as the initial detection of a compromised node, are very
vulnerabilities. In [10], the authors propose a defense-based helpful for analysts in combatting this extremely challenging
zero-sum game to prevent, or at least slow down, an attacker problem.
from reaching a target node on a computer network. Fawaz, et
al.[7] propose a hierarchical system to detect lateral movement R EFERENCES
on a graph. The lowest-level agents in the system propose [1] E. Purvine, J. R. Johnson, and C. Lo, “A graph-based impact metric for
subgraphs for each host of potential lateral movement. A mitigating lateral movement cyber attacks,” in Proceedings of the ACM
Workshop on Automated Decision Making for Active Cyber Defense
central agent then combines these local subgraphs across hosts (SafeConfig), October 2016.
to recover potential lateral movement paths. Therefore, this [2] J. Dunagan, A. X. Zheng, and D. R. Simon, “Heat-ray: combating iden-
system requires that a custom agent be installed on each tity snowball attacks using machinelearning, combinatorial optimization
and attack graphs,” in Proceedings of the 22nd symposium on Operating
host. Latte, on the other hand, only requires standard system systems principles (SOSP), October 2009.
and security events already included in recent releases of the [3] L. P. Swiler, C. Phillips, D. Ellis, and S. Chakerian, “Computer-
Windows operating system. In addition this work does not attack graph generation tool,” in Proceedings of DARPA Information
Survivability Conference & Exposition II, 2001.
specifically address the forensic or general detection scenarios [4] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and
and does not detect remote file execution. Soria-Machado, description: a survey,” Data Mining and Knowledge Discovery, vol. 29,
et al. [11] suggest rules for possibly detecting pass-the-hash no. 3, pp. 626–688, 2015.
[5] C. C. Noble and D. J. Cook, “Graph-based anomaly detection,” in Pro-
attacks for the NTLM protocol and pass-the-ticket attacks for ceedings of the 9th International Conference on Knowledge discovery
the Kerberos protocol based on system and security events and data mining (KDD), August 2003.
on Windows Vista, Windows 7, and Windows Server 2008 [6] M. I. of Technology, “Kerberos: The network authentication protocol,”
https://web.mit.edu/kerberos/, Last accessed February 16, 2017.
environments. These rules attempt to identify potential lateral [7] A. Fawaz, A. Boharay, C. Chehy, and W. H. Sanders, “Lateral movement
movement as anomalous connections to high value computers detection using distributed data fusion,” in Proceedings of IEEE 35th
(e.g., domain controllers). Symposium on Reliable Distributed Systems (SRDS), 2016.
[8] J. Neil, C. Hash, A. Brugh, M. Fisk, and C. B. Storlie, “Scan statistics
Cybersecurity. Several graph-based detection algorithms have for the online detection of locally anomalous subgraphs,” Technometrics,
been proposed for computer security, and many of them focus pp. 403–414, 2013.
on detecting certain network vulnerabilities. Heat-ray [2] tries [9] A. Hagberg, N. Lemons, A. Kent, and J. Neil, “Connected components
and credential hopping in authentication graphs,” in Proceedings of the
to determine potential configuration changes in a network 10th International Conference on Signal-Image Technology and Internet-
regarding computers and user accounts as nodes, and the Based Systems (SITIS), November 2014.
privileges of source nodes on destination nodes as edges. [10] M. A. Noureddine, A. Fawaz, W. H. Sanders, and T. Başar, “International
conference on decision and game theory for security,” in Proceedings
Attack graphs are generated in [3] where nodes represent of International Conference on Decision and Game Theory for Security
possible attack states while edges represent changes of state. (GameSec), 2016, pp. 294–313.
Recently, more studies focus on malware detection on real [11] M. Soria-Machado, D. Abolins, C. Boldea, and K. Socha, “Detecting
lateral movements in windows infrastructure,” http://cert.europa.eu/
large-scale data, which targets the security problem in large static/WhitePapers/CERT-EU_SWP_17-002_Lateral_Movements.pdf,
enterprises. Examples include [12], [13], [14], [15], [16], [17]. Last accessed August 7, 2017.
[12] T.-F. Yen, A. Oprea, K. Onarlioglu, T. Leetham, W. Robertson, A. Juels,
and E. Kirda, “Beehive: Large-scale log analysis for detecting suspicious
VIII. C ONCLUSIONS activity in enterprise networks,” in Proceedings of the 29th Annual
Computer Security Applications Conference (ACSAC), 2013.
In this paper, we propose Latte, a new graph-based system to [13] T.-F. Yen, V. Heorhiadi, A. Oprea, M. K. Reiter, and A. Juels, “An
epidemiological study of malware encounters in a large enterprise,” in
discover malicious lateral movement paths in a compromised Proceedings of the 21st ACM Conference on Computer and Communi-
computer network. The network connection graph is con- cations Security (CCS), 2014.
structed from Windows security events generated by Kerberos [14] A. Oprea, Z. Li, T.-F. Yen, S. H. Chin, and S. Alrwais, “Detection
of early-stage enterprise infection by mining large-scale log data,” in
service ticket requests. Our findings suggest the effectiveness Procedings of the 45th International Conference on Dependable Systems
and efficiency of graph-based algorithms in detecting lateral and Networks (DSN), 2015.
movement. For the forensic analysis problem, we show that [15] S. T. King, Z. M. Mao, D. G. Lucchetti, and P. M. Chen, “Enriching
intrusion alerts through multi-host causality,” in Proceedings of the
using the network connection graph processed with the path- Network and Distributed Systems Security Symposium, 2005.
rate score and the Forensic Analysis Module is sufficient for [16] J. Rennie and A. McCallum, “Using reinforcement learning to spider
generating the lateral movement graph with the detection of the web efficiently,” in Proceedings of the International Conference on
Machine Learning, 1999.
one or more confirmed malicious computers or user accounts. [17] J. Cho, H. Garcia-Molina, and L. Page, “Efficient crawling through url
For the general detection problem, relying on the path-rate ordering,” in Proceedings of the Seventh International Conference on
score alone leads to too many potential, but legitimate, rare World Wide Web, 1998.