Sec21 Alrawi Forecasting
Sec21 Alrawi Forecasting
Omar Alrawi*, Moses Ike*, Matthew Pruett, Ranjita Pai Kasturi, Srimanta Barua,
Taleb Hirani, Brennan Hill, Brendan Saltaformaggio
Georgia Institute of Technology
transitions the program’s context to BBi+1 and state An Example of DC (s) Computation. Figure 2 is
si+1 . The set All_Opsi is partitioned into 2 disjoint sets, a working example to show the computation of DC (s).
Sym_Opsi and Con_Opsi , such that: Figure 2a depicts a recovered CFG and memory and
register values from the memory image. Symbolic
Sym_Opsi ∪ Con_Opsi = All_Opsi (2) execution starts at basic block BB1 and ends at BB4 .
and We annotate each basic block to show which
Sym_Opsi ∩ Con_Opsi = ∅ (3) instructions are Sym_Ops based on the register or
For a state sn , we define the DC (sn ) function as follows: memory values when the basic block is being executed.
Notice that because register edx at BB2 and memory
n
P |Sym_Opsi | address 0x732460 at BB2 have concrete values, only one
|All_Opsi |
i=j branch is taken by the conditional jump instructions at
DC (sn ) = 1 − (4) the end of BB2 . For this reason, BB5 is not explored.
|τn |
Symbolic data can be introduced by I/O-related
where |Sym_Opsi | is the cardinality of the Sym_Ops
function calls and calls to functions that are simulated
performed to reach state si and |All_Opsi | is the
based on Forecast’s function models. Such function
cardinality of All_Ops performed to reach state si .
calls create symbolic variables within the memory dump
Further, |τn | is the cardinality of the state transitions
which causes a mixing of symbolic and concrete data.
from state sj to sn .
Tracking the cumulative ratio of Sym_Opsi to Following along with Figure 2a, Figure 2b computes
All_Opsi for each state transition enables us to DC (s) for each state (basic block) transition. For
calculate DC (s) instantaneously without iterating example, DC (s1 ) = 0.67 when we transition to state s2 ,
through the previous states sj to sn . An extended form then it increases to 0.83 as we transition from s2 to s3 .
of DC (s) that allows us to calculate its instantaneous For each DC (si ) value derived in Figure 2b, we plot
value is given as follows: them against the transition states in Figure 2d.
Figure 2c plots the Cumul_Ratio(si ) for each state
δ (shown in black). The instantaneous Cumul_Ratio(sn )
DC (sn ) = 1 − Cumul_Ratio(sn ) (5)
δT function is a straight line (Cumul_Ratio(sn ) = mT )
drawn from origin to the point sn ∈ T , where m is the
where, for all transition states T , Cumul_Ratio(sn ) is
the sum of the states’ ratio for states sj to sn , and slope. The derivative of Cumul_Ratio(sn ) = mT gives
defined as: the instantaneous DC (sn ) (Equation 5).
n Path Probability. Given m current states, the path
X |Sym_Opsi |
{∀si ∈ T : Cumul_Ratio(sn ) := } (6) probability of a path p, with current state s, is derived
|All_Opsi |
i=j by dividing s’s DC (s) by the summation of the DC (s)
Exfiltration, the constraints on the input file (buf _3) ExW ) by the base API name but our plugins cover all variants.
OF N
OF P
Malware Exfiltration Injection Spy
PF OM OF PF OM OF PF OM OF PF OM OF PF OM OF PF OM OF PF OM OF
Bokbok 38% 2 2 5% 3 3 57% 1 1 - - - - 0 0
AcridRain 23% 3 3 19% 4 4 - 28% 2 2 - 30% 1 1 - 0 0
AthenaGo - 11% 4 4 - 22% 3 2 - 33% 2 3 34% 1 1 2 0
Rokrat 30% 1 1 26% 2 2 22% 3 3 - 17% 4 4 - 15% 5 5 0 0
AdamLocker 22% 3 3 0 4 ∅ 45% 1 1 - - 33% 2 2 - 0 1
Marap - 46% 3 3 40% 1 1 - 14% 2 2 - - 0 0
ATI - - - 41% 2 2 - 42% 1 1 17% 3 3 0 0
TeslaAgent 11% 4 4 14% 3 3 32% 1 1 - 13% 2 3 30% 3 3 - 0 0
Andromeda 25% 2 2 - 14% 3 3 - - 61% 1 1 - 0 0
AveMaria 28% 3 4 29% 2 2 28% 4 3 - 25% 1 1 0 3 ∅ - 2 1
Aveo 22% 3 3 - - 40% 1 1 - 38% 2 2 0 4 ∅ 0 1
7Honest - 16% 3 3 51% 1 1 11% 4 4 - 22% 2 2 - 0 0
Abaddon - 26% 2 2 - - - 84% 1 1 - 0 0
AVCrpyt 51% 1 1 - - - - 19% 3 3 30% 2 2 0 0
Table 1: Capability Forecasts of 14 Select Recent Samples. PF : Forecast percentage, OM : Ground truth manual
ordering, OF : Forecast ordering, OF P : Ordering false positive, OF N : Ordering false negative.
4.1 Evaluating Capability Forecasts respectively. Thus, Code Injection is less difficult to
reach and hence has the highest forecast.
Table 1 presents the capability forecasts of 14 recent Next, we validate capability ordering. We assign an
samples5 we manually collected ground truth for. increasing number, starting at 1, to each capability
Forecast output 49 distinct capability forecasts. identified by manual checking (defined as OM ) and
Manual analysis validated 45 of them; we found 4 false ordered by increasing difficulty. We assign an increasing
positives (FP) and 3 false negatives (FN), with an number to each capability identified by Forecast
accuracy of 86.5%. FPs were due to over-approximating (OF ) up to the number of identified capabilities. For
symbolic constraints when simulating undocumented Bokbot, both manual checking and Forecast report an
APIs such as RtlCreateUserThread. The FNs were due ordering of 1, 2, and 3 for Code Injection, C&C
to rare unresolved symbolic targets. Communication, and File Exfiltration respectively.
As shown in Table 1, because Forecast’s forecast for
Ground Truth. Validating each forecast involves 2
Bokbok’s Code Injection is the highest, (i.e., 57%), Code
checks: (1) the presence or absence of the identified
Injection’s ordering or OF is 1. Similarly, the ordering
capability, and (2) the accuracy of the forecast
by manual checking or OM is also 1, which validates
percentage. For ground truth for the presence or
Forecast’s forecast for Bokbot’s Code Injection. In
absence of a capability, we leveraged malware reports
another example, Forecast’s prediction for AthenaGo’s
from security vendors [31], [32] and our own manual
Dropper is 22%, which is the second highest forecast (i.e.,
analysis. We also used the MITRE ATT&CK
OF is 2). However, manual checking shows Persistence as
Framework [33] for our initial ground truth mappings.
the second highest instead, resulting in FP for AthenaGo
To validate our ground truth forecast percentages,
(listed in the OF P column of Table 1). Forecast missed
(i.e.rank each outcome according to the “difficulty” or
Aveo’s Anti-analysis capability, resulting in a FN (listed
“constraints required” of arriving at an outcome) we
in the OF N column), and a forecast of 0 (Ccast column).
modeled the difficulty metric of executing capabilities
Overall, Persistence reported the highest forecast
from the memory image capture point based on the
percentages, as high as 84% for Abaddon. We found
number of branch constraints to reach a given
that most malware persist via infecting the registry.
capability. We can obtain this metric via manual
Conversely, File Exfiltration reports the lowest forecasts,
analysis of the memory image since we know the
as low as 5% for Bokbok. Reasonably, File Exfiltration
addresses of the individual capabilities. Using Bokbot as
can be seen as an “end goal” capability, which malware
an example, Table 1 shows its 3 capabilities: Code
deploy in deep code under several constraints. By
Injection, C&C Communication, and File Exfiltration.
integrating capability analysis plugins, Forecast was
For these, Forecast reports forecast percentages of
able to automatically identify them.
57%, 38%, and 5% respectively (listed in the Ccast
column of Table 1). Based on manual analysis of its C&C Communication. Table 1 shows 7 C&C
memory image, the number of branch constraints to domains identified with 1 FP. We focused on
reach these capabilities are 166, 195, and 257, WinINET’s APIs such as InternetOpenUrl and socket.
In particular, we concretized their domain and IP
5 Their hashes are presented in Table 8 in Appendix A. address arguments. Forecast revealed Rokrat and
Table 2: Packed malware evaluation results based on packer taxonomy found in Ugarte-Pedrero et al. [34].
Table 4: Average Capability Forecasts and Metrics, featuring the 11 most prevalent malware families.
Basic blocks
Exploration
capabilities
techniques
Identified
explosion
on-line and off-line concolic execution to manage path
instances
Explored
Explore
covered
time(s)
exploration. However, Forecast reduces path
paths
Path
Tools explosion by using the DC (s) framework to identify
Forecast Data-Guided 877 32 28 301 12488 capability-relevant paths. Additionally, Forecast does
angr [22] Pure Symbolic 1292 11 521 236 14567 not require an intact binary file or prior knowledge of a
S2E [6] Concolic 602 7 57 98 10007 program’s input and environment, which avoids
Triton [23] Concolic 229 3 N/A 522 4309
restrictive assumptions for symbolic execution.
Table 6: Forecast Compared to Existing Techniques. For malware applications, prior works use full-system
Forecast. We used 50 samples for this experiment.8 emulation [4], dynamic analysis [5], [45], and Win API
As shown in Table 6, Forecast identified more than simulation [46] to identify malware capabilities.
twice the capabilities compared to angr, S2E, and Yadegari et al. [47] study the robustness of symbolic
Triton. Forecast explored as many as 877 paths per analysis techniques against malware obfuscation. In
sample on average. By leveraging prior execution state contrast, Forecast is a post-detection approach that
to optimize paths, only 28 paths were terminated due combines both symbolic analysis and memory forensics
to path explosion compared to 521 by angr and 57 by to identify staged malware capabilities. Prior work on
S2E. Although angr explored the most paths (1292), it memory forensics focuses on kernel objects [48], [49],
terminated 521 due to path explosion. We observed that access patterns to kernel objects [50]–[54], and dynamic
angr could not concretize paths when faced with early memory traces [55], [56] to detect and remediate rootkit
symbolic control flow, causing state explosion. The malware. DSCRETE [57] leverages memory image code
exploration time for angr was relatively low (236s) reuse for interpreting single data structures. Similarly,
because many paths quickly became unconstrained and for mobile security, prior works [58]–[61] analyze a
terminated. Forecast reported a higher runtime of mobile application’s memory to recover artifacts related
301s due to the overhead of computing probability to recent activities. However, Forecast relies on
scores for each state. memory artifacts to contextualize malware behavior
S2E requires symbolic variables to be manually through symbolic analysis and surgically analyzes a
induced for multi-path exploration. When we initially single target malicious process.
tested S2E with malware, we traced only a single path. Provenance-based investigation techniques are also
However, to enable S2E to explore multiple paths, we related to Forecast. NoDoze [62] and Hassan et
symbolized the arguments of the malware’s local al. [63] utilize Windows and Linux system events to
functions and only traced paths that originated from prioritize alerts through a network diffusion approach
the malware code. This led to an exploration of 602 using temporal ordering. Similarly, HOLMES [64]
paths, where 57 became unconstrained and terminated. correlates suspicious events by examining information
S2E had the fastest average runtime (98s), because it flows and TARDIS [65] identifies compromised websites
executes code natively on the CPU. Triton uses a through a spatial-temporal approach to present attack
per-input iterative approach to code exploration, hence tactics for analysts. Attack2Vec [66] uses system event
the path explosion metric is not applicable. To trace embedding to derive emerging attack tactics.
multiple paths with Triton, we manually pushed new Forecast uses the DC (s) model to predict in-progress
constraints to each path predicate, but Triton was malware capabilities using a similar network diffusion
heavily hindered by input requirements to explore new approach [62]–[64] but instead identifies relevant paths
paths. Triton traced 229 paths on average, 3 of which based on the execution context of a malware.
identified capabilities. Due to its iterative nature and
instruction-level emulation, it incurred the highest
runtime of 522 seconds. 6 Limitations and Discussion
[23] Triton: A Dynamic Symbolic Execution Framework, SSTIC, [39] W. E. Howden, “DISSECT — A symbolic evaluation and
2015, pp. 31–54. program testing system,” IEEE Transactions on Software
Engineering, no. 4, pp. 266–278, 1978.
[24] Volatility: Open Source Memory Forensics Framework, htt
[40] C. Cadar and D. Engler, “Execution generated test cases:
ps://www.volatilityfoundation.org, 2019.
How to make systems code crash itself,” in Proceedings of
the International SPIN Workshop on Model Checking of
Software, San Francisco, CA, Aug. 2005.
Table 7: Capability identification and tracking is a modular component of Forecast. Analysts can build additional
capability plugins to help in future investigations by identifying APIs and parameter constraints that make up the
capability. The parameter constraints are tracked through data flow analysis and backward slicing.
Table 9: Malware Samples And Parameters Used In The Empirical Evaluation (§4.6).