0% found this document useful (0 votes)
166 views12 pages

Unleashing Malware Analysis and Understanding With Generative AI

Uploaded by

Kelner Xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views12 pages

Unleashing Malware Analysis and Understanding With Generative AI

Uploaded by

Kelner Xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

SYNTHETIC REALITIES AND ARTIFICIAL INTELLIGENCE-GENERATED CONTENTS

Unleashing Malware Analysis and


Understanding With Generative AI

Yeali S. Sun and Zhi-Kang Chen | National Taiwan University


Yi-Ting Huang | National Taiwan University of Science and Technology
Meng Chang Chen | Academia Sinica

Dissecting low-level malware behaviors into human-readable reports, such as cyber threat intelligence,
is time-consuming and requires expertise in systems and cybersecurity. This work combines dynamic
analysis and artificial intelligence-generative transformation for malware report generation, providing
detailed technical insights and articulating malware intentions.

T he ever-evolving and dynamic cybersecurity land-


scape has thrust organizations into an ongoing
battle to cope with an escalating array of vulnerabili-
learn from, and respond to cyberattacks. Unfortunately,
the production of these reports currently relies heavily
on human experts.
ties, thus amplifying systemic risk. Meanwhile, adver- The current practice to comprehend the intentions and
saries have been continually advancing their skills and behaviors of a malicious program predominantly involves
tactics to exploit vulnerabilities within the system. The dynamic analysis, which entails placing the malicious pro-
relentless disclosure of vulnerabilities affecting various gram within an isolated environment, often referred to as
IT infrastructures continues to burden security teams. a sandbox, to meticulously record its execution process.1
In response to these threats, the cybersecurity commu- However, this approach comes with various limitations.
nity, industry, and academia are collectively focused on Generally, sandbox technology typically records every
developing innovative detection methods. action performed during malware execution. Such a com-
Understanding the malicious sequences of opera- prehensive execution trace typically includes function calls
tional activities provides first-hand knowledge of adver- [such as system calls (syscalls) or application programming
sary techniques, but dissecting these low-level activities interface (API) calls] invoked at both the operating system
is time-consuming and intricate, demanding extensive and user levels, which may contain specific evidence of
knowledge of the workings of the system. Automating malicious activities. However, the semantics of the execu-
the translation of execution traces into human-readable, tion trace are at the machine level, and the size of the trace
high-level reports would serve as an invaluable resource is usually large, often comprising hundreds of thousands
for cybersecurity professionals to develop detection of API calls/syscalls. For security analysts, especially those
methods and anyone seeking insight to comprehend, engaged in digital forensics, swiftly understanding the
intentions behind malicious actions and gradual attacks on
Digital Object Identifier 10.1109/MSEC.2024.3384415
the target system demands a profound level of expertise,
Date of current version: 10 May 2024 making it a time-consuming task.

12 May/June 2024 Copublished by the IEEE Computer and Reliability Societies  1540-7993/24©2024IEEE
Recent research on large language models (LLMs) natural language articles. Our research findings shed
in the cybersecurity field has demonstrated promising light on the potential of LLMs to improve malware
applications, such as malware static analysis2 and mal- understanding and CTI report generation.
ware generation.3,4 For example, Pearce et al.2 explored
prompting LLMs to identify purposes, capabilities, and Malware Activity Trace
variable names/values from code. Trained in vast amounts While dynamic analysis is crucial to comprehending
of unstructured text, including websites, books, and open malware activity, two challenges arise when directly
source code, these LLMs can understand and generate processing the execution trace of malware with an LLM.
natural language in response to input prompts. Inspired by The first challenge stems from the substantial number
this work, we aim to investigate approaches for understand- of syscalls, forming an unfavorable input to LLMs. Spe-
ing malware behaviors and generating descriptive reports. cifically, truncating a prompt due to a restricted maxi-
However, applying standard LLMs, such as ChatGPT, mum input token length may result in inaccurate and
directly to understanding malware activity and the genera- incoherent responses. The second challenge arises from
tion of threat intelligence reports poses challenges. Standard the inherent nature of LLMs, primarily designed for
LLMs are designed for processing text and code rather than processing natural language but exposed to a variety of
sequences of syscalls, potentially limiting their ability to programming languages during training. This suggests
generate requested results. Further details are discussed in that the functionality of LLMs may not seamlessly align
the “Quantitative Analysis” section. To date, there is no pre- with the requirements of dynamic analysis.
trained LLM tailored specifically for dynamic analysis and Our primary objective in this work is to delve into the
cyber threat intelligence (CTI) report generation. Develop- realm of understanding malware activity using off-the-shelf
ing domain-specific LLMs requires significant investments LLMs. To achieve this, we introduce an attack scenario
in computing power, effort, and specialized datasets. graph (ASG) construction to reduce the size of malware
In this study, our objective is to assess the effective- execution while retaining essential information. The pro-
ness of off-the-shelf LLMs in comprehending malware cess involves parsing the trace and mapping the resulting
behaviors and producing CTI reports that delineate the data onto a graph, enabling a more concise representation
malware lifecycle in natural language. Our investigation of the behavior of the malware. Additionally, we employ
focuses on capturing the core essence of the behavior an natural language description (NLD) transformation to
and operations of malware. To accomplish this goal, we convert low-level syscalls into high-level descriptions suit-
conduct dynamic analysis to capture the syscall execu- able for LLMs. Subsequently, a prompt is formulated with
tion trace of the malware. This trace is subsequently the transformed NLDs to generate a coherent natural lan-
translated into a series of structured descriptions. Lever- guage article about the given malware. The architecture of
aging a generative natural language processing model, our proposed system is illustrated in Figure 1.
such as ChatGPT, we enable the automatic generation Our research focuses on driven by the rising preva-
of informative and easily understandable high-level lence of Unix-like operating systems, notably Linux, and

Subject
Malware Sample(s) Syscall Trace ASG I ASG II

Dynamic Redundancy
Analysis Generate Reduction

Extract

Malware List of
Malware List of A Narrative Essay of
NL Transformer NL (Verb, Object)
Syscall Steps Malware Behavior
Descriptions
ChatGPT
Input

(Arrange From
Syscall NL_Synonym
man7 Website)
Base

Figure 1. System overview.

www.computer.org/security 13
SYNTHETIC REALITIES AND ARTIFICIAL INTELLIGENCE-GENERATED CONTENTS

Linux malware, especially considering the expanding parsing module called syscallParser to extract both the
Internet of Things (IoT) landscape. Cozzi et al.5 highlight name of the syscall and the value of the parameter that
the widespread use of the Linux operating system on var- is the direct object of the function for every syscall invo-
ious devices with varying CPU architectures, including cation within the trace. This extraction process results
x86 and ARM. Although significant attention has been in the creation of a file referred to as the malware activ-
devoted to understanding Windows-based malware, ity trace (MAT). Given that Linux system versions vary,
an understanding of and analytical infrastructure for our study targeted Ubuntu 18.04 LTS (kernel version
Linux-based malware have remained relatively limited. of at least 4.15), which encompasses a total of 415 sys-
This underscores the critical importance of our research calls. Among them, 10 of these calls are either unimple-
in comprehending the operations of Linux malware and mented or not unlisted in the manual page, leaving us
the strategies it employs to achieve its attack objectives. with a total of 405 syscalls in our implementation.6
Therefore, our research is focused on investigating Linux To identify the specific argument serving as the direct
malware in the expanding landscape of interconnected object of each syscall, we conduct a manual review of
devices. It is essential to highlight that our proposed each one. For instance, in the case of the write() syscall,
framework is inherently versatile, featuring graph reduc- the direct object is positioned as the first parameter in
tion and prompting techniques for generating articles the output from Strace,7 representing a file descriptor.
that describe malware activity. As such, its applicability We have devised a parser that scans the entire execution
extends beyond a specific platform. trace file, constructing a symbol table that correlates file
descriptors with their respective file names or paths,
Step 1: Generating a List of <Actor, Syscall as well as socket descriptors and associated network
Name, Direct Object> Triplets address data. Consider the mmap2() function, which
There are several components that are integrated and work is used to map or unmap files or devices into memory.
as a pipeline to provide a new, high-level way of under- In this instance, the direct object is situated as the fifth
standing malware intentions and for malware behavior argument of the call. Following the identification of
analysis. First, we take a malware sample. Using strace, a the direct object argument for each syscall, we have
syscall trace that provides a history of operations that the developed a set of regular expressions to automatically
process performed, is generated for each process spawned extract the syscall’s name and the value associated with
by the malware as well as the main process. Strace is a diag- its direct object for each individual syscall invocation.
nostic and debugging tool that intercepts and records the To enable human analysts to inspect and visualize the
syscalls invoked by a process and the signals received by operations of the malware, we have developed a visualizer
the process. The name of each syscall, its arguments, and module designed to transform the MAT trace into a prov-
its return value will be saved to a specified file. In practice, enance graph G = (V, E), referred to as the ASG. In the
a malware execution, besides the main process, may spawn graph, the source node of a directed edge signifies the ini-
zero or more child processes. After the execution, we com- tiator or actor of the syscall, the edge itself corresponds to
bine all of the execution traces of a malware into one single the syscall and is associated with the respective step num-
trace file using the strace-log-merge command. For exam- ber in the trace, and the destination node denotes the direct
ple, in the malware Dofloo, it spawns 16 child processes. object of the operation. Figure 3 shows an example of an
Figure 2 provides an illustrative example of the execution ASG graph of the malware Dofloo directly derived from
trace pertaining to the Dofloo malware. the trace. The ASG graph incorporates nine distinct node
types: file, process, network, memory, ID, permission, exit
Step 2: ASG status, timestamp, and resource types. Data provenance
Our primary interest is to comprehend the intentions analysis techniques have found widespread application in
and behaviors of malware. We focus on the semantics of the analysis of system logs, allowing the parsing of a log into
the operations carried out in each discrete step. To facil- provenance graphs that encapsulate the entirety of the sys-
itate this understanding, we have developed a syscall tem execution. These graphs serve to facilitate causal analy-
sis, revealing the entities and information flows involved in
an attack campaign.
As an example, in Milajerdi et al.,8 a high-level prov-
enance graph is generated to summarize the actions of
an attacker. This approach utilizes data sourced from
syscalls captured in the Linux system audit log and
represents attack activities using five system entities:
Figure 2. An illustrative example of the execution trace
process, files, network connections, memory objects,
pertaining to the Dofloo malware.
and users within the graph. In our case, we draw from

14 IEEE Security & Privacy May/June 2024


sed
1,526
255. mprotect

www.computer.org/security
271. read()
247. mprotect 262. brk 265. read() 266. read() 267. read() 270. read()
0x7f2d2488f000 0x558971622000 0x7f2d250a6000 eth1 eth2 eth4 eth8 eth9

1,527
1,527
385. time 388. time 453. time
372. read 376. read
373. read 375. read 377. read
2022-09-20 2022-09-20 2022-09-20 378. read
374. read
05:05:39 05:05:45 05:07:56
/proc/net/dev

sed 1,527 1,526 1,527

262. brk 282. time 265. read() 372. read()


247. mprotect 255. mprotect
Memory Address Timestamp
NIC /proc/net/dev

(a) (b) (c) (d)

Figure 3. Part of Dofloo’s ASG. We zoomed in on four ASGs (in orange rectangles) from this attack for presentation purposes. Four reduced ASGs are in (a) memory operations reduction,
(b) time operations reduction, (c) IP address enumeration reduction, and (d) file operations reduction, respectively. NIC: network interface controller.

15
SYNTHETIC REALITIES AND ARTIFICIAL INTELLIGENCE-GENERATED CONTENTS

syscall execution traces as well but introduce five addi- considerable number of such call invocations in the trace
tional types. (The ID type here corresponds to the user file. Similarly, we employ a sink node to collectively rep-
in Milajerdi et al.8) Hassan et al.9 develop a tactical resent these time() operations, effectively reducing the
provenance graph to analyze the causal dependencies graph size; an example is shown in ­Figure 3(b).
between threat alerts generated by endpoint detection ■■ IP address enumeration reduction: Malware samples often
and response systems. In our methodology, we initiate attempt to get information such as the IP addresses
the process by converting the MAT trace into an ASG assigned to Linux interfaces (e.g., eth0, eth1, eth2,
graph, which serves as a preliminary step prior to con- etc.) by exhaustively searching with the read() func-
ducting a more comprehensive analysis. tion. This leads to a proliferation of call invocations
Given that a typical malware execution trace is often in the trace. We use a sink node to collectively signify
extensive, comprising thousands of lines of call invoca- these IP address enumeration operations; an example
tions, the resulting ASG graph can become challeng- is shown in Figure 3(c).
ing to visualize and inspect due to its size. Through our ■■ File operations reduction: Malware frequently searches for
experiments, we have identified four areas where we can target files within a host system, resulting in multiple
implement redundancy reduction techniques, effectively open(), read(), and write() calls to files. In some scenar-
reducing the graph’s size without a loss of information: ios, the number of these calls can extend into the hun-
dreds. To address this redundancy, we combine these
■■ Memory operations reduction: Malware often executes duplicated identical operations into one single edge
multiple memory operations when spawning a new within the ASG; an example is shown in Figure 3(d).
process. Those operations include syscalls like brk()
for adjusting the data segment’s end as well as set_ In the context of these sink destination nodes, a
thread_area(), set_tid_address(), set_robust_list(),
dedicated data structure is established to store the data
futex(), mprotect(), arch_prctl(), and munmap(), all
associated with the objects of the original syscall func-
of which involve memory address manipulation. How- tions to ensure that no information is lost during the
ever, the specific values of these memory addresses do
process of reducing redundancy in the graph nodes.
not inherently reveal much about the malicious intent Table 1 shows the size reduction results of five mal-
ware samples. Dofloo (also known as AESDDoS) is a mal-
of the malware. To address this, we consolidate these
memory operations under a single sink (destination)ware that is used to create large-scale botnets to launch
node, collectively denoting them as memory addresses.
distributed denial-of-service (DDoS) attacks and to load
This approach can significantly reduce the graph size;
cryptocurrency miners to the infected machines. Gafgyt
an example is shown in Figure 3(a). is a backdoor malware that affects IoT devices to launch
■■ Time operations reduction: In numerous malware execu-
DDoS attacks. Darlloz is a malware that targets the IoT
tion traces, the time() function is frequently invoked
and infects routers, security cameras, and set-top boxes
to obtain the current time in seconds. This results in a
by exploiting a Hypertext Preprocessor (PHP) vulner-
ability. LuaBot is a trojan that
is completely coded in the Lua
Table 1. The size reduction results of five malware samples. language, targeting Linux plat-
forms to recruit them in a DDoS
Number of botnet. Tsunami, also known as
Number Number Number of Destination Number Kaiten, is a type of DDoS bot that
Malware of Syscalls ASG of Steps Source Nodes Nodes of Edges uses Internet relay chat to com-
Dofloo 11,696 I 7,577 11 392 7,577
municate with the threat actor.
As expected, botnets for
II 7,577 11 54 284 DDoS attacks have dominated the
Gafgyt 864 I 404 13 36 404 Linux-based malware landscape in
II 404 13 35 79 the last few years. These five mal-
Darlloz 5,019 I 2,075 27 524 2,075
wares belong to the most popu-
lar malware families that harvest
II 2,075 27 315 596 poorly protected IoT devices. The
LuaBot 13,801 I 9,746 3 700 9,746 table presents the sizes of both the
II 9,746 2 288 349 original ASG graph (referred to
Tsunami 764 I 344 2 42 344
as ASG I) and the resulting ASG
graph (referred to as ASG II)
II 344 2 16 38 after the redundancy reduction

16 IEEE Security & Privacy May/June 2024


process. The ASG graph provides a detailed sequence of Step 3: Basic NLDs of Syscalls
the operational steps taken by a malware, with each directed Syscalls are low-level function calls used by programs
edge indicating the step number of the corresponding sys- to request services from the operating system. Because
call invocation, along with information about the actor and of the low-level semantics of syscalls, it is often difficult
the direct object of the operation. When traversing the for researchers or security analysts to quickly grasp the
graph according to the step number, we derive a list of the important behaviors or execution attempts of a mal-
execution steps of an ASG, denoted by C ware program. Furthermore, it is time-consuming and
requires deep expertise to quickly comprehend the
C = " l ; l = < k, actor, op - verb, op - dirObj > , (1) intention and tactical steps of the entire operation. In
this work, we propose a method to translate and convert
where l is a line of syscall invocation, k denotes the step a malware’s ASG graph into an easy-to-read and infor-
number, actor is the source node of a directed edge, op- mative English article. This can provide a very useful
verb is the name of the syscall associated with the edge, tool for researchers and security analysts.
and op-dirObj is the name of the object of the syscall In this step, we first develop a conversion module
operation. Figure 4 shows an example of the reduced referred to as the Linux syscall NL transformer (linuxSyscall_
ASG graph of the malware Dofloo as shown in Table 2. NLTransformer). This module makes use of a synonym

Malware

1. exec 7. ugetrlimit

8. readlink 8,192 × 1,024 B


uname 11. waitpid
9. getcwd
10. clone /proc/self/exe
4. set_tid_address
sh
3. set_thread_area 5. set_robust_list /prober
2. brk 6. futex
Memory Address

0x8fe7000 0x8fe7860 0x8fe78c8 0x8fe78d0 0xffe57718

Figure 4. The example ASG of the malware Dofloo.

Table 2. Transformed basic NLDs of the ASG.

Step Syscall Invocation NLD


1 <malware, exec(), uname> Malware execute program: uname
2 <malware, brk(), memoryAddr_0x8fe7000> Malware change the location of the program break: 0x8fe7000
3 <malware, set_thread_area(), memoryAddr_0x8fe7860> Malware set thread local storage area: 0x8fe7860
4 <malware, set_tid_address(), memoryAddr_0x8fe78c8> Malware set pointer to thread ID: 0x8fe78c8
5 <malware, set_robust_list, memoryAddr_0x8fe78d0> Malware set futexes: 0x8fe78d0
6 <malware, futex(), memoryAddr_0xffe57718> Malware lock: Oxffe57718
7 <malware, getrlimit(), 8192*1024 bytes> Malware get resource limits: 8,192 × 1,024 B
8 <malware, realink(), /proc/self/exe> Malware read symbolic link: /proc/self/exe
9 <malware, getcwd(), /prober> Malware get current working directory: /prober
10 <malware, clone(), sh> Malware create child process: sh
11 <malware, waitpid(), sh> Malware wait for process: sh
The example transformation of the syscall invocations list of execution steps into the corresponding list of the NL steps of the malware Dofloo.

www.computer.org/security 17
SYNTHETIC REALITIES AND ARTIFICIAL INTELLIGENCE-GENERATED CONTENTS

base denoted as linuxSyscall_SynonymBase and is respon- the original <Step 1, malware, exec(), uname> is
sible for transforming each line of data within C into a basic translated into “Step 1: ‘malware’ execute program:
NLD. The synonym base is a structured table-like file that uname.” In line 2, the original <Step 2, malware, brk(),
comprises a collection of tuples, each consisting of three memoryAddr_0x8fe7000> is translated into “‘malware’
elements: the syscall name, verb or verb phrase, and direct change the location of the program break: 0x8fe7000.”
object noun or noun phrase. To establish the Linux syscalls In lines 3 and 4, the malware sets up an entry in the cur-
NL synonym base, we refer to both the name and descrip- rent thread’s thread-local storage array and subsequently
tion sections of a syscall in the Linux man pages.10 We assigns the pointer to a thread ID. Combining these
extract the verb or verb phrase along with its corresponding steps, one can understand that the malware is perform-
direct object noun or noun phrase from the descriptions. ing memory location manipulation. We consider that
For example, for clone(), we extract the verb “create” and this is a significant advance in readability and provides
the noun phrase “a child process.” Table 3 lists some con- higher level semantic presentation of the syscall trace
tents of the linuxSyscall_SynonymBase. from the reader’s perspective. For instance, a forensic
By referencing the linuxSyscall_SynonymBase, analyst can perform dynamic analysis of a suspicious,
the linuxSyscall_NLTransformer module takes a list possibly malicious program and obtain its syscall trace,
of the quadruplets of the ASG graph as the input and which can then be fed to the module to obtain a more
outputs a list of the basic NLDs of the attack steps of easy-to-understand translation of the low-level trace.
the malware, as shown in Table 2. In line 1, i.e., step 1, This pioneering approach represents a notable leap
in enhancing the legibility of syscall traces, offering
readers a more semantically enriched perspective. In
Table 3. Example contents of the Linux syscall NLD base. the next phase of our work, we delve into the potential
to uncover the inherent intentions behind sequences
Syscall Name Verb Phase Direct Object Noun Phrase
of consecutive operations. We aim to enhance our cur-
rent capabilities by not only offering succinct NLDs of
clone() Create A child process syscall traces but also by enriching these descriptions
exit() Terminate The calling process with relevant details about the operations involved. Our
linkat() Create A file link objective is to empower readers with a more profound
understanding and deeper insights into the behavior of
rename() Change Name or location
malicious software, further enriching their comprehen-
brk() Change The location of the program break sion of the subject matter. The algorithm of the Linux
open() Open A file or device syscall NLTransformer is presented in Algorithm 1.
getpid() Get Process identification
Malware Activity Report
exec() Execute A program
wait4() Wait for A process Step 4: Prompting LLMs
Recent pretrained LLMs have performed impressively
and successfully on diverse NL processing tasks, such as
creative writing and crosswords.11 We had a trial explor-
Algorithm 1: linuxSyscall_ ing LLM technology, where an input is a list of basic
NLTransformer NLDs of an ASG graph of a malware, and the expected
output is a coherent and informative passage of the given
Input: stepList, synonymBase input. We had two observations based on the generated
Output: NL_stepList
output. One is that LLMs have the capability to gener-
1:  NL_stepList = [];
2:  for each step in stepList do
ate more natural text with lexical variety aligning with
3:    sourceNode = step["sourceNode"]; the given input. The other is that LLMs are able to pro-
4:    syscall = step["edge"] vide relevant information for the given input because
5:    destNode = step["destNode"] they have domain knowledge (without fine-tuning) and
6:    verb = synonymBase[syscall]["verb"] infer the context. Thus, LLMs facilitate the generation of
7:    object = synonymBase[syscall]["object"] comprehensive malware activity articles.
8:    NL_step = sourceNode + verb + object + We develop a prompting approach to leverage the
":" + destNode NL_stepList.append creative potential of LLMs to generate a malware activity
(NL_step)
report automatically. Here, the latest version of ChatGPT
9:  end for
(version 3.5)12 is used for text generation. We employ a
chain-of-thought13 style approach to prompt ChatGPT

18 IEEE Security & Privacy May/June 2024


using instruction prompting with one in-context example the remaining four malware samples as a test set to
(Dofloo). ChatGPT is steered to write a malware activity respond to the following research ­questions (RQs):
report, where the inputs are the given instructions and a
list of the basic NLDs describing malware activity, and ■■ RQ1: How effectively do the ASG reduction and the
the expected output is a coherent passage that specifies NLD transformation perform in the report generation?
the given steps in the generated paragraphs. ■■ RQ2: Does the content generated by LLMs encom-
Prompting ChatGPT involves three phases: pass information obtained from dynamic analysis?
role-playing, demonstration, and task description (the list ■■ RQ3: To what extent do the LLMs contribute addi-
of decomposed steps). First, ChatGPT is asked to play tional information in the generated report?
the role of a malware analyst since LLMs have the capa-
bility to mimic various personas.14 When assigning a role
to play, ChatGPT is provided with the context about the Table 4. An example of prompting ChatGPT to generate a
given identity and background so that it is able to gener- malware activity report for Gafgyt in three phases: role-playing,
ate more natural and in-character responses tailored to demonstration, and task description.
that role. Next, a demonstration is given to endow the lan-
guage model with the ability to generate a similar output. A Now, you are a malware analyst. I will give you a series of malware behaviors.
desired output is an natural language narrative created from Please analyze each step and write an article. The example is as follows:
the basic descriptions of the execution traces of a malware Input:
program, with each step of the malware’s lifecycle clearly Step 186: “sh” change the location of the program break: 0x565075d1f000
marked within parentheses. This provides an exemplar for Step 187: “sh” set region of memory: 0x7fb46b1c1000
few-shot prompting, which suggests how LLMs might have Step 188: “sh” set thread state: 0x7fb46b5d8540
completed the task and aligned with the given input when Step 189: “sh” unmap files or devices: 0x7fb46b5d9000
generating responses. In addition, we manually labeled ...
steps in parentheses to specify the corresponding given Output:
steps. The ultimate expected output is a series of natural Memory and Thread Control: The process “sh(PID = 1522)” demonstrates
language descriptions that ChatGPT generates based on precise control over memory and threads. It begins by changing the
the system call execution trace from a malware program. location of the program break to 0x565075d1f000 (Step 186), establishing
a specific memory region (0x7fb46b1c1000) for its operations (Step 187).
These descriptions provide a coherent narrative that offers
Additionally, it sets the thread state to 0x7fb46b5d8540 (Step 188), ensuring
a thorough portrayal of the malware’s operational activities. efficient management of threads. To optimize resource allocation, the
Table 4 illustrates an example of a prompt for Gafgyt. process unmaps files or devices at 0x7fb46b5d9000 (Step 189).
...
LLM-Generated Report Analysis As the above example, you need to mark the corresponding step after the
The goal of this study is to leverage LLMs generate an corresponding paragraphs. Please follow this format. If you understand,
intelligence report to comprehend malware activity please answer yes.
when provided with an execution trace. Notably, there is Step 1: “malware” check user permissions: /usr/bin/python
a lack of benchmarks for making comparisons with gen- Step 2: “malware” get user identity: UID: 0
erated CTI reports on a large-scale dataset. To address Step 3: “malware” get time: Timestamp
this gap, we utilize Dofloo as an in-context example in Step 4: “malware” get process identification: PID: 1525
our prompt for ChatGPT, as illustrated in Table 4, and ...

Table 5. The statistical descriptions on traces, trace-based generated reports, and ASG–NLD-based generated reports.

Trace Trace-Based ASG–NLD-Based

Average Number of Average Number of


Malware Syscall Sentences Words (SD) Sentences Words (SD)
Gafgyt 864 181 27.02 (13.27) 59 24.73 (8.26)
Darlloz 5,019 1,202 26.2 (28.47) 286 27.3 (26.23)
LuaBot 13,801 5,439 25.88 (15.97) 208 24.6 (9.86)
Tsunami 764 158 24.66 (9.22) 50 23.24 (8.76)
SD: standard deviation.

www.computer.org/security 19
SYNTHETIC REALITIES AND ARTIFICIAL INTELLIGENCE-GENERATED CONTENTS

■■ RQ4: How do LLMs contribute to the analysis and and our generated reports. Notably, our generated
generation of reports? outputs demonstrate significant brevity compared
to the others. This shows the advantages of our pro-
Quantitative Analysis posed ASG and the NLD transformation. We empha-
For RQ1, we conduct a comparison between our gen- size the collective representation of behavior through
erated responses and those generated directly from multiple syscalls.
the original execution traces of the provided samples. In this study, LLMs demonstrate the ability to merge
To ensure a fair comparison, we use the same prompt and condense pertinent NLDs linked to multiple sys-
as presented in Table 4, with the only difference being calls, encapsulating the intent of the operation into con-
the replacement of the input with the original traces. cise statements. This discovery underscores that LLMs
In Table 5, we provide a statistical description of the play a crucial role in enhancing our comprehension of
four malware samples, comparing the original execu- malware behaviors. For example, LLMs succinctly sum-
tion traces with both trace-based generated reports marized 174 NLDs of Darlloz into one sentence, stating,
“Similarly, the process repeats these actions for pro-
cesses 2 through 1882, sequentially opening their cor-
Table 6. The coverage and complementarity
responding stat files in the ‘/proc’ directory and reading
ratios of four malware samples.
the file descriptors (Steps 24 to Steps 197).” Moreover,
LLMs go beyond mere transformation, providing con-
Malware Coverage (%) Complementarity (%) cise insights into the intent of operations. For instance,
Gafgyt 89.54 58.98 in the case of Tsunami, LLMs inferred the achievement
of persistence as the sample attempted to access files
Darlloz 64.04 34.69
associated with system initialization and configuration
LuaBot 44.4 42.46 on Unix, as articulated in the statement, “Persistence
Tsunami 94.74 64.82 Mechanisms: Intriguingly, the ‘malware’ opens the files/
etc/rc.d/rc.local (Step 8) and/etc/rc.conf (Step 9). . . .
By accessing these files, the malware is likely attempting
Table 7. An excerpt of the given prompt and its generated to achieve persistence. . . .”
response to describe Gafgyt. Note that when taking traces as inputs, we made three
observations: 1) LLMs might deviate from instructions
Prompt: to produce a report; 2) LLMs might generate some irrel-
Step 1: “malware” check user permissions: /usr/bin/python evant technical descriptions; and 3) LLMs might refuse
Step 2: “malware” get user identity: UID:0 to comply with requests, providing responses such as,
... “I’m sorry, but as an AI language model, I cannot per-
Step 31: “Process 1526” open file or device: /dev/watchdog
form the analysis for the provided trace. It appears to be
Step 32: “Process 1526” open file or device: /dev/misc/ watchdog
Step 33: “Process 1526” open file or device: /sbin/watchdog a syscall trace, and I would recommend using a special-
... ized tool for analyzing syscall traces, such as strace or a
similar tool, to interpret this data,” in some cases. This
Response:
User Permissions and Identity: The malware (“malware”) begins by
suggests that the current capability of LLMs might be
checking the user permissions for the /usr/bin/python file (step 1). It then insufficient to directly analyze malware activity and gen-
retrieves the user identity and identifies it as UID:0, indicating root privileges erate an intelligence report.
(step 2). This privileged user status grants the malware significant control For RQ2 and RQ3, we analyzed our generated out-
over the system. puts to assess our system’s capabilities in coverage and
... complementarity, respectively. The coverage ratio is
File and Device Operations: Process 1526, a child process created by measured as the set of words in the NLDs covered
the “malware,” performs a series of file and device operations. It starts by within the generated reports, while the complementary
opening the file or device /dev/watchdog (step 31) and /dev/misc/watchdog
ratio is determined as the set of words in the generated
(step 32), which are related to watchdog functionality. It continues by
opening /sbin/watchdog (step 33) and /bin/watchdog (step 34), indicating
reports not present in the NLDs. To compute coverage
the execution of specific watchdog programs. . . and complementarity, two texts are treated as bags of
Watchdog Functionality and System Monitoring: The file and device words, with the words transformed into stems using the
operations performed by process 1526 indicate its engagement with Porter stemming algorithm.
watchdog functionality. Watchdog timers are used in systems to monitor the Table 6 indicates that most of the NLDs are included
health and availability of critical processes. By opening watchdog-related files in the generated reports of Gafgyt (89.54%) and Tsu-
and devices, process 1526 suggests an intention to control or manipulate the nami (94.74%), where reports convey almost the same
watchdog mechanism. . . message in more natural language forms. On the other

20 IEEE Security & Privacy May/June 2024


hand, one possible reason why less than half of LuaBot’s Darlloz in natural language. Darlloz dropped TCP pack-
generated output is covered is that LuaBot repeat- ets on port 32764 and attempted to propagate across
edly manipulated a large number of file operations in networks. As indicated in Table 9, it repeatedly estab-
directories, and these behaviors are summarized in the lished initial connections with various IP addresses but
report. The complementary ratios in Table 6 reveal that through the same port. Remarkably, when provided
LLMs can play a complementary role to dynamic anal- solely with the description of process 1921 initiating
ysis. LLMs provide complements in two ways. One is connections, the language model effectively distilled
by offering additional technical details, as seen in the the steps into the introductory sentences of the para-
example of “Watchdog timers are used in systems to graph: “. . . Each process initiates connections with dif-
monitor the health and availability of critical processes” ferent IP addresses on the specified port 58455. . . .”
in Table 7. The other is by making inferences about the Moreover, we noticed that ChatGPT was able to accu-
underlying purpose of the observed behavior, exempli- rately and chronologically repeat the numbers across
fied by the explanation of process 1518 connecting to sentences, such as, “117.201.16.1, 117.201.16.20, and
the Domain Name System (DNS) service on the local 117.201.16.30 (Steps 567 to 569).” This revealed that
machine via 127.0.0.53 in Table 8. ChatGPT might be able to generate reliable and infor-
mative content for cybersecurity.
Case Study
To answer RQ4, a qualitative analysis was conducted LuaBot. When provided with 349 steps of basic descrip-
on the four generated malware activity reports. This tions, ChatGPT produced 50 paragraphs (4,300 words)
section provides excerpts from each of these reports to to depict the lifecycle of LuaBot. It conducts DoS attacks
offer insights into the quality and comprehensiveness by employing flooding techniques. More specifically,
of the information captured. Meantime, we invited a LuaBot listened on its port 11833 an HTTP GET request
security expert to check the generated passages, and the to 217.23.3.47:1085. Table 10 presents that ChatGPT
results showed that the reports correctly covered each described LuaBot as connecting to the C2 server to
step of the ASG. receive instructional information. This shows that the
language model tends to generate verbose explanations
Gafgyt. When provided with 17 basic NLDs, Chat- when concise and insufficient inputs are given.
GPT generated 16 paragraphs, totaling 1,185 words, to In summary, our approach effectively addresses
outline the lifecycle of Gafgyt. ChatGPT successfully RQ1 by leveraging ASG reduction and NLD trans-
transformed the basic description into a narrative for formation, resulting in concise yet comprehensive
malware activity. Taking step 1 and step 2 of Table 7 as reports. Addressing RQ2, LLMs showcase their
instances, Gafgyt attempted to gain elevated root privi-
leges through a vulnerability in the PHP framework.
ChatGPT processed the messages “UID:0” and inferred Table 8. An excerpt of the given prompt and its generated
the behavior to escalate privileges. Moreover, ChatGPT response to describe Tsunami.
provided additional information to introduce the func-
tionality of watchdog and the intention of the manipula- Prompt: . . .
tion, which were not included in the given input. Step 28: “Process 1518” open file or device: /etc/hosts
Step 29: “Process 1518” get file status: /etc/hosts
Tsunami (Kaiten). When 38 steps were given, ChatGPT Step 30: “Process 1518” read file descriptor: /etc/hosts
Step 31: “Process 1518” initiate connection on a socket: 127.0.0.53:53
generated 17 paragraphs (946 words) to describe the Tsu-
Step 32: “Process 1518” send message on socket: 127.0.0.53:53 . . .
nami. The variant read the contents of the file “/etc/hosts”
and built a connection to send messages, as illustrated in Response: . . .
the steps of Table 8. Based on the context of the given Hosts File Interaction: Lastly, the malware interacts with the /etc/hosts
file. It opens the file, retrieves its status, and reads the file descriptor (steps
inputs, ChatGPT was able to return additional responses;
28–30). Manipulating the host file can have serious implications, including
that is, 127.0.0.53:53 refers to a communication with a potential redirection of network traffic.
DNS resolver service. This shows that ChatGPT not only Network Communication: The malicious “process 1518” demonstrates
translated the basic NL form into a narrative description activities related to network communication. It initiates a connection on a
but also retrieved the relevant knowledge within the con- socket at 127.0.0.53:53 (step 31), which indicates communication with a DNS
text of the pretrained language model. resolver service on the local machine. This communication might be used
for DNS resolution, possibly to evade detection by communicating locally.
Darlloz. When presented with 596 steps of basic descrip- Message Sending and File Status: “Process 1518” sends a message on the
tions as input, ChatGPT generated 93 paragraphs, socket at 127.0.0.53:53 (step 32), implying that it is actively exchanging data
with the DNS resolver. . .
amounting to 5,688 words, to describe the lifecycle of

www.computer.org/security 21
SYNTHETIC REALITIES AND ARTIFICIAL INTELLIGENCE-GENERATED CONTENTS

proficiency in translating descriptions from dynamic


analysis into detailed and coherent malware activ-
ity reports in natural language. This underscores the
T he work presents a framework to analyze the essence
of malware behavior and operations and trans-
form them into informative and easily comprehensible
adaptability and interpretive capabilities of LLMs high-level natural language articles. The results show that
within the cybersecurity domain. RQ3 and RQ4 our proposed approach produces more concise reports,
reveal the substantial value addition brought by our enriched with additional technical details, effectively cap-
method. The generated reports not only cover the turing malware behaviors and intentions. With several
malware activity within the given execution traces insights from our process of developing the framework, we
but also furnish additional technical details and pro- believe that the emergence of evolving AI techniques has
found insights into the underlying intentions behind the potential to automatically, accurately, and efficiently
observed behaviors. In essence, our approach, when enhance the analysis and understanding of malware:
coupled with the right prompts, has the potential to
provide security analysts with enriched perspectives, ■■ Highlighting malware behavior: This work provides
aiding them in the intricate task of malware analy- a comprehensive, chronologically ordered article
sis. Overall, our study demonstrates the feasibility of on malware activities. Since different interpreta-
integrating LLMs with well-designed methodologies tions of a report lead to various actionable strat-
for dynamic analysis, paving the way for a promising egies, the point of threat intelligence could be
avenue in the realm of more nuanced and insightful elaborated across execution traces by comparing
cybersecurity practices. and contrasting syscalls. For example, a generated
malware activity report could highlight distinct
behaviors by comparing a particular variant with
others within its family.
Table 9. An excerpt of the given prompt and its generated
■■ Mapping intelligence to MITRE adversarial tactics, tech-
response to describe Darlloz.
niques, and common knowledge (ATT&CK)15: The
Prompt: . . . MITRE ATT&CK framework categorizes offensive
Step 567: “Process 1921” initiate connection on a socket: 117.201.16.1:58455 and defensive techniques and is commonly referenced
Step 568: “Process 1921” initiate connection on a socket: 117.201.16.20:58455 in CTI reports. In pursuit of a threat-informed defense
Step 569: “Process 1921” initiate connection on a socket: 117.201.16.30:58455 strategy, mapping adversary action to ATT&CK, includ-
... ing their tactics and techniques, can serve as a valuable
Response: . . . resource for defenders. A pretrained LLM is likely to be
Socket Connection Initiation: In this series of steps, multiple child well acquainted with the structure of ATT&CK frame-
processes created earlier (process 1921 to process 1930) demonstrate works and adept at identifying the techniques and tac-
the behavior of initiating connections on sockets. Each process initiates tics associated with malware behavior.
connections with different IP addresses on the specified port 58455.
■■ Reproducibility to other platforms: Our conceptual
The connections initiated by each process are as follows:
• Process 1921 initiates connections on sockets with IP addresses: framework is inherently generic, employing graph
117.201.16.1, 117.201.16.20, and 117.201.16.30 (steps 567–569) . . . reduction and prompting techniques to generate
articles describing malware activity. Its applicability
extends beyond any specific platform. To adapt this
framework to other environments, the monitoring of
Table 10. An excerpt of the given prompt and its generated malware execution can be achieved using tools like
response to describe LuaBot. Process Monitor for Windows. Subsequently, ASG
reduction can be tailored for the corresponding sys-
Prompt: . . .
Step 348: “Process 1873” set options on sockets: 217.23.3.47:1085 tem entities, and the NL transformation can consult
Step 349: “Process 1873” bind name to socket: 0.0.0.0:11833 . . . official documents to comprehend the semantics of
system functions. Finally, by adhering to the proposed
Response: . . .
Socket Options: “Process 1873” sets options on sockets with the address
prompt, a comprehensive report detailing the lifecy-
217.23.3.47:1085 (step 348). Socket options allow fine-grained control over cle of malware can be generated.
socket behavior, such as setting timeouts, enabling or disabling specific ■ ■ Fine-tuning specialized LLMs: While off-the-shelf
features, or configuring socket-level protocols. LLMs demonstrate impressive capabilities at a broader
Socket Binding: Finally, “process 1873” binds a name to a socket with scale, they are not without limitations. There exists a
the address 0.0.0.0:11833 (Step 349). This socket-binding operation serves risk that ChatGPT, for instance, may generate inaccu-
to associate the specified address and port with the socket, allowing the rate information or engage in hallucinatory responses
process to listen for incoming network connections on that address and port when prompted. Recognizing this, there is a growing
combination. . .
demand for a fine-tuned, domain-specific LLM that

22 IEEE Security & Privacy May/June 2024


possesses a deeper understanding of malware behav- NJ, USA: IEEE Press, 2020, pp. 1172–1189, doi: 10.1109/
iors, thereby enhancing its utility in malware analysis SP40000.2020.00096.
applications. Initiating the fine-tuning process with a 10. “Linux manual page: Clone(2).” Jambit GmbH. Accessed:
pretrained model, such as GPT and LLaMa, allows us Mar. 30, 2023. [Online]. Available: https://man7.org/
to bolster its performance by refining its weights with linux/man-pages/man2/clone.2.html
a substantial volume of execution traces. It is crucial 11. S. Yao et al., “Tree of thoughts: Deliberate problem solv-
to underscore that the preparation of high-quality, ing with large language models,” 2023, arXiv:2305.10601.
well-annotated training data is paramount and plays a 12. “CHATGPT: Optimizing language models for dialogue.”
critical role in this fine-tuning process. OpenAI. Accessed: Mar. 30, 2023. [Online]. https://
openai.com/blog/chatgpt/
Acknowledgment 13. J. Wei et al., “Chain-of-thought prompting elicits reason-
We would like to express our gratitude to OpenAI for ing in large language models,” in Proc. 35th Conf. Neural
providing access to ChatGPT, which was instrumental Inf. Process. Syst. (NeurIPS), 2022, pp. 24,824–24,837.
in proofreading the manuscript. 14. A. Kong et al., “Better zero-shot reasoning with role-play
prompting,” 2023, arXiv:2308.07702.
References 15. “Adversary tactic technique common knowledge.”
1. O. Or-Meir, N. Nissim, Y. Elovici, and L. Rokach, MITRE ATT&CK. [Online]. Available: https://attack.
“Dynamic malware analysis in the modern era—A state mitre.org/
of the art survey,” ACM Comput. Surv., vol. 52, no. 5, pp.
1–48, 2019, doi: 10.1145/3329786. Yeali S. Sun is a professor in the Department of Informa-
2. H. Pearce, B. Tan, P. Krishnamurthy, F. Khorrami, R. tion Management, National Taiwan University, Taipei
Karri, and B. Dolan-Gavitt, “Pop quiz! Can a large lan- 106, T
­ aiwan. Her research interests include Internet
guage model help with reverse engineering?” 2022, security and forensics, quality of service, cloud com-
arXiv:2202.01142. puting and services, and performance modeling and
3. S. Sharma. “ChatGPT creates mutating malware that evaluation. Sun received her Ph.D. in computer science
evades detection by EDR.” CSO Online. Accessed: Sep. from the University of California, Los Angeles. She is
22, 2023. [Online]. Available: https://www.csoonline. a Member of IEEE. Contact her at [email protected].
com/article/575487/chatgpt-creates-mutating-malware
-that-evades-detection-by-edr.html Zhi-Kang Chen was a master student and graduated from
4. Y. M. Pa Pa, S. Tanizaki, T. Kou, M. Van Eeten, National Taiwan University, Taipei 106, Taiwan. His
K. Yoshioka, and T. Matsumoto, “An attacker’s dream? research interests include cybersecurity, malware analy-
Exploring the capabilities of ChatGPT for develop- sis, and prompting engineering. Chen received his M.S.
ing malware,” in Proc. 16th Cyber Secur. Experimenta- in information management from National Taiwan
tion Test Workshop, 2023, pp. 10–18, doi: 10.1145/ University. Contact him at [email protected].
3607505.3607513.
5. E. Cozzi, M. Graziano, Y. Fratantonio, and D. Balzarotti, Yi-Ting Huang is an assistant professor with the National
“Understanding Linux malware,” in Proc. IEEE Symp. Taiwan University of Science and Technology, T
­ aipei 106,
Secur. Privacy (SP), Piscataway, NJ, USA: IEEE Press, Taiwan. Her research interests include malware analy-
2018, pp. 161–175, doi: 10.1109/SP.2018.00054. sis, deep learning, and natural language processing in
6. “Linux manual page.” Jambit GmbH. Accessed: Jun. 24, educational applications. Huang received her Ph.D. in
2023. [Online]. Available: https://man7.org/linux/ information management from National Taiwan Uni-
man-pages/man2/syscalls.2.html versity. She is a Member of the IEEE Computer Soci-
7. “strace.” Jambit GmbH. Accessed: Mar. 30, 2023. [Online]. ety. Contact her at [email protected].
Available: https://man7.org/linux/man-pages/man1/
strace.1.html Meng Chang Chen is a research fellow/professor with
8. S. M. Milajerdi, R. Gjomemo, B. Eshete, R. Sekar, and V. the Institute of Information Science and Research
Venkatakrishnan, “HOLMES: Real-time APT detection Center for Information Technology Innovation, Aca-
through correlation of suspicious information flows,” demia Sinica, Taipei 115, Taiwan. His research inter-
in Proc. IEEE Symp. Secur. Privacy (SP), Piscataway, NJ, ests include computer and network security, wireless
USA: IEEE Press, 2019, pp. 1137–1152, doi: 10.1109/ networks, deep learning for complicated applications,
SP.2019.00026. and data and knowledge engineering. Chen received
9. W. U. Hassan, A. Bates, and D. Marino, “Tactical prove- his Ph.D. in computer science from the University
nance analysis for endpoint detection and response sys- of California, Los Angeles. He is a Member of IEEE.
tems,” in Proc. IEEE Symp. Secur. Privacy (SP), Piscataway, Contact him at [email protected].

www.computer.org/security 23

You might also like