1
Explainable Machine Learning for API Call
Sequence Analysis
Muhammad Khan
Independent Researcher
[email protected]Abstract—Although deep learning achieves state-of-the-art perfor- neural networks is their inexplicability. Researchers often
mance in tasks like vulnerability detection and classification, they have refer to neural networks as black-boxes because it is chal-
drawbacks. A significant disadvantage of deep learning methods is their lenging to reason their outputs. This lack of interpretabil-
inexplicability. Many deep learning models, especially sequential models ity of neural networks limits their adoption in regulation-
like long short-term memories and gated recurrent units, operate as
dominated applications like cyber-security and law [2].
black-boxes. The outputs of these black-box models are uninterpretable
by security analysts and software developers. This inexplicability of deep
Another major drawback of ML is the advent of adver-
learning models hinders their acceptance in enterprises. It also prevents sarial ML. Adversarial ML is a relatively new technology
the knowledgeable system experts from removing the spurious corre- that malicious actors can utilize to fool neural networks.
lations that the model might have learned. Thus, it is highly desirable This drawback of neural networks can potentially ren-
to have explainable deep learning models to promote their acceptance der neural network-based cyber-attack detectors ineffective.
in real-world systems. Another major drawback of deep learning is Prior research has exploited adversarial ML to minimally
adversarial machine learning. Cyber attackers can utilize adversarial modify malware to evade state-of-the-art malware detectors
machine learning to fool the deep learning-based cyber-attack detec- while preserving the malicious nature of the malware. Wang
tors. Prior research has shown that cyber-defenders can use explainable
et al. [3] have shown that developers can utilize explain-
artificial intelligence to defend against adversarial machine learning
attacks. Therefore, adding explainability to deep learning models for
able ML to analyze the weaknesses of ML-based malware.
cyber-security is highly desirable. This article proposes a method that Interestingly, Marino et al. [4] have used adversarial ML to
enhances the explainability of system-call sequence analysis-based explain ML models of intrusion detection systems. These
vulnerability detection. Our method can potentially pinpoint the precise variegated insights are valuable for security analysts to
instruction calls that have triggered the vulnerability. This insight is generate vulnerability patches and be more vigilant.
valuable for the security analyst as he can now evaluate the sequence Recent advancements in computer vision ML models’
of system calls more efficiently. explainable artificial intelligence (XAI) have inspired re-
searchers to investigate XAI in cybersecurity. For example,
explainable graph neural networks (GNNs) for vulnerability
1 I NTRODUCTION detection have successfully shown that graph-specific ex-
Organizations throughout the globe have been targets of planations of vulnerability discovery GNNs provide deeper
adversarial cyber-campaigns in recent years. Hackers access insights to security analysts than traditional methods [5].
millions of people’s secret credentials and private infor- Another work by Pirch et al. [6] demonstrates the use of XAI
mation through malicious campaigns. In retrospect, cyber- in vetting malware tags for better organization and catego-
researchers have discovered that most of these attacks were rization of malware. However, since XAI for vulnerability
due to unpatched vulnerabilities and zero-day attacks [1]. detection is a relatively developing area of research, efficient
Thus, it is necessary always to be vigilant and monitor our benchmarks are still not well-established for comparing
systems to detect vulnerability exploits. various XAI models. Therefore, Warnecke et al. [7] propose
Prior research has devoted significant efforts to detect- novel criteria to evaluate XAI models in cybersecurity.
ing and classifying vulnerable systems. Sequential machine Traditional ML-based vulnerability detection models can
learning (ML) based models like long short-term memories predict if a sequence is malicious or benign. However,
(LSTMs), gated recurrent units (GRUs), and transformers they cannot explain their decisions. Furthermore, the se-
have been the most successful in achieving state-of-the-art ries of instruction calls obtained from execution traces are
performance in these tasks. Other ML-based cyber-systems enormous, often amounting to millions of instruction calls.
like intrusion and anomaly detection systems handle se- The large size of these sequences makes it difficult for the
quential data like network traffic and log files. Thus, se- security analyst to analyze a potential threat predicted by
quential ML-based models are also used for these tasks. the traditional ML model. Our framework mitigates the
All state-of-the-art sequence models like LSTMs, GRUs, analysts’ burden by pinpointing the exact elements of the
and transformers have deep neural networks at the core of sequence and their execution timestamps that trigger the
their architecture. However, a significant drawback of deep exploit. Thus, the analyst does not have to analyze the
2
entire exploit sequence. Instead, he can analyze only the states in our threat model. From the state table, we extract
sequence of exploit-triggering instruction calls predicted the states predicted by the classifier. Next, we compute
by our method. This significant reduction of search space the cosine similarities between the states of the extractions
mitigates the extent of human evaluation. with the feature vector of the executed instruction. We can
successfully detect semantically similar instructions by mak-
ing this comparison. Finally, we update the exploit states
2 BACKGROUND similar to the feature vector of the executed instruction
This section discusses background material. with the feature vectors of the following instruction in the
corresponding exploit signatures. However, if the state table
2.1 Seeker indicates that the executed instruction has implemented a
potential signature, we stop the execution and raise the
Seeker is an anomaly detection system that uses artificial alarm. Otherwise, we proceed to analyze the next executed
intelligence to efficiently detect abnormal system behavior. instruction.
We design Seeker to classify instruction series at run-time
efficiently. Unlike traditional methods, Seeker does not clas-
sify entire series of instruction calls. Instead, Seeker ana- 2.1.2 Series Checker
lyzes individual instruction calls and predicts the potential Cyberattackers can exploit a vulnerability by executing a
exploits they might trigger. Then, we use these predictions set of malicious instructions in a particular series. However,
to update a dynamic state table at run-time. We raise the running the same set of instructions in a different order
alarm when one or more states in the state table represent may not trigger the vulnerability [8, 9]. Thus, we need to
a signature. This combination of ML and non-ML-based track the series of program statements and instruction calls
techniques enables Seeker to be accurate and efficient at the to detect exploits. Traditional series models, namely LSTMs
same time. and transformers, can track the order implicitly. However,
First, we describe the high-level overview of the training this leads to high computation overheads. Therefore, we
pipeline. We begin by extracting the instruction series from propose a state table-based approach to track series of
the databases and the publicly available programs. Then, we program instructions.
convert the individual instruction calls into labeled feature First, we discuss a naive pattern matching approach that
vectors. Finally, we use these labeled feature vectors to train can be implemented with a state table [10, 11, 12]. Then,
the Seeker classifier. we discuss the drawbacks of this method and show how
We describe the high-level overview of the Seeker in- we can overcome these drawbacks with Seeker. Every row
ference pipeline next. First, we extract the instruction calls in the state table corresponds to a unique exploit. We can
sequentially from the test program during inference. Then, initialize the exploit states with the first instruction call in
we convert the current instruction call into a feature vector. the respective exploit signatures for the naive approach.
The exploit classifier then analyzes this feature vector to When we encounter any instruction call in the current state
predict the potential triggers that the current instruction call table during program execution, we update the correspond-
can execute. Finally, this list is input to the series checker. ing states with the following instruction call of the exploit
The series checker tracks the order of the instruction calls signature. If the encountered instruction call was the last
and raises the alarm if the current instruction executes a one in the signature series(s), we would raise the alarm
signature. If the current instruction does not implement indicating that the API call has executed the corresponding
a signature, we analyze the next instruction of the test exploit(s).
program execution trace. Although the naive method is much more lightweight
Since Seeker analyzes the individual instructions of the than series models, it has two drawbacks that limit its
execution trace, it is easier to extract low-level insights. This applicability in real-world scenarios. First, the naive pat-
low-level information about the particular series elements tern matching approach cannot detect semantically-similar
facilitates the explainability of Seeker’s decisions. instruction calls [13]. As a result, the attacker may avoid
detection by executing different APIs with the same func-
2.1.1 End-to-end inference pipeline tionality as the instruction in the signature. Second, checking
We have designed Seeker to be capable of detecting exploits the existence of every instruction in the state table is a time-
in real-time. Therefore, we input the executed instruction intensive task. This timing overhead reduces the usefulness
into our framework at run-time. We proceed to the next of naive pattern matching in real-time scenarios [8, 14].
executed instruction if the executed instruction is present Our solution, Seeker, addresses these drawbacks of naive
in our white-list of non-sensitive API calls. Otherwise, we pattern matching. Instead of storing the instructions in the
analyze the instruction to determine if it triggers any unde- state table, we store the feature vectors of the instruction
sirable characteristic. First, we obtain the feature vector for calls. Then, we compare the feature vectors of the executed
the executed instruction. The Seeker classifier processes this APIs with the entries in the state table. We use cosine simi-
feature vector. Next, the classifier outputs the list of poten- larity to measure the similarity between them. This method
tial exploits that the executed instruction might trigger. If enables us to capture semantic similarities between APIs. If
this list is empty, we terminate our analysis of the current the cosine similarity is above a threshold value, we update
instruction and proceed to the next executed instruction. the state table with the feature vector of the following API in
Otherwise, we input the list to the Seeker series checker. the exploit chain. We experimentally observed that a thresh-
The series checker has a dynamic state table that stores the old value of 0.9 enables the model to achieve the highest F1
3
score (note that the F1 score is a better performance measure
than accuracy when the dataset is imbalanced).
To address the second drawback of considerable time
overheads, we do not compare the feature vector of the
executed API with all the states in the state table. Instead, we
compare it with only the states of the categories predicted
by the Seeker classifier. For example, if the Seeker classifier
predicts that instruction call i triggers 23 and 71, we com-
pare the feature vector of i with only the states 23 and 71.
Our experiments show that this method requires nearly 10×
fewer comparisons per instruction call.
3 P ROPOSED METHODOLOGY
We describe our proposed methodology based on Seeker Fig. 2: An example of API sequence extraction from source
in this section. We aim to minimally modify the Seeker code
inference pipeline to integrate the explainability module in
Seeker easily. We are also mindful that explainability does
not deteriorate Seeker’s efficiency.
Firstly, the class hierarchy of API calls makes it challeng-
ing to detect anomalous call sequences. We demonstrate an
example of class hierarchy in Fig. 1. Our method of API call
hierarchy extraction is based on the methods of malware
detection by analyzing API call sequences [15, 16, 17].
Fig. 3: An example of API call sequence extraction from call
graphs
As shown in Algorithm 1, there are few constraints for
generating a series from a state table. The constraints are
as follows.
1) The signID of an instruction determines its relative
position in the signature. If an instruction with a lower
signID executes after an instruction with a higher
signID, it is not triggered.
2) The generated trace should have instruction calls that
have a strictly increasing order of instruction seqID’s.
This property establishes the feasibility.
An example of the potential series extracted from Table 1
are {a1 , b1 , c1 }, {a1 , b1 , c2 }, {a1 , b1 , c3 }, and {a1 , b2 , c3 }.
The analysts can thus examine only the sub-series output
Fig. 1: An example of class hierarchy by our method instead of the entire series. This minimized
However, the class hierarchy information can be ex-
tracted from source codes. Then, we generate the sequence TABLE 1: An example of a state table of X with signature
of API calls. We demonstrate an examples of API call se- {a, b, c}. pi refers to an instruction semantically identical to
quence extraction methodologies in Fig.2 and Fig.3. p.
We propose to create a state table for each signature. We
signID instrName execInstr seqID
update each of these state tables with the series indices of
0 start start 0
the potential instructions that may trigger the pothole. For
example, let us consider that the instruction series {a, b, c} a1 17
1 a
can trigger X . Let us represent the instructions that are a2 108
semantically similar to a, b, c by ai , bi , ci respectively. We b1 32
2 b
show an example of the proposed state table of X in Table 1. b2 83
It is possible to construct a chain from the state table c1 25
3
demonstrated in Table 1. We present the algorithm for c c2 77
creating the list of series from a state table in Algorithm 1. c3 97
4
Function exploitSeqGen(currInstr, indexSeq , log file using layerwise relevance propagation,” in IEEE
instrSeq , st): Bombay Section Signature Conference. IEEE, 2019, pp. 1–
Data: Instruction of signature being analyzed from
state table currInstr; Set of potential series in 6.
terms of execution trace indices indexSeq ; Set [3] W. Wang, R. Sun, T. Dong, S. Li, M. Xue, G. Tyson, and
of potential series in terms of instruction calls H. Zhu, “Exposing weaknesses of malware detectors
instrSeq ; State table st with explainability-guided evasion attacks,” 2021.
Result: Set of potential series in terms of execution [4] D. L. Marino, C. S. Wickramasinghe, and M. Manic,
trace indices; Set of potential series in terms “An adversarial approach for explainable ai in intru-
of instruction calls
sion detection systems,” in IECON 44th Annual Confer-
for i = 0; i ¡ len(st); i++ do
if st[currInstr][seqID][i] ¿ indexSeq[-1 then ence of the IEEE Industrial Electronics Society. IEEE, 2018,
instrSeq.append(st[currInstr][execInstr]; pp. 3237–3243.
indexSeq.append(st[currInstr][seqID][i]); [5] T. Ganz, M. Härterich, A. Warnecke, and K. Rieck,
if curr == len(st) then “Explaining graph neural networks for vulnerability
return indexSeq, instrSeq; discovery,” in 14th ACM Workshop on Artificial Intelli-
else gence and Security, 2021, pp. 145–156.
return exploitSeqGen(currInstr+1,
indexSeq, InstrSeq, st; [6] L. Pirch, A. Warnecke, C. Wressnegger, and K. Rieck,
end “Tagvet: Vetting malware tags using explainable ma-
else chine learning,” in 14th European Workshop on Systems
return indexSeq.append(-1), Security, 2021, pp. 34–40.
instrSeq.append(”-1”); [7] A. Warnecke, D. Arp, C. Wressnegger, and K. Rieck,
end “Evaluating explanation methods for deep learning in
end
indicesSeqSet, instrSeqSet = computer security,” in 5th IEEE European Symposium on
exploitSeqGen(1, [0], [start], stateT able) Security and Privacy, 2020.
for j=0; j ¡ len(indicesSeqSet); j++ do [8] T. Saha, N. Aaraj, N. Ajjarapu, and N. K. Jha, “Sharks:
if indicesSeqSet[j][-1] == -1 then Smart hacking approaches for risk scanning in internet-
delete indicesSeqSet[j]; of-things and cyber-physical systems based on machine
delete instrSeqSet[j]; learning,” IEEE Transactions on Emerging Topics in Com-
end
end puting, 2021.
ALGORITHM 1: Series generation for a category from [9] T. Saha, N. Aaraj, and N. K. Jha, “Machine learning
its state table assisted security analysis of 5g-network-connected sys-
tems,” IEEE Transactions on Emerging Topics in Comput-
ing, 2022.
search space reduces the burden on the analysts. It also [10] F. Kerschbaum and N. Oertel, “Privacy-preserving
provides insights into the model’s working, thus developing pattern matching for anomaly detection in rfid anti-
the trust of the practitioners in the model. Finally, the counterfeiting,” in International Workshop on Radio
analysts can detect errors and spurious correlations, thereby Frequency Identification: Security and Privacy Issues.
correcting them. For example, if Seeker incorrectly predicts Springer, 2010, pp. 124–137.
an instruction v to be semantically similar to w, the analyst [11] S. Y. Lim and A. Jones, “Network anomaly detection
can make a note of it. Thus, he can ignore such spurious system: The state of art of network behaviour analy-
predictions in the future. This ignorance of spurious corre- sis,” in 2008 International Conference on Convergence and
lations makes the model more robust with the mechanism Hybrid Information Technology. IEEE, 2008, pp. 459–465.
of human-in-the-loop. [12] Z. A. Baig, “On the use of pattern matching for rapid
anomaly detection in smart grid infrastructures,” in
2011 IEEE International Conference on Smart Grid Com-
4 C ONCLUSION munications (SmartGridComm). IEEE, 2011, pp. 214–219.
In this article, we have proposed a methodology to extend [13] J. Brown, T. Saha, and N. K. Jha, “Gravitas: Graphical
Seeker to incorporate the explainability of its decisions with reticulated attack vectors for internet-of-things aggre-
minimal overhead. We believe that the explainability of gate security,” IEEE Transactions on Emerging Topics in
Seeker’s decisions will increase the trust of developers in Computing, 2021.
deploying Seeker in enterprise systems. Explainability also [14] T. Saha, N. Aaraj, and N. K. Jha, “System and method
makes it feasible to include the analysts in the loop to make for security in internet-of-things and cyber-physical
Seeker more robust. systems based on machine learning,” Jun. 23 2022, US
Patent App. 17/603,453.
[15] R. Sihwail, K. Omar, and K. Z. Ariffin, “A survey on
R EFERENCES malware analysis techniques: Static, dynamic, hybrid
[1] V. Sehwag and T. Saha, “TV-PUF: a fast lightweight and memory analysis,” Int. J. Adv. Sci. Eng. Inf. Technol,
analog physical unclonable function,” in 2016 IEEE In- vol. 8, no. 4-2, pp. 1662–1671, 2018.
ternational Symposium on Nanoelectronic and Information [16] D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, and K.-P.
Systems (iNIS). IEEE, 2016, pp. 182–186. Wu, “Droidmat: Android malware detection through
[2] A. Patil, A. Wadekar, T. Gupta, R. Vijan, and F. Kazi, manifest and api calls tracing,” in 2012 Seventh Asia
“Explainable lstm model for anomaly detection in hdfs joint conference on information security. IEEE, 2012, pp.
5
62–69.
[17] A. Pektaş and T. Acarman, “Deep learning for effective
android malware detection using api call graph embed-
dings,” Soft Computing, vol. 24, no. 2, pp. 1027–1043,
2020.