Control_Flow_Graphs_Against_Malware_Methods_of_Analysis_and_Detection
Control_Flow_Graphs_Against_Malware_Methods_of_Analysis_and_Detection
2024 26th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) | 979-8-3315-3283-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/SYNASC65383.2024.00074
Abstract—One key component of a cyberattack is malware. If to be evasive from detection systems and to be persistent.
a malware program is detected and blocked the cyberattack may Evasion can be achieved through mutation mechanisms
stagger or fail, thus bad actors design their malware programs such as polymorphism, metamorphism or obfuscation. In
with additional characteristics and functionalities that harden
the analysis of the malware program and make the malware case of malware persistence the focus is on executing the
undetectable by antivirus solutions. With the appearance of malicious code for longer periods of time with the malware
more advanced malware new detection methods are needed. being harder to be disrupted and detected. In case of more
With the help of reverse engineering techniques and software advanced malware, which usually have evasion capabilities, it
engineering concepts, one model that analysts can work with is is necessary to analyze the malware in order to understand its
Control Flow Graphs. Used for software optimization, control-
flow-graphs offer the advantage of graph properties for analysts capabilities and eventually build rules and update signatures
to detect malicious particularities in malware samples. This paper of the protection software.
explores some methods of detection and analysis based on control
flow graphs, categorizes them in four categories and highlights Malware analysis consists in observing the behaviour of
different particularities in these approaches. a malware sample through static and dynamic means. By
Index Terms—Malware, Malware Detection, Control Flow
Graph. static analysis the analysts extracts different information and
characteristics from a malware sample without executing
the malicious program. By dynamic analysis the malware
I. I NTRODUCTION
is being executed and observed inside a safe environment.
The number of cyberattacks is increasing and the Malware analysis is a complex and time-consuming task
complexity of cyberthreats is rising. One framework for as advanced malware employ anti-analysis methods such
cyberattacks is Mitre’s ATT&CK Matrix which focuses on as: code obfuscation techniques, packing (compression and
adversary techniques based on observations from the real encryption), information hiding, detection of debuggers,
world. [1]. This framework defines a cyberattack as a series virtual machines and monitoring tools [2]. The process of
of techniques and additionally profiles APT groups and malware analysis is being automated for faster detection and
malware families with these techniques. By this framework response inside defensive solutions. One of the concepts
a cyberattack does not necessarily follow a pattern and is which automation in malware analysis focuses on is the
more explicit as specific techniques can be missing or can be concept of Control Flow Graphs.
added. Based on this framework we define a cyberattack as
a series of stages in which bad actors deploy their malicious Control Flow Graphs (CFG) are directed graph representa-
actions and assets to achieve their proposed objectives. In tions of a program’s code. In these graph representations the
these series of stages one key element that may appear is nodes are represented in blocks of code and the edges represent
malware. the flow transitions of the program. Control Flow Graphs
found usages in code analysis and optimization, compilers and
Malware is a software program designed to execute software plagiarism checking [3]. In malware analysis these
malicious actions inside a computer system. Malware is control flow graphs give the analyst a bird-view of the malware
classified based on various factors such as functionality, samples and moreover the analyst can use different graph prop-
characteristics, target system or iterations. Based on this, the erties to his advantage to find different malware properties.
malware kingdom is vast with many types and families of CFGs are structures with explicit information which present a
malicious programs. As bad actors need to achieve multiple good potential for automated detection and analysis.
objectives malware programs are designed to have multiple
malicious functionalities. Nowadays we see more and more II. R ESEARCH M ETHODOLOGY
hybrid malware which inherits the functionalities of other In our study we mainly try to project a taxonomy of
malware types. Moreover malware programs are designed CFG-based malware methods for analysis and detection and
413
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:26:19 UTC from IEEE Xplore. Restrictions apply.
[34]
[33]
[32]
[6] [15] [25] [31]
[5] [11] [14] [18] [24] [28] [30] [37]
[4] [7] [8] [9] [10] [12] [13] [16] [17] [19] [20] [21] [22] [23] [26] [27] [29] [35] [36]
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Algorithmic Approaches Formal Approaches
Machine Learning Approaches Deep Learning Approaches
more concise feature for detection. In this case is the sequences getting rid of the trash code that normally a metamorphic
of assembly instructions. malware would produce through mutation and it implies
One other approach [29] focuses on IoT Malware and uses translating the machine code to a more explicit representation
multiple detection models, one for CFG metrics and another and applying code transformations based on control flow and
for action sequences extracted from the suspicious file. For data flow analysis. The code comparator phase relies on graph
the graph metrics they use different graph properties which isomorphism and node and edge labeling on the CFG of the
point out the complexity of the CFG (size, diameter, shortest whole program.
path in graph, etc.). The action sequences are build from One other malware characteristic detection approach was
an action graph derived from the CFG which represent the focused on infected code. Viruses are malware programs that
malware behaviour concept the models are working with. The leave a payload inside an executable, modifying its behavior.
models chosen for the experimental phase were: C-Means, CFGs proved useful in this case too, with one formal approach
SVM, kNN, Naive Bayes, Random forest and Rotation Forest [8] based on tree automata and subgraph isomorphism and
which showed good overall accuracy. another approach based on data mining [14].
With the appearance of deep learning models specialized
D. Malware Detection by Deep Learning detection got a few new attempts of methods. For cyberthreat
Neuronal Networks have proved their efficiency and opened intelligence, automated Malware Homology [25] and identifi-
the gates to a new field of automated problem solving. The cation of Attack Techniques [27] are approached using CFGs
many types of Neural Networks have found their way into and custom architectures composed of CNNs or GNNs.
malware detection methods and CFGs are also used with these
types of models to detect malware fast and accurately. V. D ISCUSSION
One approach [22] makes the use of Convolutional Neuronal In the Figure 2 we enumerate the problems approached,
Networks (CNN) to detect malware from Adjacency Matrices additional procedures used on CFGs and detection components
of CFGs. Moreover, CFGs are being augmented with addi- for the studied methods in our proposed categories.
tional edges to instructions. Besides CNN, Graph Neuronal CFGs are useful structures when it comes to malware
Networks (GNN) and Graph Isomorphism Networks (GIN) analysis. With the need of automated detection methods for
are used. In this approach [33] Attributed CFGs are used with faster detection and response CFGs proved their versatility.
a GNN based Classifier to generate node embeddings which However, in the contrast of performance CFGs tend to be
are later used to an explainer model for detailed analysis. heavy and complex structures. Malware’s complexity is
determined by many factors such as the malware type, target
IV. S PECIALIZED D ETECTION platform and malware design. For example, IoT-specialized
Besides general detection of malware, control flow graphs malware generate less complex graphs while Android Malware
have seen usage in other sides of malware analysis and have more nodes and edges and topological differences [39].
detection. Some methods were specialized in detecting mal- Additionally, for Windows malware the number of nodes may
ware characteristics (such as metamorphism, attack techniques, range from a few hundreds to tens of thousands as Windows
infected code, etc.) while other approaches are specialized on malware has a bigger variety and due to the Windows’s
certain types of malware. system particularities and the Windows APIs, which justifies
For example malware mutation was treated early in the CFG the need of optimization methods. In every type of approach
detection methods timeline. As Polymorphic and Metamorphic we see a form of optimization by code or graph reduction.
malware represent an advanced threat that can evade hash- For example, for graph reductions we see methods such as
based signature detection, methods of based on CFGs show node merging [19], node removal by tree conversion [17].
promise. One strategy proposed by Bruschi et al. [4] consists Other approaches rely on code removal from the nodes for
detecting self-mutating malware based on CFG transforma- different purposes. Some approaches take measures in case the
tions and comparisons. The normalization process focuses on malware program employs anti-analysis techniques such as
414
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:26:19 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Category table for malware detection approaches on CFGs
obfuscation [4] [9] while other approaches remove most of the VI. C ONCLUSIONS
code for analyzing relevant instructions such as API calls [15].
In this study we categorized approaches of malware
detection based on CFGs in four broad categories. We built
We observe differences in these malware detection methods. a timeline of the proposed approaches to better view the
Mainly formal and algorithmic methods revolve over pattern trends and numbers. We also gathered information about their
matching detection [11]. As we are dealing with the concept inner workings, pointing out particularities of their additional
of graphs, subgraph isomorphism is also the detection procedures used and detection components. CFGs proved
component for malware [8] [4]. With the introduction of their performance and versatility in detecting malicious
Artificial Intelligence and machine learning models the programs and characteristics while being transmuted into
malware methods are now focused heavily on feature different models and objects. As the research trend is moving
extraction and malicious feature detection. The detection towards machine learning and deep learning approaches
component is focused on the classifier, the machine learning we expect in the future approaches the use of more graph-
model responsible for telling if the binary file is malicious or like neuronal networks architectures and more A.I based
not. Quality measures should be taken for the classifiers in models to assist in the additional procedures of these methods.
order to enhance their malware detection capabilities such as
adversarial example resistance and richer and balanced data In the future we propose to extend this study by highlight-
sets for different malware families and benign samples. ing graph properties used in malware analysis and detection
through these CFGs and evaluate the performance of each
CFG based approaches show promise in specialized category of approach with respect to what these approaches
detection, with approaches that prove detection of some are focusing on detecting as each method contains many
types of more advanced malware. Regarding malware types, particularities regarding the experimental phases.
algorithmic and formal approaches were addressing the trend
R EFERENCES
of threats at those times. For instance the focus was on
viruses, worms, metamorphic and polymorphic malware. [1] “Att&ck matrix for enterprise,” https://attack.mitre.org/, last accessed on
Now these approaches, along with machine-learning based 9th of June 2024.
[2] Y. Gao, Z. Lu, and Y. Luo, “Survey on malware anti-analysis,” in
ones, are taking a focus on new threats that emerged with the Fifth International Conference on Intelligent Control and Information
advancements of computer systems and devices. Specifically, Processing, 2014, pp. 270–275.
we see methods focused on Android and IoT malware. [3] D.-K. Chae, J. Ha, S.-W. Kim, B. Kang, and E. G. Im, “Software
plagiarism detection: a graph-based approach,” in Proceedings of the
Malicious characteristics are also being addressed with 22nd ACM international conference on Information & Knowledge Man-
such approaches but predominantly metamorphic [19] and agement, 2013, pp. 1577–1580.
polymorphic mechanisms [5] were the focus along with [4] D. Bruschi, L. Martignoni, and M. Monga, “Detecting self-
mutating malware using control-flow graph matching,” in
virus infections [8]. Other malware characteristics addressed International Conference on Detection of intrusions and
through these approaches seem to be cyberthreat related, such malware, and vulnerability assessment, 2006. [Online]. Available:
as the MITTRE ATT&CK Techniques Identification [27] and https://api.semanticscholar.org/CorpusID:6148086
[5] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna, “Poly-
Malware Homology [25]. morphic worm detection using structural information of executables,” in
Recent Advances in Intrusion Detection: 8th International Symposium,
415
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:26:19 UTC from IEEE Xplore. Restrictions apply.
RAID 2005, Seattle, WA, USA, September 7-9, 2005. Revised Papers 8. [26] H. Alasmary, A. Abusnaina, R. Jang, M. Abuhamad, A. Anwar,
Springer, 2006, pp. 207–226. D. Nyang, and D. Mohaisen, “Soteria: Detecting adversarial examples
[6] J. Shin and D. F. Spears, “The basic building blocks of malware,” in control flow graph-based malware classifiers,” in 2020 IEEE 40th
Technical Report, University of Wyoming, Tech. Rep., 2006. International Conference on Distributed Computing Systems (ICDCS).
[7] G. R. Thompson and L. A. Flynn, “Polymorphic malware detection IEEE, 2020, pp. 888–898.
and identification via context-free grammar homomorphism,” Bell Labs [27] J. Fairbanks, A. Orbe, C. Patterson, J. Layne, E. Serra, and M. Scheepers,
Technical Journal, vol. 12, no. 3, pp. 139–147, 2007. “Identifying att&ck tactics in android malware control flow graph
[8] G. Bonfante, M. Kaczmarek, and J.-Y. Marion, “Morphological detection through graph representation learning and interpretability,” in 2021 IEEE
of malware,” in 2008 3rd International Conference on Malicious and International Conference on Big Data (Big Data). IEEE, 2021, pp.
Unwanted Software (MALWARE). IEEE, 2008, pp. 1–8. 5602–5608.
[9] V. P., V. Laxmi, M. S. Gaur, G. P. Kumar, and Y. S. Chundawat, [28] B. Wu, Y. Xu, and F. Zou, “Malware classification by learning semantic
“Static cfg analyzer for metamorphic malware code,” in Proceedings and structural features of control flow graphs,” in 2021 IEEE 20th
of the 2nd International Conference on Security of Information International Conference on Trust, Security and Privacy in Computing
and Networks, ser. SIN ’09. New York, NY, USA: Association and Communications (TrustCom). IEEE, 2021, pp. 540–547.
for Computing Machinery, 2009, p. 225–228. [Online]. Available: [29] K. Bobrovnikova, S. Lysenko, B. Savenko, P. Gaj, and O. Savenko,
https://doi.org/10.1145/1626195.1626251 “Technique for iot malware detection based on control flow graph
[10] H. Guo, J. Pang, Y. Zhang, F. Yue, and R. Zhao, “Hero: A novel analysis,” Radioelectronic and Computer Systems, no. 1, pp. 141–153,
malware detection framework based on binary translation,” in 2010 2022.
IEEE International Conference on Intelligent Computing and Intelligent [30] Q. Sun, E. Abdukhamidov, T. Abuhmed, and M. Abuhamad, “Leverag-
Systems, vol. 1. IEEE, 2010, pp. 411–415. ing spectral representations of control flow graphs for efficient analysis
[11] S. Cesare and Y. Xiang, “Classification of malware using structured of windows malware,” in Proceedings of the 2022 ACM on Asia Confer-
control flow,” in Proceedings of the Eighth Australasian Symposium on ence on Computer and Communications Security, 2022, pp. 1240–1242.
Parallel and Distributed Computing-Volume 107. Citeseer, 2010, pp. [31] Y. Gao, H. Hasegawa, Y. Yamaguchi, and H. Shimada, “Malware
61–70. detection by control-flow graph level representation learning with graph
isomorphism network,” IEEE Access, vol. 10, pp. 111 830–111 841,
[12] Z. Zhao, “A virus detection scheme based on features of control flow
2022.
graph,” in 2011 2nd International Conference on Artificial Intelligence,
[32] ——, “Malware detection using attributed cfg generated by pre-trained
Management Science and Electronic Commerce (AIMSEC). IEEE,
language model with graph isomorphism network,” in 2022 IEEE 46th
2011, pp. 943–947.
Annual Computers, Software, and Applications Conference (COMP-
[13] F. Song and T. Touili, “Efficient malware detection using model-
SAC). IEEE, 2022, pp. 1495–1501.
checking,” in International Symposium on Formal Methods. Springer,
[33] J. D. Herath, P. P. Wakodikar, P. Yang, and G. Yan, “Cfgexplainer:
2012, pp. 418–433.
Explaining graph neural network-based malware classification from
[14] M. Eskandari and S. Hashemi, “Ecfgm: enriched control flow graph control flow graphs,” in 2022 52nd Annual IEEE/IFIP International
miner for unknown vicious infected code detection,” Journal in Com- Conference on Dependable Systems and Networks (DSN). IEEE, 2022,
puter Virology, vol. 8, pp. 99–108, 2012. pp. 172–184.
[15] P. Faruki, V. Laxmi, M. S. Gaur, and P. Vinod, “Mining control [34] X. Ling, L. Wu, W. Deng, Z. Qu, J. Zhang, S. Zhang, T. Ma, B. Wang,
flow graph as api call-grams to detect portable executable malware,” C. Wu, and S. Ji, “Malgraph: Hierarchical graph neural networks for
in Proceedings of the Fifth International Conference on Security of robust windows malware detection,” in IEEE INFOCOM 2022-IEEE
Information and Networks, 2012, pp. 130–137. Conference on Computer Communications. IEEE, 2022, pp. 1998–
[16] S. Cesare, Y. Xiang, and W. Zhou, “Control flow-based malware variant- 2007.
detection,” IEEE Transactions on Dependable and Secure Computing, [35] F. Ullah, S. Ullah, G. Srivastava, and J. C.-W. Lin, “Droid-mcfg: Android
vol. 11, no. 4, pp. 307–317, 2013. malware detection system using manifest and control flow traces with
[17] Y. Ding, W. Dai, S. Yan, and Y. Zhang, “Control flow-based opcode multi-head temporal convolutional network,” Physical Communication,
behavior analysis for malware detection,” Computers & Security, vol. 44, vol. 57, p. 101975, 2023.
pp. 65–74, 2014. [36] P. K. Tiwari, “Malware detection using control flow graphs,” in 2024
[18] G. Suarez-Tangil, J. E. Tapiador, P. Peris-Lopez, and J. Blasco, “Den- 2nd International Conference on Device Intelligence, Computing and
droid: A text mining approach to analyzing and classifying code struc- Communication Technologies (DICCT). IEEE, 2024, pp. 216–220.
tures in android malware families,” Expert Systems with Applications, [37] Y. Gao, H. Hasegawa, Y. Yamaguchi, and H. Shimada, “Malware self-
vol. 41, no. 4, pp. 1104–1117, 2014. supervised graph contrastive learning with data augmentation.”
[19] S. Alam, I. Traore, and I. Sogukpinar, “Annotated control flow graph [38] M. Christodorescu and S. Jha, “Static analysis of executables to detect
for metamorphic malware detection,” The Computer Journal, vol. 58, malicious patterns,” in 12th USENIX Security Symposium (USENIX
no. 10, pp. 2608–2621, 2015. Security 03), 2003.
[20] M. A. Atici, S. Sagiroglu, and I. A. Dogru, “Android malware analysis [39] H. Alasmary, A. Anwar, J. Park, J. Choi, D. Nyang, and A. Mohaisen,
approach based on control flow graphs and machine learning algo- “Graph-based comparison of iot and android malware,” in Computa-
rithms,” in 2016 4th International Symposium on Digital Forensic and tional Data and Social Networks, X. Chen, A. Sen, W. W. Li, and M. T.
Security (ISDFS). IEEE, 2016, pp. 26–31. Thai, Eds. Cham: Springer International Publishing, 2018, pp. 259–272.
[21] M. Leslous, V. V. T. Tong, J.-F. Lalande, and T. Genet, “Gpfinder:
tracking the invisible in android malware,” in 2017 12th International
Conference on Malicious and Unwanted Software (MALWARE). IEEE,
2017, pp. 39–46.
[22] M. H. Nguyen, D. Le Nguyen, X. M. Nguyen, and T. T. Quan, “Auto-
detection of sophisticated malware using lazy-binding control flow graph
and deep learning,” Computers & Security, vol. 76, pp. 128–155, 2018.
[23] H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina,
A. Awad, D. Nyang, and A. Mohaisen, “Analyzing and detecting
emerging internet of things malware: A graph-based approach,” IEEE
Internet of Things Journal, vol. 6, no. 5, pp. 8977–8988, 2019.
[24] J. Yan, G. Yan, and D. Jin, “Classifying malware represented as control
flow graphs using deep graph convolutional neural network,” in 2019
49th annual IEEE/IFIP international conference on dependable systems
and networks (DSN). IEEE, 2019, pp. 52–63.
[25] J. Liu, Y. Shen, and H. Yan, “Functions-based cfg embedding for
malware homology analysis,” in 2019 26th International Conference on
Telecommunications (ICT). IEEE, 2019, pp. 220–226.
416
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:26:19 UTC from IEEE Xplore. Restrictions apply.