Graph-Oriented Modelling of Process Event Activity For The Detection of Malware
Graph-Oriented Modelling of Process Event Activity For The Detection of Malware
Abstract—This paper presents an approach to malware detec- or modified malware variants, while heuristic-based methods
tion using Graph Neural Networks (GNN) to capture the complex may generate many false positives. This emphasizes the need
relationships and dependencies between different components for a more effective malware detection approach as malware
of an operating system (OS). Traditional methods for malware
detection rely on known signatures of malware and may fail to attacks have become more sophisticated and evasive, leading
detect new or modified malware variants. GNNs offer a promising credence to a need for a more robust and accurate malware
solution by analyzing graph-structured data and identifying detection approach.
malicious behavior patterns. Specifically, this paper investigates Graph Neural Networks (GNN) offer a promising solution
the use of GNNs for malware detection based on the API to this problem by combining the ability to process large
call sequences of different event types, including File System,
Registry, and File and Thread activity. The paper presents a
datasets with ease with discriminative power - all the while
representative dataset of host process activity of malware collected generalizing well to samples. GNNs are a type of neural
in a custom sandbox environment, comprising over 239 malware network architecture that have gained popularity in recent years
executions with randomly executed benignware samples. The due to their ability to process and analyze graph-structured
paper then describes the GNN model trained on the dynamic data, which are increasingly prevalent in various domains,
process behavior generated from process execution graphs, with
independent models developed based on each category of API
including malware detection [2]. GNNs can be used to leverage
events. Finally, the paper presents a trained model that maximizes the graph structure of an Operating System (OS) to capture
the generalization performance of the model, demonstrating the the interactions between different system components and
applicability of GNNs for malware detection. This paper presents identify malicious behavior patterns. OS’s are constantly under
one of the first applications of GNN classification based on process threat from malware, and traditional methods for malware
hierarchy during malware execution that includes interaction with
benignware as well .
detection often fail to keep up with the increasing sophistication
Index Terms—Malware Detection, Graph Neural Networks, of malware attacks. GNNs offer a promising approach for
Operating Systems, Application Programming Interface, Artificial detecting and classifying malware based on their behaviors, as
Intelligence, Security they can capture the complex relationships and dependencies
between different components of the system.
I. I NTRODUCTION One of such components is the event activity of a process,
Malware detection is important due to the devastating conse- which can include the networking activity, edits of the registry,
quences that malware can have on individuals, businesses, and or the creation of a new file on disk or a new thread or
governments. For instance, malware attacks can compromise Mutex. The motivation for investigating event activity is that
sensitive data, steal financial information, and disrupt critical the behavior of malware can be best characterized by the
infrastructure. According to Malwarebytes 71% of companies independent steps that it carries out, and this is best captured
worldwide were targeted by some for of ransomware attack in through investigation of the Application Protocol Interface
2022, and on average Malware variants such as Emotet has (API) sequence [3, 4]. More specifically, this study can reveal
cost state, local, tribal, and territorial governments up to 1 behavioral-interaction-fingerprints that interacting processes
million USD per incident to remediate [1]. have with the graph structure, which can be used to differentiate
Some of the shortcomings of traditional signature-based and and classify different processes. In this paper we explore
heuristic-based malware detection methods are that they rely on the use of GNNs for malware detection based on the API
known signatures of malware in order to detect future variants. call sequences of different event types. Our work makes the
For instance, signature-based methods may fail to detect new following overarching contributions:
• a representative dataset is developed of host OS process
† Corresponding Author; kbrezinski.github.io activity collected in a custom sandbox environment that
478
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
environment in Section III-A and how the malware samples executables. First, consider the set of malware samples M and
were collected; followed by information on the sampling benignware samples C. If each malware executable were to
procedure used to generate a representative dataset in Section be executed once, then M total trials or executable graphs are
III-B. This follows into discussions on the feature vectorization being populated with process activity. Let cm represent the
of the API sequence in Section III-C followed by the GNN subset of benignware executables cm ⊂ C for execution m. At
theory and architecture design in Section III-D and III-D, each trial m, the chance of drawing a particular benignware
respectively. sample is (1/|C|)E[pc ] with replacement; where E[pc ] is the
expectation of drawing samples with probability pc . Therefore
A. Sandbox Process Execution Collection for M trials the chances of not drawing a particular benign
Malware and benignware samples are collected in a custom sample is governed by Eq. 1; where the |C| − i term takes
sandbox environment as to simulate a real host environment. into account the fact that replacement can not occur as no
Careful attention is taken to ensure the dynamic behavior of executed can be executed twice on a clean snapshot. This
malware is captured and cached for downstream tasks. To exercise is simply to demonstrate the case that some benignware
this end, we capture process event activity belonging to File samples are not used to train the model based on the sampling
System, Process and Thread Activity, as well as Registry events. technique used. This provides a good generalization of potential
Networking was not included as Networking activity uses host environments without introducing bias into the dataset if
similar networking sockets as benignware, and many malware, the same processes were chosen manually for each malware
as a part of their anti-emulation procedure, fail to reach out execution graph.
to the network at all. In virtual environments many malware
E[pc ]
variants become Virtual Machine aware and fail to execute their 1
M 1− (1)
viral payloads; therefore, the signatures of the malware which
i=1
|C| − i
probe the current environment for signs of virtualization are as
part of the malicious payload as the payload itself. Additionally, In the sandbox environment the samples were automatically
we wish to capture and detect anomalous behavior before run through the use of a batch script which helps to automate
secondary payloads are fetched from the internet - which further the process and provide consistent executions between malware
complicates the process of labelling processes. For the sake of samples in terms of time window. A short script is used to
brevity, we refer readers to our previous work in [13] which run Procmon and load relevant configuration files and filters
provides a complete overview of the sandbox environment and to begin collection. The filter files (.pmc files) are used to
configuration. exclude some Procmon specific events from appearing in the
list of captured events, but all other events, including windows
B. malware Sampling and Run-time Configuration operating system behaviour, is captured. This ensures there is
The malware samples used for execution were drawn from no bias introduced in the collected event activity by the author.
a repository of recent malware samples obtained from Virus- A list of the rules used in the Procmon filter can be found in
Total1 . VirusTotal provides researchers a repository of 10’s of Table IV Appendix V-0c.
thousands of malware samples identified and bundled in the last
C. API Call Sequence Vectorization
quarter year to represent new and unique infections submitted
to VirusTotal through an Academic License. In this work 239 In this work process APIs were vectorized according to a
malware samples were retrieved from Q4 2022. This dataset of simple scheme combining both N -grams with tf-idf. To begin,
malware samples was filtered to remove non-Windows malware, N-grams looks at all the unique n number sequences of APIs,
and includes both 32-bit and 64-bit executables. A full list and creates a feature vector with the numeration of the unique
of malicious executables, complete with file sizes and MD5 sequence. N-grams improves on Bag of Words by accounting
hashes, can be found in the Github repository for this work. for pairs - in the case of bigrams - or trigrams in the case of 3,
Alongside malware, benignware is executed in tandem as to of unique API sequences of length n. These sequences are taken
simulate a real host environment. During malware execution, from the stack traces, where they appear in order from low-
3 - 5 processes are randomly selected from the benignware memory space to high-memory space (0xfffff) in the stack.
list and run sequentially. This is to populate the execution In Fig. 1 we see an illustration for a stack trace that has been re-
graph with noise and negative training samples, and ensures solved using the Windows Symbol Table to produce the names
the dataset mirrors the OS environment of a real host as closely of the API calls [3]. In this example LoadLibraryExa
as possible. In total, 300 benignware samples are collected and LoadLibraryA appear after one-another, therefore the
from the cnet.com Apps for Windows category representing tuple (LoadLibraryExa, LoadLibraryA) is added to
popular windows applications. All benignware was confirmed the corpus. Then the next element in the sequence would
to have an AV score of 0 according to VirusTotal. include BaseThreadInitThunk which would be added to
A systematic approach was used to stochastically sample the corpus with the previous element as (LoadLibraryA,
negative training examples from the benignware category of BaseThreadInitThunk). This captures all unique com-
binations of Windows APIs, and maintains some of the
1 www.virustotal.com/ order in the sequences. When considering trigrams we would
479
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
incorporate more information of the sequence by considering D. Introduction to Graph Neural Networks
all combination of three (n = 3) APIs in succession. Node feature vectors are prepared from a directional flow
graph G. We define V as the set of vertices on the graph
which represent spawned processes on a host OS. Each vertex
has a set of node features, h = {h̄1 , h̄2 , . . . , h̄N }, h̄i ∈ RF
where N is the number of nodes in the graph and F is the
number of features for each node. In Fig. 2 we observe four
nodes that are neighbors. For example, if h̄2 is the node under
consideration it shares a neighbour with h̄2 and h̄3 but not
with h̄4 . Therefore, for any ith node we want to consider the
influence of its neighboring nodes j ∈ Ni .
480
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
ancestor with p2i . As a result, all p2i processes that exist would
share the same embedding information from p0 , leading to
all the process nodes of p2n becoming saturated with similar
information. This would impact performance negatively, and
is best resolved through domain knowledge of the dataset
used and carefully considering the amount of capacity of your
model. A visualization of this effect is shown in Fig. 4. We
can see the difference in a 1-hop aggregation (Fig. 4(left)) and
Fig. 3. Creation of the node embeddings via the dot product between the
BoW API matrix and the weight matrix W. a 2-hop aggregation (Fig. 4(right)) whereby the malicious red
node p21 is smoothed over with information from p0 , leading
to a node embedding that is more similar to that of its benign
along the second axis, and LeakyReLU () applies the leaky counterpart p22 . This is illustrated by the proportion of red,
ReLU activation function. orange and green in its node embedding, which geometrically
would coincide with the vectors p21 and p22 being closer or
exp LeakyReLU w̄T Wh̄i ||Wh̄j farther apart in d-dimensional space.
aij = (3)
k∈Ni exp LeakyReLU w̄
T Wh̄ ||Wh̄
i k
481
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
TABLE I TABLE II
S UMMARY TABLE OF THE NUMBER OF API CALLS BELONG TO THE CORPUS S UMMARY NETWORK STATISTICS FOR THE SANDBOX M ALWARE
FOR DIFFERENT n GRAMS . T HE E VENT T YPE All REFERS TO A EXECUTION GRAPHS . VALUES ARE AVERAGED OVER ALL M ALWARE
COMBINATION OF R EGISTRY, F ILE AND T HREAD EVENT API S INTO A EXECUTIONS (n = 239).
SINGLE CORPUS .
Metric Value
Event 1-gram 2-gram 3-gram
# Nodes 40
Registry 1917 6517 10985 # Edges 112
File 2017 6018 9669 Avg. Node Degree 6.70
Thread 612 1506 2200 Min Node Degree 2.33
All 2921 10656 18267 Max. Node Degree 11
Avg. Degree Connectivity 4.75
Degree Centrality 0.14
Degree Assortativity -0.20
Density 0.07
Square Clustering 0.32
482
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
h) Square Clustering: This is a measure of the clustering TABLE III
S UMMARY MODEL PERFORMANCE METRICS FOR THE VALIDATION SET.
coefficient of the network. The clustering coefficient is a M ETRICS ARE COMPUTED OVER 20 ITERATIONS . E LEMENTS BOLDED
measure of the number of triangles in the network. In this SIGNIFY THE BEST PERFORMING MODEL FOR EACH DATASET. NAMING
case, the square clustering is 0.32, which means that there SCHEME model-l-d REFERS TO THE model (GAT - G RAPH ATTENTION ;
GCN - G RAPH C ONVOLUTION W / O ATTENTION ; ANN - L INEAR L AYER )
is a relatively high level of clustering in the network. This WITH l LAYERS AND d HIDDEN NEURONS IN EACH LAYER . U NLESS
follows from the Degree Connectivity metric that noted the OTHERWISE NOTED , THE MODELS USE THE F ILE S YSTEM EVENT TYPE
high inter-connectivity of the network. WITH A unigram VECTORIZATION WITH AN l = 2 AND d = 64.
483
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
graphs are populated with benignware activity that is either Behavior Identification. In Proceedings of the 2009 First
executed twice in two different graphs and some samples which International Workshop on Education Technology and
are not present at all. This has the effect of the model learning Computer Science - Volume 02, ETCS ’09, pages 198–
very different execution graphs which adds to the robustness of 202, USA, March 2009. IEEE Computer Society.
the model but also leads to large deviations between iterations. [5] Kenneth Brezinski and Ken Ferens. Metamorphic Mal-
Secondly, the largest source of model variance is the selection ware and Obfuscation -A Survey of Techniques, Variants
of the training and validation set. To overcome effects due to and Generation Kits. Security and Communication
sampling bias (i.e. the manual selection of samples to validate Networks, 2023.
on which introduces bias), the positive and negative training [6] Daniel Gibert, Carles Mateu, and Jordi Planes. The
examples that fall into the validation set are randomly selected rise of machine learning for detection and classification
for each of the iterations that a new model is trained. These of malware: Research developments, trends and chal-
iterations are different than epochs: where at each iteration a lenges. Journal of Network and Computer Applications,
new model is initialized with random weights, a randomized 153:102526, March 2020.
training and validation set, and a new random seed; whereas [7] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang,
during each epoch the model is trained and validated on the Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li,
same model and training and validation set. So for each iteration and Maosong Sun. Graph neural networks: A review of
the model experience is very different and this can lead to methods and applications. AI Open, 1:57–81, January
significant model variance. This is because depending on the 2020.
iteration, if a hard to classify malicious executable is presented [8] Shanxi Li, Qingguo Zhou, Rui Zhou, and Qingquan Lv.
in the validation set, it would not have been trained on in the Intelligent malware detection based on graph convolu-
training set, and thus be hard to classify. In another iteration tional network. J Supercomput, 78(3):4182–4198, 2022.
it would be present in the training set, in which case it would [9] Cagatay Catal, Hakan Gunduz, and Alper Ozcan. Malware
not belong to the validation set and the performance scores Detection Based on Graph Attention Networks for Intel-
would improve. Repeated iterations tend to smooth over this ligent Transportation Systems. Electronics, 10(20):2534,
effect as the effect averages out, but this presents a persistent January 2021. Number: 20 Publisher: Multidisciplinary
effect due to the nature of sampling over 200 independent Digital Publishing Institute.
process execution graphs with heterogeneous topologies. This [10] Petar Veličković, Guillem Cucurull, Arantxa Casanova,
work already applies Regularization and Early Stopping to Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph
ensure the validation performance is as close as possible to Attention Networks. arXiv:1710.10903 [cs, stat], Febru-
the training performance to avoid over-fitting [15, 16]. One ary 2018. arXiv: 1710.10903.
other remedy is to increase the size of the dataset which is [11] Tristan Bilot, Nour El Madhoun, Khaldoun Al Agha,
always a solution to problems with over-fitting leading to poor and Anis Zouaoui. A Survey on Malware Detec-
generalization in deep learning research. But largely due to tion with Graph Representation Learning, March 2023.
the time commitment and in other cases the cost of manually arXiv:2303.16004 [cs].
labelling training examples, this is infeasible and impractical [12] Youngjoon Ki, Eunjin Kim, and Huy Kang Kim. A
in practice [17, 18, 19]. Novel Approach to Detect Malware Based on API Call
Sequence Analysis. International Journal of Distributed
V. ACKNOWLEDGEMENTS Sensor Networks, 11(6):659101, June 2015. Publisher:
This research has been financially supported by Mitacs SAGE Publications.
Accelerate (IT15018) in partnership with Canadian Tire Cor- [13] Kenneth Brezinski and Ken Ferens. Sandy Toolbox: A
poration,and is supported by the University of Manitoba. Framework for Dynamic Malware Analysis and Model De-
velopment. In Transactions on Computational Science &
R EFERENCES Computational Intelligence. Springer Nature, SAM4213;
[1] Malwarebytes. 2023 State of Malware. Technical report, Accepted, In Press, 2021.
Santa Clara, CA, 2023. [14] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
[2] Yakang Hua, Yuanzheng Du, and Dongzhi He. Classifying Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser,
Packed Malware Represented as Control Flow Graphs and Illia Polosukhin. Attention is All you Need. In
using Deep Graph Convolutional Neural Network. In Advances in Neural Information Processing Systems,
2020 International Conference on Computer Engineering volume 30. Curran Associates, Inc., 2017.
and Application (ICCEA), pages 254–258, March 2020. [15] Hwanjun Song, Minseok Kim, Dongmin Park, and Jae-
[3] Kenneth Brezinski and Ken Ferens. Transformers - Gil Lee. How does Early Stopping Help Generalization
Malware in Disguise. In Transactions on Computational against Label Noise?, September 2020. arXiv:1911.08059
Science & Computational Intelligence. Springer Nature, [cs, stat].
ACC4507; Accepted, In Press, 2021. [16] Rich Caruana, Steve Lawrence, and C. Giles. Overfitting
[4] Cheng Wang, Jianmin Pang, Rongcai Zhao, Wen Fu, and in Neural Nets: Backpropagation, Conjugate Gradient,
Xiaoxian Liu. Malware Detection Based on Suspicious and Early Stopping. In Advances in Neural Information
484
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
Processing Systems, volume 13. MIT Press, 2000. inference the model output needs to be scaled by a proportional
[17] Xue-Wen Chen and Xiaotong Lin. Big Data Deep amount equal to x̃l := xl (1 − p) to account for the scaled
Learning: Challenges and Perspectives. IEEE Access, back activations which the model is accustomed with during
2:514–525, 2014. Conference Name: IEEE Access. training. The other considerations for model capacity are the
[18] Douglas Heaven. Why deep-learning AIs are so easy depth of the GNN n (illustrated in Fig. 4) and the number of
to fool. Nature, 574(7777):163–166, October 2019. hidden neurons in each layer d. A sample sequential layer for
Bandiera_abtest: a Cg_type: News Feature Number: a GNN which encodes information for the 2-hop neighborhood
7777 Publisher: Nature Publishing Group Subject_term: (n = 2) is shown in code listing 1. The code listing represents
Computer science, Information technology. a sample architecture in which all of the models outlines in
[19] John D. Kelleher. Deep Learning. MIT Press, September Table III is based upon.
2019. Google-Books-ID: b06qDwAAQBAJ.
GNN(
[20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian (layers): ModuleList(
Sun. Deep Residual Learning for Image Recogni- (1): GATConv(in_features=input_dims, out_features=d)
tion. arXiv:1512.03385 [cs], December 2015. arXiv: (2): LeakyReLU(alpha=0.2)
(3): Dropout(p=0.2)
1512.03385. (4): GATConv(in_features=d, out_features=d)
[21] Xavier Glorot and Yoshua Bengio. Understanding the (5): LeakyReLU(alpha=0.2)
(6): Linear(in_features=d, out_features=2)
difficulty of training deep feedforward neural networks. )
In Proceedings of the Thirteenth International Conference )
on Artificial Intelligence and Statistics, pages 249–256. Code Listing 1. Sequential GNN architecture for evaluating Malware detection
JMLR Workshop and Conference Proceedings, March performance for experimental results. input_dims is the size of the input
2010. ISSN: 1938-7228. corpus, which is tabulated in Table I; d is the number of hidden neurons, p is
[22] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya the dropout probability and alpha is the negative slope in the LeakyReLU
activation.
Sutskever, and Ruslan Salakhutdinov. Dropout: A Simple
Way to Prevent Neural Networks from Overfitting. Journal Additionally, the LeakyReLU activation function which was
of Machine Learning Research, 15:1929–1958, 2014. set with a negative slope α = 0.2; where the function evaluates
[23] Diederik P. Kingma and Jimmy Ba. Adam: A Method for to αx when the input value x < 0, otherwise it evaluates to
Stochastic Optimization. arXiv:1412.6980 [cs], January x in the piece-wise function. This function has the advantage
2017. arXiv: 1412.6980. of producing large gradients when needed similar to ReLU;
[24] Ilya Loshchilov and Frank Hutter. Decoupled Weight with the added advantage that gradients do not die out as the
Decay Regularization. arXiv:1711.05101 [cs, math], gradient can still recover when x < 0.
January 2019. arXiv: 1711.05101.
Performance Metrics
A PPENDIX A - G RAPH N EURAL N ETWORK A RCHITECTURE a) Loss: is defined in this work as the Cross-Entropy loss
AND T RAINING C ONFIGURATION which is used to measure the difference between the predicted
probability distribution and the true probability distribution of
Model Weights and Initialization
the target variables. The equation to calculate Cross-Entropy
Weight matrices were all uniformly distributed using He is shown in Eq. 5 for C classes for the true and predicted
initialization [20, 21]. He initialization uses a uniform distribu- probability yi and p(yi ), respectively, for class i. In this case
tion (Wi,jl
≈ N (μ = 0, σ 2 = 0.01)) to randomly initialize the yi is the one hot encoded label vector. Crossentropy and Eq.
weights in a range that is determined by the number of input 5 is a generalized form of the Binary Cross-Entropy, which
and output units in the layer. Additionally, Dropout is used only applies to predictions with 2 classes.
to regulate the network to prevent the model from over-fitting
[22]. Unlike in [10] where the authors implemented dropout C
of the original feature vector h̄j , in this work the sparsity of LCE = ci ∗ yi log(p(yj )) (5)
the vectorized API calls makes it so that dropout would have i=1
a minimal effect. Dropout is however applied after the linear b) Accuracy: in this work is calculated as the macro-
projection and after determining the attentions coefficients, as average of the accuracies whereby the arithmetic mean of each
this is typical in a deep neural network to regulate over-fitting individual class is taken into account. Therefore, for C classes,
and prevent the network from relying on any particular set of the macro-average accuracy is calculated according to Eq. 6.
weights. So for each forward propagation and at each layer, The expression for Eq. 6 has the advantage of calculating the
a binary mask is set for each input unit j, which is drawn accuracy for each class independently, meaning class-imbalance
with probability p where rjl ∼ Bernoulli(p) at each layer l. is accounted for in the metric according the number of training
Each input layer is then masked with rjl where x̃l := xl rjl to examples for each class Nc .
produce an output with fraction p of layers set to 0. Dropout
probability for this work was tested at p ∈ {0.2, 0.5}. Since
the model learns with a proportion of layers p set to 0, during Averagemacro = N1 ∗Acc1 +N2 ∗Acc2 , . . .+NC ∗AccC (6)
485
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.
c) F1-Score: is a performance metric, where F 1–Score ∈ TABLE IV
P ROCESS M ONITOR CONFIGURATION FOR FILTERING
(0, 1), that is used to evaluate the accuracy of a classifier. OUT RELEVANT EVENTS . W ITH THE EXCEPTION OF
It is defined as the harmonic mean of precision and recall, THE FINAL ENTRY, ALL OTHER ENTRIES ARE
where precision is the fraction of True Positive predictions DISALLOW FILTERS .
among all positive predictions, and recall is the fraction Entity Relation Value
of True Positive (TP) predictions among all actual positive Process Name is Procmon.exe
instances. F1-score can be calculated according to Eq. 7; where Process Name is Procexp.exe
Process Name is Autoruns.exe
precision = T P/(T P + F P ) and recall = T P/(T P + F N ); Process Name is Procmon64.exe
where FN is False Negative and FP is False Positive. This Process Name is Procexp64.exe
metric accounts for class imbalance, providing a meaningful Operation begins with IRP_MJ_
Operation begins with FASTIO_
indicator of model performance on both the malicious and
Result begins with FAST IO
benign training examples. Path ends with pagefile.sys
Path ends with $Mft
precision × recall Path ends with $MftMirr
F 1–Score = 2 × (7)
precision + recall Path ends with $LogFile
Path ends with $Volume
Training Configuration Path ends with $AttrDef
Path ends with $Root
The Adam optimizer was used as the training optimizer
Path ends with $Bitmap
which combines stochastic gradient descent with momentum Path ends with $Boot
with RMSProp [23]. Readers can refer to the original paper for Path ends with $BadClus
the procedure, or the Pytorch documentation for an overview Path ends with $Secure
Path ends with $UpCase
of the algorithm. The optimizer’s β parameters were set to Path ends with $Extend
(0.9, 0.999), with a learning rate η ∈ {5 × 10−2 , 1 × 10−2 , 5 × Event Class is Profilinga
10−3 }. L2 regularization was also implemented through weight a alternates between Process, Network, File System
decay λ at 10% of the learning rate [24]. So for a η of 5×10−3 and Registry depending on which events are to be
the decay rate would be 5 × 10−4 . In the Adam optimizer, the collected.
486
Authorized licensed use limited to: RV University. Downloaded on March 17,2025 at 05:31:34 UTC from IEEE Xplore. Restrictions apply.