Mal BERTv 2
Mal BERTv 2
cognitive computing
Article
MalBERTv2: Code Aware BERT-Based Model for Malware
Identification
Abir Rahali and Moulay A. Akhloufi *
Abstract: To proactively mitigate malware threats, cybersecurity tools, such as anti-virus and anti-
malware software, as well as firewalls, require frequent updates and proactive implementation.
However, processing the vast amounts of dataset examples can be overwhelming when relying solely
on traditional methods. In cybersecurity workflows, recent advances in natural language processing
(NLP) models can aid in proactively detecting various threats. In this paper, we present a novel
approach for representing the relevance and significance of the Malware/Goodware (MG) datasets,
through the use of a pre-trained language model called MalBERTv2. Our model is trained on publicly
available datasets, with a focus on the source code of the apps by extracting the top-ranked files that
present the most relevant information. These files are then passed through a pre-tokenization feature
generator, and the resulting keywords are used to train the tokenizer from scratch. Finally, we apply
a classifier using bidirectional encoder representations from transformers (BERT) as a layer within
the model pipeline. The performance of our model is evaluated on different datasets, achieving
a weighted f1 score ranging from 82% to 99%. Our results demonstrate the effectiveness of our
approach for proactively detecting malware threats using NLP techniques.
1. Introduction
Citation: Rahali, A.; Akhloufi, M.A.
MalBERTv2: Code Aware A series of studies have been conducted to tackle cybersecurity threats using machine
BERT-Based Model for Malware learning (ML) and deep learning (DL) tools. Multiple studies have been directed toward
Identification. Big Data Cogn. Comput. analyzing malware at static, dynamic, and hybrid levels [1] to extract diverse features,
2023, 7, 60. https://doi.org/ such as application programming interface (API) calls, permissions, and binaries. For
10.3390/bdcc7020060 instance, the use of DL algorithms has assisted security specialists in analyzing complex
cyberattacks. DL models comprise several layers of an ML algorithm that are capable
Academic Editors: Tim Schlippe and
Matthias Wölfel
of learning high-level abstractions from enormously complex datasets. This makes DL
algorithms more effective in identifying patterns and detecting malicious activities [2].
Received: 30 January 2023 Consequently, DL has enabled many cybersecurity companies to improve the accuracy of
Revised: 4 March 2023 their malware detection systems. With the exponential growth of NLP applications, DL
Accepted: 10 March 2023 algorithms, and specifically transformer-based (TB) architectures, have gained popularity
Published: 24 March 2023
in solving complex problems. These algorithms have shown significant advantages over
other traditional ML techniques due to their ability to learn the context of the data and their
capacity to process large datasets [3].
Copyright: © 2023 by the authors.
NLP is an area of artificial intelligence (AI) that works with language data that is
Licensee MDPI, Basel, Switzerland. based on text. It provides a multitude of tools for putting innovative cybersecurity-related
This article is an open access article solutions into practice. To find weaknesses in the infrastructure, NLP can find overlaps in
distributed under the terms and data from a company’s tech stack and threat streams. The ultimate goal of NLP is to locate
conditions of the Creative Commons the keyword, read the text, and comprehend any relevant context. Despite the fact that all
Attribution (CC BY) license (https:// of these tasks are now manual, an automated system is desperately needed. Recently, using
creativecommons.org/licenses/by/ attention weights [4] in language modeling improved the transfer learning tasks in AI-
4.0/). based domains. Bidirectional encoder representation from transformers (BERT) [5] is one of
2. Related Work
One of the major threats on the internet today is malicious software attacks. Malware
detection, visualization, and classification are one of the main areas of research to solve this
issue. Today’s era requires a system for the automatic classification of malware without
de-compiling or obfuscating the code. This paper review DL methods used to classify
malware in NLP. These methods focus on parsing and extracting useful information from
natural languages to simplify human–computer interaction. The key to the success of
NLP in cybersecurity is the availability of large datasets. Textual data in cybersecurity
come from various sources, such as emails [8,9], transaction logs from different systems,
source code and online social networks. Using NLP techniques has a direct impact on
situational awareness from logs of different network events and user activities. Several
methods are implemented for representing text in digital form, such as vector space models
and distributed representation. Encoding text at the word or character level comprises
preprocessing, followed by encoding as an initial step. This includes data cleaning and
the transformation of unnecessary and unknown words or characters. Non-sequential and
sequential inputs are the two main types of text representation. Bag of words (BoWs) [10],
term-document matrices (TDMs) [11] and term frequency-inverse document frequency
(TFIDF) matrices belong to the non-sequential representation. N-gram, Keras embedding,
Word2vec [12], Neural-Bag-of-words, and FastText [13] belong to sequential representation,
which can extract similarities in word meaning. In the cybersecurity domain, capturing
sequential information is more important than word sense similarities, as most data contain
temporal and spatial information. Therefore, DL approaches are adopted for effective
malware detection.
Malicious software attacks are a significant threat on the internet, and malware de-
tection, visualization, and classification are important areas of research in addressing this
issue. With the increasing need for automatic malware classification without de-compiling
or obfuscating the code, deep learning (DL) methods have been applied to classify malware
in natural language processing (NLP). These methods focus on parsing and extracting
useful information from natural language to facilitate human–computer interaction. The
availability of large datasets is crucial for the success of NLP in cybersecurity, as textual data
Big Data Cogn. Comput. 2023, 7, 60 3 of 33
in cybersecurity come from diverse sources, including emails, transaction logs, source code,
and online social networks. Various methods for representing text in digital form, such
as vector space models and distributed representation, are implemented. Preprocessing
involves encoding text at the word or character level, which includes data cleaning and
transformation of unnecessary and unknown words or characters. Two main types of text
representation are non-sequential and sequential inputs. While non-sequential representa-
tion techniques, such as bag of words, term document matrices, and term frequency-inverse
document frequency matrices are useful, sequential representation techniques, such as N-
gram, Keras embedding, Word2vec, neural bag of words, and FastText, are better suited for
capturing sequential information that is essential in the cybersecurity domain. Therefore,
DL approaches are adopted for effective malware detection.
Table 1 shows a comprehensive overview of malware-related works that have uti-
lized natural language processing (NLP) and deep learning models. The table contains
information on the methods used, authors, descriptions, data types, and highlights of
each work. The methods used include various tokenization and embedding techniques,
pre-trained models, and customized learning models. The data types include malware and
goodware samples, as well as URLs and executable files. These works demonstrate the
effectiveness of NLP and deep learning models in malware detection and classification
tasks. The use of pre-trained models and customized learning models has shown promising
results in identifying different types of malware with high accuracy. Additionally, the use
of attention-based mechanisms and GAN-based methods has improved the ability of these
models to extract meaningful features from the data. Despite these advancements, there are
still challenges to overcome, including the limitation of the maximum sequence length and
the lack of benchmarks for malware/goodware identification. Further research is needed
to address these challenges and improve the performance of these models. Overall, the
works listed in this overview provide a valuable resource for researchers and practitioners
in the field of cybersecurity.
Big Data Cogn. Comput. 2023, 7, 60 4 of 33
Table 1. List of NLP and deep learning models and methods used for malware-related tasks.
Table 1. Cont.
Table 1. Cont.
Table 1. Cont.
identify the danger. By producing different network locations, DGA is utilized to confirm
the responsible points to the command-and-control servers. The weight of the collected
deep information from the domain names is assigned by the attention layer. The CNN
and BiLSTM neural network layers extract the features from the domain sequence data.
Lao et al. [35] proposed DeepRan, which uses a fully connected layer and an attention-
based BiLSTM for the classification of ransomware. By adding a conditional random field
(CRF) model to the attention-based BiLSTM, it additionally labels anomalous activity as
one of the potential ransomware attacks. They took high-dimensional host logging data
and extracted semantic information using the term frequency-inverse document frequency
(TFIDF) approach. The attention mechanism is the main architectural emphasis of TB
models, and the attention block performs its computations repeatedly in parallel. Each of
these is called an attention head.
Figure 1. Overview of the data collection process. We collected the goodware APKs from Google
Play [40] and set two levels of preprocessing. Level 2 handling the data extracted from the collection
of the state-of-the-art sets, we passed these data through VirusTotal [41] to check the labeling, then
we extracted the Mani f est.xml files. Level 1 handling feature-based datasets (FBs), where we passed
directly to the reformatting phase. The final samples are text files for each sample.
• Second, depending on the extraction method they suggested, the dataset authors share
the preprocessed features. These characteristics were primarily displayed as CSV files.
Figure 2. Overview of the proposed MalBERTv2 approach, Level 1 handling feature-based datasets,
where reformatting of the samples is performed. Level 2 is handling the data extracted from the
collection of state-of-the-art sets. The pretraining dataset is used to train the tokenizer. We set the
predicted class probability threshold to 0.5. We label the classes as 1 for malware and 0 for goodware.
We configured our feature generator on two levels to handle the various formats in
order to cover larger datasets. The first stage in NLP analysis is the extraction of features
and representative keywords. TFIDF is the keyword extraction algorithm that is most
frequently employed. A variety of techniques are available, ranging from tokenization
utilizing learned language models, such as BERT, to word embeddings. In order to assess
the performance of our suggested strategy and compare it to it, we built several degrees
of feature representation in this work. In addition to developing our own unique feature
generator, we used TFIDF, Fasttext word embeddings trained on Wikipedia datasets, and
the pre-trained BERT representation.
3.2.1. Tokenization
During the preprocessing and tokenization phase, raw text is first split into words or
subwords, which are then converted to unique integer IDs through a lookup table. Tok-
enization is a crucial step in natural language processing and machine learning tasks that
involve text. There are three main types of tokenizers used in transformer-based models,
which include byte-pair encoding (BPE) [42], WordPiece (linear time WordPiece tokeniza-
tion and fast WordPiece tokenization [43]), and SentencePiece [44]. The BertTokenizer class
provides a higher-level interface that includes the BERT token splitting algorithm and a
WordPieceTokenizer. It takes sentences as input and returns token IDs. One limitation of
BERT is the maximum sequence length of 512 tokens. Sequences shorter than the maximum
length require padding with [ PAD ] tokens, while longer sequences must be truncated. Our
previous work [7] addresses this limitation. Contextualized embeddings in BERT provide
a representation of words that depends on the sentence’s position, leading to distinct
clusters corresponding to word senses. This characteristic showed success in word sense
disambiguation tasks. However, the extent to which BERT can capture patterns in malware
datasets requires further investigation.
Table 2. Example of sample after applying the preprocessing without the occurrences remover
module.
Next, BPE creates a base vocabulary comprising all symbols that occur in the set
of unique words and learns to merge rules to form a new symbol from two symbols in
the base vocabulary. BPE is already used by the BERT tokenizer, so our pre-tokenizer
is a first level-specific word moderator. The proposed tokenizer, as shown in Figure 3
applies preprocessing methods to clean the code text and keeps only the useful keywords.
Additionally, we use the MaxMatch algorithm, as shown in the algorithm of the MalBERTv2
tokenizer. Then, an occurrences remover is added. This module is detailed in Section 4.
The full algorithm of the tokenizer is presented in Algorithm 1.
Figure 3. Overview of the feature creation phase using the feature generator then applying the
MalBERTv2 tokenizer.
Big Data Cogn. Comput. 2023, 7, 60 13 of 33
Algorithm 1 Proposed feature generator pre-tokenizer. We used both the MaxMatch [45]
algorithm and BPE [42] in the tokenization process. We collected the given dictionary
manually after processing the unique words in the collected pretraining datasets.
1: procedure S EGMENT STRING C INTO UNIQUE WORD LIST W USING DICTIONARY D.
2: C ← input string
3: W ← output tokens list
4: D ← given dictionary
5: BPE ← Byte-Pair Encoding
6: OR ← OccurrencesRemover Module
7: Loop:
8: while C is not empty do
9: Find longest match w in D from start of C
10: Condition:
11: if w is not empty then
12: C ← C − w.
13: W ← W + w.
14: else
15: Remove first character from C and add to W.
16: end if
17: BPE(W).
18: end while
19: return OR(W).
20: end procedure
Figure 4. Percentage % of embeddings coverage for the whole datasets samples text after preprocessing.
Big Data Cogn. Comput. 2023, 7, 60 14 of 33
Figure 5. Percentage % of embeddings coverage for the datasets’ vocabulary after preprocessing.
Figure 6. Overview of the pre-training proposed approach based on BERT-like block embeddings
and representations. N is the number of transformer blocks. For the classification, we use only the
encoder block of the transformer architecture.
The model takes the generated features and fine tunes the model. The BERT base
model uses transformer blocks and a number of self-attention heads. For every input
Big Data Cogn. Comput. 2023, 7, 60 15 of 33
token in a sequence, each head computes key, value, and query vectors used to create a
weighted representation. The outputs of all heads in the same layer are combined and
run through a fully connected layer. Each layer is wrapped with a skip connection and
followed by layer normalization. The first layer of BERT receives as input a combination
of the token, segment, and positional embeddings. Let X be the total number of instances
with M malware and G goodware samples, where the M samples possess a label L = 1
denoting malware and the G samples from X possess the label L = 0 denoting goodware
or benign. All X samples are extracted as a full-text file containing the features. We applied
the level 2 feature generator on the features. Finally, the samples are represented as text file
F representations.
MalBERT focuses on attention layers as presented in Figure 7. Attentions see their
input as a set of vectors, with no sequential order. This model also does not contain any
recurrent or convolutional layers. Transformers are built to process sequential input data,
much like recurrent neural networks (RNNs). However, unlike RNNs, transformers do
not process data in order but use positional encoding (PE). PE is added to give the model
some information about the relative position of the tokens in the sentence. The PE vector is
added to the embedding vector. Embeddings represent a token in a d-dimensional space
where tokens with similar meaning will be closer to each other. However, the embeddings
do not encode the relative position of tokens in a sentence. The formula for calculating the
PE is as shown in Equations (1) and (2). The attention function used by the transformer
takes three inputs: Q(query), K (key), V (value) as shown in Equation (3). The dot product
attention is scaled by a factor of the square root of the depth. This is done because, for large
values of depth, the dot product grows large in magnitude, pushing the softmax function
to where it has small gradients. BERT uses the Adam optimizer with a custom learning
rate schedule, as shown in Equation (4):
QK T
Attention( Q, K, V ) = so f tmaxk √ V (3)
dk
lrate = d− 0.5
model ∗ min( step_num
−0.5
, step_num · warmup_steps−1.5 ) (4)
Figure 7. Overview of the attentions mechanism used in MalBERT, where h is the number of heads.
Big Data Cogn. Comput. 2023, 7, 60 16 of 33
4. Implementation
In this section, we detail the implementation steps for our proposed approach. We
start by the dataset details, then the feature engineering steps, and finally we detail the
occurrences removal method.
4.1. Datasets
The performance of the pre-trained models is largely determined by the scale and
quality of datasets used for training. Therefore, we need a large-scale scanned document
dataset to pre-train the MalBERTv2 model. We finally selected 85,000 apps from the
following datasets distributed as Figure 8 shows.
• AMD dataset [47]. It contains 24,553 samples categorized in 135 varieties among 71
malware families ranging from 2010 to 2016. The dataset is publicly shared with the
research community.
• Drebin dataset [48]. It contains 5560 applications from 179 different malware families.
The samples were collected from August 2010 to October 2012 and were made available
by the MobileSandbox project [48]. The authors made the datasets from the project
publicly available to foster research on AM and to enable a comparison of different
detection approaches.
• VirusShare dataset [49]. It is a repository of malware samples that gives security
researchers access to live malicious code samples. It is a static data dataset obtained
from the VirusShare malware repository. The samples were gathered using the AM
datasets debiasing guidelines [50].
• Androzoo dataset [51]. It is a growing collection of Android apps gathered from
various sources, including the official Google Play app market, with the goal of
facilitating Android-related research. It currently contains over 15 million APKs, each
of which was or will be analyzed by tens of different anti-virus products to determine
which applications are malware. Each app contains more than 20 different types of
metadata, such as VirusTotal reports.
Fine tuning datasets can be summarized into two types: state-of-the-art datasets,
including MixG-Androzoo, MixG-VirusShare, MixG-AMD, and MixG-Derbin, and feature-
based datasets (FBs), including D01, D02, D03, D04, D05.
Big Data Cogn. Comput. 2023, 7, 60 17 of 33
• Android permissions and API calls during dynamic analysis [52]. This dataset includes
50,000 Android apps and 10,000 malware apps gathered from various sources. We
note this dataset as D01.
• Android malware detection [53]. These data contain APKs from various sources,
including malicious and benign applications. They were created after selecting a
sufficient number of apps. Using the pyaxmlparser and Androguard [54] framework,
we analyze each application in the array. On the set of each feature, we used a binary
vector format, and in the last column labeled class, we marked it 1 (Malicious) or 0
(Benign). We note this dataset as D02.
• Android malware dataset for machine learning [55]. These data contain 215 feature
vectors extracted from 15,036 applications: 5560 malware apps from the Drebin project
and 9476 benign apps. The dataset was used to develop and test the multilevel
classifier fusion approach for AM detection. The supporting file contains a more
detailed description of the feature vectors or attributes discovered through static code
analysis of Android apps. We note this dataset as D03.
• Android malware and normal permissions dataset [56]. These data contain 18,850
normal android application packages and 10,000 malware android packages, which
are used to identify the behavior of malware applications on the permission they need
at run time. We note this dataset as D04.
• Android permission dataset [57]. These data contain android apps and their per-
missions. They are classified as 1 (Malicious) or 0 (Benign). We note this dataset as
D05.
We indeed collected samples from the state-of-the-art datasets as suggested by DADA [58]
labeling guideline. These samples are a mix of goodware and malware samples. The
authors proposed these APK lists to solve the problem of biased malware datasets. The col-
lected datasets are, namely MixG-Androzoo, MixG-VirusShare, MixG-AMD, and MixG-
Derbin. Figure 9 shows the distribution of all the test datasets and their main features also
the publication dates of the apps.
preprocessing phase. In this step, we apply specific cleaning of the not important, mostly
repeated words. We manually analyzed different examples and created a list of words and
expressions that do not provide additional information. The purpose of the preprocessing
is to reduce the size of the input to the limit of tokens specified by the transformer. The final
dataset format has three columns, the ID column, represented by the APK hash name, the
text column representing the manifest files after preprocessing, the label column, a binary
format equal to 1 if the app is malware, and 0 if not, while for the rest of the datasets, since
the CSV of the features are binary. We concatenate the names of the features into the text if
the feature exists.
5. Experimental Results
This section focuses on the evaluation of the conducted experiment’s results using
proposed evaluation metrics. To compare our proposed approach, we define baselines
based on related studies. Evaluation metrics are defined to assess the performance of the
proposed approach and the baselines. The chosen metrics are carefully selected to ensure
that they accurately reflect the performance of the models. Finally, the experimental results
are presented and analyzed to evaluate the proposed approach’s effectiveness in detecting
malware samples.
5.1. Baselines
In this study, we compare the performance of our MalBERTv2 model to several state-
of-the-art malware detection models previously reported in the literature. Specifically, the
models selected for comparison include an SVM model with TFIDF feature representation,
a CNN model with Fasttext pre-trained embeddings, MalBERTv1, a transformer layer
model with TFIDF features, and our proposed MalBERTv2 model. The chosen models
represent a variety of machine learning approaches and feature representations commonly
employed in malware detection tasks. By comparing the performance of our model to these
established approaches, we aim to provide a comprehensive evaluation of its effectiveness
and highlight any potential advantages or limitations.
5.1.3. MalBERTv1
MalBERT is a fine-tuned BERT model that is specifically designed for malware classifi-
cation. The model is trained on a large corpus of malware and goodware samples, with the
goal of identifying and differentiating between the two classes. MalBERT achieves state-of-
the-art performance on several benchmark datasets, including Androzoo, Derbin, AMD,
and VirusShare. MalBERT’s architecture is based on BERT, a pre-trained transformer-based
language model that is widely used in natural language processing tasks. However, unlike
BERT, MalBERT is trained on a dataset of malware and goodware samples, which makes it
more effective for malware classification tasks. MalBERT’s training process involves fine
tuning the pre-trained BERT model on a large corpus of malware and goodware samples,
followed by training a classification layer on top of the fine-tuned BERT model. The Mal-
BERT model has several advantages over traditional signature-based malware detection
methods. It can detect previously unknown types of malware and is effective in identifying
different types of malware with high accuracy. Furthermore, the model can handle large
amounts of unstructured text-like data, making it a useful tool for cybersecurity profession-
als. Figure 12 that shows an overview of the MalBERTv1 methodology steps is a graphical
representation of the process of identifying and classifying malware samples using the
MalBERTv1 approach and Figure 13 show the model fine-tuning process in MalBERTv1.
In our study, we utilized the MalBERT implementation described in [7]. The MalBERT
model fine tunes the BERT model using a specific feature representation at the beginning
of the process. To address the 512-token limit of BERT, our approach reorders the text by
prioritizing the most important words and selecting the first 512 tokens. BERT [5] is a
neural network architecture that consists of a stack of transformer blocks. The transformer
blocks utilize self-attention to establish relationships between words in the input sequence
and generate meaningful embeddings.
both the encoder and decoder of our proposed MalBERTv2 model. As our data domain
is application specific, it is essential to include relevant words and symbols related to
application descriptions, while avoiding irrelevant general vocabulary from other domains.
To achieve this, we created a code-aware specific vocabulary for the tokenizer. The training
phase of our model can be described in four main steps:
1. Apply the feature generator on the dataset and use the generated features as input to
the tokenizer. The feature generator can be considered as an initial tokenizer since it
applies tokenization to the original text to obtain the most relevant and English-related
words without losing the most important keywords in the files.
2. Create and train a byte-level byte-pair encoding tokenizer with the same special
tokens as RoBERTa.
3. Train the defined RoBERTa model from scratch using masked language modeling
(MLM).
4. Save the tokenizer to map the features of the test datasets later to fine-tune the
MalBERTv2 classifier.
TP + TN
ACC = (5)
TP + TN + FP + FN
TP
Precision = (6)
TP + FP
TP
Recall = (7)
TP + FN
Big Data Cogn. Comput. 2023, 7, 60 23 of 33
2 ∗ TP
F1 = (8)
2 ∗ TP + FP + FN
TP ∗ TN − FP ∗ FN
MCC = p (9)
( TP + FP) · ( TP + FN ) · ( TN + FP) · ( TN + FN )
Table 3. Table of the metrics results for the models for the state-of-the-art collected test sets.
Model Data Accuracy f1 (mc) mcc Precision (mc) Recall (mc) auc
MixG-Androzoo 0.969589 0.969805 0.939478 0.957895 0.982014 0.969655
MixG-VirusShare 0.858803 0.857143 0.717704 0.864964 0.849462 0.858778
TFIDF + SVM
MixG-AMD 0.931127 0.932751 0.863799 0.906621 0.960432 0.931283
MixG-Derbin 0.935599 0.935018 0.871212 0.938406 0.931655 0.935578
MixG-Androzoo 0.953488 0.955403 0.906829 0.958692 0.952137 0.953554
MixG-VirusShare 0.801609 0.763326 0.644415 0.962366 0.632509 0.803596
Fasttext + CNN
MixG-AMD 0.927549 0.933113 0.856619 0.902556 0.965812 0.925683
MixG-Derbin 0.929338 0.930396 0.860468 0.96 0.902564 0.930644
MixG-Androzoo 0.975689 0.976183 0.971568 0.98651 0.966068 0.976158
MixG-VirusShare 0.924039 0.924712 0.84808 0.927176 0.922261 0.92406
MalBERT
MixG-AMD 0.970483 0.971478 0.941157 0.982517 0.960684 0.970961
MixG-Derbin 0.966905 0.968076 0.933909 0.977352 0.958974 0.967292
MixG-Androzoo 0.9558 0.954981 0.943421 0.952922 0.963211 0.954092
TFIDF
MixG-VirusShare 0.9231125 0.925467 0.884563 0.923224 0.927892 0.92343
+
Transformer From Scratch MixG-AMD 0.9576809 0.9540987 0.945896 0.977345 0.973099 0.967554
MixG-Derbin 0.9567821 0.9560983 0.958763 0.967812 0.964398 0.968989
MixG-Androzoo 0.990744 0.998341 0.991149 0.99765 0.999033 0.998901
MalBERTv2
MixG-VirusShare 0.956782 0.957819 0.945887 0.957164 0.956292 0.944226
=
FeatureAnalyzer + MalBERT MixG-AMD 0.988787 0.989742 0.961892 0.999834 0.988987 0.985977
MixG-Derbin 0.988954 0.989645 0.974889 0.998328 0.978884 0.987329
Table 3 presents the evaluation results of five models (TFIDF + SVM, Fasttext + CNN,
MalBERT, TFIDF + Transformer From Scratch, and MalBERTv2) for malware identification
using four datasets (MixG-Androzoo, MixG-VirusShare, MixG-AMD, and MixG-Derbin).
The metrics used to evaluate the models were data accuracy, F1-score, Matthews correlation
coefficient (MCC), precision, recall, and area under the curve (AUC). Overall, MalBERTv2
had the highest performance in all the datasets, with an average accuracy of 97.1%, f1 score
of 97.2%, MCC of 95.8%, precision of 98.4%, recall of 96.7%, and AUC of 98.6%. MalBERTv2
outperformed all other models in terms of accuracy, f1 score, precision, and recall. TFIDF +
Transformer From Scratch had the second-best performance, followed by MalBERT, Fasttext
+ CNN, and TFIDF + SVM. It is worth noting that the MalBERTv2 model was fine tuned on
the mixed collected dataset, which could have contributed to its superior performance in
the evaluation. Additionally, the dataset split ratios were 90%:10% for training–validation
and testing, which could also have an impact on the performance of the models. Overall,
the table presents a clear and organized comparison of the performance of five models for
malware identification, providing valuable insights for researchers in the field.
Big Data Cogn. Comput. 2023, 7, 60 24 of 33
Table 4 presents the performance metrics of different deep learning models applied
to classify malware and goodware samples in five collected datasets. MalBERTv2, which
combines feature analysis with the MalBERT model, achieved the best performance on
all five datasets, with high accuracy, F1-score, precision, recall, and area under the curve
(AUC) values. The traditional machine learning approach, TFIDF with SVM, showed
lower performance than the deep learning models. The Fasttext + CNN model achieved
high accuracy and F1-score on some datasets but relatively low precision and recall rates.
The MalBERT model alone performed well on some datasets, but its performance varied
depending on the dataset. Overall, the results suggest that combining deep learning models
with feature analysis can improve the performance of malware detection systems, and
MalBERTv2 is a promising approach in this regard.
Table 4. Table of the metrics results for the models for the feature-based collected test sets.
Model Data Accuracy f1 (mc) mcc Precision (mc) Recall (mc) auc
D01 0.570881 0.721393 0.188245 0.653153 0.805556 0.427469
D02 0.582226 0.728216 0.122358 0.576176 0.989259 0.520577
TFIDF + SVM D03 0.582226 0.728216 0.122358 0.576176 0.989259 0.520577
D04 0.814212 0.838955 0.619591 0.832186 0.845834 0.808882
D05 0.599373 0.74552 0.123743 0.641096 0.89058 0.463673
D01 0.617084 0.727308 0.158007 0.627276 0.865297 0.562497
D02 0.886327 0.989815 0.872379 0.876578 0.844387 0.834099
Fasttext + CNN D03 0.681754 0.628078 0.59842 0.586972 0.614239 0.607119
D04 0.888516 0.889971 0.876653 0.883587 0.896437 0.887252
D05 0.664133 0.798173 0.562563 0.664133 0.758021 0.758021
D01 0.694449 0.734708 0.656146 0.698335 0.951598 0.615904
D02 0.799747 0.799775 0.799485 0.699775 0.899775 0.899743
MalBERT D03 0.798821 0.79815 0.797286 0.698766 0.897535 0.898479
D04 0.899875 0.79989 0.699745 0.79978 0.899855 0.898855
D05 0.759333 0.794697 0.659333 0.679373 0.669332 0.568801
D01 0.623949 0.664798 0.554896 0.688923 0.551898 0.593549
D02 0.783359 0.742259 0.669237 0.682342 0.789149 0.778833
TFIDF
+ D03 0.824719 0.829188 0.779938 0.738336 0.793799 0.812268
Transformer From Scratch
D04 0.903338 0.894773 0.823442 0.813492 0.813457 0.848735
D05 0.775727 0.764993 0.754489 0.749271 0.773246 0.735923
D01 0.824623 0.793342 0.784459 0.824836 0.821458 0.813454
D02 0.883678 0.857334 0.782653 0.773456 0.889922 0.879653
MalBERTv2
= D03 0.894577 0.889882 0.848883 0.928921 0.893939 0.881948
FeatureAnalyzer + MalBERT
D04 0.937643 0.894388 0.799922 0.923562 0.973252 0.928798
D05 0.834465 0.834781 0.835549 0.872873 0.873984 0.833654
Figure 14. MalBERTv2 Collected malware datasets from Androzoo, Derbin, AMD and VirusShare
general entities knowledge graph.
Big Data Cogn. Comput. 2023, 7, 60 27 of 33
Figure 15. Example of a single connection, of the MalBERTv2 malware dataset knowledge graph for
"ConfigChanges" entity.
Figure 16. The general entities knowledge graph MalBERTv2 collected goodware dataset for third
party providers.
Big Data Cogn. Comput. 2023, 7, 60 29 of 33
Figure 17. Example of the same connection by the "ConfigChanges" entity for the MalBERTv2
goodware dataset knowledge graph.
6. Conclusions
In this paper, we proposed a novel approach to tokenize the data sources extracted
from malware and goodware datasets. This approach helps to focus on the relevance and
significance of unstructured code from different programming languages that presents
subjunctive importance to the app features. We believe that our approach could address
the problem of processing a massive amount of unstructured text-like malware/goodware
samples for the cybersecurity domain. The idea was to use a feature generator to play the
role of a pre-tokenizer for our main classification. We applied this code-aware tokenization
process during training and then in the testing phase for mapping the features. The novel
feature representation considers the software applications’ source code as a set of features.
We apply text preprocessing on these features to keep the important information, such as
permissions, intents, and activities. We trained from scratch our BERT-based tokenizer
with the extracted features of 85,000 samples from different datasets, normally Androzoo,
Derbin, AMD, VirusShare and a collection of goodware samples, where the list is provided
by DADA [58]. Finally, we trained the MalBERTv2 classifier; it has a BERT layer block with
Big Data Cogn. Comput. 2023, 7, 60 30 of 33
the same weights as the pre-trained BERT, and we added a fully connected layer for the
prediction probabilities. Combining all these pieces, we present MalBERTv2, an end-to-end
malware/goodware language model for malware classification. The combination of the
proposed methods had interesting results. Because of the constraints, such as the lack
of benchmarks for malware/goodware identification, we could not effectively compare
the model with existing methods. Meanwhile, we tried to select the best approaches
based on the previous works and set it as baselines for comparison. Additionally, for
numerous datasets, researchers provided the extracted features not in the format of logs.
We reformatted the ones that include the feature names and occurrences, but the rest of
datasets that do not fit our requirements were eliminated at the datasets collection step.
Besides addressing these flows, in the future, we will extend the research by improving the
full platform execution time and reducing the feature generation and prediction process.
Finally, the whole system could run a full cycle in less time.
Author Contributions: Conceptualization, M.A.A. and A.R.; methodology, A.R. and M.A.A.; vali-
dation, A.R. and M.A.A.; formal analysis, A.R. and M.A.; writing—original draft preparation, A.R.;
writing—review and editing, M.A.A.; funding acquisition, M.A.A. All authors have read and agreed
to the published version of the manuscript.
Funding: This research was enabled in part by support provided by the Natural Sciences and
Engineering Research Council of Canada (NSERC), funding reference number RGPIN-2018-06233.
Data Availability Statement: All data used in this work are available publicly. See Section 4.1 for the
sources of the used data.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
AM Android Malware
API Application programming interface
APK Android Package File
BERT Bidirectional Encoder Representations from Transformers
BiLSTM Bidirectional Long Short-Term Memory
BoW Bag of Words
BPE Byte Pair Encoding
CNN Convolutional Neural Networks
CRF Conditional Random Field
DL Deep Learning
FB Feature-Based
GAN Graph Attention Network
HIN Heterogeneous Information Network
MG Malware/Goodware
ML Machine Learning
NLP Natural Language Processing
RF Random Forest
SVM Support Vector Machine
TB Transformer Based
TDM Term Document Matrices
TFIDF Term Frequency Inverse Document Frequency
References
1. Damodaran, A.; Di Troia, F.; Visaggio, C.A.; Austin, T.H.; Stamp, M. A comparison of static, dynamic, and hybrid analysis for
malware detection. J. Comput. Virol. Hacking Tech. 2017, 13, 1–12. [CrossRef]
2. Mahdavifar, S.; Ghorbani, A.A. Application of deep learning to cybersecurity: A survey. Neurocomputing 2019, 347, 149–176.
[CrossRef]
Big Data Cogn. Comput. 2023, 7, 60 31 of 33
3. Karita, S.; Chen, N.; Hayashi, T.; Hori, T.; Inaguma, H.; Jiang, Z.; Someki, M.; Soplin, N.E.Y.; Yamamoto, R.; Wang, X.; et al. A
comparative study on transformer vs rnn in speech applications. In Proceedings of the 2019 IEEE Automatic Speech Recognition
and Understanding Workshop (ASRU), Singapore, 14–18 December 2019; pp. 449–456.
4. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In
Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017;
pp. 5998–6008. Available online: https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
(accessed on 13 March 2023).
5. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding.
arXiv 2018, arXiv:1810.04805.
6. Rahali, A.; Akhloufi, M.A. MalBERT: Using transformers for cybersecurity and malicious software detection. arXiv 2021,
arXiv:2103.03806.
7. Rahali, A.; Akhloufi, M.A. MalBERT: Malware Detection using Bidirectional Encoder Representations from Transformers. In
Proceedings of the 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20
October 2021; pp. 3226–3231.
8. Swetha, M.; Sarraf, G. Spam email and malware elimination employing various classification techniques. In Proceedings of
the 2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT),
Bengaluru, India, 17–18 May 2019; pp. 140–145.
9. Mohammad, R.M.A. A lifelong spam emails classification model. Appl. Comput. Informatics 2020, ahead-of-print. [CrossRef]
10. Zhang, Y.; Jin, R.; Zhou, Z.H. Understanding bag-of-words model: A statistical framework. Int. J. Mach. Learn. Cybern. 2010,
1, 43–52. [CrossRef]
11. Antonellis, I.; Gallopoulos, E. Exploring term-document matrices from matrix models in text mining. arXiv 2006, arXiv:cs/0602076.
12. Church, K.W. Word2Vec. Nat. Lang. Eng. 2017, 23, 155–162. [CrossRef]
13. Mahoney, M.V. Fast Text Compression with Neural Networks. In Proceedings of the FLAIRS Conference, Orlando, FL, USA,
22–24 May 2000; pp. 230–234.
14. Rudd, E.M.; Abdallah, A. Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction.
arXiv 2020, arXiv:2011.03040.
15. Han, L.; Zeng, X.; Song, L. A novel transfer learning based on albert for malicious network traffic classification. Int. J. Innov.
Comput. Inf. Control. 2020, 16, 2103–2119.
16. Li, M.Q.; Fung, B.C.; Charland, P.; Ding, S.H. I-MAD: Interpretable Malware Detector Using Galaxy Transformer. Comput. Secur.
2021, 108 , 102371. [CrossRef]
17. Jusoh, R.; Firdaus, A.; Anwar, S.; Osman, M.Z.; Darmawan, M.F.; Ab Razak, M.F. Malware detection using static analysis in
Android: A review of FeCO (features, classification, and obfuscation). PeerJ Comput. Sci. 2021,7, e522. [CrossRef]
18. Niveditha, V.; Ananthan, T.; Amudha, S.; Sam, D.; Srinidhi, S. Detect and classify zero day Malware efficiently in big data
platform. Int. J. Adv. Sci. Technol. 2020, 29, 1947–1954.
19. Choi, S.; Bae, J.; Lee, C.; Kim, Y.; Kim, J. Attention-based automated feature extraction for malware analysis. Sensors 2020, 20, 2893.
[CrossRef]
20. Catal, C.; Gunduz, H.; Ozcan, A. Malware Detection Based on Graph Attention Networks for Intelligent Transportation Systems.
Electronics 2021, 10, 2534. [CrossRef]
21. Hei, Y.; Yang, R.; Peng, H.; Wang, L.; Xu, X.; Liu, J.; Liu, H.; Xu, J.; Sun, L. Hawk: Rapid android malware detection through
heterogeneous graph attention networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–15. [CrossRef]
22. Pathak, P. Leveraging Attention-Based Deep Neural Networks for Security Vetting of Android Applications. Ph.D. Thesis,
Bowling Green State University, Bowling Green, OH, USA, 2021; Volume 8, Number 29. [CrossRef]
23. Chen, J.; Guo, S.; Ma, X.; Li, H.; Guo, J.; Chen, M.; Pan, Z. SLAM: A Malware Detection Method Based on Sliding Local Attention
Mechanism. Secur. Commun. Netw. 2020, 2020, 6724513. [CrossRef]
24. Ganesan, S.; Ravi, V.; Krichen, M.; Sowmya, V.; Alroobaea, R.; Soman, K. Robust Malware Detection using Residual Attention
Network. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 10–12
January 2021; pp. 1–6.
25. Ren, F.; Jiang, Z.; Wang, X.; Liu, J. A DGA domain names detection modeling method based on integrating an attention mechanism
and deep neural network. Cybersecurity 2020, 3, 4. [CrossRef]
26. Komatwar, R.; Kokare, M. A Survey on Malware Detection and Classification. J. Appl. Secur. Res. 2021, 16, 390–420. [CrossRef]
27. Singh, J.; Singh, J. A survey on machine learning-based malware detection in executable files. J. Syst. Archit. 2021, 112, 101861.
[CrossRef]
28. Kouliaridis, V.; Kambourakis, G.; Geneiatakis, D.; Potha, N. Two Anatomists Are Better than One—Dual-Level Android Malware
Detection. Symmetry 2020, 12, 1128. [CrossRef]
29. Imtiaz, S.I.; ur Rehman, S.; Javed, A.R.; Jalil, Z.; Liu, X.; Alnumay, W.S. DeepAMD: Detection and identification of Android
malware using high-efficient Deep Artificial Neural Network. Future Gener. Comput. Syst. 2021, 115, 844–856. [CrossRef]
30. Amin, M.; Tanveer, T.A.; Tehseen, M.; Khan, M.; Khan, F.A.; Anwar, S. Static malware detection and attribution in android
byte-code through an end-to-end deep system. Future Gener. Comput. Syst. 2020, 102, 112–126. [CrossRef]
Big Data Cogn. Comput. 2023, 7, 60 32 of 33
31. Karbab, E.B.; Debbabi, M. PetaDroid: Adaptive Android Malware Detection Using Deep Learning. In Proceedings of the
International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Online, 14–16 July 2021;
pp. 319–340.
32. Yadav, P.; Menon, N.; Ravi, V.; Vishvanathan, S.; Pham, T.D. EfficientNet convolutional neural networks-based Android malware
detection. Comput. Secur. 2022, 115, 102622. [CrossRef]
33. Yuan, C.; Cai, J.; Tian, D.; Ma, R.; Jia, X.; Liu, W. Towards time evolved malware identification using two-head neural network.
J. Inf. Secur. Appl. 2022, 65, 103098. [CrossRef]
34. Weng Lo, W.; Layeghy, S.; Sarhan, M.; Gallagher, M.; Portmann, M. Graph Neural Network-based Android Malware Classification
with Jumping Knowledge. In Proceedings of the 2022 IEEE Conference on Dependable and Secure Computing (DSC), Edinburgh,
UK, 22–24 June 2022. [CrossRef]
35. Roy, K.C.; Chen, Q. Deepran: Attention-based bilstm and crf for ransomware early detection and classification. Inf. Syst. Front.
2021, 23, 299–315. [CrossRef]
36. Korine, R.; Hendler, D. DAEMON: Dataset/Platform-Agnostic Explainable Malware Classification Using Multi-Stage Feature
Mining. IEEE Access 2021, 9, 78382–78399. [CrossRef]
37. Lu, T.; Du, Y.; Ouyang, L.; Chen, Q.; Wang, X. Android malware detection based on a hybrid deep learning model. Secur. Commun.
Networks 2020, 2020, 8863617. [CrossRef]
38. Yoo, S.; Kim, S.; Kim, S.; Kang, B.B. AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware
classification. Inf. Sci. 2021, 546, 420–435. [CrossRef]
39. Yousefi-Azar, M.; Varadharajan, V.; Hamey, L.; Tupakula, U. Autoencoder-based feature learning for cyber security applications.
In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017;
pp. 3854–3861. [CrossRef]
40. Viennot, N.; Garcia, E.; Nieh, J. A measurement study of google play. In Proceedings of the 2014 ACM International Conference
on Measurement and Modeling of Computer Systems, Austin, TX, USA, 16–20 June 2014; pp. 221–233.
41. Peng, P.; Yang, L.; Song, L.; Wang, G. Opening the blackbox of virustotal: Analyzing online phishing scan engines. In Proceedings
of the Internet Measurement Conference, Amsterdam, The Netherlands, 21–23 October 2019; pp. 478–485.
42. Shibata, Y.; Kida, T.; Fukamachi, S.; Takeda, M.; Shinohara, A.; Shinohara, T.; Arikawa, S. Byte Pair Encoding: A Text Compression
Scheme That Accelerates Pattern Matching. Researchgate. 1999. Available online: https://www.researchgate.net/publication/23
10624_Byte_Pair_Encoding_A_Text_Compression_Scheme_That_Accelerates_Pattern_Matching (accessed on 12 March 2023).
43. Song, X.; Salcianu, A.; Song, Y.; Dopson, D.; Zhou, D. Fast WordPiece Tokenization. In Proceedings of the 2021 Conference on
Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2089–2103.
44. Kudo, T.; Richardson, J. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text
processing. arXiv 2018, arXiv:1808.06226.
45. Chang, P.C.; Galley, M.; Manning, C.D. Optimizing Chinese word segmentation for machine translation performance. In
Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, OH, USA, 19 June 2008; pp. 224–232.
46. Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909.
47. Li, Y.; Jang, J.; Hu, X.; Ou, X. Android malware clustering through malicious payload mining. In Proceedings of the International
symposium on Research in Attacks, Intrusions, and Defenses, Atlanta, GA, USA, 18–20 September 2017; pp. 192–214.
48. Arp, D.; Spreitzenbarth, M.; Hubner, M.; Gascon, H.; Rieck, K.; Siemens, C. Drebin: Effective and explainable detection of android
malware in your pocket. In Proceedings of the Network and Distributed System Security Symposium (NDSS) ’14, San Diego, CA,
USA, 23–26 February 2014.
49. Roberts, J.M. Automatic Analysis of Malware Behaviour using Machine Learning. J. Comput. Secur. 2011, 19, 639–668.
50. Miranda, T.C.; Gimenez, P.F.; Lalande, J.F.; Tong, V.V.T.; Wilke, P. Debiasing Android Malware Datasets: How Can I Trust Your
Results If Your Dataset Is Biased? IEEE Trans. Inf. Forensics Secur. 2022, 17, 2182–2197. [CrossRef]
51. Li, L.; Gao, J.; Hurier, M.; Kong, P.; Bissyandé, T.F.; Bartel, A.; Klein, J.; Traon, Y.L. Androzoo++: Collecting millions of android
apps and their metadata for the research community. arXiv 2017, arXiv:1709.05281.
52. Arvind, M. Android Permissions and API Calls during Dynamic Analysis. Available online: https://data.mendeley.com/
datasets/vng8wg9n65/1 (accessed on 12 March 2023).
53. Colaco, C.W.; Bagwe, M.D.; Bose, S.A.; Jain, K. DefenseDroid: A Modern Approach to Android Malware Detection. Strad Res.
2021, 8, 271–282. [CrossRef]
54. Desnos, A.; Gueguen, G. Androguard-Reverse Engineering, Malware and Goodware Analysis of Android Applications. Available
online: https://androguard.readthedocs.io/en/latest/ (accessed on 12 March 2023).
55. Yerima, S. Android Malware Dataset for Machine Learning. Figshare. 2018 . Available online: https://figshare.com/articles/
dataset/Android_malware_dataset_for_machine_learning_2/5854653 (accessed on 12 March 2023).
56. Arvind, M. A Android Malware and Normal Permissions Dataset. 2018. Available online: https://data.mendeley.com/datasets/
958wvr38gy/5 (accessed on 27 July 2022).
57. Arvind, M. Android Permission Dataset. 2018. Available online: https://data.mendeley.com/datasets/8y543xvnsv/1 (accessed
on 27 July 2022).
58. Concepcion Miranda, T.; Gimenez, P.F.; Lalande, J.F.; Viet Triem Tong, V.; Wilke, P. Dada: Debiased Android Datasets. 2021.
Available online: https://ieee-dataport.org/open-access/dada-debiased-android-datasets (accessed on 27 July 2022).
Big Data Cogn. Comput. 2023, 7, 60 33 of 33
59. Hozan, E. Android APK Reverse Engineering: Using JADX, October 4, 2019. Available online: https://www.secplicity.org/2019
/10/04/android-apk-reverse-engineering-using-jadx/ (accessed on 30 March 2021).
60. Winsniewski, R. Apktool: A Tool for Reverse Engineering Android apk Files. Available online: https://ibotpeaches.github.io/
Apktool/ (accessed on 27 July 2022).
61. Harrand, N.; Soto-Valero, C.; Monperrus, M.; Baudry, B. Java decompiler diversity and its application to meta-decompilation.
J. Syst. Softw. 2020, 168, 110645. [CrossRef]
62. Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. arXiv 2016, arXiv:1607.04606.
63. Zhang, B.; Xiao, W.; Xiao, X.; Sangaiah, A.K.; Zhang, W.; Zhang, J. Ransomware classification using patch-based CNN and
self-attention network on embedded N-grams of opcodes. Future Gener. Comput. Syst. 2020, 110, 708–720. [CrossRef]
64. Rahali, A.; Lashkari, A.H.; Kaur, G.; Taheri, L.; GAGNON, F.; Massicotte, F. DIDroid: Android Malware Classification and
Characterization Using Deep Image Learning. In Proceedings of the 2020 the 10th International Conference on Communication
and Network Security, Tokyo, Japan, 27–29 November 2020; pp. 70–82. [CrossRef]
65. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly
optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692.
66. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary
classification evaluation. BMC Genom. 2020, 21, 6. [CrossRef]
67. Narkhede, S. Understanding auc-roc curve. Towards Data Sci. 2018, 26, 220–227.
68. Jia, Y.; Qi, Y.; Shang, H.; Jiang, R.; Li, A. A practical approach to constructing a knowledge graph for cybersecurity. Engineering
2018, 4, 53–60. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.