Paper Springer
Paper Springer
RESEARCH
1 Introduction
The integration of the Internet into our daily lives has opened up a new era of op-
portunities, products, and access to vast amounts of data. However, as our reliance
on the Internet grows, so does the threat it poses to our security [1] [2] [3]. This
increased connectivity has led to the exploitation of cyber security vulnerabilities,
resulting in various threats such as phishing, malware, ransomware, social engineer-
ing, identity theft, and denial-of-service attacks. One particularly concerning threat
is the existence of botnets [4] attacks in the current era. A botnet is a network
of compromised machines, controlled by one or multiple remote servers known as
command and control servers. These infected machines often remain inconspicuous,
operating normally until they receive a command to launch an attack. There are
various types of botnets, but they all function in a similar fashion [5]. For exam-
ple, the Zeus botnet [6], also known as Zbot, specifically targets financial market
data, resulting in significant global financial losses. Another example is the Windigo
botnet [7], it emerged in 2011, targeting approximately 10,000 Linux servers and
Qasim et al. Page 2 of 30
producing a remarkable 35 million spam emails. The Mirai botnet [8], surfaced in
2016, targeted high-end embedded and IoT devices, carrying out large-scale dis-
tributed denial-of-service (DDoS) attacks. These examples highlight the immense
scale and impact of botnets, emphasizing the critical need for effective measures to
detect and mitigate their activities.
Botnet detection is challenging due evolving nature of these attacks. Techniques
for botnet detection [9] can be broadly categorized into two main types: active tech-
niques and passive techniques. Passive techniques [10] primarily rely on monitoring
data collected from sources such as honeypots, packet inspection, and analysis of
spam records. On the other hand, active techniques involve direct communication
with information sources, often employing deeper probing and analysis. One ex-
ample of an active detection technique is the sinkhole method, which disrupts the
communication between the bots and the command and control server by severing
their connection. Another active technique is DNS cache spoofing, which involves
manipulating DNS cache entries to redirect botnet traffic toward detection systems.
Tracking flux networks is yet another active technique used for botnet detection. De-
spite the continuous advancements in botnet detection, the increasing complexity of
botnet threats has rendered conventional methods ineffective [11]. As botnets evolve
and employ sophisticated techniques, there is a need for innovative and adaptive
approaches to effectively identify and mitigate these threats.
Effective detection and prevention of botnets play a vital role in safeguarding the
security and integrity of computer systems. To assess the effectiveness of machine
learning in the field of cybersecurity, it is necessary to apply machine learning tech-
niques to existing datasets or generate new datasets. There are multiple approaches
for generating datasets that can be used for botnet detection. One such approach is
the utilization of the EMBER dataset [12], which involves extracting static features
from botnet samples. This dataset provides valuable insights into the characteristics
and behaviors of botnets, enabling the development of machine-learning models for
accurate detection. Another approach is the use of the CTU-13 dataset [13], where
bots are deliberately deployed and analyzed in a controlled laboratory environment.
By analyzing the network traffic generated by these bots, meaningful features can be
derived. This dataset allows researchers to study the real-world behavior of botnets
and develop machine learning algorithms to effectively detect and mitigate their
activities. By leveraging these datasets and applying machine learning techniques,
researchers can explore and evaluate the potential of machine learning in addressing
the challenges of botnet detection and prevention, ultimately enhancing the security
of computer systems.
This paper addresses three main limitations in botnet attack detection using ma-
chine learning. Firstly, there is a lack of a comprehensive botnet ground truth
dataset that covers modern attack techniques. Existing research has relied on
outdated and unbalanced datasets, often with a limited number of botnet types
[divekar2018benchmarking] [song2006description] [shiravi2012toward]. This lack of
benchmark datasets hinders the ability to compare different machine learning mod-
els and features, resulting in fragmented research and a lack of clarity regarding the
effectiveness of proposed detection models [koroniotis2019towards].
Secondly, the proposed machine learning models often struggle to withstand the
evasive strategies employed by hackers. Botnets continuously evolve and adapt to
Qasim et al. Page 3 of 30
avoid detection, rendering some detection models ineffective. This poses a significant
challenge in developing robust machine-learning models that can effectively detect
botnet attacks and overcome evasion techniques.
Thirdly, the current feature extraction techniques predominantly rely on static
features and do not capture the complete profiles of botnet behaviors. This limited
feature coverage may lead to missed detections and insufficient understanding of
the intricate characteristics and dynamics of botnets.
While classical machine learning and deep learning models have been utilized
for botnet detection [popoola2021smote] [saeed2022survey] [miller2016role], the ap-
plication of Natural Language Processing (NLP) has opened up new possibilities
for malware detection systems. ML-based text analyzers, leveraging NLP tech-
niques, have shown promise in enhancing the detection capabilities of botnet attacks
[8233569] [zhang2021hybrid] [lu2019malware] [mimura2022applying]. Addressing
these limitations and exploring the potential of NLP-based approaches can con-
tribute to the development of more effective and robust botnet detection systems,
enhancing the overall security of computer networks.
In this research, we propose an alternative approach to the conventional feature
selection method for detecting botnet samples from benign samples. The raw data is
generated by sandboxing real-world botnet samples and generating reports. Our aim
is to explore further methods for extracting features from these reports, particularly
those leveraging NLP approaches, which focus on the processing of human language
[eisenstein2019introduction].
The objective of our research is to create a dataset by extracting features from
sandbox-generated reports, identify NLP approaches applicable to this scenario,
and train machine learning algorithms for botnet detection. Additionally, we aim
to evaluate the viability of employing NLP for this task. To accomplish this, we
examine the performance of several machine learning algorithms on dataset AUB02,
while also investigating ways to increase the robustness of our models.
This study contributes to addressing the difficulties in botnet detection by propos-
ing novel techniques that leverage NLP and sandbox-generated reports. By focusing
on feature extraction, dataset creation, and machine learning algorithm training,
we aim to advance the field of botnet detection and enhance the effectiveness of
detection systems.
1.1 Contributions
Major contributions of our research are listed below:-
1 Release of ground truth botnet dataset. A new, real-world botnet dataset
that includes both legitimate and botnet samples, with 65.75% malicious and
34.24% benign from the past three years and 12 botnet families. The data is
collected from multiple industry sources, including virus total and abuse.ch,
which is being released. Ground truth is ensured by collecting all malicious
samples from databases of famous cybersecurity companies. The dataset is
generated by post-processing of malware analysis reports and available for
security researchers.
2 ML-based botnet detection analysis. Many suitable algorithms for botnet de-
tection are evaluated for performance utilizing a variety of methods, such as
Qasim et al. Page 4 of 30
2 Literature Review
In this section, we embark on a comprehensive literature review, delving into the
realm of machine learning for botnet detection as explored by esteemed security
researchers. Each technique presented in the literature showcases impressive efficacy
in combating this pervasive threat. However, it is important to note that a direct
comparison of our study’s findings with others may not always be feasible due to
potential variations in the methodologies employed.
Diverse approaches exist within the realm of botnet detection. Some focus on de-
tecting botnets through the analysis of network traffic, while our primary objective
lies in identifying malicious bots themselves. As such, comparing our results directly
with studies adopting different methodologies might yield contrasting outcomes.
To facilitate a quick and comprehensive overview, we present a summarized version
of Table 1 at the conclusion of this section. This summary enables us to readily
compare our methods with those employed by other researchers, shedding light on
the distinctiveness and potential contributions of our approach to the broader field
of botnet detection.
Extensive research was conducted by U. Pedrero et al. [14], wherein a complete
one-day feed from public and private sources was dissected. The focus was on ex-
amining Windows executables in detail and comparing the outcomes generated by
static and dynamic analysis tools. Clustering techniques were employed to identify
similar sandbox findings and classify them based on different priorities. The study
aimed to evaluate the computational resources and human effort required by orga-
nizations to comprehend daily malware feed, while also addressing the challenges
encountered during sample feed analysis. DBot [15] introduced a technique for iden-
tifying and grouping botnets that communicate with their command and control
(CC) servers using domain generation algorithms (DGAs). By monitoring DNS traf-
fic, the research aimed to detect patterns commonly associated with DGA-based
botnets. The underlying assumption of the DBot approach was that the majority
of domains generated by the DGA do not possess valid IP addresses, resulting in
an NXDomain response. The study reported an average accuracy of 99.69 across
four distinct datasets under controlled conditions. However, it is important to note
that real-time scenarios revealed limitations, as bot activity does not always align
with the collection of DNS messages required for accurate identification. In a sep-
arate research effort [16], a methodology based on word embedding and the Long
Short-Term Memory (LSTM) network model was proposed for packet-based ma-
licious traffic classification. The evaluation demonstrated comparable performance
Qasim et al. Page 5 of 30
3 Methodology
The proposed methodology encompasses several steps, including data collection,
feature extraction, application of machine learning algorithms, and evaluation of
Qasim et al. Page 6 of 30
results. A flow chart illustrates the sequence of these processes, which will be dis-
cussed in detail in the subsequent sections.
The first step in the methodology is data collection, where relevant datasets are
acquired for the botnet detection task. These datasets may consist of network traffic
data, malware samples, or reports generated by sandboxing techniques.
Next, the feature extraction process is employed to extract meaningful information
from the collected data. This involves techniques such as NLP, wavelet decomposi-
tion, or other feature engineering methods to identify and extract relevant features
that can distinguish between botnet and benign samples.
Once the features are extracted, machine learning algorithms are applied to train
models for botnet detection. Various algorithms, including Naive Bayes, k-nearest
neighbor, decision trees, multi-layer perceptrons, support vector machines, and ran-
dom forest, may be utilized based on their suitability and performance for the given
task.
After training the models, the evaluation phase assesses their performance. This
includes measuring metrics such as true positive rate, false positive rate, accuracy,
F1 score, and other relevant evaluation criteria. The results are analyzed to deter-
mine the effectiveness and efficiency of the proposed methodology.
Throughout the discussion, accompanying figures, including the flow chart, pro-
vide a visual representation of the proposed methodology, illustrating the sequence
of steps and processes involved in the detection of botnets using machine learning
techniques.
The availability of both raw data and structured information enables subsequent
steps in the botnet detection process, such as feature extraction, machine learning
algorithms, and evaluation of results, to be conducted effectively. It ensures that
the detection methods can leverage the combination of raw data and structured
insights to accurately identify and classify botnet activity. Therefore, this phase
plays a crucial role in ensuring that the necessary data and information are available
for the successful detection of botnets.
Table 2 JSON report attributes that are intended for NLP models
Title Representation Criticality NLP contribution
info sandbox platform,analysis information nil no
network tcp/udp,dns,domains bad IPs/URLs yes
static static and embedded properties entropy yes
behavior system calls,files,registry keys file modifications yes
signatures pattern match malicious patterns yes
strings extracted strings from binaries strings match yes
debug error&analysis logs nil no
metadata name,hash nil no
Tables
Table 3 Number of malicious and benign samples after data cleanse
Ser. type of samples number
a. malicious 5,027
b. benign 2,652
total 7679
class. Then, we remove symbols and punctuation marks, such as commas, paren-
theses, and quotation marks, from the text, merging all the words into a single
cohesive phrase. Since our focus is not on contextual meaning but rather on feeding
the words to the algorithm, this consolidation suffices. For implementation, we uti-
lize the scikit-learn CountVectorizer(), which incorporates a built-in English stop
word list to remove unnecessary terms. We set a parameter to include only the top
1000 most frequent features. The resulting token count matrix is then organized
into a data frame, along with the previously stored labels, and saved as a CSV file
for future use.
This allows us to organize and utilize the extracted features for further analysis and
classification purposes.
K
1 X
ij = s − Sk, j
K
k=1
The dataset consists of 7,679 samples, including both benign and malicious sam-
ples. To train the models, the first step is to split the data into two parts: a training
set and a testing set. The training set will be used for hyperparameter tuning and
model training, while the testing set will be reserved solely for evaluating the final
model’s performance.
Here are the details of the training and testing samples:
Training Set: This set is used for model training and hyperparameter tuning.
It comprises a portion of the dataset and is essential for optimizing the model’s
performance. The specific size or proportion of the training set is not mentioned in
the provided information.
Testing Set: This set is kept separate and is used to evaluate the model’s perfor-
mance after it has been trained. It serves as an unbiased assessment of the model’s
ability to generalize to new, unseen data. The size or proportion of the testing set
is not mentioned.
By splitting the dataset into training and testing sets, we can ensure that the
models are trained on a representative sample of the data and evaluated on inde-
pendent data to assess their performance accurately. Details of training and testing
samples are as under:-
Tables
Table 4 Traing and testing split samples for ML models
Ser. type of samples malicious benign total
a. train set 3740 2019 5759
b. test set 1287 633 1920
During the hyperparameter tuning process, different algorithms and their respec-
tive hyperparameters were evaluated to identify the best-performing models. For
the MLP classifier, GridSearchCV was utilized with a manual selection of param-
eter values. Six distinct choices were made for the hidden layer size, three for the
maximum number of iterations, two for activation functions, solvers, alpha values,
and learning rate settings, resulting in a total of 36 combinations. This was further
multiplied by 10 for 10-fold cross-validation, resulting in 360 model trainings to find
the optimal parameter combination [?].
For the remaining models, BayesSearchCV was employed. Unlike GridSearchCV,
which exhaustively explores all parameter combinations, BayesSearchCV uses a
probabilistic function to guide the search and prioritize promising parameter sets.
By considering prior assessments, it focuses on areas of the parameter space likely
to yield higher accuracy. This approach reduces the number of iterations required
to find the optimal hyperparameters and avoids exploring irrelevant regions of the
parameter space [49].
Cross-validation was employed to assess model performance and detect overfitting.
If there is a significant difference between the validation accuracy and the test
accuracy, it indicates overfitting. However, in this study, no substantial differences
were observed, suggesting that the models were able to generalize well.
It is worth noting that hyperparameter tuning and cross-validation can be com-
putationally intensive and time-consuming processes, as they involve training and
evaluating multiple models with different parameter settings. Therefore, the choice
Qasim et al. Page 14 of 30
The number of true positives, true negatives, false positives, and false negatives
is summarised in a confusion matrix in Figure 1.
Figure 2 shows the ROC curve, which shows that RF and XGboost almost create
a perfect ROC curve, indicating that these models perform quite well. [figures are
not referred correctly]
The DET (detection error tradeoff) curves for our models are shown in Figure 3.
In our specific scenario, our preference is to have a higher number of False Negatives
and a lower number of False Positives. This preference stems from our concern of
misclassifying a malicious sample as benign. By prioritizing this aspect, we can
use the threshold value to evaluate and optimize the model’s performance. Upon
examining the curves, we observe a relatively uniform distribution, indicating the
absence of any significant bias in the model’s predictions. This suggests that the
Qasim et al. Page 16 of 30
model is not favoring any specific class and is making balanced predictions across
both benign and malicious samples.
Calibration curves in Figure 4 demonstrate the degree of alignment between the
model’s estimated probabilities and the actual probabilities of different events. A
well-calibrated model should have a curve that closely resembles the diagonal line,
indicating a match between expected and observed probabilities. Deviation from
the diagonal line suggests poor calibration. Despite our models exhibiting high ac-
curacy, precision, and recall, the predicted probabilities do not align well with the
actual probabilities. This discrepancy may be attributed to issues such as overfit-
ting, class bias, or suboptimal decision criteria [62]. Notably, the RandomForest
model appears to exhibit closer alignment with the well-calibrated line, implying a
higher level of reliability for this model.
positive rates and low false positive rates. Comparing these results to Figure 5,
which represents the performance of the models using the Bag of Words (BoW)
technique, there is a noticeable difference in performance. The models in the BoW
technique seem to exhibit better performance compared to the models in the current
evaluation.
The DET curve in Figure 9 illustrates that even with adjustments to the threshold
chosen by a particular model, the outcomes remain unfavorable. This is evident
when examining the RF graph, where reducing the false negative rate to 5% results
in a false positive rate of approximately 35%. In both cases, the results fall below
the desired level of performance.
Qasim et al. Page 20 of 30
Table 7 Performance analysis of Machine Learning Algorithms using Word2Vec as Feature model
Conventional two class classifiers (Supervised)
Classifier Accuracy Precision Recall F1-score
KNeighborsClassifier 0.8486 0.8474 0.8486 0.8478
LogisticRegression 0.7823 0.7765 0.7823 0.7763
DecisionTreeClassifier 0.8347 0.8341 0.8347 0.8344
RandomForestClassifier 0.8555 0.8548 0.8555 0.8551
SVM 0.8132 0.8108 0.8132 0.8116
XGBClassifier 0.8548 0.8540 0.8549 0.8544
MLPClassifier 0.8013 0.8231 0.8013 0.8058
One Class Classification (Unsupervised)
OneClassSVM 0.6381 0.5925 0.6381 0.6043
PU Learning (Semi-supervised))
PUBagging 0.8543 0.912 0.8646 -
Table 8 Performance analysis of Machine Learning Algorithms using GloVe as Feature model
Conventional two class classifiers (Supervised)
Classifier Accuracy Precision Recall F1-score
KNeighborsClassifier 0.8454 0.8440 0.8454 0.8445
LogisticRegression 0.7905 0.7853 0.7905 0.7842
DecisionTreeClassifier 0.8347 0.8349 0.8347 0.8348
RandomForestClassifier 0.8524 0.8517 0.8524 0.8520
SVM 0.8088 0.8052 0.8088 0.8057
XGBClassifier 0.8543 0.8539 0.8543 0.8541
MLPClassifier 0.8145 0.8108 0.8145 0.8108
One Class Classification (Unsupervised)
OneClassSVM 0.6366 0.5970 0.6366 0.6083
PU Learning (Semi-supervised))
PUBagging 0.8454 0.9100 0.8522 -
4.5 Performance Evaluation of State of the art Neural and Deep Learning Classifiers
Modern text classification techniques heavily rely on neural networks as they have
shown great effectiveness in handling complex language patterns and capturing se-
mantic information. One popular neural network library used for text classification
is FastText, developed by Facebook. FastText is designed for both supervised text
categorization and unsupervised word embeddings. It introduces a lightweight and
efficient approach to training text classifiers by using a hierarchical structure, reduc-
ing the training and testing time complexity from linear to logarithmic with respect
to the number of classes. FastText also leverages the Huffman algorithm to optimize
computational efficiency, particularly when dealing with imbalanced classes, such
as in our Spam dataset, where some classes occur more frequently than others [65]
[56]. Deep learning architectures have emerged as powerful tools in various domains,
surpassing the performance of traditional shallow machine learning methods. This
holds true for natural language processing (NLP), image classification, and other
tasks. Unlike shallow machine learning, deep learning models can automatically
learn meaningful features directly from high-dimensional raw data. As a result, we
considered several state-of-the-art deep learning models that have been successfully
employed by researchers for text classification and spam detection. These models
include recurrent neural networks (RNNs), convolutional neural networks (CNNs),
Qasim et al. Page 22 of 30
transformer models, and more. The choice of architecture depends on the specific
learning objectives and the nature of the data being analyzed [66] [55] [59] [67].
Table 9 Performance Evaluation of State of the art Neural and Deep Learning Classifiers
Classifiers Accuracy Precision Recall F1-score
FastText 0.821 0.8200 0.8209 0.8204
BERT 0.8544 0.8912 0.8904 0.8908
distilbert 0.8550 0.8893 0.8927 0.8910
Roberta 0.8544 0.8896 0.8921 0.8909
LSTM 0.758 0.75 0.76 0.75
BiLSTM 0.715 0.73 0.72 0.72
CNN (Random) 0.76 0.76 0.77 0.76
CNN (GloVe) 0.722 0.77 0.72 0.73
ENSEMBLE (Random) 0.824 0.82 0.82 0.82
ENSEMBLE (Static) 0.786 0.78 0.79 0.78
ENSEMBLE (Dynamic) 0.782 0.78 0.78 0.77
tures for that particular model. It is important to keep in mind that the relevance
of features may vary across different models.
Additionally, it is essential to consider the presence of correlations among features.
In cases where features are highly correlated, the importance of individual features
may be underestimated. Even if one of the correlated features is removed or altered,
the other feature may still drive the model to make similar predictions as before.
To address this, a clustering technique could have been employed using Spearman
rank-order correlations. By establishing a threshold and selecting a representative
feature from each cluster, the impact of correlated features could have been better
managed.
Furthermore, eliminating features that were estimated to have zero relevance may
not always be the best approach. It is possible that the zero relevance estimation
could be an indication of a correlation between those features. Taking this into
consideration, it would have been prudent to investigate and address correlations
among features before discarding them based on relevance estimation alone.
We also conducted an evaluation of the feature importance for the two highest
performing algorithms. During this analysis, we discovered that some of the features
on the list may not seem logical or intuitively important. It would be interesting to
investigate further and understand why these features are considered important by
the algorithms, as there could be instances where they are mistakenly identified as
such.
Qasim et al. Page 25 of 30
6 Conclusion
In conclusion, our investigation into botnet malware detection using machine learn-
ing has opened exciting possibilities for enhanced cybersecurity. We delved into
the realm of feature extraction, specifically from sandbox-generated reports, and
devised two innovative approaches. And boy, did they deliver!
By harnessing the power of a Bag of Words representation and employing smart
feature selection, we crafted a robust dataset that captured the essence of malicious
behavior. Our machine learning algorithms eagerly dived into this treasure trove
of information, learning patterns and trends that allowed them to predict with
astounding accuracy whether a file is harboring dangerous botnet malware. The
XGBoost classifier emerged as the shining star of our study, boasting an impressive
accuracy of 99.17% and an awe-inspiring ROC/AUC score of 0.9995. Not far behind,
the RandomForest classifier held its ground with an impressive accuracy of 98.38%
and a commendable ROC/AUC score of 0.9985. These results showcase the power
of machine learning in the realm of cybersecurity. Of course, no endeavor comes
without challenges and room for improvement. We carefully examined limitations
and drawbacks, paving the way for future enhancements. Sandbox hardening and
expanding our scope to tackle other forms of malware are exciting avenues that
await exploration.
In a world where cyber threats loom large, our research ignites hope. By pushing
the boundaries of feature extraction and harnessing the might of machine learn-
ing, we take a significant step toward securing the digital landscape. Let us march
forward, armed with knowledge and innovation, in the quest for a safer cyberspace.
Appendix
Text for this section. . .
Qasim et al. Page 28 of 30
Acknowledgements
Text for this section. . .
Funding
Text for this section. . .
Abbreviations
Text for this section. . .
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
Text for this section . . .
Authors’ information
Text for this section. . .
Author details
1 2
Department of Electrical Engineering, NUST, ISLAMABAD, PAKISTAN. Institute of Computer science, Denmark
Technical University, DTU, Kiel, Germany.
References
1. Baraz, A., Montasari, R.: Law enforcement and the policing of cyberspace. In: Digital Transformation in
Policing: The Promise, Perils and Solutions, pp. 59–83. Springer, ??? (2023)
2. Rane, S., Devi, G., Wagh, S.: Cyber threats: Fears for industry. In: Cyber Security Threats and Challenges
Facing Human Life, pp. 43–54. Chapman and Hall/CRC, ??? (2023)
3. Kaur, J., Ramkumar, K.: The recent trends in cyber security: A review. Journal of King Saud
University-Computer and Information Sciences 34(8), 5766–5781 (2022)
4. Thanh Vu, S.N., Stege, M., El-Habr, P.I., Bang, J., Dragoni, N.: A survey on botnets: Incentives, evolution,
detection and current trends. Future Internet 13(8), 198 (2021)
5. Negash, N., Che, X.: An overview of modern botnets. Information Security Journal: A Global Perspective
24(4-6), 127–132 (2015)
6. Falliere, N., Chien, E.: Zeus: King of the bots. Symantec Security Response (http://bit. ly/3VyFV1) (2009)
7. Prasad, R., Rohokale, V.: BOTNET, pp. 43–65. Springer, Cham (2020). doi:10.1007/978-3-030-31703-44 .
8. Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., Durumeric, Z., Halderman,
J.A., Invernizzi, L., Kallitsis, M., et al.: Understanding the mirai botnet. In: 26th USENIX Security Symposium
(USENIX Security 17), pp. 1093–1110 (2017)
9. Gaonkar, S., Dessai, N.F., Costa, J., Borkar, A., Aswale, S., Shetgaonkar, P.: A survey on botnet detection
techniques. In: 2020 International Conference on Emerging Trends in Information Technology and Engineering
(ic-ETITE), pp. 1–6 (2020). IEEE
10. Khattak, S., Ramay, N.R., Khan, K.R., Syed, A.A., Khayam, S.A.: A taxonomy of botnet behavior, detection,
and defense. IEEE communications surveys & tutorials 16(2), 898–924 (2013)
11. Owen, H., Zarrin, J., Pour, S.M.: A survey on botnets, issues, threats, methods, detection and prevention.
Journal of Cybersecurity and Privacy 2(1), 74–88 (2022)
12. Anderson, H.S., Roth, P.: Ember: an open dataset for training static pe malware machine learning models.
arXiv preprint arXiv:1804.04637 (2018)
13. Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. computers
& security 45, 100–123 (2014)
14. Ugarte-Pedrero, X., Graziano, M., Balzarotti, D.: A close look at a daily dataset of malware samples. ACM
Transactions on Privacy and Security (TOPS) 22(1), 1–30 (2019)
15. Wang, T.-S., Lin, H.-T., Cheng, W.-T., Chen, C.-Y.: Dbod: Clustering and detecting dga-based botnets using
dns traffic analysis. Computers & Security 64, 1–15 (2017)
16. Hwang, R.-H., Peng, M.-C., Nguyen, V.-L., Chang, Y.-L.: An lstm-based deep learning approach for classifying
malicious traffic at the packet level. Applied Sciences 9(16), 3414 (2019)
17. Cui, J., Long, J., Min, E., Mao, Y.: Wedl-nids: improving network intrusion detection using word
embedding-based deep learning method. In: International Conference on Modeling Decisions for Artificial
Intelligence, pp. 283–295 (2018). Springer
18. Oak, R., Du, M., Yan, D., Takawale, H., Amit, I.: Malware detection on highly imbalanced data through
sequence modeling. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp.
37–48 (2019)
19. Alauthaman, M., Aslam, N., Zhang, L., Alasem, R., Hossain, M.A.: A p2p botnet detection scheme based on
decision tree and adaptive multilayer neural networks. Neural Computing and Applications 29(11), 991–1004
(2018). doi:10.1007/s00521-016-2564-5
20. Faculty of Engineering. https://www.uvic.ca/ecs/ece/isot/assets/docs/isot-datase.pdf
Qasim et al. Page 29 of 30
21. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate
benchmark datasets for intrusion detection. Computers and Security 31(3), 357–374 (2012).
doi:10.1016/j.cose.2011.12.012
22. Feizollah, A., Anuar, N.B., Salleh, R., Amalina, F., Shamshirband, S., et al.: A study of machine learning
classifiers for anomaly-based mobile botnet detection. Malaysian Journal of Computer Science 26(4), 251–265
(2013)
23. Zago, M., Pérez, M.G., Pérez, G.M.: Umudga: A dataset for profiling dga-based botnet. Computers & Security
92, 101719 (2020)
24. Wojnowicz, M., Chisholm, G., Wolff, M., Zhao, X.: Wavelet decomposition of software entropy reveals
symptoms of malicious code. Journal of Innovation in Digital Ecosystems 3(2), 130–140 (2016)
25. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate
benchmark datasets for intrusion detection. computers & security 31(3), 357–374 (2012)
26. Alauthaman, M., Aslam, N., Zhang, L., Alasem, R., Hossain, M.A.: A p2p botnet detection scheme based on
decision tree and adaptive multilayer neural networks. Neural Computing and Applications 29(11), 991–1004
(2018)
27. Botnet and Ransomware Detection Datasets - University of Victoria, URL =
https://www.uvic.ca/ecs/ece/isot/datasets/botnet-ransomware/index.php/, note = Accessed: 2023-01-23
28. Zhou, Y., Jiang, X.: Dissecting android malware: Characterization and evolution. In: 2012 IEEE Symposium on
Security and Privacy, pp. 95–109 (2012). IEEE
29. Cylance, URL = https://github.com/orgs/cylance/repositories?type=all, note = Accessed: 2023-01-24
30. Online malware analysis platform, URL = http://www.virustotal.com/, note = Accessed: 2022-11-29
31. Figthing malware and botnets, URL = https://abuse.ch/, note = Accessed: 2022-11-29
32. Interactive Online Malware Sandbox, URL = https://any.run/, note = Accessed: 2022-11-29
33. Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: Avclass: A tool for massive malware labeling. In: Research
in Attacks, Intrusions, and Defenses: 19th International Symposium, RAID 2016, Paris, France, September
19-21, 2016, Proceedings 19, pp. 230–253 (2016). Springer
34. Automated Malware Analysis, URL = https://cuckoosandbox.org/, note = Accessed: 2022-11-29
35. JSON - JSON encoder and decoder. https://docs.python.org/3/library/json.html
36. Soviany, S., Soviany, C.: Feature engineering. Principles of Data Science, 79–103 (2020).
doi:10.1007/978-3-030-43981-15
37. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv
preprint arXiv:1301.3781 (2013)
38. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv
preprint arXiv:1309.4168 (2013)
39. Řehřek, R., Sojka, P., et al.: Gensim—statistical semantics in python. Retrieved from genism. org (2011)
40. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of
the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
41. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information.
Transactions of the association for computational linguistics 5, 135–146 (2017)
42. Matthew, E.P., Mark, N., Mohit, I., Matt, G., Christopher, C., Kenton, L.: Deep contextualized word
representations (2018). arXiv preprint arXiv:1802.05365 (1802)
43. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for
language understanding. arXiv preprint arXiv:1810.04805 (2018)
44. Papers with code - bert: Pre-training of deep bidirectional Transformers for language understanding.
https://paperswithcode.com/paper/bert-pre-training-of-deep-bidirectional
45. team, P.: PyTorch: An open source machine learning framework. Facebook Inc. (2021)
46. Corporation, N.: CUDA Toolkit 11.2. https://developer.nvidia.com/cuda-11.2.0-download-archive (2021)
47. Corporation, N.: cuDNN. https://developer.nvidia.com/cudnn (2021)
48. Wolf, T., Debut, L., Sanh, V., Chaplot, D.S., Delangue, J., Moi, A., Cistac, P., Louf, R., Funtowicz, M.,
Gardner, M., et al.: huggingface/transformers. https://github.com/huggingface/transformers
49. BayesSearchCV. https://scikit-optimize.github.io/stable/modules/generated/skopt.BayesSearchCV.html
50. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data.
Cambridge university press, ??? (2007)
51. Vapnik, V.: Statistical learning theory wiley new york google scholar (1998)
52. Manevitz, L.M., Yousef, M.: One-class svms for document classification. Journal of machine Learning research
2(Dec), 139–154 (2001)
53. Hu, W., Le, R., Liu, B., Ji, F., Chen, H., Zhao, D., Ma, J., Yan, R.: Learning from positive and unlabeled data
with adversarial training (2020)
54. DeBarr, D., Wechsler, H.: Spam detection using clustering, random forests, and active learning. In: Sixth
Conference on Email and Anti-Spam. Mountain View, California, pp. 1–6 (2009). Citeseer
55. Mandic, D., Chambers, J.: Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and
Stability. Wiley, ??? (2001)
56. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint
arXiv:1607.01759 (2016)
57. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and
lighter. arXiv preprint arXiv:1910.01108 (2019)
58. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.:
Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
59. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings
of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
60. Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for
sequence modeling. arXiv preprint arXiv:1803.01271 (2018)
Qasim et al. Page 30 of 30
61. Chen, Y.: Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo
(2015)
62. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. 34th International
Conference on Machine Learning, Icml 2017 3, 2130–2143 (2017)
63. Goldberg, Y., Levy, O.: word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding
method. arXiv preprint arXiv:1402.3722 (2014)
64. Ma, L., Zhang, Y.: Using word2vec to process big text data. In: 2015 IEEE International Conference on Big
Data (Big Data), pp. 2895–2897 (2015). IEEE
65. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification
algorithms: A survey. Information 10(4), 150 (2019)
66. LeCun, Y., Bengio, Y., Hinton, G., et al.: Deep learning. nature, 521 (7553), 436-444. Google Scholar Google
Scholar Cross Ref Cross Ref, 25 (2015)
67. Tay, Y., Dehghani, M., Bahri, D., Metzler, D.: Efficient transformers: A survey. ACM Computing Surveys
55(6), 1–28 (2022)
68. Martin, D.: Hardening cuckoo sandbox against VM aware malware.
https://cybersecurity.att.com/blogs/labs-research/hardening-cuckoo-sandbox-against-vm-aware-malware
Additional Files
Additional file 1 — Sample additional file title
Additional file descriptions text (including details of how to view the file, if it is in a non-standard format or the file
extension). This might refer to a multi-page table or a figure.