Identification and Analysis of Ransomware Transactions in The Bitcoin Network
Identification and Analysis of Ransomware Transactions in The Bitcoin Network
04
Int. J. Advance Soft Compu. Appl, Vol. 16, No. 2, July 2024
Print ISSN: 2710-1274, Online ISSN: 2074-8523
Copyright © Al-Zaytoonah University of Jordan (ZUJ)
Abstract
1 Introduction
The term cryptocurrency has now become a household name. Bitcoin (2021), the forerunner of
all cryptocurrencies, is a peer-to-peer digital currency that can be transmitted over the internet to
facilitate any financial transactions that take place using hard cash. This has enabled two parties
to directly transact funds without the need of a third party (Nakamoto 2008). Such transactions
making use of digital, or cryptocurrencies are done in a decentralized manner, where there is no
regulating authority to maintain and manage neither the funds nor the corresponding transactions
and are maintained in a public ledger, called as Blockchain.
Blockchain is maintained in a trustless environment and stores all transactions in a chronological
manner marked by timestamps (Nakamoto 2008). Any participant of the network can verify the
transactions that take place in the network via cryptographic Proof of Work (PoW) (Wu et al.
2008). The idea of Blockchain was first implemented in 2009. All the transaction details are stored
in a block and a chain of blocks make up the Blockchain network. New blocks can be added to the
chain of existing blocks, whereas, deleting or modifying any information present in any block is
highly impossible due to the presence of linked hashes as depicted in Figure 1. Blockchain network
also houses several features such as decentralized nodes to store and manage the transaction data,
information persistence on a public ledger, participant anonymity, and public auditability.
Anonymity of the participants is well maintained in the Bitcoin network, and this basically
makes it extremely difficult to trace the actual identity of the sender or the receiver (Reid and
Harrigan 2013). The major advantage of the Blockchain ecosystem is the usage of digital addresses
to provide total anonymity to the participants in the network. Each participant is identified by a
unique Bitcoin address which is generated by a unidirectional function. This address includes a
pair of keys – public and private (Herrera-Joancomartí 2017). The private key is only visible to
the user. This generated key is used to authenticate transactions involving spending of the Bitcoins
held by the user. An elliptic curve multiplication algorithm is applied on the private key to generate
the public key. The public key upon applying a double hash function results in the public key hash.
This hash is encrypted to generate the publicly visible address to which other users can send or
from which they can receive Bitcoins.
Figure 2 gives a diagrammatic representation of the address generation process. The users’
personal data remains anonymous in the Blockchain ecosystem. Though there are various tools to
de-anonymize the identity of a person to some extent, there is very little means to trace and
recognize the identity of anyone participating in a Bitcoin transaction. Though this does have some
advantages to it, the major drawback is that hackers and attackers also make use of this network
to their advantage. Moreover, the absence of a cap on the number of addresses a user can create
can give rise to situations where a single user can control multiple transactions and thereby engage
in committing illegal activities on the network. But it is worth mentioning that tracing multiple
activities to a user is still a possibility since all the transactions taking place in the network is made
available publicly (Reid and Harrigan 2013).
Private
key Public
Elliptic curve multiplication key
Double hash functions
Public key
hash
Encryption
Public address
G. Somasundaram et al.. 50
One of the biggest challenges faced by governments and security firms across the globe is the
usage of cryptocurrency to pay off ransomware attackers to retrieve encrypted information.
Ransomware is a form of malicious software or malware that is premeditated to take control of the
victim’s machine to encrypt information and hold it at ransom (McAfee 2021). There will be a
ransom demand for restoring the computer back to its previous usable state. It is transmitted over
the internet to infect other systems. Upon exploiting the files in the system, the binary file is
executed to encrypt all the sensitive files. Since it uses an asymmetric encryption, this results in
public-private pair of keys being generated. Each victim would require the unique private key held
by the attacker to reclaim their files. The key will only be shared upon the victim paying the
demanded ransom via the specified online payment mechanism.
Currently, there are three types of ransomwares (Nieuwenhuizen 2017):
1. Locker ransomware – Work by blocking access to the computer.
2. Crypto ransomware – Works by making the victim’s data unusable via means of
encryption algorithms.
3. Locker/Crypto ransomware – A combination of blocking a user from using their
computer while all their data is being encrypted by a malware.
Crypto ransomware is more prevalent due to the usage of strong encryption algorithms that
make it nearly impossible to decrypt and retrieve the data without the availability of a key. These
malwares are booming to success due to the presence of cryptocurrencies such as Bitcoin and
Ethereum (2021). These greatly benefit the attackers since it is easy to quickly move the ransom
money anywhere across the globe with very little to no ability to be tracked. By identifying these
attackers in the network, one can report these addresses in various public portals such as Bitcoin
Abuse Database (2021), Scam Alert (2021), where all the addresses used by hackers and criminals
are made available to the public so as to check and prevent monetary transfers to these accounts.
Users can also report cases to the local law enforcement agencies and government agencies such
the FBI. In very few cases, government agencies have shown that they were able to seize the
Bitcoins paid to these attackers as in the case of Colonial Pipeline Ransomware attack (Office of
Public Affairs 2021).
Detecting these ransomware payments in the Bitcoin network is a challenge and one tool that
can aid in this process is Machine Learning (ML). Machine Learning is believed to be a subset of
Artificial Intelligence (AI) that involves building a mathematical model to ascertain and learn the
diverse nature of the sample data, usually referred to as training data, which is a portion of the
entire dataset (Zhang 2020). The model so developed is capable of accurately capturing the
relationship between the data attributes and using that knowledge to make predictions or decisions
on its own (Zhang 2020). The training phase of these models makes the algorithms adapt
themselves based on experience or repetition so that their result is more accurate (El Naqa and
Murphy 2015).
To establish an effective way to detect ransomware payments made to fraudsters’ Bitcoin
addresses and thereby blocking those addresses from receiving future payments, classification
algorithms can be applied to identify and classify ransomware payments in the Bitcoin network.
This will help mitigate financial losses to ransomware attacks. The central goal of this work is to
discover and classify payments happening on the Bitcoin network based on transaction patterns.
To this end, a publicly available dataset, from UCI, tagging several Bitcoin transactions as normal
or ransomware payments was downloaded. The dataset was subject to various classification
models to assess the performance and find a model which best classifies the data. Three ML
algorithms were chosen namely, Random Forest, XGBoost and Balanced bagging classifier. These
51 Identification and Analysis …
models were selected for this study due to their performance in our pilot comparative studies.
Though the three algorithms are variations of the decision tree algorithm, their diverse
functionality provide a range of results which are compared in this work. This paper also gives a
detailed information as to how each of these algorithm work and enables easy comparison based
on the pseudocode.
The rest of the paper is systematized as follows: Section 2 is dedicated to providing a review
of related works in the identification and classification of ransomware and other attacks in the
digital financial transaction platform. Section 3 deals with the methodology adopted in this
research work. Dataset is given in Section 4 and the results and evaluation are discussed in Section
5. Lastly, the conclusion and future works are provided in Section 6.
2 Literature Review
Several researchers have proposed various ways to identify ransomware attacks and alleviate
the impact it causes. Few related works are discussed, highlighting the problem statement, the
mechanism to solve the problem and the limitations of these works. Table 1 shows a consolidated
description of the reviewed papers.
Yin and Vatrapu (2017) tried to decode the proportion of cybercrime activities that take place
in the Bitcoin ecosystem. The Bitcoin transaction data obtained was subject to various
preprocessing techniques. Selected features were supplied to the following thirteen models
namely, Linear Regression, Linear Discriminant Analysis, k-Nearest Neighbor Classifier (KNN),
Classification And Regression Trees (CART), Support Vector Machine Classifier (SVM), Multi-
Layer Perceptron Classifier (MLP), Naïve Bayes Classifier (NB), Random Forest Classifier
(RFC), Extremely Randomized Forests Classifier (ETC), Bagging Classifier (BGC), Gradient
Boosting Classifier (GBC), AdaBoost Classifier and Stochastic Gradient Descent Classifier (SGD)
respectively (Yin and Vatrapu 2017). Among these thirteen classifiers the top four best performing
classifiers and their Cross Validation (CV) accuracy were reported to be RFC with 77.38%, ETC
with 76.47%, BGC with 78.46%, and GBC with 80.76% respectively. Finally, the proportion of
cybercrimes to the total transactions taking place in the Bitcoin environment was predicted as
29.81% by BGC and as 10.95% by GBC. With respect to ransomware transaction, BGC classified
19.15% of transaction in the network as ransomware payments, while GBC classified the same as
5.28%.
Al-rimy et al. (2019) proposed an ensemble model, made up of three modules, to discover
crypto ransomware attacks. They were iBagging module, Enhanced Semi-Random Subspace
module and base classifiers. iBagging (Al-rimy et al. 2019) is an incremental bagging approach to
build data subsets from the existing dataset. This helps build up the data that could be supplied to
the classifier even when a previously unknown crypto ransomware attack tries to take place. Joint
Mutual Information was utilized to rank and thereby select the primary features required to make
predictions (Alshemmari, 2024). The Enhanced Semi-Random Subspace (ESRS) (Al-rimy et al.
2019) selection is a method that tries to extract informative features from the selected primary
features, to enable the model to make accurate predictions. ESRS ensures that the variety of the
data is maintained in the selected subspace. The output of this is fed to the ensemble classifier,
which is made up of SVM Classifier, Logistic Regression, RFC, Decision Tree Classifier,
AdaBoost Classifier, and MLP Classifier. The authors used Grid Search to select the prominent
combination of classifiers. They also implemented a majority voting scheme to make the final
decision. The overall average accuracy of this model was 96.8%. The only drawback in this model
G. Somasundaram et al.. 52
is the repetition of features across different subspaces which could have a solid impact on the
detection accuracy.
Yazdinejad et al. (2020) proposed a model that utilizes Long Short-Term Memory (LSTM).
LSTM is deliberated to be a form of Recurrent Neural Network (RNN). This deep learning model
used opcodes of various cryptocurrency applications that can be executed on Windows systems to
identify and classify ransomware payments. The acquired opcodes were filtered using tokenization
and converted to corresponding numeric values using embedding. The reduced data - 448 hidden
units and 512 unique opcodes were fed to the LSTM with Adam optimizer to update the weights
of the neural network layers. Finally, a 10-fold CV was applied to evaluate the model. Among
various configurations of the LSTM model, the optimum accuracy achieved was 98.25%. The
result was also compared with traditional ML models such as Random Forest, SVM, NB, MLP,
KNN, AdaBoost and Decision Tree and was found to be higher.
Dalal et al. (2021) came up with a model to identify miscreants involved in ransomware and
gambling in the Bitcoin ecosystem. The authors devised a transaction graph modeling the address-
to-address data. This is then converted to an entity graph which models actor-to-actor transactions
by identifying all the addresses belonging to a particular actor by clustering local data. Supervised
ML algorithms are then applied. The generated graph is broken into six sub-graphs and each sub-
graph is fitted into six different classifiers. The result of each of these is then combined by using
Stacking, an ensemble technique that results in a stacking probability. In the final stage, a stacking-
bagging model called as meta classifier is created. This model uses Linear Regression and the
output from stacking is fed into the meta classifier to get the final prediction. CV is used to
determine the accuracy of this model which was reported to be around 96% and 99%.
In the work done by Agarwal et al. (2021), the authors have come up with measures to identify
and classify malicious accounts in a permission-less Blockchain networks. Their study focused on
the Ethereum main net Blockchain (Ethereum 2021) from which the authors collected the
transaction data pertaining to gambling. They performed a time-series analysis on various features
to identify the graph based temporal features that describe the behaviors of malicious agents.
Primitive features like in-degree, out-degree, balance, neighbors are used in this process. As a
result, they identified 28 features amongst the total 400. Random-under sampling was utilized to
balance the highly imbalanced data. The data was then divided into multiple sub-datasets and were
fed to TPOT, an autoML tool that was supplied with all the supervised ML algorithms. The tool
identified as the best algorithm the one that gave the best balanced accuracy. From the experiment,
they identified that the Extra Trees Classifier performed the best with 88.7% balanced accuracy.
The classifier was validated on unseen data, for which it provided accuracy as low as 50%. This
was identified to be due to the evolving characteristics of the malicious accounts. The sub-datasets
were also tested on few unsupervised algorithms and k-means outperformed the others in correctly
clustering malicious accounts. Moreover, it was able to cluster unseen data better than the
supervised learning algorithm. Finally, the authors were also able to model the behavior changes
of Ethereum accounts to identify malicious and benign actors.
Al-Haija and Alsulami (2021) presented a Bitcoin transaction predictive system that utilizes
a Shallow Neural Network (SNN) and an Optimizable Decision Tree (ODT). To detect verified
and anomaly transactions and perform a binary classification, an ensemble classifier for the ODT
model or a Sigmoid classifier for the SNN model is used. When a multi-class classification task is
at hand, an ensemble ODT (Al-Haija and Alsulami 2021) and Softmax classifier for the SNN (Al-
Haija and Alsulami 2021) model are used. The accuracy for binary classifier was 99.9% and that
of multiclass classifier was 99.4%.
53 Identification and Analysis …
Another notable work is the Host-based Intrusion Detection System (HIDS) built using
Modified Vector Space Representation (MVSR) N-gram and Multilayer Perceptron (MLP) model
for securing the Internet of Things (IoT), based on lightweight techniques and using Fog
Computing devices (Khater et al., 2021). To maintain the lightweight criteria, the feature extraction
stage considers a combination of 1-gram and 2-gram for the system call encoding. In addition, a
Sparse Matrix is used to reduce the space by keeping only the weight of the features that appear in
the trace, thus ignoring the zero weights. Subsequently, Linear Correlation Coefficient (LCC) is
utilized to compensate for any missing N-gram in the test data. In the feature selection stage, the
Mutual Information (MI) method and Principal Component Analysis (PCA) are utilized and then
compared to reduce the number of input features. Following the feature selection stage, the
modeling and performance evaluation of various Machine Learning classifiers are conducted using
a Raspberry Pi IoT device.
A very recent work on an automated behavior-based detection model using Particle Swarm
Optimization (PSO), a wrapper-based feature selection algorithm is analyzed (Abbasi et al., 2023).
This model is used to efficiently classify ransomware transactions. The proposed method gave
similar results in binary classification as that of the base work. However, it did show an improved
performance in multiclass classification problems.
DBSCAN, 0.356 in
HDBSCAN, clustering
OneClassSV
M, K-Means
for clustering
(Al- Yes Yes Classification Binary and SNN and 99.9% and
Haija Multiclass ODT 99.4%
and classification
Alsula
mi
2021)
(Khater No No Classification PCA, LCC MVSR N- 96%
et al., gram and
2021) MLP
(Abbasi Yes No Classification Wrapper-based PSO 97.48%
et al., feature selection
2023)
3 Research Methodology
This section discusses the methodology adopted in this work. The ransomware identification
process consists of two stages, namely data acquisition and ML classifier modeling. The overall
flow of work is depicted in Figure 3.
3.2 Dataset
The Bitcoin Heist dataset was downloaded from UCI Machine Learning Repository (2020)
and analyzed. The dataset was constructed using addresses that were mined within a 24-hour
window to better track the movement of the coins in the network. Six important features pertaining
to an address ‘u’ were estimated from the data. Each of these features was carefully chosen to
explain the obscure behavior of ransomware payments. The six features are listed below (Akcora
et al. 2019).
55 Identification and Analysis …
Length – which indicates whether the specified address ‘u’ is the output address of a starter
transaction (length = 0) or not (length > 1). A length of 1 or more indicates how many non-
starter (intermediate) transactions have taken place before the coin ended up in this address.
Count – indicates the number of starter transactions which are associated to the address
‘u’.
Loop – is indicative of how many starter transactions are connected to the specified address
‘u’ in more than one directed path.
Weight – is the sum of the fraction of coins that have originated from some starter
transaction and ended up in ‘u’. This parameter is not concerned with the amount of coins
being transacted.
Neighbors – indicates the number of transactions which have ‘u’ as their output address.
Income – denotes the total number of coins ‘u’ has received from various transactions.
Finally, the dataset also includes the dependent feature – label, which denotes what type of
address ‘u’ is, that is, whether the address is used by ransomware attackers or not. The downloaded
dataset has 2,916,697 records with 10 columns. 8 of the 10 columns are numerical while 2 (address
and label) are categorical in nature. The label column indicates the number of white (normal) and
ransomware transactions.
Data
Transformation Training set Testing set
Bitcoin Heist
Dataset
Data Balancing
Data using SMOTE
Analyze Standardization
Dataset Data Preprocessing
Build a Machine
Learning
Classifier
Data Mining
4. Data Analysis
The experiment is done on the obtained dataset which includes normal and ransomware
transactions. Three ML classifiers were built using Python’s scikit-learn and imblearn libraries,
namely random forest, extreme gradient boosting and balanced bagging. Binary classification was
performed using all the three classifiers to classify the transaction in the dataset as normal or
ransomware i.e., 0 or 1.
The entire dataset was employed in training and testing the three models. The data was subset
in the ratio 70:30 to create training and testing sets. To ensure that the class distribution is
maintained among the training and testing set, stratified splitting is done. SMOTE is applied on
the training dataset to enable the classifier to learn from a balanced dataset (shown in Figure 5).
The classifier is then subject to the test dataset for classification. The results of the three classifiers’
performance are compared based on common model evaluation criteria specified in Section 4.1.
• Sensitivity or Recall – The proportion of actual positive predictions that were identified
correctly i.e., the number of actual positive values that were correctly predicted out of all the
actual positive values.
𝑁𝑜. 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑃
Sensitivity/Recall = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑜.𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 = 𝑇𝑃+𝐹𝑁
• Balanced Accuracy Score – In binary classification, it represents the average of the recall of
both the classes when there is an imbalance in the dataset.
𝑟𝑒𝑐𝑎𝑙𝑙 𝑥 𝑟𝑒𝑐𝑎𝑙𝑙
Balanced Accuracy Score = 𝑟𝑒𝑐𝑎𝑙𝑙𝑐𝑙𝑎𝑠𝑠1 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝑐𝑙𝑎𝑠𝑠2
𝑐𝑙𝑎𝑠𝑠1 𝑐𝑙𝑎𝑠𝑠2
• Geometric Mean Score (G-measure or G-mean) – Gives the geometric mean between
precision and recall.
G-mean = √𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑟𝑒𝑐𝑎𝑙𝑙
5. Results
This segment publishes the results of the three classifiers with their performance in accurately
identifying and classifying Bitcoin transactions. The results are published in Table 4.
Fig 6. ROC-AUC Curve for Random Forest Fig 7. ROC-AUC Curve for XGBoost
Classifier Classifier
Due to the imbalance of classes present in the dataset the above-mentioned evaluation metrics
are alone not sufficient to determine the performance of these classifiers. Hence special evaluation
metrics used exclusively for imbalanced datasets such as Balanced Accuracy, MCC and G-Mean
are also calculated and compared in Table 5
Table 5: Model performance considering the imbalanced nature of the Bitcoin Heist
Ransomware Address Dataset
Model Balanced Matthews Correlation G-Mean
Accuracy Coefficient (MCC)
RFC 79.149771 0.148331 0.785226
XGB 76.023707 0.251986 0.737791
BBC 68.701046 0.397220 0.615233
6. Discussions
This section discusses the results and what they infer. The metrics in Table 4 and the ROC
Curves shown in Figures 7, 8, 9 show that among the three classifiers, balanced bagging performs
well since it has the highest accuracy as well as precision, indicating that this model got most of
the predictions right as compared to the other two models. This is again palpable from the high
value of F1-score. When looking at the AUC curves, we can see that the curve of BBC has more
closeness to the upper-left corner of the plot when compared to RFC and XGB. This area is
typically preferred as this is where the sensitivity of the model turns 1 while the false positive rate
is approaching 0 (Nahm, 2022). Yet the low recall value of BBC could be an indicator of quite a
number of negative values being predicted as positive. This along with the straight line in ROC
curve indicates that the model is probably subject to little overfitting.
As per the results obtained from balanced accuracy and G-mean, Random Forest classifier is
said to perform better. But like F1-score, balanced accuracy and G-mean tend to ignore the impact
of True Negative classifications and focuses only on the majority class which is labelled as positive
class. This is overcome by MCC which uses all the four measures from the confusion matrix in
determining the correlation coefficient (Chicco et al., 2021). Thus, in this work, we attribute more
weightage to MCC than the other metrics.
Statistical Inferences
Statistical inference is vital in research as it allows scientists to draw conclusions about a
population based on sample data. By generalizing findings, testing hypotheses, and estimating
parameters, it transforms raw data into meaningful insights. This process ensures that decisions
63 Identification and Analysis …
and conclusions are backed by rigorous quantitative analysis, enhancing the reliability and validity
of research outcomes.
In this study, we use the analysis of variance (ANOVA) technique. The following parameters
are used for this computation: accuracy, precision, recall, F1-score, balanced accuracy, MCC and
G-Mean. The three classifiers were also ranked in order of their performance as shown in Table 5.
The alpha value (significance level) is fixed at 0.05 (95%). The following are the null (H0) and
alternate (H1) hypotheses that were chosen for this analysis.
H0: There is a significant difference in the performance of the classifiers.
H1: There is no significant difference in the performance of the classifiers.
owing to the various benefits it provides such as transparent transactions, absence of any
intermediary authority, security, and user anonymity. All cyber criminals, especially ransomware
attackers, are using this space to make the most without getting caught. Ransomware victims are
required to transfer the ransom amount to the attacker’s address. Many a time, these transactions
are routed through different addresses before reaching the final destination. This makes the money
trail untraceable in most cases.
This paper aims to identify a classifier that can decode the Bitcoin addresses of ransomware
attackers and prevent these addresses from receiving any payments in the future. A publicly
available Bitcoin ransomware transaction dataset was employed in this study. Three different ML
classifiers – Random Forest, XGBoost and Balanced Bagging were developed and trained using a
part of the given dataset. These were then tested to check the efficacy of the models in correctly
classifying transactions as normal or ransomware. The models were evaluated not only based on
traditional ML metric such as accuracy but also using recall, precision, and F1-score. Evaluation
metrics specially designed for imbalanced classes were also employed. Based on the results, it
could be settled that the Balanced Bagging classifier outperformed the other two in terms of
accuracy, F1-score, and MCC. Though this classifier had an MCC of almost 0.4, this can still be
improved. Metrics like balanced accuracy and AUC are still low compared to the other classifiers
and can be tuned better.
Moreover, the highly imbalanced nature of this dataset also makes accurate prediction a
challenge, since the usage of a large amount of data could most probably result in model over-
fitting the data and thereby providing inaccurate results. In the future, more ensemble models can
be tested to classify the transactions. In order to improve the predictions, hyper-parameter tuning
can be employed with Grid Search to identify and select the best hyper-parameters to attain
maximum accuracy.
Declaration of interest
The authors declare no conflicts of interest.
Acknowledgment
The third author wants to tank Babes-Bolyai University, Cluj-Napoca, Romania for financial
support.
References
[1] Agarwal, R., Barve, S. & Shukla, S.K. Detecting malicious accounts in permissionless
blockchains using temporal graph properties. Appl Netw Sci 6, 9 (2021).
https://doi.org/10.1007/s41109-020-00338-3
[2] Al-Haija, Qasem Abu, and Abdulaziz A. Alsulami. 2021. "High Performance Classification
Model to Identify Ransomware Payments for Heterogeneous Bitcoin Networks" Electronics 10,
no. 17: 2113. https://doi.org/10.3390/electronics10172113
[3] Al-rimy BAS, Mohd AM, Syed ZMS (2019) Crypto-ransomware early detection model using
novel incremental bagging with enhanced semi-random subspace selection. Future Generation
Computer Systems, Vol. 101, 476-491, DOI: https://doi.org/10.1016/j.future.2019.06.005
[4] Akcora CG, et al. (2019) BitcoinHeist: Topological data analysis for ransomware detection on
the bitcoin blockchain. DOI: https://arxiv.org/abs/1906.07852v1
[5] Bitcoin (2021) Bitcoin - Open Source P2P Money. Available in: https://bitcoin.org/en/
Accessed on: December 31, 2021.
65 Identification and Analysis …
[6] Bitcoin Abuse Database (2021) Available in: https://www.bitcoinabuse.com/ Accessed on:
December 28, 2021.
[7] Breiman, L. (1996) Bagging predictors. Mach L..0arn 24, 123–140.
https://doi.org/10.1007/BF00058655
[8] Breiman, L. (2001) Random Forests. Machine Learning 45, 5–32.
https://doi.org/10.1023/A:1010933404324
[9] Chawla NV et al. (2002) SMOTE: synthetic minority over-sampling technique. Journal of
Artificial Intelligence Research. DOI: https://doi.org/10.1613/jair.953
[10] Chen T et al. (2015) Xgboost: extreme gradient boosting.
DOI: https://mran.microsoft.com/
[11] Chicco, D., Tötsch, N. & Jurman, G. The Matthews correlation coefficient (MCC) is more
reliable than balanced accuracy, bookmaker informedness, and markedness in two-class
confusion matrix evaluation. BioData Mining 14, 13 (2021). https://doi.org/10.1186/s13040-
021-00244-z.
[12] Dalal S, Zihe W, Siddhanth S (2021) Identifying Ransomware Actors in the Bitcoin Network.
DOI: https://arxiv.org/abs/2108.13807v1
[13] El Naqa, I., Murphy, M.J. (2015). What Is Machine Learning?. In: El Naqa, I., Li, R., Murphy,
M. (eds) Machine Learning in Radiation Oncology. Springer, Cham.
https://doi.org/10.1007/978-3-319-18305-3_1
[14] Ethereum (2021) Available in: https://ethereum.org/en/ Accessed on: December 31, 2021.
[15] Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The
Annals of Statistics, 29(5), 1189–1232. http://www.jstor.org/stable/2699986
[16] He H, Yunqian M (2013) Imbalanced learning: foundations, algorithms, and applications.
John Wiley & Sons.
[17] Herrera-Joancomartí, J. (2015). Research and Challenges on Bitcoin Anonymity. In: Garcia-
Alfaro, J., et al. Data Privacy Management, Autonomous Spontaneous Security, and Security
Assurance. Lecture Notes in Computer Science, vol 8872. Springer, Cham.
https://doi.org/10.1007/978-3-319-17016-9_1
[18] Hossin M, Md NS (2015) A review on evaluation metrics for data classification evaluations.
DOI: http://dx.doi.org/10.5121/ijdkp.2015.5201
[19] Khater, B.S.; Abdul Wahab, A.W.; Idris, M.Y.I.; Hussain, M.A.; Ibrahim, A.A.; Amin, M.A.;
Shehadeh, H.A. (2021) Classifier Performance Evaluation for Lightweight IDS Using Fog
Computing in IoT Security. Electronics, 10, 1633.
DOI: https://doi.org/10.3390/electronics10141633
[20] McAfee (2021) What is Ransomware? Available in:
https://www.mcafee.com/enterprise/en-in/security-awareness/ransomware.html Accessed on:
December 21, 2021.
[21] Nakamoto S (2008) Bitcoin: A peer-to-peer electronic cash system.
DOI: https://bitcoin.org/bitcoin.pdf
[22] Nieuwenhuizen D (2017) A behavioural-based approach to ransomware detection.
DOI: https://labs.f-secure.com/
[23] Office of Public Affairs (2021) Department of Justice Seizes $2.3 Million in Cryptocurrency
Paid to the Ransomware Extortionists Darkside. Available in:
https://www.justice.gov/opa/pr/department-justice-seizes-23-million-cryptocurrency-paid-
ransomware-extortionists-darkside Accessed on: December 28, 2021.
G. Somasundaram et al.. 66
[24] Parsa AB et al. (2020) Toward safer highways, application of XGBoost and SHAP for real-
time accident detection and feature analysis. Accident Analysis & Prevention, 136, 2020,
105405, DOI: https://doi.org/10.1016/j.aap.2019.105405
[25] Random Forest Classifier (2021) Available in:
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Accessed on: November 26, 2021.
[26] Reid, F., Harrigan, M. (2013). An Analysis of Anonymity in the Bitcoin System. In: Altshuler,
Y., Elovici, Y., Cremers, A., Aharony, N., Pentland, A. (eds) Security and Privacy in Social
Networks. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4139-7_10
[27] Scam Alert (2021) Available in: https://scam-alert.io/ Accessed on: December 28, 2021.
[28] Shah K et al. (2020) A comparative analysis of logistic regression, random forest and KNN
models for the text classification. DOI:https://doi.org/10.1007/s41133-020-00032-0.
[29] Shehadeh, H. A., Jebril, I. H., Jaradat, G. M., Ibrahim, D., Sihwail, R., Al Hamad, H., ... &
Alia, M. A. (2023). Intelligent Diagnostic Prediction and Classification System for Parkinson's
Disease by Incorporating Sperm Swarm Optimization (SSO) and Density-Based Feature
Selection Methods. J. Advance Soft Compu. Appl, 15(1).
[30] Standard Scaler Documentation (2021) Available in:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
Accessed on: December 28, 2021.
[31] UCI Machine Learning Repository (2020) Bitcoin Heist Ransomware Address Dataset.
Available in:
https://archive.ics.uci.edu/ml/datasets/BitcoinHeistRansomwareAddressDataset Accessed on:
November 19, 2021.
[32] Wu X, Kumar V, Quinlan J R, Ghosh J, Yang Q, Motoda H, Steinberg D. et al. (2008) Top
10 algorithms in data mining. DOI: https://doi.org/10.1007/s10115-007-0114-2
[33] XGBoost Documentation (2021) Available in: https://xgboost.readthedocs.io/en/stable/
Accessed on: December 1, 2021.
[34] Yazdinejad A et al. (2020) Cryptocurrency malware hunting: A deep recurrent neural network
approach. Applied Soft Computing,Vol. 96, 106630,
DOI: https://doi.org/10.1016/j.asoc.2020.106630
[35] H. Sun Yin and R. Vatrapu, "A first estimation of the proportion of cybercriminal entities in
the bitcoin ecosystem using supervised machine learning," 2017 IEEE International
Conference on Big Data (Big Data), Boston, MA, USA, 2017, pp. 3690-3699, DOI:
https://10.1109/BigData.2017.8258365
[36] Hang, XD. (2020). Machine Learning. In: A Matrix Algebra Approach to Artificial
Intelligence. Springer, Singapore. https://doi.org/10.1007/978-981-15-2770-8_6
[37] Alshemmari, M. (2024). Semiotics of the Images on Social Media Signifying the Boycott of
Western Products During Al- Aqsa Flood Crisis in 2023: An Analytical Study, Arab Journal
for the Humanities: 167, 133-186. https://doi.org/10.34120/ajh.v42i167.619
67 Identification and Analysis …
Notes on contributors
S. Sasikala holds the position of Professor in the Department
of Computer Science at the Institute of Distance Education
(IDE), University of Madras. Her academic qualifications
include B.Sc. in Computer Science, M.C.A., SLET in
Computer Science, M.Phil. in Computer Science, and Ph.D.
in Computer Science. She has contributed significantly to her
field with a focus on ML and Big Data Analytics. She has an
impressive publication record, with 44 works featured in both
Indian and international journals. Additionally, she holds two
patents and has presented papers at various national and
international seminars. Her active involvement in academia
is evident through her organization and participation in 79
workshops and seminars. Professor Sasikala's expertise lies
in the intersection of ML and Big Data Analytics.