0% found this document useful (0 votes)
3 views3 pages

Ransom

The document discusses the increasing threat of ransomware attacks and the application of machine learning (ML) techniques for detection. It details the development of a robust ML model that achieved 97.85% accuracy with a 2% false positive rate using a dataset of 1165 samples, including both benign and ransomware examples. The methodology involved data collection, preprocessing, feature extraction, and the application of various ML classifiers to enhance ransomware detection capabilities.

Uploaded by

201sc004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Ransom

The document discusses the increasing threat of ransomware attacks and the application of machine learning (ML) techniques for detection. It details the development of a robust ML model that achieved 97.85% accuracy with a 2% false positive rate using a dataset of 1165 samples, including both benign and ransomware examples. The methodology involved data collection, preprocessing, feature extraction, and the application of various ML classifiers to enhance ransomware detection capabilities.

Uploaded by

201sc004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Ransomware detec on using ML

Ransomware a acks have escalated recently and are affec ng essen al infrastructure and
enterprises across the globe. Unfortunately, ransomware uses sophis cated encryp on techniques to
encrypt important files on the targeted machine and then demands payment to decrypt the data.

Ar ficial intelligent techniques including machine learning have been increasingly applied in the field
of cybersecurity and greatly contributed to detec ng and preven ng different kinds of a acks
However, the number of studies that applied machine learning to detect ransomware are s ll limited
by the obfusca on of malware, the lack of se ng up a proper analysis environment, the accuracy of
models, and the high false-posi ve rate. Thus, it is crucial to develop effec ve ransomware detec on
based on machine learning techniques. This study aims to build a robust machine-learning model
that can recognize unknown samples using memory dumps to detect ransomware with high accuracy
and minimal false posi ves providing an extensive analysis of how memory traces can assist in the
detec on of ransomware. This goal was achieved by building a new dataset composed of recent
ransomware group a ack samples like Revil, Lockbit, and BlackCat, as well as a number of benign
samples, including office applica ons, Windows applica ons, and compression applica ons, which
were dynamically analyzed within an enhanced cuckoo sandbox to ensure the most reliable results.
Then, a set of machine learning models were developed, and a compara ve performance analysis
was conducted. Among the various models evaluated, XGBoost was the best-performing model,
using only 47 features out of 58. It achieved 97.85% accuracy with a 2% false posi ve rate.

Dataset
The dataset consists of 1165 benign and ransomware samples. The benign samples were collected
from legi mate public sources and cross-verified with Virus Total. The sample was excluded from the
dataset if it was flagged as malicious by at least one An -Virus (AV). Most samples were collected
from Informer and CNET.

1. CICIDS 2017 (Canadian Ins tute for Cybersecurity):

 Dataset: Contains a variety of network traffic data, including benign and malicious ac vi es,
with annota ons that include ransomware traffic.

 Details: It features a mix of normal and a ack traffic data, including ransomware-related
behaviors.

 Link: CICIDS 2017

2. CICIDS 2018 (Canadian Ins tute for Cybersecurity):

 Dataset: This dataset also includes network traffic data with labeled benign and malicious
traffic. Some ransomware behaviors are included.

 Details: Network traffic and system logs, along with labeled features.

 Link: CICIDS 2018


Implementa on
1. Data collec on

2. The, comes the data pre-processing. Irrelevant features like year, day, Bitcoin address were
dropped since year and day does not add any value to the classifica on and each of the
Bitcoin addresses are unique so it’s of no use in the task of detec ng bitcoin ransomware
addresses.

3. Binary label encoding was performed on target label, neighbours, length, count, looped
columns. In target label column white label was encoded as 1 and rest of the labels grouped
as black label was encoded as 0. In neighbour column those values which was greater than 2
was encoded 0 else 1, in length column those values which was greater than 8 was encoded
as 0 else 1, in count as well as looped columns those values which was greater than 1 was
encoded as 0 else 1. Encoding as 1 indicates it’s a non-ransomware address whereas
encoding 0 indicates it’s a ransomware address.

4. Model Building: The dataset was divided into 80% train and 20% test splits. Various scaling
techniques like StandardScaler, MinMaxSccaler, RobustScaler was applied to both the train
and test data to scale them down appropriately.

5. A er that, various supervised classifica on machine learning models was applied like logis c
regression, KNN, SVM, Decision Tree, Random Forest, AdaBoost, XGBoost, Neural Networks.
To evaluate our models, we have used metrics like accuracy, precision, recall, F1 score and
ROC

• True posi ve rate (T P R) = T P/ (T P + F N ……………... (1)

• False posi ve rate (F P R) = F P/ (F P + T N) ……………. (2)

• Precision = T P/ (T P + F P) ………………………………. (3)

• Recall = T P/ (T P + F N) …………………………………. (4)

• F-measure = 2∗Precision∗Recall/ (Precision + Recall) …… (5)

• Accuracy = T P + T N/ (T P + T N + F P + F N) …………. (6)

Methodology
Firstly, the methodology starts with iden fying criteria for the sample collec on. Secondly, collect
samples based on predefined criteria. Thirdly, pre-processing the samples and analyzing them based
on sandboxing techniques where we can recognize program behaviors during execu on.
Furthermore, feature extrac on turns the sandbox memory dumps into a set of features that can be
categorized under the memory level. Beyond that, select the best set of features to reduce noise and
obtain relevant features for ransomware behaviors although choosing the right features is a cri cal
step in improving detec on, reducing computa onal resources, and increasing processing speed. The
resul ng set is then subjected to a machine-learning model with various classifiers.
Related work
Ransomware detec on can be achieved by understanding ransomware behavior, monitoring, and
contrac ng indicators. While previous research has primarily focused on using memory dump
for malware detec on, we expand our review to include memory-based malware detec on in order
to apply it to ransomware detec on. Several studies u lize memory related features like memory
access pa erns and scanning memory dump for ransomware detec on with machine learning. The
following is a summary of research studies grouped according to detec on target ransomware or
malware.

Hirano & Kobayashi [13] focused on using hypervisor computer, which runs and monitors
mul ple virtual machine instances. As a result, they added a new func on to collect low-level
memory access pa erns and then apply this informa on to the machine's learning, resul ng in %95
accuracies in dis nguishing ransomware from benign samples.

Singh et al., [14] suggested a way to effec vely detect ransomware based on process memory access
privileges by u lizing a sandboxing environment using mul ple machine learning algorithms. As a
result, this technique achieves 96.28 % accuracy and will help to detect ransomware before any
major harm may happen. Besides, depending only on memory access privilege is not enough
indicator to detect ransomwares.

Medhat et al., [15] suggested a hybrid technique to detect obfuscated ransomware using the
enhanced Yara tool based on two features: scanning memory dumps and dropped files. which
reached a 96.2 % detec on rate and a 5 % false posi ve rate. The technique's sta c component is
constrained, and adding more features will likely improve accuracy and reduce the false posi ve rate.

The following studies have explored the use of memory dumps for malware detec on. One approach
is to convert memory dumps into images and apply machine learning techniques to analyze them.
Alterna vely, features can be extracted directly from memory analysis tools like Vola lity or based on
memory access pa erns. These methods have shown promise in detec ng and iden fying malware
in computer systems.

You might also like