Final Synposis
Final Synposis
A Project Synopsis on
Submitted by:
BINDHU SHREE G V 1SJ20CS027
CHANDAN GOWDA N 1SJ20CS032
SANJANA S 1SJ20CS128
SHWETHASHREE K V 1SJ20CS140
One of the most significant issues facing internet users nowadays is malware. Malware
is any software intentionally designed to cause damage to a computer, server, client, or
computer network. A wide variety of malware types exist, including computer viruses,
worms, Trojan horses, ransomware, spyware, adware, rogue software, wiper and
scareware. Polymorphic malware is a new type of malicious software that is more
adaptable than previous generations of viruses. Polymorphic malware constantly
modifies its signature traits to avoid being identified by traditional signature-based
malware detection models. To identify malicious threats or malware, we will use a
number of machine learning techniques. Machine learning algorithms can be used to
detect malware by identifying its behaviour and other characteristics. The proposed
approach is based on computing the difference in correlation symmetry integrals.
Which demonstrates that machine learning algorithms can be used to effectively detect
malware, even polymorphic malware. This is good news for internet users, as it can
help to improve the security of computer systems and networks.
TABLE OF CONTENTS
Sl. No. Chapters Page No.
1. Introduction 2
2. Literature Survey 6
3. Objectives 7
4. Study area and Methodology 8
5. Software and Hardware Requirements 10
6. Expected Outcome 11
References 12
Chapter 1
INTRODUCTION
Malware is a major threat to the security of computer systems and networks. Cyberattacks are currently
the most pressing concern in the realm of modern technology. The word implies exploiting a system’s
flaws for malicious purposes, such as stealing from it, changing it, or destroying it. Malware is an
example of a cyberattack. Malware is any program or set of instructions that is designed to harm a
computer, user, business, or computer system. The term “malware” encompasses a wide range of threats,
including viruses, Trojan horses, ransomware, spyware, adware, rogue software, wipers, scareware, and
so on. Malicious software, by definition, is any piece of code that is run without the user’s knowledge or
consent. Traditional signature-based malware detection methods are becoming increasingly ineffective
against new and emerging malware strains. Machine learning (ML) algorithms have the potential to
overcome these limitations by detecting malware based on its behaviour and other characteristics. Both
static and dynamic learning methods may be used to identify behavioral similarities between members of
the same family of malware. Unlike static analysis, which examines dangerous files’ contents without
actually running them, dynamic analysis takes their behavior into account by tracking data flows,
recording function calls, and adding monitoring code to dynamic binaries. Machine learning algorithms
may leverage such static and behavioral artefacts to describe the ever-evolving structure of
contemporary Symmetry 2022, 14, 2304 3 of 11 malware, allowing them to identify increasingly
complex malware assaults that could otherwise avoid detection using signature-based techniques. As
machine learning-based solutions do not rely on signatures, they are more successful against newly
released malware. Deep learning algorithms that can perform feature engineering on their own can be
used to obtain and represent features more accurately.
Our synopsis a comprehensive survey of ML algorithms for malware analysis and detection. We here
discuss the different types of ML algorithms that can be used for malware detection, as well as the
different features that can be extracted from malware samples for classification. We also review the
state-of-the-art ML-based malware detection systems and their performance.
In the introduction, we define the need for ML-based malware detection and discuss the advantages of
ML over traditional signature-based methods. We then provide a brief overview of the different types of
ML algorithms that can be used for malware detection, as well as the different features that can be
extracted from malware samples for classification. Finally, we review the state-of-the-art ML-based
malware detection systems and their performance.
Chapter 3
OBJECTIVES
Dataset:
Collect a diverse dataset of malwares. The collection has many data files that include
log data for various types of malwares. These recovered log features may be used to
train a broad variety of models.
Pre-Processing:
Data will be stored in the file system as binary code, and the files themselves will be
unprocessed executables. Unpacking the executables requires a protected
environment, or virtual machine (VM).
Features Extraction:
We will be building a smaller set of features from a larger set; this technique is
commonly used to maintain the same degree of accuracy while using fewer features.
The goal is to refine the existing dataset of dynamic and static features by keeping
those that were most helpful and eliminating those that were not valuable for data
analysis.
Features Selection:
After completing feature extraction, which involves the discovery of more features,
feature selection is performed. Feature selection is a crucial process for enhancing
accuracy, simplifying the model, and reducing overfitting, as it involves choosing
features from a pool of newly recognized qualities.
Chapter 5
SOFTWARE AND HARDWARE REQUIREMENTS
5.1 Hardware Requirements:
1. Processor: Intel core Duo 2.0Ghz or more.
2. RAM: 8GB or more
3. Hard disk: 80GB or more
4. Monitor: 15" CRT or LCD monitor
1. Malware Detection Algorithm: The system must employ machine learning techniques for
malware detection, including Naive Bayes, SVM, J48, RF, and a proposed approach.
2. High Detection Accuracy: The selected algorithm must achieve a high detection ratio,
ensuring accurate identification of malicious threats.
3. Confusion Matrix: The system should generate a confusion matrix to measure false
positives and false negatives, providing additional performance insights.
4. Comparison of Classifiers: The system must compare the performance of DT, CNN, and
SVM algorithms in terms of detection accuracy, particularly on a small False Positive Rate
(FPR).
5.4 Non-Functional Requirements:
[1] Akhtar, M.S.; Feng, T, “Malware Analysis and Detection Using Machine Learning
Algorithms (2022)”, DOI:10.3390/sym14112304.
[2] Akshit Kamboj, Priyanshu Kumar, Amit Kumar Bairwa , “Detection of malware in
downloaded files using various machine learning models (2022)”,DOI:
https://doi.org/10.1016/j.eij.2022.12.002.
[3] Raj Sinha, “Study Of Malware Detection Using Machine Learning”, DOI:
10.13140/RG.2.2.11478.16963.
[4] Souri, Hosseini Hum, Cent. Comput. Inf. Sci., “A State-Of-The-Art Survey Of Malware
Detection Approaches Using Datamining Techniques(2018)
”,DOI:org/10.1186/s13673.