0% found this document useful (0 votes)
18 views

Automation Detection of Malware and Stenographical Content Using Machine Learning

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
0% found this document useful (0 votes)
18 views

Automation Detection of Malware and Stenographical Content Using Machine Learning

Contact us for project abstract, enquiry, explanation, code, execution, documentation. Phone/Whatsap : 9573388833 Email : [email protected] Website : https://dcs.datapro.in/contact-us-2 Tags: btech, mtech, final year project, datapro, machine learning, cyber security, cloud computing, blockchain,

Uploaded by

dataprodcs
Copyright
© © All Rights Reserved
You are on page 1/ 11

ABSTRACT

In recent times many malware attacks increasing in our society. Mainly image-
based malware attacks are spreading worldwide and many people get harmful
malware-based images through the technique called steganography. In the
existing system, only open malware and files from the internet is identified.

The image-based malware cannot be identified and detected so many phishers


make use of this technique and exploit the target. Social media platforms would be
totally harmful to the users. To avoid these difficulties, by implementing Machine
learning we can find the steganographic malware images(contents).

Our proposed methodology developing an Automation detection of malware and


steganographic content using Machine Learning. Steganography is the field of
hiding messages in apparently innocuous media (e.g., images), and steganalysis
is the field of detecting this covert malware.

We propose a machine learning (ML) approach to steganalysis. In the existing


system, only open malware and files from the internet are identified. But in recent
times many people get harmful malware-based images through the technique
called steganography. Social media platforms would be totally harmful to the
users.

To avoid these difficulties, by implementing Machine learning we can find the


steganographic malware images(contents). We use the steganalysis method using
machine learning for logistic classification. By using this we can spot and get
escape from the malware images sharing in social media like WhatsApp,
Facebook without downloading it. It can be also used for all the photo-sharing sites
such as google photos.

V
LIST OF FIGURES

Figure no. Name of the Figure Page no.

4.1 Input JPG image 41

4.2 Output image 41

4.3 Change in Output image 42


4.4 Malware Detection simulation 43

4.5 RGB Layer Identification Step 44

5.1 LSB Graph 50

5.2 False rate graph 50

5.3 Output image 51


5.4 Output image 51
5.5 Binary code image 52

VI
TABLE OF CONTENT

CHAPTER NO. TITLE PAGE NO


1 INTRODUCTION 1
2 LITERATURE SURVEY

2.1 Survey Walk Through 2


2.2 Tensor Flow 2
2.3 Opencv 2
2.4 keras 6
2.5 Numpy 7
2.6 Neural Networks 9
2.7 Convolutional Neural Network 14

3 IMPLEMENTATION
3.1 Image Processing 19
3.1.1 Digital Image Processing 19
3.1.2 Pattern Recognition 20
3.2 Basic approaches to malware detection 21

VII
3.3 Machine learning 22
4
METHODOLOGY 3.4 Unsupervised Learning 22

3.5 Supevised Learning 23


4
. 3.6 Deep Learning 24
13.7 Machine Learning Applications 24

Methodology 29

4.1.1 Training Model 29


4.1.2 Segmentation 29
4.2 Classification 30
4.3 Testing 34

5 RESULT

5.1 Result 49
5.2 Performance Analysis 52

6 CONCLUSION AND FUTURE SCOPE

6.1 Future Scope 54


6.2 Conclusion 54
7 APPENDIX
a) Sample code 58

VIII
IX
CHAPTER 1
INTRODUCTION

By definition, steganography is a technique or art of concealing a type of data


within a different type of data. The word steganography derives from the Greek
words stegano (sealed) and graph (writing), thus meaning "writing a sealed
message. The technique was historically used by governments to hide sensitive
information. One interesting form of steganography sends and receives secret
messages publicly.

There is no way to discover the hidden message except by the sender and
receiver. Because the secret message is embedded in the cover file, anyone
observing it as an ordinary file does not notice that the cover file contains secret
information, thus making steganography more secure.

The person who knows whether the cover file contains secret information is the
only one who can attempt to steal it.Machine learning is the main domain used for
modern steganography purposes. The major reason is the modern problem needs
a modern solution. Machine learning powerful prediction algorithm helps to find out
the stego content. It can be also useful for filtering the contents in the transmission
area.

Image Steganography is a type of steganography. Common template is already


programmed regarding the stego and the software identifies the text by matching
the template.[5]
A review of LSB image steganography techniques is used for small types of text
and URLs. It cannot find large-sized texts compared to the other techniques.It is
mostly based on the LSB algorithm and its accuracy level is very low.[6]
Detection of LSB alternate and LSB identical Steganography Using Gray Level
Run Length Matrix Using an old model system which is very useful for encrypting
the system. Grayscale image recognition is very useful for encrypting the text
alone and it is not useful for encrypting malware attacks.[7] Enhance security and
ability for Arabic text steganography using 'Kashida' extensions. is very time-
consuming for encrypting the texts.

1
CHAPTER 2
LITERATURE SURVEY

2.1 SURVEY WALKTHROUGH:

The domain analysis that we have done for the project mainly involved
understanding the neural networks

2.2 TensorFlow:

TensorFlow is a free and open-source software library for dataflow and


differentiable programming across a range of tasks. It is a symbolic math
library, and is also used for machine learning applications such as neural
networks. It is used for both research and production at Google.

Features: TensorFlow provides stable Python (for version 3.7 across all
platforms) and C APIs; and without API backwards compatibility guarantee:
C++, Go, Java, JavaScript and Swift (early release). Third-party packages are
available for C#, Haskell Julia, MATLAB,R, Scala, Rust, OCaml, and
Crystal."New language support should be built on top of the C API. However,
not all functionality is available in C yet." Some more functionality is provided
by the Python API.

Application: Among the applications for which TensorFlow is the foundation,


are automated image-captioning software, suchas DeepDream.

2.3 Opencv:

OpenCV (Open Source Computer Vision Library) is a library of programming


functions mainly aimed at real-time computer vision.[1] Originally developed
by Intel, it was later supported by Willow Garage then Itseez (which was later
acquired by Intel[2]). The library is cross-platform and free for use under the
open-source BSD license.

2
OpenCV's application areas include:

 2D and 3D feature toolkits


 Egomotion estimation
 Facial recognition system
 Gesture recognition
 Human–computer interaction (HCI)
 Mobile robotics
 Motion understanding
 Object identification
 Segmentation and recognition

Stereopsis stereo vision: depth perception from 2 cameras

 Structure from motion (SFM).


 Motion tracking
 Augmented reality

To support some of the above areas, OpenCV includes a statistical machine


learning library that contains:

 Boosting
 Decision tree learning
 Gradient boosting trees
 Expectation-maximization algorithm
 k-nearest neighbor algorithm
 Naive Bayes classifier
 Artificial neural networks
 Random forest
 Support vector machine (SVM)
 Deep neural networks (DNN)

AForge.NET, a computer vision library for the Common Language Runtime


(.NET Framework and Mono).

3
ROS (Robot Operating System). OpenCV is used as the primary vision
package in ROS.

VXL, an alternative library written in C++.

Integrating Vision Toolkit (IVT), a fast and easy-to-use C++ library with an
optional interface to OpenCV.

CVIPtools, a complete GUI-based computer-vision and image-processing


software environment, with C function libraries, a COM-based DLL, along with
two utility programs for algorithm development and batch processing.

OpenNN, an open-source neural networks library written in C++. List of free

and open source software packages

 OpenCV Functionality
 Image/video I/O, processing, display (core, imgproc, highgui)
 Object/feature detection (objdetect, features2d, nonfree)
 Geometry-based monocular or stereo computer vision (calib3d,
stitching, videostab)
 Computational photography (photo, video, superres)
 Machine learning & clustering (ml, flann)
 CUDA acceleration (gpu)

 Image-Processing:

Image processing is a method to perform some operations on an image, in


order to get an enhanced image and or to extract some useful information
from it.

If we talk about the basic definition of image processing then ―Image

4
processing is the analysis and manipulation of a digitized image, especially in
order to improve its quality‖.

Digital-Image :

An image may be defined as a two-dimensional function f(x, y), where x and y


are spatial(plane) coordinates, and the amplitude of fat any pair of coordinates
(x, y) is called the intensity or grey level of the image at that point.

In another word An image is nothing more than a two-dimensional matrix (3-D


in case of coloured images) which is defined by the mathematical function f(x,
y) at any point is giving the pixel value at that point of an image, the pixel
value describes how bright that pixel is, and what colour it should be.

Image processing is basically signal processing in which input is an image and


output is image or characteristics according to requirement associated with
that image.Image processing basically includes the following three steps :
Importing the image. Analysing and manipulating the imageOutput in which
result can be altered image or report that is based on image analysis

Applications of Computer Vision:


Here we have listed down some of major domains where Computer Vision is
heavily used.

 Robotics Application
 Localization − Determine robot location automatically
 Navigation
 Obstacles avoidance
 Assembly (peg-in-hole, welding, painting)
 Manipulation (e.g. PUMA robot manipulator)
 Human Robot Interaction (HRI) − Intelligent robotics to interact with and
serve people

 Medicine Application
 Classification and detection (e.g. lesion or cells classification and tumor

5
coding necessary for writing deep neural network code. The code is hosted on
GitHub, and community support forums include the GitHub issues page, and a
Slack channel.

In addition to standard neural networks, Keras has support for convolutional


and recurrent neural networks. It supports other common utility layers like
dropout, batch normalization, and pooling.

Keras allows users to productize deep models on smartphones (iOS and


Android), on the web, or on the Java Virtual Machine. It also allows use of
distributed training of deep-learning models on clusters of Graphics
processing units (GPU) and tensor processing units (TPU) principally in
conjunction with CUDA.

Keras applications module is used to provide pre-trained model for deep neural
networks. Keras models are used for prediction, feature extraction and fine
tuning. This chapter explains about Keras applications in detail.

Pre-trained models

Trained model consists of two parts model Architecture and model


Weights. Model weights are large file so we have to download and extract the
feature from ImageNet database. Some of the popular pre-trained models are
listed below,

 ResNet
 VGG16
 MobileNet
 InceptionResNetV2
 InceptionV3

2.5 Numpy:

NumPy (pronounced /ˈnʌmpaɪ/ (NUM-py) or sometimes /ˈnʌmpi/ (NUM-pee)) is a


library for the Python programming language, adding support for large, multi-

You might also like