0% found this document useful (0 votes)

55 views247 pages

An Overview of Machine Learning in Security

The document provides an overview of machine learning in security. It defines machine learning as the process of applying computing resources to implement algorithms that can learn from data to predict outcomes. Various machine learning techniques are discussed, including linear regression, logistic regression, Bayesian classification, support vector machines, decision trees, neural networks, and ensemble learning. Machine learning in security involves using these algorithms to identify patterns in data that can help detect anomalies and threats.

Uploaded by

Pankaj Singha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views247 pages

An Overview of Machine Learning in Security

Uploaded by

Pankaj Singha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 247

Chap1: Machine Learning in Security: An Overview #1

Devesh C Jinwala, Professor, CSE, SVNIT, Surat

January 6, 2023

Devesh C Jinwala,
Professor, SVNIT and Adjunct Prof., CSE, IIT Jammu
Department of Computer Science and Engineering,
Sardar Vallabhhai National Institute of Technology, SURAT
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 1 / 63
Chap 1: An Overview of Machine Learning in Security: Topics
Introduction to the Course Contents, Review of the Basic Machine Learning
Concepts. Foundations of Machine Learning for Security: Artificial
Intelligence and Machine Learning.
Review of the ML techniques. Machine Learning problems viz. Classification,
Regression, Clustering, Association rule learning, Structured output,
Ranking. Linear Regression. Logistics Regression and Bayesian Classification.
Support Vector Machines, Decision Tree and Random Forest, Neural
Networks, DNNs , Ensemble learning. Principal Components Analysis.
Un-supervised learning algorithms: K-means for clustering problems, K-NN
(k nearest neighbours). Apriori algorithm for association rule learning
problems. Generative vs Discriminative learning. [4 hours]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 3 / 63
What is Machine Learning?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 4 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.

Formal definition
Formally, machine learning is defined as the complex computation process of
automatic pattern recognition and intelligent decision making based on
training sample data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.

The formulas and procedures - derived from mathematical concepts

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.

The formulas and procedures - derived from mathematical concepts
for appropriate ML task say classification, would be built into the ML
algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.

......continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.

The formulas and procedures - derived from mathematical concepts
for appropriate ML task say classification, would be built into the ML
algorithms.
The ML algorithms would be implemented in programming code to perform
calculations on our data
after which the algorithm/program typically generates an output known as a
model.
......continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

......continued
The process of generating the model is known as training the model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

......continued
The process of generating the model is known as training the model.
This model describes the rules, numbers, and any other algorithm-specific
data structures that our machine learned from the data.
Our machine can then use the model to perform its task.
Thus, a key characteristic of ML is the concept of self-learning.
i.e. the application of statistical modeling to detect patterns and improve
performance based on data and empirical information; all without direct
programming commands.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Self learning, in summary
ML uses data as input to build a decision model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Self learning, in summary
ML uses data as input to build a decision model.
decisions are generated by deciphering relationships and patterns in the data using probabilistic
reasoning, trial and error, and other computationally-intensive techniques.
means that the output of the decision model is determined by the contents of the input data
rather than any pre-set rules defined by a human programmer.
the human programmer is still responsible

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
Lineage of ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 11 / 63
Anatomy of Machine Learning: Where does it fit in
It

Figure: Lineage of ML [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 12 / 63
Relationship between data related fields

Data Science comprises methods and

systems to extract knowledge and insights
from data with the aid of computers

Figure: Visual representation of the

relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for

Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and

systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks

Figure: Visual representation of the

relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for

Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and

systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks
A simile:

Figure: Visual representation of the

relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for

Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and

Figure: Visual representation of the

relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for

Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and

footnoteTheobald, Oliver. Machine Learning for

Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and

systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks
A simile:
Industrial Revolution → an era of
machines simulating physical tasks,
AI → development of machines capable of
simulating cognitive abilities.
AI includes the subfields search and Figure: Visual representation of the
planning, reasoning and knowledge relationship between data-related
representation, perception, natural language fields
processing (NLP), and machine learning footnoteTheobald, Oliver. Machine Learning for
Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Machine learning overlaps with data mining

ML overlaps with data mining—a discipline based

on discovering and unearthing patterns in large
datasets.

Figure: Visual representation of the

relationship between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining

ML overlaps with data mining—a discipline based

on discovering and unearthing patterns in large
datasets.
Both techniques rely on inferential methods, i.e.
predicting outcomes based on other outcomes
and probabilistic reasoning,

Figure: Visual representation of the

relationship between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining

ML overlaps with data mining—a discipline based

on discovering and unearthing patterns in large
datasets.
Both techniques rely on inferential methods, i.e.
predicting outcomes based on other outcomes
and probabilistic reasoning,
both draw from a similar assortment of Figure: Visual representation of the
algorithms includingprincipal component analysis, relationship between data-related fieldsa
regression analysis, decision trees, and clustering
techniques
a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental

process of self-learning and
automatically detecting patterns
through experience derived from
exposure to data,

Figure: Visual representation of the relationship

between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental

process of self-learning and
automatically detecting patterns
through experience derived from
exposure to data,
whereas, data mining is a less
autonomous technique of
extracting hidden insight

Figure: Visual representation of the relationship

between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental

process of self-learning and
automatically detecting patterns
through experience derived from
exposure to data,
whereas, data mining is a less
autonomous technique of
extracting hidden insight
it seeks out patterns and
relationships that are yet to be
mined - well-suited for
understanding large datasets with Figure: Visual representation of the relationship
complex patterns between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
What Machine Learning is Not?
Artificial intelligence

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
helps us create AI, but is not the only way
to achieve it.
refers to statistical learning algorithms
that are able to create generalizable
abstractions (models) by seeing and
Figure: ML is part of AIa
dissecting a dataset.

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
Data Mining

Figure: Data Mining

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 17 / 63
Data Mining...

Figure: Data Mining

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 18 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
learning user patterns and/or behaviors - that is critical for intrusion detection
and attack predictions

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Overview of ML tasks and
Examples

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 20 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
image recognition, and
natural language processing
transcription
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
transcription
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
causal modeling
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
causal modeling
anomaly detection
link profiling and so on....

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement

...........this is continued

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Here, this model is different from the one in the previous slide.....and hence
produces different output. But, at no stage the decision model is
programmed to produce the two required outcomes.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model

- model learns to flag emails as
spams

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model

- model learns to flag emails as
spams
this is based on the identified
suspicious subject lines and body
text containing keywords from the
mails flagged by users as spam n
the past.

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model

- model learns to flag emails as
spams
this is based on the identified
suspicious subject lines and body
text containing keywords from the
mails flagged by users as spam n
the past.
e.g. words like dear friend, free,
invoice, PayPal, Viagra, casino,
payment, bankruptcy, and winner
however, as more data is analyzed,
the model might also find
exceptions and incorrect Figure: Spam Mail Detection using MLa

assumptions
a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model

assumptions
a Eman M.Bahgat et al
this could render the model
susceptible to bad predictions.
.....continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false

negatives) predicted by the model

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false

negatives) predicted by the model
depend on the quality and the
quantity of the data supplied
during the training

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false

negatives) predicted by the model
depend on the quality and the
quantity of the data supplied
during the training
e.g. if there is limited data to
reference its decision, the email
subject viz. “PayPal has received
your payment for Casino Royale
purchased on eBay.” might be
wrongly classified as spam
traditional programming is highly
Figure: Spam Mail Detection using MLa
susceptible to this problem
a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data

as a way to refine the model,
adjusts weak assumptions,

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data

as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data

as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1
Does more data lead to better
predictions ?

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data

as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1
Does more data lead to better
predictions ?
No

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data

as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1
Does more data lead to better
predictions ?
No
while data is used to source the
self-learning process, more data do
not always equate to better Figure: Spam Mail Detection using ML
decisions;

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
“When looking for the needle, the last thing you want to do is pile lots more
hay on it.”

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
then, modifications must then be made to the model, e.g., email notifications
issued from the sending address ”[email protected]” should be excluded
from spam filtering.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
Testing data

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
thus the model already ”knows” the training data - but how it performs on
new test data will lead to know if it’s working accurately or if it requires more
training data to perform

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training, Validation & Testing Data Sets