An Overview of Machine Learning in Security
An Overview of Machine Learning in Security
January 6, 2023
Devesh C Jinwala,
Professor, SVNIT and Adjunct Prof., CSE, IIT Jammu
Department of Computer Science and Engineering,
Sardar Vallabhhai National Institute of Technology, SURAT
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 1 / 63
Chap 1: An Overview of Machine Learning in Security: Topics
Introduction to the Course Contents, Review of the Basic Machine Learning
Concepts. Foundations of Machine Learning for Security: Artificial
Intelligence and Machine Learning.
Review of the ML techniques. Machine Learning problems viz. Classification,
Regression, Clustering, Association rule learning, Structured output,
Ranking. Linear Regression. Logistics Regression and Bayesian Classification.
Support Vector Machines, Decision Tree and Random Forest, Neural
Networks, DNNs , Ensemble learning. Principal Components Analysis.
Un-supervised learning algorithms: K-means for clustering problems, K-NN
(k nearest neighbours). Apriori algorithm for association rule learning
problems. Generative vs Discriminative learning. [4 hours]
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 3 / 63
What is Machine Learning?
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 4 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.
An Example.....classifying animals as reptiles or mammals.... and the feature
gives birth to live offspring.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.
An Example.....classifying animals as reptiles or mammals.... and the feature
gives birth to live offspring.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.
An Example.....classifying animals as reptiles or mammals.... and the feature
gives birth to live offspring.
Formal definition
Formally, machine learning is defined as the complex computation process of
automatic pattern recognition and intelligent decision making based on
training sample data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.
How would one do so in the conventional programming ?
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.
How would one do so in the conventional programming ?
What are the issues with the approach followed in the conventional
programming ?
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.
How would one do so in the conventional programming ?
What are the issues with the approach followed in the conventional
programming ?
it is challenging to come up with the rules. How ?
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.
ML can autonomously derive robust rules to automate the classification
process,
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.
ML can autonomously derive robust rules to automate the classification
process,
thereby preventing the need for manually engineered rules.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.
ML can autonomously derive robust rules to automate the classification
process,
thereby preventing the need for manually engineered rules.
Again, how could it derive robust rules to automate the classification ?
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...
......continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
Lineage of ML
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 11 / 63
Anatomy of Machine Learning: Where does it fit in
It
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 12 / 63
Relationship between data related fields
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Machine learning overlaps with data mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
What Machine Learning is Not?
Artificial intelligence
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
helps us create AI, but is not the only way
to achieve it.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
helps us create AI, but is not the only way
to achieve it.
refers to statistical learning algorithms
that are able to create generalizable
abstractions (models) by seeing and
Figure: ML is part of AIa
dissecting a dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
helps us create AI, but is not the only way
to achieve it.
refers to statistical learning algorithms
that are able to create generalizable
abstractions (models) by seeing and
Figure: ML is part of AIa
dissecting a dataset.
e.g. self-driving car’s functions in a self-driving a Clarence and Chio, ML and Security..., O’Reilly Media
system
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
Data Mining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 17 / 63
Data Mining...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 18 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
learning user patterns and/or behaviors - that is critical for intrusion detection
and attack predictions
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
learning user patterns and/or behaviors - that is critical for intrusion detection
and attack predictions
can identify and describe structural patterns in the data automatically and
theoretically explain data and predict patterns.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
learning user patterns and/or behaviors - that is critical for intrusion detection
and attack predictions
can identify and describe structural patterns in the data automatically and
theoretically explain data and predict patterns.
automatic and theoretic learning require complex computation that calls for
abundant machine-learning algorithms.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Overview of ML tasks and
Examples
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 20 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing
transcription
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing
transcription
machine translation
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing
transcription
machine translation
anomaly detection
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
image recognition, and
natural language processing
transcription
machine translation
anomaly detection
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
transcription
machine translation
anomaly detection
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
machine translation
anomaly detection
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
anomaly detection
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
causal modeling
anomaly detection
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
causal modeling
anomaly detection
link profiling and so on....
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection
Testing and matching
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection
Testing and matching
Model monitoring
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks
Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection
Testing and matching
Model monitoring
Model retraining
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.
the model developed would be specific to this application, as compared to
any other, with different objectives.....
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.
the model developed would be specific to this application, as compared to
any other, with different objectives.....
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.
the model developed would be specific to this application, as compared to
any other, with different objectives.....
...........this is continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Here, this model is different from the one in the previous slide.....and hence
produces different output. But, at no stage the decision model is
programmed to produce the two required outcomes.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Here, this model is different from the one in the previous slide.....and hence
produces different output. But, at no stage the decision model is
programmed to produce the two required outcomes.
the model only decodes the complex patterns in the input data, and uses
machine learning to find connections without human help.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Here, this model is different from the one in the previous slide.....and hence
produces different output. But, at no stage the decision model is
programmed to produce the two required outcomes.
the model only decodes the complex patterns in the input data, and uses
machine learning to find connections without human help.
this also means that a related dataset collected from another time period,
with fewer or greater data points, might push the model to produce a
slightly different output.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,
and based on the same, allows constructing an effective self-learning model.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,
and based on the same, allows constructing an effective self-learning model.
A common example of self learning is a system for detecting spam email
messages.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,
and based on the same, allows constructing an effective self-learning model.
A common example of self learning is a system for detecting spam email
messages.
Let us primarily investigate the scenario.....
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
Another Example: An ML model for detecting spam email messages.
a Eman M.Bahgat et al
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.
a Eman M.Bahgat et al
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.
a Eman M.Bahgat et al
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.
assumptions
a Eman M.Bahgat et al
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.
assumptions
a Eman M.Bahgat et al
this could render the model
susceptible to bad predictions.
.....continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages...
a Eman M.Bahgat et al
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...
a Eman M.Bahgat et al
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...
a Eman M.Bahgat et al
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
“When looking for the needle, the last thing you want to do is pile lots more
hay on it.”
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
“When looking for the needle, the last thing you want to do is pile lots more
hay on it.”
that is, adding irrelevant data can be counter-productive to achieving a
desired result.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
“When looking for the needle, the last thing you want to do is pile lots more
hay on it.”
that is, adding irrelevant data can be counter-productive to achieving a
desired result.
In addition, the amount of input data should be compatible with the
processing resources and the available time.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
then, modifications must then be made to the model, e.g., email notifications
issued from the sending address ”[email protected]” should be excluded
from spam filtering.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
then, modifications must then be made to the model, e.g., email notifications
issued from the sending address ”[email protected]” should be excluded
from spam filtering.
thus, the model can then be trained to automatically detect these errors (by
analyzing historical examples of spam messages and deciphering their patterns)
without direct human interference.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
then, modifications must then be made to the model, e.g., email notifications
issued from the sending address ”[email protected]” should be excluded
from spam filtering.
thus, the model can then be trained to automatically detect these errors (by
analyzing historical examples of spam messages and deciphering their patterns)
without direct human interference.
subsequently, after developing the model based on patterns extracted from the
training data one can test the model on the remaining data, known as the test
data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
Testing data
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
thus the model already ”knows” the training data - but how it performs on
new test data will lead to know if it’s working accurately or if it requires more
training data to perform
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
thus the model already ”knows” the training data - but how it performs on
new test data will lead to know if it’s working accurately or if it requires more
training data to perform
hence, test data provides a final, real-world check of an unseen dataset to
confirm that the machine learning algorithm was trained effectively.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
thus the model already ”knows” the training data - but how it performs on
new test data will lead to know if it’s working accurately or if it requires more
training data to perform
hence, test data provides a final, real-world check of an unseen dataset to
confirm that the machine learning algorithm was trained effectively.
normally, there is a split of 80% for training and 20% for testing dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Testing dataset
After the model is built, testing data once again validates that it can make
accurate predictions.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Testing dataset
After the model is built, testing data once again validates that it can make
accurate predictions.
if training and validation data include labels to monitor performance metrics
of the model, the testing data should be unlabeled.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Testing dataset
After the model is built, testing data once again validates that it can make
accurate predictions.
if training and validation data include labels to monitor performance metrics
of the model, the testing data should be unlabeled.
Test data provides a final, real-world check of an unseen dataset to confirm
that the ML algorithm was trained effectively.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Validation dataset
during training, validation data infuses new data into the model that it
hasn’t evaluated before.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 34 / 63
ML dataset: Training, Validation & Testing Data Sets...
Validation dataset
during training, validation data infuses new data into the model that it
hasn’t evaluated before.
provides the first test against unseen data, allowing data scientists to
evaluate how well the model makes predictions based on the new data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 34 / 63
ML dataset: Training, Validation & Testing Data Sets...
Validation dataset
during training, validation data infuses new data into the model that it
hasn’t evaluated before.
provides the first test against unseen data, allowing data scientists to
evaluate how well the model makes predictions based on the new data.
not all data scientists use validation data, but it can provide some helpful
information to optimize hyperparameters, which influence how the model
assesses data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 34 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.
however, when trained for long time, the model gets biased.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.
however, when trained for long time, the model gets biased.
basically, the hyperparameters get changed appropriately in each iteration
such that model performs better with validation data set.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.
however, when trained for long time, the model gets biased.
basically, the hyperparameters get changed appropriately in each iteration
such that model performs better with validation data set.
Thus, the validation dataset details get leaked into training dataset.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets...
Model built with training, validation & test data set:
uses the third dataset split from the original dataset which is kept hidden
from training and evaluation process.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 36 / 63
ML dataset: Implications of using differing datasets...
Model built with training, validation & test data set:
uses the third dataset split from the original dataset which is kept hidden
from training and evaluation process.
thus, have a greater likelihood of generalizing on unseen dataset than earlier
two cases mentioned above.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 36 / 63
Categories of ML methods
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 37 / 63
Categories of Machine Learning methods/mechanisms
2
Figure: Machine Learning Techniques
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 38 / 63
ML methods based on training patterns
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
Supervised Learning
Supervised learning methods
imitates our own ability to extract
patterns from known examples and
use that extracted insight to
engineer a repeatable outcome.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 40 / 63
Supervised Learning
Supervised learning methods
imitates our own ability to extract
patterns from known examples and
use that extracted insight to
engineer a repeatable outcome.
the process of understanding a
known input-output combination to
learn the underlying patterns is the
focus here.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 40 / 63
Supervised Learning
Supervised learning methods
imitates our own ability to extract
patterns from known examples and
use that extracted insight to
engineer a repeatable outcome.
the process of understanding a
known input-output combination to
learn the underlying patterns is the
focus here.
the supervised learning ML model
analyzes and deciphers the
relationship between input and Figure: Supervised Learning
output data to learn the underlying
patterns.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 40 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America
this process of understanding a known
input-output combination is what is
seen in supervised learning.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America
this process of understanding a known
input-output combination is what is
seen in supervised learning.
the model analyzes and deciphers the
relationship between input and output
data to learn the underlying patterns.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America
this process of understanding a known
input-output combination is what is
seen in supervised learning.
the model analyzes and deciphers the
relationship between input and output
data to learn the underlying patterns.
Input data → independent variable
(uppercase “X”), Output data →
dependent variable (lowercase “y”). Figure: Supervised Learning
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...: Another view
1 https://www.crayondata.com/machine-learning-explained-understanding-supervised-unsupervised-and-reinforcement-learning/
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 42 / 63
Supervised Learning algorithms usecases...
The most common use cases of supervised learning are as follows:
Spam detection - discussed before
Bioinformatics
used for in storage of biological information of human beings that includes –
fingertips, iris textures, eyes, swabs, and so on.
every time one wants to unlock your devices, it asks to authenticate either
through fingertips or facial recognition.
Object Recognitions
captcha - where one has to choose multiple images as per the instruction to
get confirmed that one is a human.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 43 / 63
Supervised Learning algorithms...
Supervised Learning algorithms
are categorized based on the structures and objective functions of learning
algorithms.
are commonly characterized by the two types of problems viz. Classification
and Regression
Popular categorizations of the algorithms include
Linear and Logistic Regression
Artificial Neural Network (ANN),
Support Vector Machine (SVM), and
Decision trees.
adopt a Bayesian approach to knowledge discovery, using probabilities of
previously observed events to infer the probabilities of new events.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 44 / 63
Supervised Learning algorithms...: Advantages
are categorized based on the structures and objective functions of learning
algorithms.
permits one unmistakable with regards to the meaning of the marks/labels
outcomes delivered by the directed strategy are more precise and dependable
as compared to those of other procedures of AI.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 45 / 63
Supervised Learning algorithms...: Disadvantages
are categorized based on the structures and objective functions of learning
algorithms. Hence
Computation time is vast for supervised learning.
Unwanted data downs efficiency - requires a ton of calculation time for
preparing.
Pre-processing of data is no less than a big challenge.
Always in need of updates.
Anyone can overfit supervised algorithms easily.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 46 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and
adding one or more meaningful
and informative labels to provide
context so that an ML model can
learn from it.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and
adding one or more meaningful
and informative labels to provide
context so that an ML model can
learn from it.
In conventional DaL, label tags are
attached to data points by a human
who is an in-house labeler or
outsourced personnel.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and
adding one or more meaningful
and informative labels to provide
context so that an ML model can
learn from it.
In conventional DaL, label tags are
attached to data points by a human
who is an in-house labeler or
outsourced personnel.
However, considering the massive
Figure: Active Learning for Smart Labeling
volume of data, manual labeling
can be time-consuming, costly, and
difficult to coordinate.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications...
[Ref: https://aws.amazon.com/sagemaker/data-labeling/what-is-data-labeling/]
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 48 / 63
Supervised Learning algorithms...: Real world Applications...
[Ref: https://aws.amazon.com/sagemaker/data-labeling/what-is-data-labeling/]
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 48 / 63
Supervised Learning algorithms...: Real world Applications...
[Ref: https://aws.amazon.com/sagemaker/data-labeling/what-is-data-labeling/]
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 48 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.
Understanding Youth Sentiments Through Artificial Intelligence
a real world application in which a Data Analysis pipeline was developed for
sentiment analysis
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.
Understanding Youth Sentiments Through Artificial Intelligence
a real world application in which a Data Analysis pipeline was developed for
sentiment analysis
this was to understand youth sentiments, analyzing aspirations, fears, and
thoughts of the youth through scraping the web and youth-led media.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
Medical applications
an application was developed to anticipate patient danger (like the
high-hazard patient etc.) or for foreseeing the likelihood of a congestive
cardiovascular breakdown.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 50 / 63
Supervised Learning algorithms...: Real world Applications...
Medical applications
an application was developed to anticipate patient danger (like the
high-hazard patient etc.) or for foreseeing the likelihood of a congestive
cardiovascular breakdown.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 50 / 63
Supervised Learning algorithms...: Real world Applications...
Medical applications
an application was developed to anticipate patient danger (like the
high-hazard patient etc.) or for foreseeing the likelihood of a congestive
cardiovascular breakdown.
Public safety application
a tool was built for analysing and classifying cases of sexual abuse in the
workplace to identify patterns of such behaviors.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 50 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.
it permits the model to chip away at
its own to find examples and data
that was beforehand undetected.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.
it permits the model to chip away at
its own to find examples and data
that was beforehand undetected.
are the ones where no target or
label of the data is given in sample
data.
Figure: Un-Supervised Learning
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.
it permits the model to chip away at
its own to find examples and data
that was beforehand undetected.
are the ones where no target or
label of the data is given in sample
data.
are designed to summarize the key Figure: Un-Supervised Learning
features of the data and to form the
natural clusters of input patterns
given a particular cost function.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.
Independent Component
Analysis.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.
Independent Component
Analysis.
Self-organization map.
Figure: Un-Supervised Learning
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.
Independent Component
Analysis.
Self-organization map.
are difficult to evaluate, because Figure: Un-Supervised Learning
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Un-Supervised Learning...: Another view
1
1 https://www.crayondata.com/machine-learning-explained-understanding-supervised-unsupervised-and-reinforcement-learning/
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 53 / 63
Un-Supervised Learning algorithms...: Advantages
are categorized based on the structures and objective functions of learning
algorithms.
less intricacy in correlation with administered learning
nobody is needed to comprehend and afterward name i.e. label the
information inputs
it is frequently simpler to get unlabeled information
1
1 https://omdena.com/blog/supervised-and-unsupervised-machine-learning/
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 54 / 63
Un-Supervised Learning algorithms...: Dis-advantages
less exactness of the outcomes.
the consequences of the investigation can’t be found out
1
1 https://omdena.com/blog/supervised-and-unsupervised-machine-learning/
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 55 / 63
Un-Supervised Learning algorithms...: Real-world Applications
An Anomaly detection system developed using USML.
The system is capable of capturing sudden vegetation changes, which can be
used as an alert mechanism to provide immediate relief to the people and
communities in need.
Besides, USML is generally used for
Optical character recognition Medical diagnosis
(OCR) Natural language processing
Search engines Speech and handwriting
Computer vision recognition
Classifying DNA sequences Economics and finance
Detecting fraud, e.g., credit card Recommendation engines, such as
and internet those used by Netflix and Amazon
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 56 / 63
Supervised & Un-Supervised Learning algorithms
Supervised learning = uses labeled data
Unsupervised learning = uses unlabeled data.
Well the main difference is that supervised learning uses off-line analysis
whereas unsupervised learning uses real-time analysis of data.
In SL, the number of classes is known but in unsupervised learning the
number of classes is unknown.
The results of supervised learning are accurate and reliable,
on the other hand, the results of unsupervised learning are moderate,
accurate, and reliable.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 57 / 63
Supervised & Un-Supervised Learning algorithms
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 58 / 63
Reinforcement Learning
Unlike SL and USL, reinforcement learning builds its prediction model by
gaining feedback from random trial and error and leveraging insight from
previous iterations.
the goal is to achieve a specific goal (output) by randomly trialling a vast
number of possible input combinations and grading their performance
can best be explained by using a video game analogy
algorithms are set to train the model based on continuous learning.
a standard reinforcement learning model has measurable performance criteria
where outputs are graded.
In the case of self-driving vehicles, avoiding a crash earns a positive score, and
in the case of chess, avoiding defeat likewise receives a positive assessment.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 59 / 63
Q Learning
is a a specific algorithmic example of reinforcement learning
understand through the Pac-Man game, as follows......
Three main components
states could be the challenges, obstacles or pathways that exist in the video
game
”A” - could depict the set of possible actions to respond to these states
limited to left, right, up, and down movements, as well as multiple
combinations thereof.
”q” - could depict the the model’s starting value and has an initial value of
“0.”
as the game progresses, two main things happen
Q drops as negative things occur after a given state/action.
Q increases as positive things occur after a given state/action.
In Q-learning, the machine learns to match the action for a given state that
generates or preserves the highest level of Q
the model records its results (rewards and penalties) and how they impact its
Q level and stores those values to inform and optimize its future actions.
this is computationally expensive
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 60 / 63
An Overview of ML tasks ...to be continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 61 / 63
Blank
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 62 / 63
Blank
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 63 / 63