0% found this document useful (0 votes)
50 views

An Overview of Machine Learning in Security

The document provides an overview of machine learning in security. It defines machine learning as the process of applying computing resources to implement algorithms that can learn from data to predict outcomes. Various machine learning techniques are discussed, including linear regression, logistic regression, Bayesian classification, support vector machines, decision trees, neural networks, and ensemble learning. Machine learning in security involves using these algorithms to identify patterns in data that can help detect anomalies and threats.

Uploaded by

Pankaj Singha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

An Overview of Machine Learning in Security

The document provides an overview of machine learning in security. It defines machine learning as the process of applying computing resources to implement algorithms that can learn from data to predict outcomes. Various machine learning techniques are discussed, including linear regression, logistic regression, Bayesian classification, support vector machines, decision trees, neural networks, and ensemble learning. Machine learning in security involves using these algorithms to identify patterns in data that can help detect anomalies and threats.

Uploaded by

Pankaj Singha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 247

Chap1: Machine Learning in Security: An Overview #1

Devesh C Jinwala, Professor, CSE, SVNIT, Surat

January 6, 2023

Devesh C Jinwala,
Professor, SVNIT and Adjunct Prof., CSE, IIT Jammu
Department of Computer Science and Engineering,
Sardar Vallabhhai National Institute of Technology, SURAT
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 1 / 63
Chap 1: An Overview of Machine Learning in Security: Topics
Introduction to the Course Contents, Review of the Basic Machine Learning
Concepts. Foundations of Machine Learning for Security: Artificial
Intelligence and Machine Learning.
Review of the ML techniques. Machine Learning problems viz. Classification,
Regression, Clustering, Association rule learning, Structured output,
Ranking. Linear Regression. Logistics Regression and Bayesian Classification.
Support Vector Machines, Decision Tree and Random Forest, Neural
Networks, DNNs , Ensemble learning. Principal Components Analysis.
Un-supervised learning algorithms: K-means for clustering problems, K-NN
(k nearest neighbours). Apriori algorithm for association rule learning
problems. Generative vs Discriminative learning. [4 hours]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 3 / 63
What is Machine Learning?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 4 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.
An Example.....classifying animals as reptiles or mammals.... and the feature
gives birth to live offspring.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.
An Example.....classifying animals as reptiles or mammals.... and the feature
gives birth to live offspring.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning?
Learning is the process of building a scientific model after discovering
knowledge from a sample data set or data sets.
Machine learning
is considered to be the process of applying a computing-based resource to
implement learning algorithms.
refers algorithms and processes that learn in the sense of being able to
generalize past data and experiences in order to predict future outcomes.
at its core, is a set of mathematical techniques, implemented on computer
systems, that enables a process of information mining, pattern discovery, and
drawing inferences from data.
An Example.....classifying animals as reptiles or mammals.... and the feature
gives birth to live offspring.

Formal definition
Formally, machine learning is defined as the complex computation process of
automatic pattern recognition and intelligent decision making based on
training sample data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 5 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.
How would one do so in the conventional programming ?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.
How would one do so in the conventional programming ?
What are the issues with the approach followed in the conventional
programming ?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning (ML) ?...
This also means that we need to equip the machine with the ability to mimic
human behavior.
ML is concerned with giving computers the ability to perform a task without
being explicitly commanded/programmed.
As an example, suppose we want to classify the emails into promotional and
non-promotional emails.
For example, classify an email as promotional if it contains the words
“Discount”, “Sale”, or “Free Gift”.
Classify an email as non-promotional if the email address includes “.gov” or
“.edu”.
How would one do so in the conventional programming ?
What are the issues with the approach followed in the conventional
programming ?
it is challenging to come up with the rules. How ?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 6 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.
ML can autonomously derive robust rules to automate the classification
process,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.
ML can autonomously derive robust rules to automate the classification
process,
thereby preventing the need for manually engineered rules.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation
From the other angle, why would the ML based approach be required in such
a scenario ?
ML can super-charge the classification program by identifying each email’s
unique attributes.
It would derive robust rules to automate the classification process, thereby
preventing the need for manually engineered rules.
But, for a machine to do that, we need to provide it with data.
Thus, the goal is for the machine to learn the rules directly from the data,
using what are known as ML algorithms.
ML can autonomously derive robust rules to automate the classification
process,
thereby preventing the need for manually engineered rules.
Again, how could it derive robust rules to automate the classification ?

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 7 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.


The formulas and procedures - derived from mathematical concepts

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.


The formulas and procedures - derived from mathematical concepts
for appropriate ML task say classification, would be built into the ML
algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.


The formulas and procedures - derived from mathematical concepts
for appropriate ML task say classification, would be built into the ML
algorithms.
The ML algorithms would be implemented in programming code to perform
calculations on our data

......continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]

It would do so, using what are known as machine learning algorithms.


The formulas and procedures - derived from mathematical concepts
for appropriate ML task say classification, would be built into the ML
algorithms.
The ML algorithms would be implemented in programming code to perform
calculations on our data
after which the algorithm/program typically generates an output known as a
model.
......continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 8 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


......continued
The process of generating the model is known as training the model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


......continued
The process of generating the model is known as training the model.
This model describes the rules, numbers, and any other algorithm-specific
data structures that our machine learned from the data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


......continued
The process of generating the model is known as training the model.
This model describes the rules, numbers, and any other algorithm-specific
data structures that our machine learned from the data.
Our machine can then use the model to perform its task.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


......continued
The process of generating the model is known as training the model.
This model describes the rules, numbers, and any other algorithm-specific
data structures that our machine learned from the data.
Our machine can then use the model to perform its task.
Thus, a key characteristic of ML is the concept of self-learning.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


......continued
The process of generating the model is known as training the model.
This model describes the rules, numbers, and any other algorithm-specific
data structures that our machine learned from the data.
Our machine can then use the model to perform its task.
Thus, a key characteristic of ML is the concept of self-learning.
i.e. the application of statistical modeling to detect patterns and improve
performance based on data and empirical information; all without direct
programming commands.
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 9 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


Self learning, in summary
ML uses data as input to build a decision model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


Self learning, in summary
ML uses data as input to build a decision model.
decisions are generated by deciphering relationships and patterns in the data using probabilistic
reasoning, trial and error, and other computationally-intensive techniques.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


Self learning, in summary
ML uses data as input to build a decision model.
decisions are generated by deciphering relationships and patterns in the data using probabilistic
reasoning, trial and error, and other computationally-intensive techniques.
means that the output of the decision model is determined by the contents of the input data
rather than any pre-set rules defined by a human programmer.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


Self learning, in summary
ML uses data as input to build a decision model.
decisions are generated by deciphering relationships and patterns in the data using probabilistic
reasoning, trial and error, and other computationally-intensive techniques.
means that the output of the decision model is determined by the contents of the input data
rather than any pre-set rules defined by a human programmer.
the human programmer is still responsible

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


Self learning, in summary
ML uses data as input to build a decision model.
decisions are generated by deciphering relationships and patterns in the data using probabilistic
reasoning, trial and error, and other computationally-intensive techniques.
means that the output of the decision model is determined by the contents of the input data
rather than any pre-set rules defined by a human programmer.
the human programmer is still responsible
for feeding the data into the model,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


Self learning, in summary
ML uses data as input to build a decision model.
decisions are generated by deciphering relationships and patterns in the data using probabilistic
reasoning, trial and error, and other computationally-intensive techniques.
means that the output of the decision model is determined by the contents of the input data
rather than any pre-set rules defined by a human programmer.
the human programmer is still responsible
for feeding the data into the model,
selecting an appropriate algorithm and

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
What is Machine Learning? Motivation...

Figure: Comparison of Input command vs Input Data [Src: Oliver Theobald]


Self learning, in summary
ML uses data as input to build a decision model.
decisions are generated by deciphering relationships and patterns in the data using probabilistic
reasoning, trial and error, and other computationally-intensive techniques.
means that the output of the decision model is determined by the contents of the input data
rather than any pre-set rules defined by a human programmer.
the human programmer is still responsible
for feeding the data into the model,
selecting an appropriate algorithm and
tweaking its settings (called hyperparameters) in a bid to reduce prediction error,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 10 / 63
Lineage of ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 11 / 63
Anatomy of Machine Learning: Where does it fit in
It

Figure: Lineage of ML [Src: Oliver Theobald]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 12 / 63
Relationship between data related fields

Data Science comprises methods and


systems to extract knowledge and insights
from data with the aid of computers

Figure: Visual representation of the


relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for


Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and


systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks

Figure: Visual representation of the


relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for


Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and


systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks
A simile:

Figure: Visual representation of the


relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for


Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and


systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks
A simile:
Industrial Revolution → an era of
machines simulating physical tasks,

Figure: Visual representation of the


relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for


Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and


systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks
A simile:
Industrial Revolution → an era of
machines simulating physical tasks,
AI → development of machines capable of
simulating cognitive abilities.
Figure: Visual representation of the
relationship between data-related
fields

footnoteTheobald, Oliver. Machine Learning for


Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Relationship between data related fields

Data Science comprises methods and


systems to extract knowledge and insights
from data with the aid of computers
AI, encompasses the ability of machines to
perform intelligent and cognitive tasks
A simile:
Industrial Revolution → an era of
machines simulating physical tasks,
AI → development of machines capable of
simulating cognitive abilities.
AI includes the subfields search and Figure: Visual representation of the
planning, reasoning and knowledge relationship between data-related
representation, perception, natural language fields
processing (NLP), and machine learning footnoteTheobald, Oliver. Machine Learning for
Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 13 / 63
Machine learning overlaps with data mining

ML overlaps with data mining—a discipline based


on discovering and unearthing patterns in large
datasets.

Figure: Visual representation of the


relationship between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining

ML overlaps with data mining—a discipline based


on discovering and unearthing patterns in large
datasets.
Both techniques rely on inferential methods, i.e.
predicting outcomes based on other outcomes
and probabilistic reasoning,

Figure: Visual representation of the


relationship between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining

ML overlaps with data mining—a discipline based


on discovering and unearthing patterns in large
datasets.
Both techniques rely on inferential methods, i.e.
predicting outcomes based on other outcomes
and probabilistic reasoning,
both draw from a similar assortment of Figure: Visual representation of the
algorithms includingprincipal component analysis, relationship between data-related fieldsa
regression analysis, decision trees, and clustering
techniques
a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 14 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental


process of self-learning and
automatically detecting patterns
through experience derived from
exposure to data,

Figure: Visual representation of the relationship


between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental


process of self-learning and
automatically detecting patterns
through experience derived from
exposure to data,
whereas, data mining is a less
autonomous technique of
extracting hidden insight

Figure: Visual representation of the relationship


between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental


process of self-learning and
automatically detecting patterns
through experience derived from
exposure to data,
whereas, data mining is a less
autonomous technique of
extracting hidden insight
it seeks out patterns and
relationships that are yet to be
mined - well-suited for
understanding large datasets with Figure: Visual representation of the relationship
complex patterns between data-related fieldsa

a Theobald, Oliver. Machine Learning for Absolute Beginners

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
Machine learning overlaps with data mining

ML emphasizes the incremental


process of self-learning and
automatically detecting patterns
through experience derived from
exposure to data,
whereas, data mining is a less
autonomous technique of
extracting hidden insight
it seeks out patterns and
relationships that are yet to be
mined - well-suited for
understanding large datasets with Figure: Visual representation of the relationship
complex patterns between data-related fieldsa
An Example: Excavation
a Theobald, Oliver. Machine Learning for Absolute Beginners
operation on sites by two different
team of archaeologists...

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 15 / 63
What Machine Learning is Not?
Artificial intelligence

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
helps us create AI, but is not the only way
to achieve it.

Figure: ML is part of AIa

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
helps us create AI, but is not the only way
to achieve it.
refers to statistical learning algorithms
that are able to create generalizable
abstractions (models) by seeing and
Figure: ML is part of AIa
dissecting a dataset.

a Clarence and Chio, ML and Security..., O’Reilly Media

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
What Machine Learning is Not?
Artificial intelligence
indicates algorithmic solutions to complex
problems typically solved by humans.
systems are loosely defined to be
machine-driven decision engines that can
achieve near-human-level intelligence.
Machine learning
is a core building block for AI.
helps us create AI, but is not the only way
to achieve it.
refers to statistical learning algorithms
that are able to create generalizable
abstractions (models) by seeing and
Figure: ML is part of AIa
dissecting a dataset.
e.g. self-driving car’s functions in a self-driving a Clarence and Chio, ML and Security..., O’Reilly Media
system

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 16 / 63
Data Mining

Figure: Data Mining

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 17 / 63
Data Mining...

Figure: Data Mining

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 18 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
learning user patterns and/or behaviors - that is critical for intrusion detection
and attack predictions

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
learning user patterns and/or behaviors - that is critical for intrusion detection
and attack predictions
can identify and describe structural patterns in the data automatically and
theoretically explain data and predict patterns.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Data mining requires machine learning
Data mining to machine learning
the goal is to develop of predictive models that enable a real-time cyber
response after a sequence of cybersecurity processes,
the response must include real-time data sampling, selection, analysis and
query, and mining peta-scale data, with the goal to
to classify and detect attacks and intrusions on a computer network.
learning user patterns and/or behaviors - that is critical for intrusion detection
and attack predictions
can identify and describe structural patterns in the data automatically and
theoretically explain data and predict patterns.
automatic and theoretic learning require complex computation that calls for
abundant machine-learning algorithms.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 19 / 63
Overview of ML tasks and
Examples

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 20 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing
transcription

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing
transcription
machine translation

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
making recommendations,
image recognition, and
natural language processing
transcription
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
image recognition, and
natural language processing
transcription
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
transcription
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
causal modeling
anomaly detection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
Two of the most common tasks that ML models perform
Three of the most common tasks ML models perform are
classification - e.g., classifying emails into promotional and non-promotional
prediction - e.g., predicting stock prices.
regression - e.g. predicting how much a used car would sell for given historical
data on recent used car sales in the area
There are other tasks that include
synthesis & sampling
making recommendations,
estimation of probability density
image recognition, and
and probability mass function
natural language processing
similarity matching
transcription
co-occurrence grouping
machine translation
causal modeling
anomaly detection
link profiling and so on....

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 21 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection
Testing and matching

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection
Testing and matching
Model monitoring

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
The key development phases in ML used to solve the different tasks

Following are the key development phases that are used to solve the different
tasks listed in the previous slide.
These form the key phases of the machine learning models’ (MLM)
development lifecycle.
Data gathering
Data preprocessing
Exploratory data analysis (EDA)
Feature engineering including feature creation/extraction, feature selection,
dimensionality reduction
Training machine learning models
Model / Algorithm selection
Testing and matching
Model monitoring
Model retraining

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 22 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.
the model developed would be specific to this application, as compared to
any other, with different objectives.....

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.
the model developed would be specific to this application, as compared to
any other, with different objectives.....

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Example Applications of some ML tasks
How the power of ML can be exploited....e.g. for analyzing YouTube viewing habits
Say in one application, the decision model identifies a significant relationship
among data scientists who like watching cat videos
the machine analyzes which videos data scientists enjoy watching on YouTube
based on user engagement
user engagement is measured in likes, subscribes, and repeat viewing.
the model developed would be specific to this application, as compared to
any other, with different objectives.....

...........this is continued

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 23 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Here, this model is different from the one in the previous slide.....and hence
produces different output. But, at no stage the decision model is
programmed to produce the two required outcomes.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Here, this model is different from the one in the previous slide.....and hence
produces different output. But, at no stage the decision model is
programmed to produce the two required outcomes.
the model only decodes the complex patterns in the input data, and uses
machine learning to find connections without human help.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
A Few Examples of ML tasks
How the power of ML can be exploited....analyzing YouTube viewing habits
Say in another application, the decision model identifies patterns among the
physical traits of baseball players and
identifies their likelihood of winning the season’s Most Valuable Player
(MVP) award.
here, the machine assesses the physical attributes of previous baseball MVPs
among other features such as age and education.
Here, this model is different from the one in the previous slide.....and hence
produces different output. But, at no stage the decision model is
programmed to produce the two required outcomes.
the model only decodes the complex patterns in the input data, and uses
machine learning to find connections without human help.
this also means that a related dataset collected from another time period,
with fewer or greater data points, might push the model to produce a
slightly different output.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 24 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,
and based on the same, allows constructing an effective self-learning model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,
and based on the same, allows constructing an effective self-learning model.
A common example of self learning is a system for detecting spam email
messages.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
ML: Improving predictions based on experience
ML also has the ability to improve predictions based on experience; like
mimicking the way humans base their decisions on experience.
For the purpose, ML utilizes exposure to data to improve its decision making.
The exposure to the data points
provides experience and enables the model to familiarize itself with patterns in
the data.
deepens the model’s understanding of patterns, including the significance of
changes in the data,
and based on the same, allows constructing an effective self-learning model.
A common example of self learning is a system for detecting spam email
messages.
Let us primarily investigate the scenario.....

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 25 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model


- model learns to flag emails as
spams

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model


- model learns to flag emails as
spams
this is based on the identified
suspicious subject lines and body
text containing keywords from the
mails flagged by users as spam n
the past.

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model


- model learns to flag emails as
spams
this is based on the identified
suspicious subject lines and body
text containing keywords from the
mails flagged by users as spam n
the past.
e.g. words like dear friend, free,
invoice, PayPal, Viagra, casino,
payment, bankruptcy, and winner

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model


- model learns to flag emails as
spams
this is based on the identified
suspicious subject lines and body
text containing keywords from the
mails flagged by users as spam n
the past.
e.g. words like dear friend, free,
invoice, PayPal, Viagra, casino,
payment, bankruptcy, and winner
however, as more data is analyzed,
the model might also find
exceptions and incorrect Figure: Spam Mail Detection using MLa

assumptions
a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages.

Initial data used to develop a model


- model learns to flag emails as
spams
this is based on the identified
suspicious subject lines and body
text containing keywords from the
mails flagged by users as spam n
the past.
e.g. words like dear friend, free,
invoice, PayPal, Viagra, casino,
payment, bankruptcy, and winner
however, as more data is analyzed,
the model might also find
exceptions and incorrect Figure: Spam Mail Detection using MLa

assumptions
a Eman M.Bahgat et al
this could render the model
susceptible to bad predictions.
.....continued
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 26 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false


negatives) predicted by the model

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false


negatives) predicted by the model
depend on the quality and the
quantity of the data supplied
during the training

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false


negatives) predicted by the model
depend on the quality and the
quantity of the data supplied
during the training
e.g. if there is limited data to
reference its decision, the email
subject viz. “PayPal has received
your payment for Casino Royale
purchased on eBay.” might be
wrongly classified as spam

Figure: Spam Mail Detection using MLa

a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false


negatives) predicted by the model
depend on the quality and the
quantity of the data supplied
during the training
e.g. if there is limited data to
reference its decision, the email
subject viz. “PayPal has received
your payment for Casino Royale
purchased on eBay.” might be
wrongly classified as spam
traditional programming is highly
Figure: Spam Mail Detection using MLa
susceptible to this problem
a Eman M.Bahgat et al

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

The false positives (or even the false


negatives) predicted by the model
depend on the quality and the
quantity of the data supplied
during the training
e.g. if there is limited data to
reference its decision, the email
subject viz. “PayPal has received
your payment for Casino Royale
purchased on eBay.” might be
wrongly classified as spam
traditional programming is highly
Figure: Spam Mail Detection using MLa
susceptible to this problem
this is due to the rigidly defined a Eman M.Bahgat et al
pre-set rules.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 27 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data


as a way to refine the model,
adjusts weak assumptions,

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data


as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data


as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1
Does more data lead to better
predictions ?

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data


as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1
Does more data lead to better
predictions ?
No

Figure: Spam Mail Detection using ML

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
Another Example: An ML model for detecting spam email messages...

ML, however, with exposure to data


as a way to refine the model,
adjusts weak assumptions,
ML thereby, responds appropriately
to unique data points such as the
scenario just described.
1
Does more data lead to better
predictions ?
No
while data is used to source the
self-learning process, more data do
not always equate to better Figure: Spam Mail Detection using ML
decisions;

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 28 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
“When looking for the needle, the last thing you want to do is pile lots more
hay on it.”

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
“When looking for the needle, the last thing you want to do is pile lots more
hay on it.”
that is, adding irrelevant data can be counter-productive to achieving a
desired result.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML tasks: more data do not always better decisions
While data is used to source the self-learning process, more data do not
always equate to better decisions;
the input data must be relevant, to realize the objective.
in Data and Goliath: The Hidden Battles to Collect Your Data and Control
Your World, Bruce Schneir writes that,
“When looking for the needle, the last thing you want to do is pile lots more
hay on it.”
that is, adding irrelevant data can be counter-productive to achieving a
desired result.
In addition, the amount of input data should be compatible with the
processing resources and the available time.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 29 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
then, modifications must then be made to the model, e.g., email notifications
issued from the sending address ”[email protected]” should be excluded
from spam filtering.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
then, modifications must then be made to the model, e.g., email notifications
issued from the sending address ”[email protected]” should be excluded
from spam filtering.
thus, the model can then be trained to automatically detect these errors (by
analyzing historical examples of spam messages and deciphering their patterns)
without direct human interference.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
In ML, the input data is typically split into training data and test data.
Training data
it is the initial reserve of data used to develop the model and the subset of
original dataset - fed into the ML model to discover, learn patterns.
is generally larger in size compared to the testing dataset.
is well known to the model, as it is used to train the model
e.g. in the spam email detection example, false-positives similar to the PayPal
auto-response message - discussed earlier - might be detected from the
training data
then, modifications must then be made to the model, e.g., email notifications
issued from the sending address ”[email protected]” should be excluded
from spam filtering.
thus, the model can then be trained to automatically detect these errors (by
analyzing historical examples of spam messages and deciphering their patterns)
without direct human interference.
subsequently, after developing the model based on patterns extracted from the
training data one can test the model on the remaining data, known as the test
data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 30 / 63
ML dataset: Training Data & Test Data
Testing data

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
thus the model already ”knows” the training data - but how it performs on
new test data will lead to know if it’s working accurately or if it requires more
training data to perform

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
thus the model already ”knows” the training data - but how it performs on
new test data will lead to know if it’s working accurately or if it requires more
training data to perform
hence, test data provides a final, real-world check of an unseen dataset to
confirm that the machine learning algorithm was trained effectively.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training Data & Test Data
Testing data
is used to check the accuracy of the model.
is the unseen data used to test the ML model.
thus, is used t to evaluate the performance and progress of the training of ML
algorithms and adjust or optimize it for improved results.
must represent the actual dataset.
must be large enough to generate meaningful predictions
thus the model already ”knows” the training data - but how it performs on
new test data will lead to know if it’s working accurately or if it requires more
training data to perform
hence, test data provides a final, real-world check of an unseen dataset to
confirm that the machine learning algorithm was trained effectively.
normally, there is a split of 80% for training and 20% for testing dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 31 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.

Figure: Three Splits of ML dataset

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.
Hence, sometimes the dataset is viewed
to be splitted into

Figure: Three Splits of ML dataset

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.
Hence, sometimes the dataset is viewed
to be splitted into
training data,

Figure: Three Splits of ML dataset

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.
Hence, sometimes the dataset is viewed
to be splitted into
training data,
test data and

Figure: Three Splits of ML dataset

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.
Hence, sometimes the dataset is viewed
to be splitted into
training data,
test data and
validation data.

Figure: Three Splits of ML dataset

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.
Hence, sometimes the dataset is viewed
to be splitted into
training data,
test data and
validation data.
Note that all three are typically split
from one large dataset....

Figure: Three Splits of ML dataset

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.
Hence, sometimes the dataset is viewed
to be splitted into
training data,
test data and
validation data.
Note that all three are typically split
from one large dataset....
However, each one typically has its own
distinct use in ML modeling.
Figure: Three Splits of ML dataset

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets

Very importantly note that model


performance depends on how the
dataset are splitted in the model
building.
Hence, sometimes the dataset is viewed
to be splitted into
training data,
test data and
validation data.
Note that all three are typically split
from one large dataset....
However, each one typically has its own
distinct use in ML modeling.
But first, reviewing the Figure: Three Splits of ML dataset
meaning/semantics of each dataset
(again).....

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 32 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Testing dataset
After the model is built, testing data once again validates that it can make
accurate predictions.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Testing dataset
After the model is built, testing data once again validates that it can make
accurate predictions.
if training and validation data include labels to monitor performance metrics
of the model, the testing data should be unlabeled.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Training dataset
As seen already, this type of data builds up the machine learning algorithm.
The data scientist feeds the algorithm input data, which corresponds to an
expected output.
The model evaluates the data repeatedly to learn more about the data’s
behavior
And then the model adjusts itself to serve its intended purpose.
Testing dataset
After the model is built, testing data once again validates that it can make
accurate predictions.
if training and validation data include labels to monitor performance metrics
of the model, the testing data should be unlabeled.
Test data provides a final, real-world check of an unseen dataset to confirm
that the ML algorithm was trained effectively.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 33 / 63
ML dataset: Training, Validation & Testing Data Sets...
Validation dataset
during training, validation data infuses new data into the model that it
hasn’t evaluated before.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 34 / 63
ML dataset: Training, Validation & Testing Data Sets...
Validation dataset
during training, validation data infuses new data into the model that it
hasn’t evaluated before.
provides the first test against unseen data, allowing data scientists to
evaluate how well the model makes predictions based on the new data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 34 / 63
ML dataset: Training, Validation & Testing Data Sets...
Validation dataset
during training, validation data infuses new data into the model that it
hasn’t evaluated before.
provides the first test against unseen data, allowing data scientists to
evaluate how well the model makes predictions based on the new data.
not all data scientists use validation data, but it can provide some helpful
information to optimize hyperparameters, which influence how the model
assesses data.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 34 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.
however, when trained for long time, the model gets biased.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.
however, when trained for long time, the model gets biased.
basically, the hyperparameters get changed appropriately in each iteration
such that model performs better with validation data set.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets
Model built using just training data set
gets highly biased to the dataset.
most likely won’t be able to generalize on unseen data, unless the dataset
used for training represented the entire population.
thus, overfits the training dataset.
Model built with training & validation data set:
when evaluated on validation dataset, the model performs much better than
the earlier model trained using entire dataset.
however, when trained for long time, the model gets biased.
basically, the hyperparameters get changed appropriately in each iteration
such that model performs better with validation data set.
Thus, the validation dataset details get leaked into training dataset.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 35 / 63
ML dataset: Implications of using differing datasets...
Model built with training, validation & test data set:
uses the third dataset split from the original dataset which is kept hidden
from training and evaluation process.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 36 / 63
ML dataset: Implications of using differing datasets...
Model built with training, validation & test data set:
uses the third dataset split from the original dataset which is kept hidden
from training and evaluation process.
thus, have a greater likelihood of generalizing on unseen dataset than earlier
two cases mentioned above.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 36 / 63
Categories of ML methods

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 37 / 63
Categories of Machine Learning methods/mechanisms

2
Figure: Machine Learning Techniques

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 38 / 63
ML methods based on training patterns

ML methods - training patterns -


classifier model.

Figure: Supervised and Unsupervised


Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns

ML methods - training patterns -


classifier model.
Classifier model can be parametric
or non-parametric

Figure: Supervised and Unsupervised


Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns

ML methods - training patterns -


classifier model.
Classifier model can be parametric
or non-parametric
The goal of using ML algorithms is
to reduce the classification error on
the given training sample data.

Figure: Supervised and Unsupervised


Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns

ML methods - training patterns -


classifier model.
Classifier model can be parametric
or non-parametric
The goal of using ML algorithms is
to reduce the classification error on
the given training sample data.
ML algorithms are categorized into
supervised learning and
unsupervised learning based on

Figure: Supervised and Unsupervised


Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns

ML methods - training patterns -


classifier model.
Classifier model can be parametric
or non-parametric
The goal of using ML algorithms is
to reduce the classification error on
the given training sample data.
ML algorithms are categorized into
supervised learning and
unsupervised learning based on
how ML algorithms treat the
input and output variables.

Figure: Supervised and Unsupervised


Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
ML methods based on training patterns

ML methods - training patterns -


classifier model.
Classifier model can be parametric
or non-parametric
The goal of using ML algorithms is
to reduce the classification error on
the given training sample data.
ML algorithms are categorized into
supervised learning and
unsupervised learning based on
how ML algorithms treat the
input and output variables.
on the availability of training
data and the desired outcome of
Figure: Supervised and Unsupervised
the learning algorithms. Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 39 / 63
Supervised Learning
Supervised learning methods
imitates our own ability to extract
patterns from known examples and
use that extracted insight to
engineer a repeatable outcome.

Figure: Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 40 / 63
Supervised Learning
Supervised learning methods
imitates our own ability to extract
patterns from known examples and
use that extracted insight to
engineer a repeatable outcome.
the process of understanding a
known input-output combination to
learn the underlying patterns is the
focus here.

Figure: Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 40 / 63
Supervised Learning
Supervised learning methods
imitates our own ability to extract
patterns from known examples and
use that extracted insight to
engineer a repeatable outcome.
the process of understanding a
known input-output combination to
learn the underlying patterns is the
focus here.
the supervised learning ML model
analyzes and deciphers the
relationship between input and Figure: Supervised Learning
output data to learn the underlying
patterns.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 40 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.

Figure: Supervised Learning

Figure: Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America

Figure: Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America
this process of understanding a known
input-output combination is what is
seen in supervised learning.

Figure: Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America
this process of understanding a known
input-output combination is what is
seen in supervised learning.
the model analyzes and deciphers the
relationship between input and output
data to learn the underlying patterns.

Figure: Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...
Supervised learning methods
The example of how Toyota designed
their first car prototype from the
Chevrolet car.
created its first vehicle prototype after
taking apart a Chevrolet car and
unlocking its design - which was kept Figure: Supervised Learning
secret by Chevrolet in America
this process of understanding a known
input-output combination is what is
seen in supervised learning.
the model analyzes and deciphers the
relationship between input and output
data to learn the underlying patterns.
Input data → independent variable
(uppercase “X”), Output data →
dependent variable (lowercase “y”). Figure: Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 41 / 63
Supervised Learning...: Another view

Figure: Supervised Learning

1 https://www.crayondata.com/machine-learning-explained-understanding-supervised-unsupervised-and-reinforcement-learning/

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 42 / 63
Supervised Learning algorithms usecases...
The most common use cases of supervised learning are as follows:
Spam detection - discussed before
Bioinformatics
used for in storage of biological information of human beings that includes –
fingertips, iris textures, eyes, swabs, and so on.
every time one wants to unlock your devices, it asks to authenticate either
through fingertips or facial recognition.
Object Recognitions
captcha - where one has to choose multiple images as per the instruction to
get confirmed that one is a human.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 43 / 63
Supervised Learning algorithms...
Supervised Learning algorithms
are categorized based on the structures and objective functions of learning
algorithms.
are commonly characterized by the two types of problems viz. Classification
and Regression
Popular categorizations of the algorithms include
Linear and Logistic Regression
Artificial Neural Network (ANN),
Support Vector Machine (SVM), and
Decision trees.
adopt a Bayesian approach to knowledge discovery, using probabilities of
previously observed events to infer the probabilities of new events.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 44 / 63
Supervised Learning algorithms...: Advantages
are categorized based on the structures and objective functions of learning
algorithms.
permits one unmistakable with regards to the meaning of the marks/labels
outcomes delivered by the directed strategy are more precise and dependable
as compared to those of other procedures of AI.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 45 / 63
Supervised Learning algorithms...: Disadvantages
are categorized based on the structures and objective functions of learning
algorithms. Hence
Computation time is vast for supervised learning.
Unwanted data downs efficiency - requires a ton of calculation time for
preparing.
Pre-processing of data is no less than a big challenge.
Always in need of updates.
Anyone can overfit supervised algorithms easily.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 46 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and

Figure: Active Learning for Smart Labeling

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and
adding one or more meaningful
and informative labels to provide
context so that an ML model can
learn from it.

Figure: Active Learning for Smart Labeling

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and
adding one or more meaningful
and informative labels to provide
context so that an ML model can
learn from it.
In conventional DaL, label tags are
attached to data points by a human
who is an in-house labeler or
outsourced personnel.

Figure: Active Learning for Smart Labeling

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications
Active Learning i.e. Smart Data Labeling
with ML
In ML, Data Labeling (DaL) is the
process of identifying raw data
(images, text files, videos, etc.) and
adding one or more meaningful
and informative labels to provide
context so that an ML model can
learn from it.
In conventional DaL, label tags are
attached to data points by a human
who is an in-house labeler or
outsourced personnel.
However, considering the massive
Figure: Active Learning for Smart Labeling
volume of data, manual labeling
can be time-consuming, costly, and
difficult to coordinate.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 47 / 63
Supervised Learning algorithms...: Real world Applications...

Massive volume of data dis-


courages manual labeling of the
data...
Therefore, smart labeling or
automatic labeling is
employed

Figure: Smart Labeling

[Ref: https://aws.amazon.com/sagemaker/data-labeling/what-is-data-labeling/]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 48 / 63
Supervised Learning algorithms...: Real world Applications...

Massive volume of data dis-


courages manual labeling of the
data...
Therefore, smart labeling or
automatic labeling is
employed
here, a separate ML model
can be trained to
understand raw data
Figure: Smart Labeling

[Ref: https://aws.amazon.com/sagemaker/data-labeling/what-is-data-labeling/]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 48 / 63
Supervised Learning algorithms...: Real world Applications...

Massive volume of data dis-


courages manual labeling of the
data...
Therefore, smart labeling or
automatic labeling is
employed
here, a separate ML model
can be trained to
understand raw data
and then, output Figure: Smart Labeling
appropriate label tags.

[Ref: https://aws.amazon.com/sagemaker/data-labeling/what-is-data-labeling/]

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 48 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.
Understanding Youth Sentiments Through Artificial Intelligence
a real world application in which a Data Analysis pipeline was developed for
sentiment analysis

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
An ethical credit scoring system for banks and financial institutions
Banking the unbanked i.e. developing credit rating for those who do not
have a credit cards and hence no formal credit score.
In one of the implementations, transactions made by different account
numbers, the region, mode of transaction, etc, the per capita income per
area and the job title of the account numbers was used to develop such a
system.
Understanding Youth Sentiments Through Artificial Intelligence
a real world application in which a Data Analysis pipeline was developed for
sentiment analysis
this was to understand youth sentiments, analyzing aspirations, fears, and
thoughts of the youth through scraping the web and youth-led media.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 49 / 63
Supervised Learning algorithms...: Real world Applications...
Medical applications
an application was developed to anticipate patient danger (like the
high-hazard patient etc.) or for foreseeing the likelihood of a congestive
cardiovascular breakdown.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 50 / 63
Supervised Learning algorithms...: Real world Applications...
Medical applications
an application was developed to anticipate patient danger (like the
high-hazard patient etc.) or for foreseeing the likelihood of a congestive
cardiovascular breakdown.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 50 / 63
Supervised Learning algorithms...: Real world Applications...
Medical applications
an application was developed to anticipate patient danger (like the
high-hazard patient etc.) or for foreseeing the likelihood of a congestive
cardiovascular breakdown.
Public safety application
a tool was built for analysing and classifying cases of sexual abuse in the
workplace to identify patterns of such behaviors.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 50 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.
it permits the model to chip away at
its own to find examples and data
that was beforehand undetected.

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.
it permits the model to chip away at
its own to find examples and data
that was beforehand undetected.
are the ones where no target or
label of the data is given in sample
data.
Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
here, one does not have to direct
the model with pre-labeled
input/output data.
it permits the model to chip away at
its own to find examples and data
that was beforehand undetected.
are the ones where no target or
label of the data is given in sample
data.
are designed to summarize the key Figure: Un-Supervised Learning
features of the data and to form the
natural clusters of input patterns
given a particular cost function.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 51 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.
Independent Component
Analysis.

Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.
Independent Component
Analysis.
Self-organization map.
Figure: Un-Supervised Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Unsupervised Learning...
Unsupervised Learning methods
thus, it draw abstractions from
unlabeled datasets and apply these
to new data.
the example methods include
Hierarchical, K-Means clustering.
K-NN (k nearest neighbours).
Principal Component Analysis.
Singular Value Decomposition.
Independent Component
Analysis.
Self-organization map.
are difficult to evaluate, because Figure: Un-Supervised Learning

does not have an explicit teacher


i.e. does not have labeled data for
testing.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 52 / 63
Un-Supervised Learning...: Another view

Figure: Un-Supervised Learning

1
1 https://www.crayondata.com/machine-learning-explained-understanding-supervised-unsupervised-and-reinforcement-learning/

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 53 / 63
Un-Supervised Learning algorithms...: Advantages
are categorized based on the structures and objective functions of learning
algorithms.
less intricacy in correlation with administered learning
nobody is needed to comprehend and afterward name i.e. label the
information inputs
it is frequently simpler to get unlabeled information
1

1 https://omdena.com/blog/supervised-and-unsupervised-machine-learning/

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 54 / 63
Un-Supervised Learning algorithms...: Dis-advantages
less exactness of the outcomes.
the consequences of the investigation can’t be found out
1

1 https://omdena.com/blog/supervised-and-unsupervised-machine-learning/

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 55 / 63
Un-Supervised Learning algorithms...: Real-world Applications
An Anomaly detection system developed using USML.
The system is capable of capturing sudden vegetation changes, which can be
used as an alert mechanism to provide immediate relief to the people and
communities in need.
Besides, USML is generally used for
Optical character recognition Medical diagnosis
(OCR) Natural language processing
Search engines Speech and handwriting
Computer vision recognition
Classifying DNA sequences Economics and finance
Detecting fraud, e.g., credit card Recommendation engines, such as
and internet those used by Netflix and Amazon

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 56 / 63
Supervised & Un-Supervised Learning algorithms
Supervised learning = uses labeled data
Unsupervised learning = uses unlabeled data.
Well the main difference is that supervised learning uses off-line analysis
whereas unsupervised learning uses real-time analysis of data.
In SL, the number of classes is known but in unsupervised learning the
number of classes is unknown.
The results of supervised learning are accurate and reliable,
on the other hand, the results of unsupervised learning are moderate,
accurate, and reliable.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 57 / 63
Supervised & Un-Supervised Learning algorithms

Figure: Machine Learning

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 58 / 63
Reinforcement Learning
Unlike SL and USL, reinforcement learning builds its prediction model by
gaining feedback from random trial and error and leveraging insight from
previous iterations.
the goal is to achieve a specific goal (output) by randomly trialling a vast
number of possible input combinations and grading their performance
can best be explained by using a video game analogy
algorithms are set to train the model based on continuous learning.
a standard reinforcement learning model has measurable performance criteria
where outputs are graded.
In the case of self-driving vehicles, avoiding a crash earns a positive score, and
in the case of chess, avoiding defeat likewise receives a positive assessment.

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 59 / 63
Q Learning
is a a specific algorithmic example of reinforcement learning
understand through the Pac-Man game, as follows......
Three main components
states could be the challenges, obstacles or pathways that exist in the video
game
”A” - could depict the set of possible actions to respond to these states
limited to left, right, up, and down movements, as well as multiple
combinations thereof.
”q” - could depict the the model’s starting value and has an initial value of
“0.”
as the game progresses, two main things happen
Q drops as negative things occur after a given state/action.
Q increases as positive things occur after a given state/action.
In Q-learning, the machine learns to match the action for a given state that
generates or preserves the highest level of Q
the model records its results (rewards and penalties) and how they impact its
Q level and stores those values to inform and optimize its future actions.
this is computationally expensive
Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 60 / 63
An Overview of ML tasks ...to be continued

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 61 / 63
Blank

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 62 / 63
Blank

Devesh C Jinwala, Professor, CSE, SVNIT, Surat Chap1: Machine Learning in Security 63 / 63

You might also like