Machine Learning Models: by Mayuri Bhandari

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 48
At a glance
Powered by AI
The document discusses different types of machine learning including supervised, unsupervised, reinforcement and semi-supervised learning.

The types of machine learning discussed are supervised learning, unsupervised learning, reinforcement learning and semi-supervised learning.

Supervised learning involves providing a machine with labeled training data to perform a specific task. The machine predicts answers and receives feedback on its predictions to learn.

Machine Learning Models

By Mayuri Bhandari
Types of Machine Learning
Supervised Learning
Supervised learning is when you provide the machine
with a lot of training data to perform a specific task.
In the training data, you’d feed the machine with a lot
of similar examples, and the computer will predict the
answer.
You would then give feedback to the computer as to
whether it made the right prediction or not. 
Supervised learning is task-specific, and that’s why it’s
quite common. 
Supervised Learning
Unsupervised Learning
As the name suggests, unsupervised learning is the
opposite of supervised learning.
In this case, you don’t provide the machine with any
training data. 
The machine has to reach conclusions without any
labeled data.
It’s a little challenging to implement than supervised
learning.
It is used for clustering data and for finding anomalies.
It is also quite popular as it is data-driven.
Unsupervised Learning
Reinforcement Learning
Reinforcement learning is quite different from other types of
machine learning (supervised and unsupervised). 
The relation between data and machine is quite different
from other machine learning types as well. 
In reinforcement learning, the machine learns by its
mistakes.
You give the machine a specific environment in which it can
perform a given set of actions.
Now, it will learn by trial and error.
Although reinforcement learning is quite challenging to
implement, it finds applications in many industries. 
Reinforcement Learning
Semi-supervised Learning
A semi-supervised learning problem starts with a
series of labeled data points as well as some data point
for which labels are not known. 
The goal of a semi-supervised model is to classify
some of the unlabeled data using the labeled
information set.
The goal of a semi-supervised learning model is to
make effective use of all of the available data, not just
the labelled data like in supervised learning.
Semi-supervised Learning
Machine Learning Models
Components of Generalization Error

Bias

Variance

Underfitting

Overfitting
Bias Error
Bias is defined as the average squared difference between
predictions and true values.
Bias measures the deviation between the expected output of
our model and the real values, so it indicates the fit of our
model.
Bias results in under-fitting the data.
A high bias means our learning algorithm is missing
important trends among the features.
High bias algorithms are easier to learn but less flexible, due
to this they have lower predictive performance on complex
problems.
Bias Error

Data is almost always noisy in reality, so some bias is


inevitable — called the irreducible error
Variance
Variance measures the amount that the outputs of our
model will change if a different dataset is used.
It is the impacts of using different datasets.
A model is said to have high variance if its predictions
are sensitive to small changes in the input. 
Generally, non-parametric machine learning algorithms
that have a lot of flexibility have a high variance
Bias-Variance Trade-off
Example : Bias-Variance Trade-off
Bias-Variance Trade-off

Training dataset 1 Training dataset 2


High Bias
High Variance
How to achieve Trade-off
Dimensionality Reduction
Regularisation in linear models
Using mixture model and Ensemble training
Optimal value of K in KNN
Total Error
Error = Bias + Variance
Total Error = Bias + Variance + Irreducible error
error(X) = bias(X)2 + variance(X) + noise(X)
Bias(X) = E[f^(x)] − f(x)
Underfitting and Overfitting
Underfitting and Overfitting
Overfitting: Good performance on the training data,
poor result while giving other data.

Underfitting: Poor performance on the training data


and poor result while giving the other data.

Underfitting would imply that the model has still


capacity to learn, so you would simply train for more
iterations or collect more data.
Underfitting and Overfitting
A learning system cycle
1. Ideation
The following prerequisites are essential for a successful
ideation:
1. Clear requirements regarding business objectives and
scope
2. Availability of historical data
3. Understanding of end-to-end IT infrastructure
requirements
2. Development
Once key metrics that correspond to the business
objectives are agreed upon and historical data is acquired,
the data scientist can start developing the initial model.
Data scientists have a wide array of tools available to solve
their puzzles:
1. Transforming data to a more useful format
2. Analysis of data to guide modeling approach
3. Writing of the actual machine learning model code
4. Creating numbers and visuals for initial reports towards
stakeholders
3. Production
When the development phase is over, the developed model
needs to be put in production to start generating value.
The complexity of getting a model in production depends
on the context of the problem, the autonomy of data
science teams and the overall maturity of the organization.
The context of the problem consists of a number of factors:
1. Data flow at prediction time
2. Sensitivity of the data
3. Maximum acceptable latency of delivery
4. Maintenance
Once a model is deployed, there are a number of
measures that can be taken to improve robustness and
quality of the machine learning model.
These measures can be roughly divided into four
areas. We call this post-production process
maintenance.
1. Lineage
2. Monitoring
3. Comparison
4. Model Drift
Evaluation Metrics : Accuracy
It is the ratio of number of correct predictions to the
total number of input samples.

It works well only if there are equal number of samples


belonging to each class.
 Classification Accuracy is great, but gives us the false
sense of achieving high accuracy.
Confusion Matrix
Confusion Matrix as the name suggests gives us a
matrix as output and describes the complete
performance of the model.
Confusion Matrix
Lets assume we have a binary classification problem.
We have some samples belonging to two classes : YES
or NO. Also, we have our own classifier which predicts
a class for a given input sample. On testing our model
on 165 samples ,we get the following result.
Confusion Matrix
There are 4 important terms :
True Positives : The cases in which we predicted YES
and the actual output was also YES.
True Negatives : The cases in which we predicted NO
and the actual output was NO.
False Positives : The cases in which we predicted YES
and the actual output was NO.
False Negatives : The cases in which we predicted
NO and the actual output was YES.
Confusion Matrix
Accuracy for the matrix can be calculated by taking
average of the values lying across the “main
diagonal” i.e
F1 Score
F1 Score is used to measure a test’s accuracy
F1 Score is the Harmonic Mean between precision and
recall. The range for F1 Score is [0, 1].
It tells you how precise your classifier is (how many
instances it classifies correctly), as well as how robust
it is (it does not miss a significant number of
instances).
F1 Score : Precision and Recall
Precision : It is the number of correct positive results
divided by the number of positive results predicted by
the classifier.
True Positives
Precision =
True Positives + False Positives
Recall : It is the number of correct positive results
divided by the number of all relevant samples 
True Positives
Recall =
True Positives + False Negatives
Mean Absolute Error
Mean Absolute Error is the average of the difference
between the Original Values and the Predicted Values.
It gives us the measure of how far the predictions were
from the actual output.
However, they don’t gives us any idea of the direction
of the error i.e. whether we are under predicting the
data or over predicting the data. 
Mean Squared Error
Mean Squared Error(MSE) is quite similar to Mean
Absolute Error, the only difference being that MSE takes
the average of the square of the difference between the
original values and the predicted values.
The advantage of MSE being that it is easier to compute the
gradient
As, we take square of the error, the effect of larger errors
become more pronounced than smaller error, hence the
model can now focus more on the larger errors.
Maximum likelihood Estimation(MLE)
Maximum likelihood estimation is a method that
determines values for the parameters of a model.
The parameter values are found such that they
maximize the likelihood that the process described by
the model produced the data that were actually
observed.
What are parameters?
For a linear model we can write this as y = mx + c. In this
example x could represent the advertising spend and y might be
the revenue generated. m and c are parameters for this model.
Different values for these parameters will give different lines
MLE : Example
Let’s suppose we have observed 10 data points from some process. For
example, each data point could represent the length of time in seconds
that it takes a student to answer a specific exam question. These 10 data
points are shown in the figure below
MLE : Example
For these data we’ll assume that the data generation process can
be adequately described by a Gaussian (normal) distribution.

Gaussian distribution
has 2 parameters. The
mean, μ, and the
standard deviation, σ.
Different values of these
parameters result in
different curves
MLE
We want to know which curve was most likely
responsible for creating the data points that we
observed? 
Maximum likelihood estimation is a method that will
find the values of μ and σ that result in the curve that
best fits the data.
The true distribution from which the data were
generated was f1 ~ N(10, 2.25), which is the blue curve
in the figure above.
Posterior probability
A posterior probability, in Bayesian statistics, is the
revised or updated probability of an event occurring
after taking into consideration new information.
The posterior probability is calculated by updating
the prior probability using Bayes' theorem.
In statistical terms, the posterior probability is the
probability of event A occurring given that event B has
occurred.
Bayes' Theorem Formula
The formula to calculate a posterior probability of A
occurring given that B occurred:
Bayes' theorem can be used in many applications, such as medicine,
finance, and economics.
In finance, Bayes' theorem can be used to update a previous belief
once new information is obtained.
Prior probability represents what is originally believed before new
evidence is introduced, and posterior probability takes this new
information into account.
Posterior probability distributions should be a better reflection of the
underlying truth of a data generating process than the prior
probability since the posterior included more information.
 A posterior probability can subsequently become a prior for a new
updated posterior probability as new information arises and is
incorporated into the analysis.
References
https://google.com
https://towardsdatascience.com
https://medium.com
https://www.upgrad.com/
www.edureka.co

You might also like