Unit 2 PPT - Part 2
Unit 2 PPT - Part 2
(Unit 2 - Part 2)
In statistic modeling a common problem arises as to how can we
estimate the joint probability distribution for dataset.
What is EM Algorithm?
• EM algorithm was proposed in 1997 by Arthur Dempster.
• It is basically used to find the local maximum likelihood
parameters of a statistical model in case of latent variables
are present for the data is missing or incomplete.
Applications of EM
Algorithm
Data Clustering in Machine Learning and Computer Vision
Used in Natural Language Processing
Used in Parameter Estimation in Mix Models and
Quantitative Genetics
Used in Psychometrics
Used in Medical Image Reconstruction, Structural Engineering
Support Vector Machine
Support vector Machine (SVM)
❑ SVM is based on statistical learning theory.
❑ A support-vector machines are supervised learning models with associated learning
algorithms that analyze data used for classification and regression analysis.
❑ SVM involve detection of hyperplanes which segregate data into classes.
❑ Support vectors are the data points that lie closest to the decision surface (or
hyperplane).
❑ SVMs are very versatile and are also capable of performing linear or nonlinear
classification, regression, and outlier detection.
Two Class Problem: Linear Separable Case
❑ Linearly separable
Class 1
binary sets
Denotes +1
Denotes -1
❑ Many decision
boundaries can
separate these two
Class 2 classes.
Which one should we
choose?
Classifier Margin
Denotes +1
Denotes -1
Define the margin of a
linear classifier as the width
that the boundary could be
increased by before hitting
a data point.
Good Decision Boundary: Margin Should Be Large
f(x,w,b) = sign(w. x - b)
Denotes +1
Denotes -1 The maximum margin linear
classifier is the linear classifier
with the maximum margin.
This is the simplest kind of
SVM (called an Linear SVM).
Support Vectors
are those data
points that the
margin pushes
up against
How Does it Works
Identify the right hyper-plane (Scenario-1):
In this scenario,
hyper-plane “B”
has excellently performed
this job.
How Does it Works
Identify the right hyper-plane (Scenario-2):
Can be used for classification and regression problems as support vector classification
(SVC) and support vector regression (SVR).
The distance of the vectors from the hyperplane are called the margins.
Similarly, the function which classifies the point in higher dimension is called as a
hyperplane.
Hyperplane (Decision surface)
Let’s say there are “m” dimensions
thus the equation of the hyperplane in the ‘M’ dimension can be given as =
where,
Wi = vectors(W0, W1, W2, W3……Wm)
b = biased term (W0)
X = variables.
Hard margin SVM
Assume 3 hyperplanes namely (π, π+, π−) such that ‘π+’ is parallel to ‘π’ passing through
the support vectors on the positive side and ‘π−’ is parallel to ‘π’ passing through the
support vectors on the negative side.
So we can see that if the points are linearly separable then only our hyperplane is able to
distinguish between them and if any outlier is introduced then it is not able to separate
them.
Polynomial Kernel
Effective when the number of features are more than training examples.
The hyperplane is affected by only the support vectors thus outliers have less impact.
Selecting, appropriately hyperparameters of the SVM that will allow for sufficient
generalization performance.
x
❑ But what are we going to do if the dataset is just too hard?
0
0 x
Non-Linear SVM : Feature Space
❑ General idea: the original input space (x) can be mapped to some
higher-dimensional feature space (φ(x) )where the training set is separable:
x=(x1,x2) √2x1x2
Φ: x → φ(x)
x22
x12
Transformation to Feature Space
❑ Possible problem of the transformation
❑ High computation burden due to high-dimensionality and hard to get a good
estimate
❑ SVM solves these two issues simultaneously
❑ “Kernel tricks” for efficient computation
❑ Minimize ||w||2 can lead to a “good” classifier
φ( φ(
φ( ) φ( ) φ( φ(
φ )
φ( φ( ) φ( ) φ( ) φ(
) φ( ) φ( ) φ(φ( ) )
(.) ) φ(
) φ( )
) φ( )
)
Feature
Input )
space space space
Key idea: transform xi to a higher dimensional
How to calculate the distance from a point to a line?
Form of equation defining the decision
surface separating the classes is a
hyperplane of the form:
wx +b = 0
W x X – Vector
W – Normal Vector
b – bias