ML Module 3 2022
ML Module 3 2022
ML Module 3 2022
SUPPORT FEATURE
Bayes Theorem
Topic VECTOR Topic SELECTION
MACHINE
2 3 4 5
1
▪ Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving
classification problems.
▪ Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
▪ Its an Eager algorithm so, can make quick predictions thus can be used to make real-time predictions.
▪ It is a probabilistic classifier.
▪ Ex: spam filtration, Sentimental analysis, and classifying articles.
Terminologies :
▪ Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the
occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple. Hence each feature individually contributes to identify that it is
an apple without depending on each other.
▪ Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
NAÏVE BAYE’S EXAMPLE
Problem:
Solution:
STEPS
1. Convert the data set into frequency table Problem: Players will play if weather is sunny. Is this statement is
2. Create Likelihood table by finding the correct?
probabilities. P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
▪ Overcast probability = 0.29 and probability of Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36,
playing is 0.64. P( Yes)= 9/14 = 0.64
3. Now, use Naive Bayesian equation to calculate Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher
the conditional probability for each class. The class probability.
with the highest conditional probability is the
outcome of prediction.
Naïve Bayes Algorithm
Cons:
Pros: • If categorical variable has a category (in test data set),
• It is an eager learning algorithm. It is easy and fast which was not observed in training data set,
to predict class of test data set. then model will assign a 0 (zero) probability and will
be unable to make a prediction. This is often known
• It performs well in multi class prediction as “Zero Frequency”. To solve this, we can use the
• When assumption of independence holds, a Naive smoothing technique. One of the simplest smoothing
Bayes classifier performs better compare to techniques is called Laplace estimation.
other models like logistic regression and you need • Another limitation of Naive Bayes is the assumption
less training data. of independent predictors. In real life, it is almost
• It perform well in case of categorical input variables impossible that we get a set of predictors which are
compared to numerical variable(s). completely independent.
Support Vector Machine
w x + b<0
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1
Any of these
would be fine..
..but which is
best?
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1
Misclassified
to +1 class
a
Classifier Margin
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1 Define the
hyperplane
(margin) of a
A hyperplane could linear classifier as
be thought of as a the width that
Line that linearly
the boundary
separates two
decision boundary could be
increased by
before hitting a
datapoint.
a
Maximum Margin
x f yest
1. A hyperplane with largest possible
margin could be a =
f(x,w,b) good classifier.
sign(w x + b)
denotes +1 2. Implies that only support vectors are
denotes -1 important; other training examples
are ignorable.
Support Vectors 3. Empirically it works very very well.
are those
datapoints that The maximal margin
are closer to the Hyperplane is the
hyperplane. linear classifier with
If these are the maximum margin.
removed the This is the simplest
position of the kind of SVM (Called an
hyperplane may LSVM)
change. So
critical Linear SVM
Terminologies: SVM
•Margin − It may be defined as the gap between two lines on the closet
data points of different classes. It can be calculated as the perpendicular
distance from the line to the support vectors. Large margin is considered
as a good margin and small margin is considered as a bad margin.
Types of SVM
What is a Kernel:
Linear Kernel:
▪ The polynomial kernel isn't used in practice very often because it isn't as computationally efficient
as other kernels.
▪ f(Xi, Xj) represents the polynomial decision boundary that will separate your
data. X1 and X2 represent your data and ‘d’ is the degree of the polynomial.
▪ One of the most powerful and commonly used kernels in SVMs. Usually the choice for non-linear
data.
▪ It is a general-purpose kernel; used when there is no prior knowledge about the data. Equation is:
gamma is a parameter must be specified to the learning algorithm. A good default value of gamma is
0.1.
▪ One of the most powerful and commonly used kernels in SVMs. Usually the choice for non-linear
data.
▪ It is a general-purpose kernel; used when there is no prior knowledge about the data. Equation is:
Sigmoid
More useful in neural networks than in support vector machines, but there are
occasional specific use cases.
f(X, y) = tanh(alpha * X^T * y
+ C)
Why SVMs are so popular?
▪ Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality reduction
in machine learning.
▪ It can be thought of as a projection method where data with m-columns (features) is projected into a
subspace with m or fewer columns, while retaining the essence of the original data.
a11, a12
A= a21, a22
a31, a32
B = PCA(A)
PRINCIPAL COMPONENT ANALYSIS
a11, a12 Step1) The first step is to calculate the mean values of each column.
A= a21, a22 M = mean(A) OR
a31, a32 M(m11, m12) = (a11 + a21 + a31) / 3 (a12 + a22 + a32) / 3
▪ If all eigenvalues have a similar Step 2) Next, we need to center the values in each column by subtracting the mean
value, then the existing column value.
representation may already be C=A–M
reasonably compressed or dense
and that the projection is not
Step 3)The next step is to calculate the covariance matrix of the centered matrix C.
required.
a11, a12 Step1) The first step is to calculate the mean values of each column.
A= a21, a22 M = mean(A) OR
a31, a32 M(m11, m12) = (a11 + a21 + a31) / 3 (a12 + a22 + a32) / 3
▪ If all eigenvalues have a similar Step 2) Next, we need to center the values in each column by subtracting the mean
value, then the existing column value.
representation may already be C=A–M
reasonably compressed or dense
and that the projection is not
Step 3)The next step is to calculate the covariance matrix of the centered matrix C.
required.