0% found this document useful (0 votes)
8 views

ML Unit-2

machine learning course unit 2

Uploaded by

dokihi3931
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

ML Unit-2

machine learning course unit 2

Uploaded by

dokihi3931
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

Machine Learning

Techniques

KCS 055
Regression Algorithm
House Price
Challenges in guessing the House price
Predicting the price with the help of ML model
Regression Model
Simple Linear Regression
Y = a + bX
Dependent
Variable
Independent
Variable
Y-intercept
(The value of
Y when x is 0)

Slope
(How much Y
changes for a unit
change in X)
Linear Regression
35

30
30

25
Area Price 20
(sq. feet) (in Lakhs) 20

Price
100 10 15

10
200 20 10

300 30 5

0
0 50 100 150 200 250 300 350
Area
Linear Regression
35

30
30

Area Price 25

(sq. feet) (in Lakhs) 20


20
100 10

Price
15
200 20 10
10
300 30
5

Y = a + bX 0
0 50 100 150 200 250 300 350
Y -> Price Area

X -> Area
Linear Regression
Slope (b) = Sum of product of deviation/ Sum of square
of deviation for X
Y-intercept (a) = Mean of Y – (b* Mean of X)

Area (X) Price (Y) Mean Mean Deviation (X) Deviation (Y) Product of Square of
(sq. feet) ( Lakhs) of X of Y X – mean(X) Y – mean(Y) Deviations Deviation for X
100 10 200 20 100 – 200 = -100 10 - 20= -10 1000 10,000
200 20 200-200 = 0 20-20 = 0 0 0
300 30 300 – 200 = 100 30 – 20 = 10 1000 10,000

If you have 150 sq. feet house, predict the price?


Slope (b) = 2000/20,000 = 0.1
Y- intercept (a) = 20 – 0.1 * 200 = 0 Y = a + bx
Y = 0 + 0.1 * 150
Y = 15
35

30
30

25

20
Price 20

15

10
10

0
0 50 100 150 200 250 300 350
Area
35

30
30

25

20
Price 20

Outliers
15

10
10

0
0 50 100 150 200 250 300 350
Area
Outliers
An observation that lies an abnormal distance from other
values in a random sample from a population
Predict the price of the pizza whose
diameter is 20 inches.

Diameter in Inches (X) Price in Dollar (Y)


8 10

10 13

12 16
Predict the price of the pizza whose
diameter is 20 inches.

Diameter Price (Y) Mean Mean Deviation (X) Deviation (Y) Product of Square of
(X) (Dollar) of X of Y X – mean(X) Y – mean(Y) Deviations Deviation for X
(inches)
8 10 10 13 8 -10 = -2 10 – 13 = -3 6 4
10 13 10 – 10 = 0 13 -13 = 0 0 0
12 16 12- 10 = 2 16 – 13 = 3 6 4

Slope (b) = Sum of product of deviation/ Sum of square of deviation for X Price when X is 20
Y-intercept (a) = Mean of Y – (b* Mean of X)
Price = a + bx
Slope (b) = 1.5 = -2 + 1.5 * 20
Y-intercept (a) = -2 = 28
Pizza Price
30

25

20

15

10

0
0 5 10 15 20 25

-5
The world in not so linear
Multiple Linear Regression
• When the data has more
than one independent
variable.
Y = a + b1X1+ b2X2 + b3X3
………………. + bnXn
Dataset
Use the following steps to fit a multiple linear
regression model to this dataset.

Step 1: Calculate X12, X22, X1y, X2y and X1X2.


Step 2: Calculate Regression Sums.
•Σx12 = ΣX12 – (ΣX1)2 / n = 38,767 – (555)2 / 8 = 263.875
•Σx22 = ΣX22 – (ΣX2)2 / n = 2,823 – (145)2 / 8 = 194.875
•Σx1y = ΣX1y – (ΣX1Σy) / n = 101,895 – (555*1,452) / 8 = 1,162.5
•Σx2y = ΣX2y – (ΣX2Σy) / n = 25,364 – (145*1,452) / 8 = -953.5
•Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n = 9,859 – (555*145) / 8 = -200.375
Step 3: Calculate b0, b1, and b2.

The formula to calculate b1 is:


[(Σx22)(Σx1y) – (Σx1x2)(Σx2y)] / [(Σx12) (Σx22) – (Σx1x2)2]
Thus,
b1 = [(194.875)(1162.5) – (-200.375)(-953.5)] / [(263.875) (194.875) – (-200.375)2]
= 3.148

The formula to calculate b2 is:


[(Σx12)(Σx2y) – (Σx1x2)(Σx1y)] / [(Σx12) (Σx22) – (Σx1x2)2]
Thus,
b2 = [(263.875)(-953.5) – (-200.375)(1152.5)] / [(263.875) (194.875) – (-
200.375)2] = -1.656

The formula to calculate b0 is:


Y– b1X1 – b2X2
Thus,
b0 = 181.5 – 3.148(69.375) – (-1.656)(18.125) = -6.867
Step 5: Place b0, b1, and b2 in the estimated
linear regression equation.

The estimated linear regression equation is:


Y = b0 + b1*x1 + b2*x2

In our example, it is
Y = -6.867 + 3.148x1 – 1.656x2
Matrix Approach
Coefficients = ((XTX)-1XT )Y

1 1 4
Y X1 X2 1 1 1 1
X= 1 2 5 XT = 1 2 3 4
1 1 4 1 3 8
4 5 8 2
1 4 2 4x3
3x4
6 2 5
1
8 3 8 (((XT)3x4X4x3)-1)3x3(XT)3x4 ) 3x4Y4x1
Y= 6
8 = (result) 3x1
12 4 2
12
4x1
Matrix Approach
Coefficients = ((XTX)-1XT )Y
1 1 1 1 1 1 4
X TX = 1 2 3 4 * 1 2 5
Y X1 X2 1 3 8
4 5 8 2
1 4 2
1 1 4 4 10 19
6 2 5
XTX = 10 30 46
19 46 109
8 3 8
3.15 −0.59 −0.30
12 4 2 (XTX)-1 = −0.59 0.20 0.016
−0.30 0.016 0.054
Matrix Approach
Coefficients = ((XTX)-1XT )Y
3.15 −0.59 −0.30 1 1 1 1
(XTX)-1XT = −0.59 0.20 0.016 * 1 2 3 4
Y X1 X2 −0.30 0.016 0.054 4 5 8 2
1 1 4 (XTX)-1XT =
6 2 5
0.05 0.47 − 1.02 0.19
8 3 8 −0.32 −0.098 0.155 0.26
12 4 2 −0.065 0.005 0.185 −0.125
Matrix Approach
Coefficients = ((XTX)-1XT )Y
((XTX)-1XT )Y =
1
Y X1 X2 0.05 0.47 − 1.02 0.19
* 6
1 1 4 −0.32 −0.098 0.155 0.26 8
−0.065 0.005 0.185 −0.125 12
6 2 5
−1.69 𝑏0
8 3 8 ((XTX)-1XT )Y = 3.48 = 𝑏1
12 4 2 −0.05 𝑏2
b0 = -1.69, b1 = 3.48, b2 = -0.05
Matrix Approach
Coefficients = ((XTX)-1XT )Y
So, Coefficients are:
Y X1 X2 b0 = -1.69, b1 = 3.48, b2 = -0.05
1 1 4
Y = b0 + b1X1 + b2X2
6 2 5
8 3 8 Y = -1.69 + 3.48X1 + -0.05X2

12 4 2
Polynomial Regression Model
It is the extended version of Simple Linear Model
Polynomial
• Zero degree Polynomial
Y = ax0 = a = Constant
• One degree Polynomial
Y = a + bx1 = Simple Linear Equation
• Two degree Polynomial
Y = a + bx1 + bx2
• n degree Polynomial
Y = a + bx1 + bx2 + bx3 +……….. + bxn
Regression Model

Simple Linear • Y = a + bX
Regression
Multiple Linear • Y = a + b1X1 + b2X2 + b3X3 + ……….. + bnXn
Regression
Polynomial • Y = a + bX1 + bX2 + bX3 +……….. + bXn
Regression
1 - Linear Relationship
Between dependent and independent variables
2 - Normal Distribution of Residuals
Mean should be zero
3 - Very low/No Multicollinearity
As we can see, there is no relation between
independent variables
4- No Auto-correlation

• Whenever you plot errors you should not find any


correlation between them
5 - Homoscedasticity
• Homo → Same
• Scedasticity → spread/scatter
‘Having the same scatter’
Application of Linear Regression
• House Price Prediction
• Bitcoin Price Prediction
• Stock Market Analysis
• Market Sales Prediction
• Rainfall Prediction
• Weather Prediction
Logistic Regression
Logistic Regression

𝑌 = 𝜎 𝑎 + 𝑏𝑥
Sigmoid
Function
Logistic Regression

1
𝑌= −(𝑎+𝑏𝑥)
1+ 𝑒
Logistic Regression
Study Exam
Hours Result
X Y
• Supervised Classification Model
2 0 • Dependent Variable (Y) is
3 0
Categorical or binary (0 or 1)
4 0
5 1 • Independent Variable (X) is
6 1 Continuous
7 1
8 1
Linear Regression Vs Logistic Regression
What is error

• The difference between predicted values and the


actual values.
𝐸𝑟𝑟𝑜𝑟 = 𝑌 − 𝑌෠

Observed Predicted value


or actual
value
Mean Squared Error

Note: MSE can be used to calculate


loss in Linear regression but can’t be
used in Logistic Regression
Support Vector Machine

• Supervised Machine
Learning Algorithm.
• Binary Classification.
• Vectors means the
data points.
Basic Concepts in SVM

• Support Vectors – The data points closest to


hyperplane.
• Hyperplane – Line which divides into two different
classes.
• Margin – Gap between 2 lines on closest data points
of two different classes.
How to choose the Hyperplane
Scenario
SVM chooses the hyperplane with maximum margin
Non- Linearly Separable Data Points
Kernel Functions

• Mathematical Functions
• Take data at input and transform it into required output.
• Different Kernal Functions are:
– Linear Kernel
– Polynomial Kernel
– Gaussian Kernel
– Radial Basis Function (RBF)
Linear Kernel

• When data is linearly separable, then Linear


Kernel is used.
• For eg, we have 2 vectors x1 and y1, then linear
Kernel K is:
𝐾 𝑥1, 𝑦1 = 𝑥1. 𝑦1
Polynomial Kernel

• Allowed for the curved lines in the


input space.
𝐾 𝑥𝑖, 𝑦𝑖 = 1 + 𝑥𝑖 , 𝑦𝑖 d
Where d = degree of polynomial.

• Very popular in Image Processing.


Gaussian Kernel

• When there is no prior knowledge


of data.
|𝑥−𝑦|2
−( 2 )
𝐾 𝑥, 𝑦 = 𝑒 2𝜎
Radial Basis Function (RBF)

𝐾 𝑥𝑖 , 𝑥𝑗 = exp −𝐺𝑎𝑚𝑚𝑎 ∗ 𝑆𝑢𝑚 𝑥𝑖 − 𝑥𝑗 2

Where Gamma (𝛾) = Constant Parameter (0 < 𝛾 < 1)


Naïve Bayes Classifier
• Naïve means “untrained” or “without experience”.
• Based on Bayes Theorem.
• Supervised learning algorithm.
• Simple and powerful.
• Assumption made here is that every feature is class
conditionally independent.
Marginal Probability
• Simplest form of probability
• Occurring of event A in presence of all other
events.
Favourable Events
P(A) =
𝑇𝑜𝑡𝑎𝑙 𝐸𝑣𝑒𝑛𝑡𝑠
Eg – Probability of a card being Ace in a deck of
52 cards.
4 1
P(Ace) = =
52 13
Joint Probability
• Occurring of 2 events at same time
𝑃 𝐴, 𝐵 = 𝑃 𝐴 ∩ 𝑃 𝐵 = P (A and B)
Eg - The probability that a card is an Ace and red.
𝑃 𝐴𝑐𝑒, 𝑅𝑒𝑑 = 𝑃 𝐴𝑐𝑒 𝑎𝑛𝑑 𝑅𝑒𝑑
2
=
52
1
=
26
Conditional Probability
• Occurrence of event B when event A is already
occurred.
𝑃(𝐵 ∩ 𝐴)
𝑃 𝐵𝐴 =
𝑃(𝐴)
Eg - Given that you drew a red card, what’s the
probability that it’s an Ace
𝑃(𝐴𝑐𝑒 𝑎𝑛𝑑 𝑅𝑒𝑑)
𝑃 𝐴𝑐𝑒 𝑅𝑒𝑑 =
𝑃(𝑅𝑒𝑑)
2/52
=
26/52
1
=
13
Bayes Theorem
Outlook Play Tennis
Rainy Yes
Total No. of Yes = 10
Sunny Yes Total No. of No = 4
Overcast Yes
Overcast Yes P(Yes) = 10/14
Sunny No P(No) = 4/14
Rainy Yes
Sunny Yes P(Sunny) = 5/14
Overcast Yes P(Rainy) = 4/14
Rainy No P(Overcast) = 5/14
Sunny No
Sunny Yes
Rainy No
Overcast Yes
Overcast Yes
Step-1: Make a frequency table

Outlook Yes No
Overcast
Rainy
Sunny
Total
Step-1: Make a frequency table

Outlook Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4
Step-2: Make Likelihood Table

Outlook Yes No Likelihood


Overcast
Rainy
Sunny
All
Step-2: Make Likelihood Table

Outlook Yes No Likelihood


Overcast 5 0 5/14
Rainy 2 2 4/14
Sunny 3 2 5/14
All 10/14 4/14
Outlook P(Outook|Yes) P(Outook|No)

Overcast
Rainy
Sunny
Outlook P(Outook|Yes) P(Outook|No)

Overcast 5/10 0
Rainy 2/10 2/4
Sunny 3/10 2/4
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
Step-3: Apply Bayes’ Theorem:
𝑃 𝐵 𝐴 . 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
• First, we find the probability of Yes when it is Sunny
𝑃 𝑆𝑢𝑛𝑛𝑦 𝑌𝑒𝑠 ∗𝑃(𝑌𝑒𝑠)
𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 =
𝑃(𝑆𝑢𝑛𝑛𝑦)
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
• First, we find the probability of Yes when it is Sunny
𝑃 𝑆𝑢𝑛𝑛𝑦 𝑌𝑒𝑠 ∗𝑃(𝑌𝑒𝑠)
𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 =
𝑃(𝑆𝑢𝑛𝑛𝑦)
3 10

10 14
𝑃 𝑌𝑒𝑠 𝑆𝑢𝑛𝑛𝑦 = 5 = 3/5 = 0.60
14
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
• Second, we find the probability of No when it is
Sunny
𝑃 𝑆𝑢𝑛𝑛𝑦 𝑁𝑜 ∗𝑃(𝑁𝑜)
𝑃 𝑁𝑜 𝑆𝑢𝑛𝑛𝑦 =
𝑃(𝑆𝑢𝑛𝑛𝑦)
2 4

4 14
𝑃 𝑁𝑜 𝑆𝑢𝑛𝑛𝑦 = 5 = 2/5 = 0.40
14
Find the probability to play tennis
on 15th day using Naïve Bayes
Classifier where Outlook is Sunny
• So, P(Yes|Sunny) > P(No|Sunny)
= 0.60 > 0.40
Therefore, we can say that Player can play tennis on a
sunny day.
P(Play Tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Humidity Prob.
Outlook Prob. Temperature Prob.
High
Sunny hot
Normal
Overcast mild

Rain Windy Prob. cool

true
false
P(Play tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Outlook Prob. Humidity Prob.


Temperature Prob.
Sunny 5/14 High 7/14
hot 4/14
Normal 7/14
Overcast 4/14 mild 6/14
Rain 4/14 Windy Prob. cool 4/14

true 6/14
false 8/14
P(Play tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Outlook yes no Humidity yes no Temperature yes no


Sunny High hot
Normal
Overcast mild
Rain cool
Windy yes no
true
false
P(Play tennis = yes) = 9/14 = 0.64
P(Play Tennis = no) = 5/14 = 0.36

Outlook yes No Humidity yes no Temperature yes no


Sunny 2/9 3/5 High 3/9 4/5 hot 2/9 2/5
Normal 6/9 1/5
Overcast 4/9 0 mild 4/9 2/5
Rain 3/9 2/5 cool 3/9 1/5
Windy yes no
true 3/9 3/5
false 6/9 2/5
Find the probability to play tennis on 15th day using
Naïve Bayes Classifier where conditions are:
Outlook = Sunny, Temperature = Cool,
Humidity = High and Wind = true
argmax P(Yj)∏iP(Xi|Yj)
→ P(Y|X) =
∏iP(Xi)
→ P(yes|X) =
P(yes) x P(Sunny|yes) x P(Cool|yes) x P(High|yes) x P(true|yes)
P(Sunny) x P(Cool) x P(High) x P(true)

9/14 x 2/9 x 3/9 x 3/9 x 3/9


= = 0.242
5/14 x 4/14 x 7/14 x6/14
argmax P(Yj)∏iP(Xi|Yj)
→ P(Y|X) =
∏iP(Xi)
→ P(no|X) =
P(no) x P(Sunny|no) x P(Cool|no) x P(High|no) x P(true|no)
P(Sunny) x P(Cool) x P(High) x P(true)

5/14 x 3/5 x 1/5 x 4/5 x 3/5


= = 0.9408 Thus, P(no|X) > P(yes|X)
5/14 x 4/14 x 7/14 x6/14
So, the result is No.
Applications Of Naïve Bayes Classifier
• Real Time Prediction
• Text Classification
• Spam Filtering
• Sentiment Analysis
• Recommendation System
• Multiclass Classification
Advantages and Disadvantages

• Advantages:
– Fast and easy algorithm
– Can be used for binary and multi classification
– Mostly used for text classification
• Disadvantages:
– Cannot learn relation between independent features
Bayesian Belief Network
• Probabilistic Graphical Model.
• Represents a set of variables and
their conditional dependencies
using a directed acyclic graph.
• Two major components:
• Directed Acyclic Graph (DAG)
• Table of Conditional
Probabilities
Bayesian Belief Network

• Node represents the


random variables
• Arc represents the
casual relationships or
conditional probabilities
between random
variables.
Example 1

0.001
Calculate the probability that alarm has
sounded, but there is neither a burglary,
nor an earthquake occurred, and David
and Sophia both called the Harry.

• We will calculate the joint probability of all the


events
– P(¬B, ¬E, A, D, S)
= P (¬B) *P (¬E) * P (A|¬B ^ ¬E) * P (S|A) * P (D|A)
= 0.998*0.999 * 0.001* 0.75* 0.91
= 0.00068045.
0.001

What is the probability that David called?


P(D) = P(D|A)P(A) + P(D|⌐A)P(⌐A)
What is the probability that David called?
P(D) = P(D|A)P(A) +
P(D|⌐A)P(⌐A)

P(A) = P(A|B,E)P(B)P(E)+
c0.001 P(A|B, ⌐ E)P(B)P(⌐ E)+
P(A| ⌐ B,E)P(⌐ B)P(E)+
P(A| ⌐ B, ⌐ E)P(⌐ B)P(⌐ E)
What is the probability that David called?
P(D) = P(D|A)P(A) +
P(D|⌐A)P(⌐A)

P(⌐A) = P(⌐A|B,E)P(B)P(E)+
0.001 P(⌐A|B, ⌐E)P(B)P(⌐E)+ P(⌐A| ⌐
B,E)P(⌐ B)P(E)+ P(⌐A|⌐B,
⌐E)P(⌐B)P(⌐E)
What is the probability that David called?

• P(A) = 0.00252
• P(⌐A) = 0.99748
• P(D) = P(D|A)P(A) + P(D|⌐A)P(⌐A)
• P(D) = 0.91 * 0.00252 + 0.05 * 0.99748
EM Algorithm
• E -> Expectation
• M -> Maximization
• Used to find latent variable.
• Latent variable – not directly observed
• Basically, used for many unsupervised
clustering algorithm
Steps involved in EM Algorithm
• Step 1- A set of initial values are considered
– Set of incomplete data is given to the system.
• Step 2 - Expectation Step or E-step
– Use observed data to estimate or guess the values.
• Step 3 – Maximization Step or M-Step
– Update the generated values
• Step 4 – To check values are converging or not
– If converging – stop
– Otherwise repeat step 2 or 3 till the convergence
occurs
Usage of EM Algorithm
• Used to fill missing data.
• Used for unsupervised clustering.
• Used to discover values of latent
variable.
• Used to calculate Gaussian density of a
function.
• Used to estimate parameters of Hidden
Markov Model.
Advantages & Disadvantages

Advantages Disadvantages
• Easy to implement as it has • Slow convergence.
only 2 steps E-step and M- • Make convergence local
step. optimal only.
• Likelihood increases after • Required both forward and
each iteration. backward probabilities.
• Solution of M-Step exists in
closed form.
Concept Learning
• “A task of acquiring potential
hypothesis (solution) that best fits the
given training examples”.

• Main Goal – Find all


concepts/hypothesis that are
consistent.
• For each attribute, the hypothesis will
either
• indicate by a “?’ that any value is
acceptable for this attribute,
• specify a single required value
(e.g., Warm) for the attribute, or
• indicate by a “ø” that no value is
acceptable.
Most General and Specific Hypothesis
• The most general hypothesis-that every day
is a positive example-is represented by
(?, ?, ?, ?, ?, ?)
• The most specific possible hypothesis-that
no day is a positive example-is represented
by
(ø, ø, ø, ø, ø, ø)
Types Of Concept Learning

List then Candidate


Find-S
eliminate Elimination
Algorithm
Algorithm Algorithm
Find – S Algorithm
• Step-1: Initialize with most specific
hypothesis (Փ).
H0 = < Փ , Փ , Փ, Փ, Փ >
• Step 2: For each +ve sample,
– For each attribute,
• If (value = Hypothesis value) => Ignore
Else
Replace with the most general hypothesis
(?).
• h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
• h2 = <Sunny, Warm, ?, Strong, Warm, Same>
• h3 = h2
• h4 = <Sunny, Warm, ?, Strong, ?, ?>
• h4 → most specific hypothesis
Disadvantage of Find-S algorithm

• Considers only +ve values.

• h4 may not be the sole hypothesis that


fits the complete data.
Candidate Elimination Algorithm
• Used the concept of version space.
• Considers both +ve and –ve values.
• For +ve samples,
– move from specific(Փ) to general(?).
• For –ve samples,
– move from general(?) to specific(Փ).
Example
S0 = <Փ,Փ,Փ,Փ,Փ,Փ>
G0 = <?,?,?,?,?,?>

1) +ve
S1 = < Sunny, Warm, Normal,
Strong, Warm, Same>
G1 = <?,?,?,?,?,?>
2) +ve
S2 = < Sunny, Warm, ?, Strong, Warm, Same>
G2 = <?,?,?,?,?,?>

3) –ve
S3 = < Sunny, Warm, ?, Strong, Warm, Same>
G3 = <<Sunny,?,?,?,?,?>,<?,Warm,?,?,?,?>,<?,?,?,?,?,same>>
4) +ve
S4 = < Sunny, Warm, ?, Strong, ?, ?>
G4 = <<Sunny,?,?,?,?,?>,<?,Warm,?,?,?,?>>

S4 and G4 are final hypothesis


SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Find S algorithm.

S0 = <Փ,Փ,Փ,Փ>

1) +ve (Honda, Blue, 1970, Economy)


S1 = < Honda, Blue, 1970, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Find S algorithm.

2) -ve (Toyota, Green, 1980, Sports)


S1 = < Honda, Blue, 1970, Economy>
S2 = S1
S2 = < Honda, Blue, 1970, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Find S algorithm.

3) +ve (Toyota, Blue, 1990, Economy)


S2 = < Honda, Blue, 1970, Economy>
S3 = <?, Blue, ?, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Find S algorithm.

4) -ve (BMW, Red, 2000, Economy)


S3 = <?, Blue, ?, Economy>
S4 = S3
S4 = <?, Blue, ?, Economy>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Find S algorithm.

5) +ve (Hona, White, 2010, Economy)


S4 = <?, Blue, ?, Economy>
S5 = <?, ?, ?, Economy> Final Specific Hypothesis
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm
S0 = <Փ,Փ,Փ,Փ>
G0 = <?,?,?,?>
1) +ve (Honda, Blue, 1970, Economy)
S1 = < Honda, Blue, 1970, Economy>
G1 = <?,?,?,?,?,?>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm

2) -ve (Toyota, Green, 1980, Sports)


S2 = S1
S2 = < Honda, Blue, 1970, Economy>
G2 = <<Honda,?,?,?>,<?,Blue,?,?>,<?,?,1970,?>,<?,?,?, Economy>>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm

3) +ve (Toyota, Blue, 1990, Economy)


S2 = < Honda, Blue, 1970, Economy >
S3 = < ?, Blue, ?, Economy>
G2 = <<Honda,?,?,?>,<?,Blue,?,?>,<?,?,1970,?>,<?,?,?, Economy >>
G3 = <<?,Blue,?,?>,<?,?,?, Economy >>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes

Candidate Elimination Algorithm

4) -ve (BMW, Red, 2000, Economy)


S4 = S3
S4 = < ?, Blue, ?, Economy>
G4 = <<?,Blue,?,?>,<?,?,?, Economy >>
SNo Manufacturer Color Year Type Will Buy
1. Honda Blue 1970 Economy Yes
2. Toyota Green 1980 Sports No
3. Toyota Blue 1990 Economy Yes
4. BMW Red 2000 Economy No
5. Honda White 2010 Economy Yes
Candidate Elimination Algorithm
5) +ve (Honda, White, 2010, Economy)
S5 and G5 are final
S4 = < ?, Blue, ?, Economy >
Hypothesis.
S5 = < ?, ?, ?, Economy>
G4 = <<?,Blue,?,?>,<?,?,?, Economy >>
G5 = <?,?,?, Economy >
Consistent Hypothesis (H)

• A hypothesis (h) is consistent with a set of training


examples (D) if and only if h(x) = C(x) for each
example <x, C(x)> in D”.
– Consistent (h,D) Ξ ( ꓯ<x, C(x)> є D) h(x) = C(x)
Example

Example Citations Size In Library Price Editions Buy


1 Some Small No Afordable One No
2 Many Big No Expensive Many Yes

h1 = (?, ?, No, ?, Many) Consistent

h2 = (?, ?, No, ?, ?) Not Consistent


Version Space (VSH,D )

• The version space with respect to hypothesis space


(H) and training examples (D), is the subset of
hypotheses from (H) consistent with the training
examples in D.
– VSH,D = {h є H | Consistent (h, D)}
List then Eliminate

• Step 1- Version Space : A list of every hypothesis in H.


• Step 2- For each training example (x,c(x))
• Remove from version space any hypothesis (h) which
is not consistent (h(x) != c(x)).
• Step 3- Output the list of hypothesis remaining in the
version space.
Example

• F1 -> A, B
• F2-> X, Y
• Instance Spaces: (A,X), (A,Y), (B,X), (B,Y) – 4 examples
• Hypothesis Space: (A,X), (A,Y), (A, Փ), (A,?), (B,X),
(B,Y), (B, Փ), (B,?), (Փ,X), (?,X), (Փ,Y), (?,Y), (Փ, Փ),
(Փ,?), (?, Փ), (?,?) - 16 Hypothesis
List then Eliminate

• Semantically Distinct Hypothesis: (A,X), (A,Y), (A,?),


(B,X), (B,Y), (B,?), (?,X), (?,Y), (?,?), (Փ, Փ) – 10
• Now, Using list then eliminate algorithm:
• Step 1: Version Space:
– (A,X), (A,Y), (A,?), (B,X), (B,Y), (B,?), (?,X), (?,Y),
(?,?), (Փ, Փ)
List then Eliminate
• Step 1: Version Space:
– (A,X), (A,Y), (A,?), (B,X), (B,Y), (B,?), (?,X),
(?,Y), (?,?), (Փ, Փ)
F1 F2 Target
• Training Instances: A X Yes
A Y Yes

• Step 2: Remove from Version Space which is


not consistent:
– (A,?), (?,) → Consistent Hypothesis
Problem with List then Eliminate
Algorithm

• The Hypothesis space must be finite.

• Enumeration of all the hypothesis,


rather inefficient because listing all the
hypothesis is just a waste of time.
Reference Books

Tom M. Mitchell, Ethem Alpaydin, ―Introduction Stephen Marsland, Bishop, C., Pattern
―Machine Learning, to Machine Learning (Adaptive ―Machine Learning: An Recognition and Machine
McGraw-Hill Computation and Machine Algorithmic Perspective, Learning. Berlin:
Education (India) Learning), The MIT Press 2004. CRC Press, 2009. Springer-Verlag.
Private Limited, 2013.
Text Books

Saikat Dutt, Andreas C. Müller and John Paul Mueller and Dr. Himanshu
Subramanian Sarah Guido - Luca Massaron - Sharma, Machine
Chandramaouli, Amit Introduction to Machine Machine Learning for Learning, S.K.
Kumar Das – Machine Learning with Python Dummies Kataria & Sons -2022
Learning, Pearson

You might also like