Presentation UNIT-2
Presentation UNIT-2
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
•In this scenario, hyper-plane “B” has excellently performed this job.
• Scenario-2: Identify the right hyper-plane : Here, we
have three hyper-planes (A, B, and C) and all are
segregating the classes well. Now, How can we
identify the right hyper-plane?
• Above, you can see that the margin for hyper-plane C is high as
compared to both A and B. Hence, we name the right hyper-
plane as C. Another lightning reason for selecting the hyper-
plane with higher margin is robustness. If we select a hyper-plane
having low margin then there is high chance of miss-
classification.
• Scenario3:Identify the right hyper-plane :
• Advantages of SVM:
• Effective in high dimensional cases
• Its memory efficient as it uses a subset of training points in
the decision function called support vectors
• Different kernel functions can be specified for the decision
functions and its possible to specify custom kernels
Sr. No. Title of Book Author Publications
01 Machine Learning, Second Stephen Marsland CRC Press
Edition
02 Introduction to Machine Ethem Alpaydin The MIT Press
Learning, Third Edition
03 Machine Learning Tom M. Mitchell McGraw-Hill
Website:-
1. https://www.javatpoint.com/k-nearest-neighbor-
algorithm-for-machine-learning.
2. https://medium.com/@adi.bronshtein
3.
https://www.analyticsvidhya.com/blog/2021/05/k
nn-the-distance-based-machine-learning-algorithm
4.https://www.tutorialspoint.com/
5. https://towardsdatascience.com/
6.
https://people.revoledu.com/kardi/tutorial/KNN/K
NN_Numerical-example.html
7. https://medium.com/analytics-vidhya
Decision Tree Classification Algorithm:
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems.
• It uses a flowchart like a tree structure to show the predictions that result
from a series of feature-based splits.
• The decisions or the test are performed on the basis of features of the given
dataset.
Where,
probability(a) is probability of getting head and
probability(b) is probability of getting tail.
What is “Entropy”? and What is its function?
• In machine learning, entropy is a measure of the
randomness in the information being processed.
The higher the entropy, the harder it is to draw
any conclusions from that information.
= [ -3/8*log2(3/8) - 5/8*log2(5/8)]
= [ -0.375 * (-1.415) - 0.625 * (-0.678)]
= (0.530+0.424)
= 0.954 bits
Information Gain:
• The measure we will use called information gain,
is simply the expected reduction
in entropy caused by partitioning the data set
according to this attribute.
• It is given by
• For the age category “youth,” there are 2 yes tuples and 3
no tuples.
• For the category “middle aged,” there are 4 yes tuples and
0 no tuples.
= 0.694
• Hence, the gain in information from such a partitioning
would be,
= 0.940-0.694
= 0.246 bits
• Next, we need to compute the expected information
requirement for the attribute Income.
• For the income category “High ,” there are 2 yes tuples and
2 no tuples.
Entropy(income(D))=(4/14)*[-2/4*log(2/4)-2/4*log(2/4)+(6/14)*[-4/6*log(4/6)-
2/6*log(2/6)] + (4/14)*[-3/4*log(3/4)- ¼ *log(1/4)]
=(4/14)*[(-0.5*-1.0+-0.5* 1.0)]+(6/14)*[(0.666*0.5849)+(0.3333*1.5851)]+
(4/14)*[(0.75*0.4150) + (0.25*2.0)]
=(2/7)*[0.5+0.5]+(3/7)*[(0.666*0.5849)+(0.3333*1.5851)]+
(2/7)*[(0.75*0.4150) + (0.25*2.0)]
Info(student)(D)=(7/14)*[-6/7*log(6/7)-1/7*log(1/7)+(7/14)*[-3/7*log(3/7)-
4/7*log(4/7)]
= (7/14)*[0.8571*(0.2224)+(0.1428*2.8079)+(7/14)*[0.4285*1.2226 -
0.5714*0.8074)]
= (7/14)*[0.8571*(0.2224)+(0.1428*2.8079)+(7/14)*[0.4285*1.2226 -
0.5714*0.8074)]
= 0.5*(0.1906+0.4009)+0.5*(0.5238+0.4613)
=0.2957+0.4925
=0.7880
Gain(student) = 0.940-0.788
= 0.152 bits
• Similarly, we can compute Gain(credit_rating)= 0.048 bits.
• Node N is labeled with age, and branches are grown for each of
the attribute’s values.
• The final decision tree returned by the algorithm was shown earlier
in Figure 8.2.
• Step 1: Calculate the Entropy of one attribute —
Prediction: Clare Will Play Tennis/ Clare Will Not
Play Tennis
• For this illustration, I will use this contingency table
to calculate the entropy of our target variable:
Played? (Yes/No). There are 14 observations (10
“Yes” and 4 “No”). The probability (p) of ‘Yes’ is
0.71428(10/14), and the probability of ‘No’ is
0.28571 (4/14). You can then calculate the entropy
of our target variable using the equation above.
• Step 2: Calculate the Entropy for each feature using the
contingency table
• To illustrate, I use Outlook as an example to explain how to
calculate its Entropy. There are a total of 14 observations.
Summing across the rows we can see there are 5 of them
belong to Sunny, 4 belong to Overcast, and 5 belong to Rainy.
Therefore, we can find the probability of Sunny, Overcast,
and Rainy and then calculate their entropy one by one using
the above equation. The calculation steps are shown below.
• Definition: Information Gain is the decrease
or increase in Entropy value when the node is
split.
• The equation of Information Gain:
= 9/14
= 0.642
• Now, we will see probability of not playing
tennis.
• Probability of not playing tennis:
Number of favourable events : 5
• Number of total events : 14
• Probability = (Number of favourable events) / (Number of total events)
=5/14
=0.357
• Entropy at source= -(Probability of playing
tennis) * log2(Probability of playing tennis) –
(Probability of not playing tennis) *
log2(Probability of not playing tennis)
• E(S) = -[(9/14)log2(9/14) - (5/14)log2(5/14)]
= -0.652 * log2(0.652) – 0.357*log2(0.357)
=0.940
So, Entropy of whole system before we make
our first question is 0.940
Note: Here typically we will take log to base 2
• From the above data for outlook we can arrive at the following
table easily:
Sunny Rain
Overcast
• E(Sunny, Humidity)=(3/5)*E(0,3)+(2/5)*E(2,0)
=(3/5)*0+(2/5)*0
=0
IG(Sunny, Humidity)=E(S)- E(Sunny, Humidity)
= 0.971-0
= 0.971
For humidity from the above table, we can say that play will occur if
humidity is normal and will not occur if it is high. Similarly, find the
nodes under rainy.
Play Tennis?
Yes No Total
Hot 0 2 2
Sunny Temp. Mild 1 1 2
Cool 1 0 1
5
E(Sunny, Temperature)=(2/5)*E(0,2)+(2/5)*E(1,1)+(1/5)*E(1,0)
=(2/5)*0+(2/5)*(-(1/2)log1/2-(1/2)log1/2)+ (1/5)*0
=0+(2/5)*1.0+0= 0.40
E(Sunny, Wind)=(2/5)*E(1,1)+(3/5)*E(1,2)
=(2/5)*(-(1/2)log(1/2)-(1/2)log(1/2)+ (3/5)*(-(1/3)log1/3-(2/3)log2/3)
=0.4*[0.5(-1)+(0.5)(-1)]+0.6[-0.3333*(-1.5851)-0.6666*(-0.585)]
=0.4*1.0+0.6 *[0.5278+0.3896]
=0.4+0.6*0.9174
=0.4+0.5504
=0.9504
E(Rain)=E(3,2)
= (-3/5)*log3/5-(2/5)log(2/5)
=0.6*0.7369+0.4*1.3219
=0.970
High 1 1 2
Outlook Rain Normal 2 1 3
Total 5
E(Rain , Humidity)=(2/5)*E(1,1)+(3/5)*E(2,1)
=(2/5)*(-(1/2)log1/2-(1/2)log1/2+(3/5)*(-(2/3)log2/3-(1/3)log1/3
=0.4*((0.5*-1)+(0.5*1)+0.6*(0.666*0.5851+0.333*1.5851)
=0.4*1.0+0.6*(0.3900+0.5283)
=0.4+0.5509
=0.950
IG(Rain, Humidity)=E(S)- E(Rain, Humidity)
= 0.970-0.950
= 0.020
Day Outlook Wind Total
D4 Rain Weak Yes
D5 Rain Weak Yes
D6 Rain Strong No
D10 Rain Weak Yes
D14 Rain Strong No
E(Rain , Wind)=(2/5)*E(1,1)+(3/5)*E(2,1)
=(2/5)*(-(1/2)log1/2-(1/2)log1/2+(3/5)*(-(2/3)log2/3-(1/3)log1/3
=0.4*((0.5*-1)+(0.5*1)+0.6*(0.666*0.5851+0.333*1.5851)
=0.4*1.0+0.6*(0.3900+0.5283)
=0.4+0.5509
=0.950
IG(Rain, Wind)=E(S)- E(Rain, Wind)
= 0.970-0.950
= 0.020
Day Outlook Temp Humidity Wind Class:
Play Tennis?
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes
D14 Rain Mild High Strong No
ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]
? Yes ?
Test for this node