0% found this document useful (0 votes)

26 views37 pages

Data Mining Unit-2

The document discusses classification techniques in data mining, including Bayesian Classification, K-Nearest Neighbors (KNN), and Decision Trees. It explains how these methods work, provides examples of their application, and outlines the steps involved in each technique. Additionally, it covers important concepts such as prior and conditional probabilities, entropy, and information gain.

Uploaded by

gangak23022005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views37 pages

Data Mining Unit-2

Uploaded by

gangak23022005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 37

UNIT – 2 CLASSIFICATION

Classification in data mining is a common technique that separates data points

into different classes. It allows you to organize data sets of all sorts, including
complex and large datasets as well as small and simple ones.
Classification Techniques in Data Mining
 Regression
 Naive Bayes Classification
 K-Nearest Neighbour(KNN)
 Decision Trees
1. Bayesian Classification – It is a supervised learning algorithm based on the
Bayes theorem. Bayesian classifiers view high efficiency and speed when
used to high databases.
P(Y/X)= (P(X/Y) * P(Y) ) / P(X)
P(Y/X 1, X2 ,…..Xn)=P(X1/Y)*P(X2/Y)……P(Xn/Y) * P(Y) / P(X1) *P(X2)
…..P(Xn) for yes
P (N/X 1, X2 ,…..Xn)=P(X1/N)*P(X2/N)……P(Xn/N) * P(N) / P(X1) *P(X2)
…..P(Xn) for no
 In Bayes classification ,we get the output from pre based knowledge.
 Bayes classification can predict class membership probability ,such
as,the probability that a given tuple belongs to a particular class or
not.
 Bayes classifiers are statistical classifiers. Ie. Here we use numerical
or mathematical formulas to calculate bayes classification.
 It predicts probalility that a given record belongs to a particular class
or not.
Problem 1: given in the below table,find whether
person(Flu,Covid)belongs to which class ie. Fever(yes/no).

Person Covid(yes/no) Flu(yes/no) Fever(yes/no)

1 Yes No Yes
2 No Yes Yes
3 Yes Yes Yes
4 No No No
5 Yes No Yes
6 No No Yes
7 Yes No Yes
8 Yes No No
9 No Yes Yes
10 No Yes No

Step 1: Prior probability

P(fever = yes) = 7 / 10
P(fever = no) = 3 /10
Step 2: Conditional probability

Yes NO
COVID 4/7 2/3
FLU 3/7 2/3

Note: 4/7 : if covid is yes and fever is yes and No with No is the condition
P(Yes / Flu , Covid) = P(Flu/yes)*P(covid/yes)*P(yes)
3/7 * 4/7 * 7/10 = 0.17
P(No/Flu, Covid) = P(Flu/No)*P(covid/yes)*P(No)
= 2/3* 2/3*3/10 = 0.13
Therefore, given probablity (flu,covid) belongs to yes class because
P(Yes/Flu,Covid) > P(No/Flu,Covid).
PROBLEM 2: Given the table below
CAR COLOU TYPE ORIGIN STOLEN(C
NO. R LASS)
1 RED SPORT DOMESTI YES
S C
2 RED SPORT DOMESTI NO
S C
3 RED SPORT DOMESTI YES
S C
4 YELLO SPORT DOMESTI NO
W S C
5 YELLO SPORT IMPORTE YES
W S D
6 YELLO SUV IMPORTE NO
W D
7 YELLO SUV IMPORTE YES
W D
8 YELLO SUV DOMESTI NO
W C
9 RED SUV IMPORTE NO
D
10 RED SPORT IMPORTE YES
S D
Given instance : Red,Suv,Domestic belongs to which class?
Step 1: Prior probability : P(yes)=5/10
P(no)=5/10
Step 2: Conditional Probability:
Color Yes No
Red 3/5 2/5
Yellow 2/5 3/5

Type Yes No
Sports 4/5 2/5
Suv 1/5 3/5

Origin Yes No
Domestic 2/5 3/5
Imported 3/5 2/5

P(Yes/Red,Suv,Domestic)=P(Red/Yes)*P(Suv/Yes)*P(Domestic/
yes)*P(Yes)
3/5 * 1/5 * 2/5 * 5/10= 0.024

P(No/Red,Suv,Domestic)=P(Red/No)*P(Suv/No)*P(Domestic/No)*P(No)
2/5 * 3/5 *3/5 * 5/10=0.072
Therefore Red,Suv,Domestic belongs to “ No” class because 0.072>0.024
2. K-Nearest Neighbors algorithm:

Step #1 - Assign a value to K.

Step #2 - Calculate the distance between the new data entry and all other existing
data entries (you'll learn how to do this shortly). Arrange them in ascending order.
Step #3 - Find the K nearest neighbors to the new entry based on the calculated
distances.
Step #4 - Assign the new data entry to the majority class in the nearest neighbors.

K-Nearest Neighbors Classifiers and Model Example With Diagrams

Consider the diagram below:

The graph above represents a data set consisting of two classes — red and blue.
A new data entry has been introduced to the data set. This is represented by the
green point in the graph above.

We'll then assign a value to K which denotes the number of neighbors to consider
before classifying the new data entry. Let's assume the value of K is 3.

Since the value of K is 3, the algorithm will only consider the 3 nearest neighbors
to the green point (new entry). This is represented in the graph above.
Out of the 3 nearest neighbors in the diagram above, the majority class is red so the
new entry will be assigned to that class.

The last data entry has been classified as red.

K-Nearest Neighbors Classifiers and Model Example With Data Set
calculate the distance between a new entry and other existing values using the
Euclidean distance formula.

Note: you can also calculate the distance using the Manhattan and Minkowski
distance formulas.

BRIGHTNESS SATURATION CLASS

40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
The table above represents our data set. We have two columns
— Brightness and Saturation. Each row in the table has a class of
either Red or Blue.
Before we introduce a new data entry, let's assume the value of K is 5.
How to Calculate Euclidean Distance in the K-Nearest Neighbors Algorithm
Here's the new data entry:

BRIGHTNESS SATURATION CLASS

20 35 ?
We have a new entry but it doesn't have a class yet. To know its class, we have to
calculate the distance from the new entry to other entries in the data set using the
Euclidean distance formula.

Here's the formula: √(X₂-X₁)²+(Y₂-Y₁)²

Where:

 X₂ = New entry's brightness (20).

 X₁= Existing entry's brightness.
 Y₂ = New entry's saturation (35).
 Y₁ = Existing entry's saturation.

d1 = √(20 - 40)² + (35 - 20)²

= √400 + 225
= √625
= 25

d2 = √(20 - 50)² + (35 - 50)²

= √900 + 225
= √1125
= 33.54

d3 = √(20 - 60)² + (35 - 90)²

= √1600 + 3025
= √4625
= 68.01

Table after all the distances have been calculated:

BRIGHTNESS SATURATION CLASS DISTANCE

40 20 Red 25
50 50 Blue 33.54
60 90 Blue 68.01
10 25 Red 10
70 70 Blue 61.03
60 10 Red 47.17
25 80 Blue 45
Let's rearrange the distances in ascending order:
BRIGHTNESS SATURATION CLASS DISTANCE
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54
25 80 Blue 45
60 10 Red 47.17
70 70 Blue 61.03
60 90 Blue 68.01
Since we chose 5 as the value of K, we'll only consider the first five rows. That is:
BRIGHTNESS SATURATION CLASS DISTANCE
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54
25 80 Blue 45
60 10 Red 47.17
As you can see above, the majority class within the 5 nearest neighbors to the new
entry is Red. Therefore, we'll classify the new entry as Red.
Here's the updated table:

BRIGHTNESS SATURATION CLASS

40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
BRIGHTNESS SATURATION CLASS
20 35 Red
How to Choose the Value of K in the K-NN Algorithm
There is no particular way of choosing the value K, but here are some common
conventions to keep in mind:
 Choosing a very low value will most likely lead to inaccurate predictions.
 The commonly used value of K is 5.
 Always use an odd number as the value of K.
Advantages of K-NN Algorithm
 It is simple to implement.
 No training is required before classification.
Disadvantages of K-NN Algorithm
 Can be cost-intensive when working with a large data set.
 A lot of memory is required for processing large data sets.
 Choosing the right value of K can be tricky.

3. Decision Tree
A Decision Tree is a popular machine learning algorithm used for both classification and regress
represents a series of decisions and their possible outcomes. Each internal node of the tree corres
represents a decision based on that attribute, and each leaf node represents the final outcome or c
and easy to understand, making them useful for both analysis and prediction.

Decision Tree Terminologies

Root Node- It is the topmost node in the tree, which represent the complete dataset. Also we can
making process.

Decision/Internal Node- Decision nodes are nothing but the result in the splitting of data into mu
the children nodes with maximum homogeneity or purity( means all of the same kind).
Leaf/Terminal Node- This node represent the data section having highest homogeneity (means a
Entropy- Entropy is the measurement of impurities or randomness in the data points.
If all elements belong to a single class, then it is termed as “Pure”, and if not then the distributio

It is used for checking the impurity or uncertainty present in the data. Entropy is used to evaluate
the sample is completely homogeneous, meaning that each instance belongs to the same class and
divided between different classes.
Decision tree algorithms:

1 . ID3 Algorithm:
2. C4.5 algorithm
3. CART

ID3 Algorithm:

The ID3 (Iterative Dichotomiser 3) algorithm is one of the earliest and most widely used algorith
dataset. It uses the concept of entropy and information gain to select the best attribute for splittin
uncertainty or randomness in the data, and information gain quantifies the reduction in uncertain
particular attribute. The ID3 algorithm recursively splits the dataset based on the attributes with t
criterion is met, resulting in a Decision Tree that can be used for classification tasks.

Steps to Create a Decision Tree using the ID3 Algorithm:

Step 1: Data Preprocessing:

Clean and preprocess the data. Handle missing values and convert categorical variables into num

Step 2: Selecting the Root Node:

Calculate the entropy of the target variable (class labels) based on the dataset. The formula for en

Entropy(S) = -Σ(Pi * log2(P_i))

where Pi is the probability of instances belonging to class i.
Step 3: Calculating Information Gain:
For each attribute in the dataset, calculate the information gain when the dataset is split on that at
Information Gain(S, A) = Entropy(S) - Σ ((|S v| / |S|) * Entropy(S_v))
where S_v is the subset of instances for each possible value of attribute A, and |S_v| is the numbe

Step 4: Selecting the Best Attribute:

Choose the attribute with the highest information gain as the decision node for the tree.

Step 5: Splitting the Dataset:

Split the dataset based on the values of the selected attribute.

Step 6: Repeat the Process:

Recursively repeat steps 2 to 5 for each subset until a stopping criterion is met (e.g., the tree dept
a subset belong to the same class).

Example:

Let’s illustrate the ID3 algorithm with a simple example of classifying whether to play tennis bas
following dataset:

Play
Wea Temper Humi Wi
Ten
ther ature dity ndy
nis?

Sunn Fals
Hot High No
y e

Sunn Tru
Hot High No
y e

Over Fals
Hot High Yes
cast e

Rain Mild High Fals Yes

Play
Wea Temper Humi Wi
Ten
ther ature dity ndy
nis?

y e

Rain Norm Fals

Cool Yes
y al e

Rain Norm Tru

Cool No
y al e

Over Norm Tru

Cool Yes
cast al e

Sunn Fals
Mild High No
y e

Sunn Norm Fals

Cool Yes
y al e

Rain Norm Fals

Mild Yes
y al e

Sunn Norm Tru

Mild Yes
y al e

Over Tru
Mild High Yes
cast e
Play
Wea Temper Humi Wi
Ten
ther ature dity ndy
nis?

Over Norm Fals

Hot Yes
cast al e

Rain Tru
Mild High No
y e

Step 1: Data Preprocessing:

The dataset does not require any preprocessing, as it is already in a suitable format.

Step 2: Calculating Entropy:

To calculate entropy, we first determine the proportion of positive and negative instances in the d

 Positive instances (Play Tennis = Yes): 9

 Negative instances (Play Tennis = No): 5
Entropy(S) = -(9/14) * log2(9/14) – (5/14) * log2(5/14) ≈ 0.940

Step 3: Calculating Information Gain:

We calculate the information gain for each attribute (Weather, Temperature, Humidity, Windy) a
information gain as the root node.

Information Gain(S, Weather) = Entropy(S) – [(5/14) * Entropy(Sunny) + (4/14) * Entropy(Ove

Information Gain(S, Temperature) = Entropy(S) – [(4/14) * Entropy(Hot) + (4/14) * Entropy(Mi

Information Gain(S, Humidity) = Entropy(S) – [(7/14) * Entropy(High) + (7/14) * Entropy(Norm

Information Gain(S, Windy) = Entropy(S) – [(8/14) * Entropy(False) + (6/14) * Entropy(True)]

Step 4: Selecting the Best Attribute:

The “Weather” attribute has the highest information gain, so we select it as the root node for our

Step 5: Splitting the Dataset:

We split the dataset based on the values of the “Weather” attribute into three subsets (Sunny, Ov

Step 6: Repeat the Process:

Since the “Weather” attribute has n0o repeating values in any subset, we stop splitting and label
subset. The decision tree will look like below:

Advantages

 Inexpensive to construct

 Extremely fast at classifying unknown records Easy to interpret for small-sized trees.

 Robust to noise (especially when methods to avoid over-fitting are employed).

 Can easily handle redundant or irrelevant attributes (unless the attributes are interacting).

Disadvantages
 The space of possible decision trees is exponentially large. Greedy approaches are often un

 Does not take into account interactions between attributes.

 Each decision boundary involves only a single attribute.

C4.5 algorithm

C 4.5 is the successor of ID3.It is the improved version of ID3. It makes use of Gain ratio.

Calculating Gain & Gain Ratios:

1. GainRatio(A) = Gain(A) / SplitInfo(A)

2. Information Gain(S, A) = Entropy(S) - Σ ((|S v| / |S|) * Entropy(S_v))
where S_v is the subset of instances for each possible value of attribute A, and |S_v|
3. Entropy(S) = -Σ(Pi * log2(P_i))
where Pi is the probability of instances belonging to class i.
4. SplitInfo(A) = -∑ |Dj|/|D| * log2|Dj|/|D|

Where Dj is the number of cases of a particular value of an attribute. D here is the

5. Select the highest value of gain ratio and proceed .

Dataset:

The data contains information on weather – related to temperature, humidity, wind, etc. This is a

The column description is as follows:

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No
2 Sunny 80 90 Strong No

3 Overcast 83 78 Weak Yes

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

6 Rain 65 70 Strong No

7 Overcast 64 65 Strong Yes

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

10 Rain 75 80 Weak Yes

11 Sunny 75 70 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

14 Rain 71 80 Strong No
Calculating Global Entropy

There are 14 rows in our data. 9 of them lead to “Yes” decision and 5 lead to “No” decision.

Entropy = – ∑ p(i) * log2p(i)

= – [p(Yes) * log2p(Yes)] – [p(No) * log2p(No)]

= – (9/14) * log2(9/14) – (5/14) * log2(5/14)

= 0.940

Calculating Gain & Gain Ratios:

GainRatio(A) = Gain(A) / SplitInfo(A)

SplitInfo(A) = -∑ |Dj|/|D| * log2|Dj|/|D|

Dj is number of cases of a particular value of an attribute. D here is the total number of cases of t

I. Gain & Gain Ratio for Outlook Variable:

Outlook variable is nominal. It has 3 values: Sunny, Overcast, Rain.

Gain (Decision, Outlook) = Entropy(Decision) – ∑ [ p(Decision|Outlook) * Entropy(Decisio

The above big formula is nothing but the formula for calculating gain. Let’s call this Equation 1

The first part, i.e, Entropy(Decision) has already been calculated by us as 0.940

The second part is the negative summation of the products of (i) Probability of that Outlook valu
Entropy of that Outlook value.

Let’s calculate this 2nd part, i.e, Entropy

1. entropy for Outlook = Sunny

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No

2 Sunny 80 90 Strong No

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

11 Sunny 75 70 Strong Yes

We have 3 No decisions and 2 Yes decisions.

Entropy(Decision|Outlook=Sunny)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= -(3/5).log2(3/5) – (2/5).log2(2/5)

= 0.441 + 0.528

= 0.970

2. Entropy for Outlook = Overcast

Day Outlook Temp. Humidity Wind Decision

3 Overcast 83 78 Weak Yes

7 Overcast 64 65 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

All decisions are Yes here.

Entropy(Decision|Outlook=Overcast)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= -(0/4)*log2(0/4) – (4/4)*log2(4/4)

[Here log20 should be undefined. But we took it as 0. Because if we consider x*log2x, then if x te

3. Entropy for Outlook = Rain

Day Outlook Temp. Humidity Wind Decision

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

6 Rain 65 70 Strong No
10 Rain 75 80 Weak Yes

14 Rain 71 80 Strong No

We have 3 Yes and 2 No decisions.

Entropy(Decision|Outlook=Rain)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= -(2/5)*log2(2/5) – (3/5)*log2(3/5)

= 0.528 + 0.441

= 0.970

4. Gain for Outlook variable:

We are done with calculating Entropies for Outlook variable.

Putting these in the Equation 1 above:

Gain(Decision, Outlook)

= 0.940 – (5/14)(0.970) – (4/14)(0) – (5/14)*(0.970)

= 0.247

5. SplitInfo for Outlook variable:

Sunny: 5 cases

Overcast: 4 cases
Rain: 5 cases

SplitInfo(Decision, Outlook)

= -(5/14)log2(5/14) -(4/14)log2(4/14) -(5/14)*log2(5/14)

= 1.577

6. Finally, Gain Ratio for Outlook variable:

GainRatio(Decision, Outlook)

= Gain(Decision, Outlook)/SplitInfo(Decision, Outlook)

= 0.247/1.577

= 0.156

More work needs to be done. This is Gain Ratio for just 1 of the attributes. we have to calculate t
that we can compare them at the end.

II. Gain & Gain Ratio for Wind Variable:

This is also a nominal variable. It has 2 values: Weak & Strong.

Gain (Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) * Entropy(Decision|Win

Let’s call this Equation 2.

1. Entropy for Wind = Weak

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No
3 Overcast 83 78 Weak Yes

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

10 Rain 75 80 Weak Yes

13 Overcast 81 75 Weak Yes

We have 6 Yes and 2 No decisions.

Entropy(Decision|Wind=Weak)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= – (2/8) * log2(2/8) – (6/8) * log2(6/8)

= 0.811

2. Entropy for Wind = Strong

Day Outlook Temp. Humidity Wind Decision

2 Sunny 80 90 Strong No

6 Rain 65 70 Strong No

7 Overcast 64 65 Strong Yes

11 Sunny 75 70 Strong Yes

12 Overcast 72 90 Strong Yes

14 Rain 71 80 Strong No

We have 3 Yes and 3 No decisions.

Entropy(Decision|Wind=Strong)

= – (3/6) * log2(3/6) – (3/6) * log2(3/6)

3. Gain for Wind variable:

Gain(Decision, Wind)

= 0.940 – (8/14)(0.811) – (6/14)(1)

= 0.940 – 0.463 – 0.428

= 0.049

4. SplitInfo for Wind variable:

Weak: 8 cases

Strong: 6 cases

SplitInfo(Decision, Wind)

= -(6/14)*log2(6/14) -(8/14)*log2(8/14)

= 0.524 + 0.461

= 0.985

5. Finally, Gain Ratio for Wind variable:

GainRatio(Decision, Wind)

= Gain(Decision, Wind)/SplitInfo(Decision, Wind)

= 0.049 / 0.985

= 0.049

III. Gain & Gain Ratio for Humidity Variable:

This is where things get interesting because Humidity is a continuous variable. How do we dea

Step 1. Arrange the values in ascending order.

Step 2. Convert them to nominal values by performing a binary split on a threshold value.

[Gain for this variable must be maximum at the threshold value.]

Step 3. The gain at this threshold value will be used for comparison of gains and gain ratios of al

1. Let’s arrange it in ascending order of values of Humidity:

Day Humidity Decision

7 65 Yes

6 70 No

9 70 Yes

11 70 Yes

13 75 Yes

3 78 Yes

5 80 Yes

10 80 Yes

14 80 No

1 85 No

2 90 No

12 90 Yes

8 95 No
4 96 Yes

Now, we need to calculate the gains and gain ratios for every value of Humidity. The value whic
Here, we will separate our dataset in 2 parts: (i) values less than or equal to the current value, and

2. Calculating Gains and Gain Ratios for all values:

2.a. For Humidity = 65

We have 1 Yes & 0 No decisions at <= 65 and 8 Yes & 5 No decisions at > 65

Entropy(Decision|Humidity<=65)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(0/1).log2(0/1) – (1/1).log2(1/1)

Entropy(Decision|Humidity>65)

= -(5/13).log2(5/13) – (8/13).log2(8/13)

=0.530 + 0.431

= 0.961

Gain(Decision, Humidity<> 65)

= 0.940 – (1/14).0 – (13/14).(0.961)

= 0.048
SplitInfo(Decision, Humidity<> 65) =

-(1/14).log2(1/14) -(13/14).log2(13/14)

= 0.371

GainRatio(Decision, Humidity<> 65)

= 0.048/0.371

= 0.129

2.b. For Humidity = 70

We have 3 Yes & 1 No decisions at <= 70 and 6 Yes & 4 No decisions at > 70

Entropy(Decision|Humidity<=70)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(1/4).log2(1/4) – (3/4).log2(3/4)

= 0.811

Entropy(Decision|Humidity>70)

= -(4/10).log2(4/10) – (6/10).log2(6/10)

= 0.971

Gain(Decision, Humidity<> 70)

= 0.940 – (4/14).(0.811) – (10/14).(0.971)

= 0.014
SplitInfo(Decision, Humidity<> 70)

= -(4/14).log2(4/14) -(10/14).log2(10/14)

= 0.863

GainRatio(Decision, Humidity<> 70)

= 0.014/0.863

= 0.016

Similarly, calculate the Gains and Gain Ratios for all other values of Humidity.

We found out that the Gain was maximum for Humidity = 80

[Note: Here is something interesting. You can take either Gain or Gain Ratio as the threshold val
Decision Trees. We are taking Gain.]

Gain(Decision, Humidity <> 80) = 0.101

GainRatio(Decision, Humidity <> 80) = 0.107

IV. Gain & Gain Ratio for Temp. Variable:

This is also a continuous variable. We will repeat the steps we did for Humidity variable.

1. Let’s arrange it in ascending order of values of Temp:

Day Temp. Decision

7 64 Yes

6 65 No
5 68 Yes

9 69 Yes

4 70 Yes

14 71 No

8 72 No

12 72 Yes

10 75 Yes

11 75 Yes

2 80 No

13 81 Yes

3 83 Yes

1 85 No
2. Calculating Gains and Gain Ratios for all values:

2.a. For Temp = 64

We have 1 Yes & 0 No decisions at <= 64 and 8 Yes & 5 No decisions at > 64

Entropy(Decision|Temp<=64)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(0/1).log2(0/1) – (1/1).log2(1/1)

Entropy(Decision|Temp>64)

= -(5/13).log2(5/13) – (8/13).log2(8/13)

=0.530 + 0.431

= 0.961

Gain(Decision, Temp <> 64)

= 0.940 – (1/14).0 – (13/14).(0.961)

= 0.048

SplitInfo(Decision, Temp <> 64) =

-(1/14).log2(1/14) -(13/14).log2(13/14)

= 0.371
GainRatio(Decision, Temp <> 64)

= 0.048/0.371

= 0.129

2.b. For Temp = 65

We have 1 Yes & 1 No decisions at <= 65 and 8 Yes & 4 No decisions at > 65

Entropy(Decision|Temp<=65)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(1/2).log2(1/2) – (1/2).log2(1/2)

Entropy(Decision|Temp>65)

= -(4/12).log2(4/12) – (8/12).log2(8/12)

= 0.918

Gain(Decision, Temp<> 65)

= 0.940 – (2/14).1 – (12/14).(0.918)

= 0.010

SplitInfo(Decision, Temp<> 65)

= -(2/14).log2(2/14) -(12/14).log2(12/14)

= 0.591
GainRatio(Decision, Temp<> 65)

= 0.010/0.591

= 0.017

Similarly, calculate the Gains and Gain Ratios for all other values of Temp.

We found out that the Gain was maximum for Temp = 83

Gain(Decision, Temp <> 83) = 0.113

GainRatio(Decision, Temp <> 83) = 0.305

Comparison of Gains and Gain Ratios

Attribute Gain Gain Ratio

Wind 0.049 0.049

Outlook 0.247 0.156

Humidity <> 80 0.101 0.107

Temp <> 83 0.113 0.305

If we use Gain, Outlook will be the root node. (Because it has the highest Gain value)

Similarly, if we use Gain Ratio, Temp will be the root node.

We will proceed using the Gain.

Outlook = Sunny

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No

2 Sunny 80 90 Strong No

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

11 Sunny 75 70 Strong Yes

If humidity > 80, decision is ‘No’

If humidity <= 80, decision is ‘Yes’

Outlook = Overcast

Day Outlook Temp. Humidity Wind Decision

3 Overcast 83 78 Weak Yes

7 Overcast 64 65 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

All decisions are ‘Yes’

Outlook = Rain
Day Outlook Temp. Humidity Wind Decision

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

6 Rain 65 70 Strong No

10 Rain 75 80 Weak Yes

14 Rain 71 80 Strong No

If Wind = Weak, decision is ‘No’

If Wind = Strong, decision is ‘Yes’

So, this is our final Decision Tree using C4.5 algorithm.

Advantages of C4.5 over ID3

C4.5 is an evolution of ID3 by the same author (Quinlan). He made sure that the bottlenecks are
Following are the improvements he made in C4.5

1. It can handle both continuous and discrete variables.

2. It can handle missing values by marking them as ‘?’. They are not used in Gain and Entropy c

3. Prunes the tree and thereby avoids ‘overfitting’.

CART Algorithm
Classification and Regression Trees (CART) is a decision tree algorithm that is
used for both classification and regression tasks. It is a supervised learning
algorithm that learns from labelled data to predict unseen data.
 Tree structure: CART builds a tree-like structure consisting of nodes and
branches. The nodes represent different decision points, and the branches
represent the possible outcomes of those decisions. The leaf nodes in the tree
contain a predicted class label or value for the target variable.
 Splitting criteria: CART uses a greedy approach to split the data at each
node. It evaluates all possible splits and selects the one that best reduces the
impurity of the resulting subsets.
For classification tasks, CART uses Gini impurity or Gini index as the
splitting criterion. The lower the Gini impurity, the more pure the subset is.
The formula for Gini Index is as per the following:

where pi is the probability of a thing having a place with a specific class.

For regression tasks, CART uses residual reduction as the splitting
criterion. The lower the residual reduction, the better the fit of the model to the
data.
 Pruning: pruning is a technique used to remove the nodes that contribute little
to the model accuracy.
To prevent overfitting (Overfitting happens due to several reasons, such as: •
The training data size is too small and does not contain enough data samples to
accurately represent all possible input data values. )of the data,
Cost complexity pruning and information gain pruning are two popular
pruning techniques. Cost complexity pruning involves calculating the cost of
each node and removing nodes that have a negative cost. Information gain
pruning involves calculating the information gain of each node and removing
nodes that have a low information gain.
How does CART algorithm works?
The CART algorithm works via the following process:
 The best-split point of each input is obtained.
 Based on the best-split points of each input in Step 1, the new “best” split
point is identified.
 Split the chosen input according to the “best” split point.
 Continue splitting until a stopping rule is satisfied or no further desirable
splitting is available.

A Course in Machine Learning
100% (1)
A Course in Machine Learning
191 pages
PID Control - ARAKI
No ratings yet
PID Control - ARAKI
23 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Unit 2
No ratings yet
Unit 2
55 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
Chapter
100% (1)
Chapter
101 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
Lec6 K-Means, Niavebase, KNN
No ratings yet
Lec6 K-Means, Niavebase, KNN
25 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
Unit - II
No ratings yet
Unit - II
37 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
Classification
No ratings yet
Classification
50 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
DM - MP
No ratings yet
DM - MP
15 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
Le 4
No ratings yet
Le 4
12 pages
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
No ratings yet
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
22 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Analisis Klasifikasi Part2
No ratings yet
Analisis Klasifikasi Part2
28 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Aiml Unit-4
No ratings yet
Aiml Unit-4
82 pages
Lec 7
No ratings yet
Lec 7
40 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
08 Classification Using K NN
No ratings yet
08 Classification Using K NN
23 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Algorithm
No ratings yet
Algorithm
27 pages
ML 5
No ratings yet
ML 5
76 pages
Classification
No ratings yet
Classification
58 pages
Data Mining Assignment 3
No ratings yet
Data Mining Assignment 3
9 pages
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
No ratings yet
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
23 pages
KNN Classifier Summer 25
No ratings yet
KNN Classifier Summer 25
17 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Intern Final Report Ganga-1
No ratings yet
Intern Final Report Ganga-1
34 pages
V Sem Bca Question Papers 2024
No ratings yet
V Sem Bca Question Papers 2024
10 pages
Web Programming Programs
No ratings yet
Web Programming Programs
35 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
8 pages
Lecture 15
No ratings yet
Lecture 15
37 pages
Detection of Plant Disease Using Smartphones: Inception v3 Versus MobileNet v2
No ratings yet
Detection of Plant Disease Using Smartphones: Inception v3 Versus MobileNet v2
25 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
No ratings yet
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
43 pages
Tuning of Smith Predictor Based Generalized ADRC For Time D - 2020 - ISA Transac
No ratings yet
Tuning of Smith Predictor Based Generalized ADRC For Time D - 2020 - ISA Transac
8 pages
Presentation T12 LeMasson Dal
No ratings yet
Presentation T12 LeMasson Dal
47 pages
7 Plane Truss Example
No ratings yet
7 Plane Truss Example
16 pages
ML 4
No ratings yet
ML 4
14 pages
Optimal Tuning of PID Controller For AVR System Using Modified PSO
No ratings yet
Optimal Tuning of PID Controller For AVR System Using Modified PSO
6 pages
Hughes, T. J. R. (Hughes, Thomas J. R.) (1-TX-CPE) Reali, A. (Reali, Alessandro) (I-PAVI-MC) Sangalli, G. (I-PAVI)
No ratings yet
Hughes, T. J. R. (Hughes, Thomas J. R.) (1-TX-CPE) Reali, A. (Reali, Alessandro) (I-PAVI-MC) Sangalli, G. (I-PAVI)
2 pages
Algo and Flowchart
No ratings yet
Algo and Flowchart
32 pages
The Kendall-Lee Notation For Queuing Systems
No ratings yet
The Kendall-Lee Notation For Queuing Systems
13 pages
Ch.8 - More About Equations - Method of Substitution
No ratings yet
Ch.8 - More About Equations - Method of Substitution
7 pages
LU 1 Questions
No ratings yet
LU 1 Questions
6 pages
ResSimCh6 PDF
No ratings yet
ResSimCh6 PDF
69 pages
Punya Angell
No ratings yet
Punya Angell
2 pages
Multicriteria Decision-Making Models (A Summary)
No ratings yet
Multicriteria Decision-Making Models (A Summary)
3 pages
Exc 4.1
No ratings yet
Exc 4.1
4 pages
BSIT List of Books
No ratings yet
BSIT List of Books
2 pages
HW1 电路
No ratings yet
HW1 电路
10 pages
A Markovian Game Theoretic Framework - Formatted - Ver2
No ratings yet
A Markovian Game Theoretic Framework - Formatted - Ver2
4 pages
NumericalMethodsT264UnitIVByDrNVNagendram PDF
100% (1)
NumericalMethodsT264UnitIVByDrNVNagendram PDF
37 pages
An Algorithm of NURBS Surface Fitting For Reverse Engineering
No ratings yet
An Algorithm of NURBS Surface Fitting For Reverse Engineering
6 pages
Chaotic Image Encryption Techniques: A Project Seminar On
No ratings yet
Chaotic Image Encryption Techniques: A Project Seminar On
31 pages
Ds - Lab - 4.ipynb - Colab
No ratings yet
Ds - Lab - 4.ipynb - Colab
7 pages
PSTAT 174/274 Lecture Notes 3
No ratings yet
PSTAT 174/274 Lecture Notes 3
24 pages
Travelling Salesperson Problem Using Hill Climbing Search
No ratings yet
Travelling Salesperson Problem Using Hill Climbing Search
6 pages
Distribution System Problem
No ratings yet
Distribution System Problem
8 pages
1st Year Incourse Exam Applied Mathematics
No ratings yet
1st Year Incourse Exam Applied Mathematics
2 pages