0% found this document useful (0 votes)
26 views37 pages

Data Mining Unit-2

The document discusses classification techniques in data mining, including Bayesian Classification, K-Nearest Neighbors (KNN), and Decision Trees. It explains how these methods work, provides examples of their application, and outlines the steps involved in each technique. Additionally, it covers important concepts such as prior and conditional probabilities, entropy, and information gain.

Uploaded by

gangak23022005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views37 pages

Data Mining Unit-2

The document discusses classification techniques in data mining, including Bayesian Classification, K-Nearest Neighbors (KNN), and Decision Trees. It explains how these methods work, provides examples of their application, and outlines the steps involved in each technique. Additionally, it covers important concepts such as prior and conditional probabilities, entropy, and information gain.

Uploaded by

gangak23022005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

UNIT – 2 CLASSIFICATION

Classification in data mining is a common technique that separates data points


into different classes. It allows you to organize data sets of all sorts, including
complex and large datasets as well as small and simple ones.
Classification Techniques in Data Mining
 Regression
 Naive Bayes Classification
 K-Nearest Neighbour(KNN)
 Decision Trees
1. Bayesian Classification – It is a supervised learning algorithm based on the
Bayes theorem. Bayesian classifiers view high efficiency and speed when
used to high databases.
P(Y/X)= (P(X/Y) * P(Y) ) / P(X)
P(Y/X 1, X2 ,…..Xn)=P(X1/Y)*P(X2/Y)……P(Xn/Y) * P(Y) / P(X1) *P(X2)
…..P(Xn) for yes
P (N/X 1, X2 ,…..Xn)=P(X1/N)*P(X2/N)……P(Xn/N) * P(N) / P(X1) *P(X2)
…..P(Xn) for no
 In Bayes classification ,we get the output from pre based knowledge.
 Bayes classification can predict class membership probability ,such
as,the probability that a given tuple belongs to a particular class or
not.
 Bayes classifiers are statistical classifiers. Ie. Here we use numerical
or mathematical formulas to calculate bayes classification.
 It predicts probalility that a given record belongs to a particular class
or not.
Problem 1: given in the below table,find whether
person(Flu,Covid)belongs to which class ie. Fever(yes/no).

Person Covid(yes/no) Flu(yes/no) Fever(yes/no)


1 Yes No Yes
2 No Yes Yes
3 Yes Yes Yes
4 No No No
5 Yes No Yes
6 No No Yes
7 Yes No Yes
8 Yes No No
9 No Yes Yes
10 No Yes No

Step 1: Prior probability


P(fever = yes) = 7 / 10
P(fever = no) = 3 /10
Step 2: Conditional probability

Yes NO
COVID 4/7 2/3
FLU 3/7 2/3

Note: 4/7 : if covid is yes and fever is yes and No with No is the condition
P(Yes / Flu , Covid) = P(Flu/yes)*P(covid/yes)*P(yes)
3/7 * 4/7 * 7/10 = 0.17
P(No/Flu, Covid) = P(Flu/No)*P(covid/yes)*P(No)
= 2/3* 2/3*3/10 = 0.13
Therefore, given probablity (flu,covid) belongs to yes class because
P(Yes/Flu,Covid) > P(No/Flu,Covid).
PROBLEM 2: Given the table below
CAR COLOU TYPE ORIGIN STOLEN(C
NO. R LASS)
1 RED SPORT DOMESTI YES
S C
2 RED SPORT DOMESTI NO
S C
3 RED SPORT DOMESTI YES
S C
4 YELLO SPORT DOMESTI NO
W S C
5 YELLO SPORT IMPORTE YES
W S D
6 YELLO SUV IMPORTE NO
W D
7 YELLO SUV IMPORTE YES
W D
8 YELLO SUV DOMESTI NO
W C
9 RED SUV IMPORTE NO
D
10 RED SPORT IMPORTE YES
S D
Given instance : Red,Suv,Domestic belongs to which class?
Step 1: Prior probability : P(yes)=5/10
P(no)=5/10
Step 2: Conditional Probability:
Color Yes No
Red 3/5 2/5
Yellow 2/5 3/5

Type Yes No
Sports 4/5 2/5
Suv 1/5 3/5

Origin Yes No
Domestic 2/5 3/5
Imported 3/5 2/5

P(Yes/Red,Suv,Domestic)=P(Red/Yes)*P(Suv/Yes)*P(Domestic/
yes)*P(Yes)
3/5 * 1/5 * 2/5 * 5/10= 0.024

P(No/Red,Suv,Domestic)=P(Red/No)*P(Suv/No)*P(Domestic/No)*P(No)
2/5 * 3/5 *3/5 * 5/10=0.072
Therefore Red,Suv,Domestic belongs to “ No” class because 0.072>0.024
2. K-Nearest Neighbors algorithm:

Step #1 - Assign a value to K.


Step #2 - Calculate the distance between the new data entry and all other existing
data entries (you'll learn how to do this shortly). Arrange them in ascending order.
Step #3 - Find the K nearest neighbors to the new entry based on the calculated
distances.
Step #4 - Assign the new data entry to the majority class in the nearest neighbors.

K-Nearest Neighbors Classifiers and Model Example With Diagrams


Consider the diagram below:

The graph above represents a data set consisting of two classes — red and blue.
A new data entry has been introduced to the data set. This is represented by the
green point in the graph above.

We'll then assign a value to K which denotes the number of neighbors to consider
before classifying the new data entry. Let's assume the value of K is 3.

Since the value of K is 3, the algorithm will only consider the 3 nearest neighbors
to the green point (new entry). This is represented in the graph above.
Out of the 3 nearest neighbors in the diagram above, the majority class is red so the
new entry will be assigned to that class.

The last data entry has been classified as red.


K-Nearest Neighbors Classifiers and Model Example With Data Set
calculate the distance between a new entry and other existing values using the
Euclidean distance formula.

Note: you can also calculate the distance using the Manhattan and Minkowski
distance formulas.

BRIGHTNESS SATURATION CLASS


40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
The table above represents our data set. We have two columns
— Brightness and Saturation. Each row in the table has a class of
either Red or Blue.
Before we introduce a new data entry, let's assume the value of K is 5.
How to Calculate Euclidean Distance in the K-Nearest Neighbors Algorithm
Here's the new data entry:

BRIGHTNESS SATURATION CLASS


20 35 ?
We have a new entry but it doesn't have a class yet. To know its class, we have to
calculate the distance from the new entry to other entries in the data set using the
Euclidean distance formula.

Here's the formula: √(X₂-X₁)²+(Y₂-Y₁)²

Where:

 X₂ = New entry's brightness (20).


 X₁= Existing entry's brightness.
 Y₂ = New entry's saturation (35).
 Y₁ = Existing entry's saturation.

d1 = √(20 - 40)² + (35 - 20)²


= √400 + 225
= √625
= 25

d2 = √(20 - 50)² + (35 - 50)²


= √900 + 225
= √1125
= 33.54

d3 = √(20 - 60)² + (35 - 90)²


= √1600 + 3025
= √4625
= 68.01

Table after all the distances have been calculated:

BRIGHTNESS SATURATION CLASS DISTANCE


40 20 Red 25
50 50 Blue 33.54
60 90 Blue 68.01
10 25 Red 10
70 70 Blue 61.03
60 10 Red 47.17
25 80 Blue 45
Let's rearrange the distances in ascending order:
BRIGHTNESS SATURATION CLASS DISTANCE
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54
25 80 Blue 45
60 10 Red 47.17
70 70 Blue 61.03
60 90 Blue 68.01
Since we chose 5 as the value of K, we'll only consider the first five rows. That is:
BRIGHTNESS SATURATION CLASS DISTANCE
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54
25 80 Blue 45
60 10 Red 47.17
As you can see above, the majority class within the 5 nearest neighbors to the new
entry is Red. Therefore, we'll classify the new entry as Red.
Here's the updated table:

BRIGHTNESS SATURATION CLASS


40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
BRIGHTNESS SATURATION CLASS
20 35 Red
How to Choose the Value of K in the K-NN Algorithm
There is no particular way of choosing the value K, but here are some common
conventions to keep in mind:
 Choosing a very low value will most likely lead to inaccurate predictions.
 The commonly used value of K is 5.
 Always use an odd number as the value of K.
Advantages of K-NN Algorithm
 It is simple to implement.
 No training is required before classification.
Disadvantages of K-NN Algorithm
 Can be cost-intensive when working with a large data set.
 A lot of memory is required for processing large data sets.
 Choosing the right value of K can be tricky.

3. Decision Tree
A Decision Tree is a popular machine learning algorithm used for both classification and regress
represents a series of decisions and their possible outcomes. Each internal node of the tree corres
represents a decision based on that attribute, and each leaf node represents the final outcome or c
and easy to understand, making them useful for both analysis and prediction.

Decision Tree Terminologies

Root Node- It is the topmost node in the tree, which represent the complete dataset. Also we can
making process.

Decision/Internal Node- Decision nodes are nothing but the result in the splitting of data into mu
the children nodes with maximum homogeneity or purity( means all of the same kind).
Leaf/Terminal Node- This node represent the data section having highest homogeneity (means a
Entropy- Entropy is the measurement of impurities or randomness in the data points.
If all elements belong to a single class, then it is termed as “Pure”, and if not then the distributio

It is used for checking the impurity or uncertainty present in the data. Entropy is used to evaluate
the sample is completely homogeneous, meaning that each instance belongs to the same class and
divided between different classes.
Decision tree algorithms:

1 . ID3 Algorithm:
2. C4.5 algorithm
3. CART

ID3 Algorithm:

The ID3 (Iterative Dichotomiser 3) algorithm is one of the earliest and most widely used algorith
dataset. It uses the concept of entropy and information gain to select the best attribute for splittin
uncertainty or randomness in the data, and information gain quantifies the reduction in uncertain
particular attribute. The ID3 algorithm recursively splits the dataset based on the attributes with t
criterion is met, resulting in a Decision Tree that can be used for classification tasks.

Steps to Create a Decision Tree using the ID3 Algorithm:

Step 1: Data Preprocessing:


Clean and preprocess the data. Handle missing values and convert categorical variables into num

Step 2: Selecting the Root Node:


Calculate the entropy of the target variable (class labels) based on the dataset. The formula for en

Entropy(S) = -Σ(Pi * log2(P_i))


where Pi is the probability of instances belonging to class i.
Step 3: Calculating Information Gain:
For each attribute in the dataset, calculate the information gain when the dataset is split on that at
Information Gain(S, A) = Entropy(S) - Σ ((|S v| / |S|) * Entropy(S_v))
where S_v is the subset of instances for each possible value of attribute A, and |S_v| is the numbe

Step 4: Selecting the Best Attribute:


Choose the attribute with the highest information gain as the decision node for the tree.

Step 5: Splitting the Dataset:


Split the dataset based on the values of the selected attribute.

Step 6: Repeat the Process:


Recursively repeat steps 2 to 5 for each subset until a stopping criterion is met (e.g., the tree dept
a subset belong to the same class).

Example:

Let’s illustrate the ID3 algorithm with a simple example of classifying whether to play tennis bas
following dataset:

Play
Wea Temper Humi Wi
Ten
ther ature dity ndy
nis?

Sunn Fals
Hot High No
y e

Sunn Tru
Hot High No
y e

Over Fals
Hot High Yes
cast e

Rain Mild High Fals Yes


Play
Wea Temper Humi Wi
Ten
ther ature dity ndy
nis?

y e

Rain Norm Fals


Cool Yes
y al e

Rain Norm Tru


Cool No
y al e

Over Norm Tru


Cool Yes
cast al e

Sunn Fals
Mild High No
y e

Sunn Norm Fals


Cool Yes
y al e

Rain Norm Fals


Mild Yes
y al e

Sunn Norm Tru


Mild Yes
y al e

Over Tru
Mild High Yes
cast e
Play
Wea Temper Humi Wi
Ten
ther ature dity ndy
nis?

Over Norm Fals


Hot Yes
cast al e

Rain Tru
Mild High No
y e

Step 1: Data Preprocessing:


The dataset does not require any preprocessing, as it is already in a suitable format.

Step 2: Calculating Entropy:


To calculate entropy, we first determine the proportion of positive and negative instances in the d

 Positive instances (Play Tennis = Yes): 9


 Negative instances (Play Tennis = No): 5
Entropy(S) = -(9/14) * log2(9/14) – (5/14) * log2(5/14) ≈ 0.940

Step 3: Calculating Information Gain:


We calculate the information gain for each attribute (Weather, Temperature, Humidity, Windy) a
information gain as the root node.

Information Gain(S, Weather) = Entropy(S) – [(5/14) * Entropy(Sunny) + (4/14) * Entropy(Ove

Information Gain(S, Temperature) = Entropy(S) – [(4/14) * Entropy(Hot) + (4/14) * Entropy(Mi

Information Gain(S, Humidity) = Entropy(S) – [(7/14) * Entropy(High) + (7/14) * Entropy(Norm

Information Gain(S, Windy) = Entropy(S) – [(8/14) * Entropy(False) + (6/14) * Entropy(True)]

Step 4: Selecting the Best Attribute:


The “Weather” attribute has the highest information gain, so we select it as the root node for our

Step 5: Splitting the Dataset:


We split the dataset based on the values of the “Weather” attribute into three subsets (Sunny, Ov

Step 6: Repeat the Process:


Since the “Weather” attribute has n0o repeating values in any subset, we stop splitting and label
subset. The decision tree will look like below:

Advantages

 Inexpensive to construct

 Extremely fast at classifying unknown records Easy to interpret for small-sized trees.

 Robust to noise (especially when methods to avoid over-fitting are employed).

 Can easily handle redundant or irrelevant attributes (unless the attributes are interacting).

Disadvantages
 The space of possible decision trees is exponentially large. Greedy approaches are often un

 Does not take into account interactions between attributes.

 Each decision boundary involves only a single attribute.

C4.5 algorithm

C 4.5 is the successor of ID3.It is the improved version of ID3. It makes use of Gain ratio.

Calculating Gain & Gain Ratios:

1. GainRatio(A) = Gain(A) / SplitInfo(A)


2. Information Gain(S, A) = Entropy(S) - Σ ((|S v| / |S|) * Entropy(S_v))
where S_v is the subset of instances for each possible value of attribute A, and |S_v|
3. Entropy(S) = -Σ(Pi * log2(P_i))
where Pi is the probability of instances belonging to class i.
4. SplitInfo(A) = -∑ |Dj|/|D| * log2|Dj|/|D|

Where Dj is the number of cases of a particular value of an attribute. D here is the

5. Select the highest value of gain ratio and proceed .

Dataset:

The data contains information on weather – related to temperature, humidity, wind, etc. This is a

The column description is as follows:

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No
2 Sunny 80 90 Strong No

3 Overcast 83 78 Weak Yes

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

6 Rain 65 70 Strong No

7 Overcast 64 65 Strong Yes

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

10 Rain 75 80 Weak Yes

11 Sunny 75 70 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

14 Rain 71 80 Strong No
Calculating Global Entropy

There are 14 rows in our data. 9 of them lead to “Yes” decision and 5 lead to “No” decision.

Entropy = – ∑ p(i) * log2p(i)

= – [p(Yes) * log2p(Yes)] – [p(No) * log2p(No)]

= – (9/14) * log2(9/14) – (5/14) * log2(5/14)

= 0.940

Calculating Gain & Gain Ratios:

GainRatio(A) = Gain(A) / SplitInfo(A)

SplitInfo(A) = -∑ |Dj|/|D| * log2|Dj|/|D|

Dj is number of cases of a particular value of an attribute. D here is the total number of cases of t

I. Gain & Gain Ratio for Outlook Variable:

Outlook variable is nominal. It has 3 values: Sunny, Overcast, Rain.

Gain (Decision, Outlook) = Entropy(Decision) – ∑ [ p(Decision|Outlook) * Entropy(Decisio

The above big formula is nothing but the formula for calculating gain. Let’s call this Equation 1

The first part, i.e, Entropy(Decision) has already been calculated by us as 0.940

The second part is the negative summation of the products of (i) Probability of that Outlook valu
Entropy of that Outlook value.

Let’s calculate this 2nd part, i.e, Entropy


1. entropy for Outlook = Sunny

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No

2 Sunny 80 90 Strong No

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

11 Sunny 75 70 Strong Yes

We have 3 No decisions and 2 Yes decisions.

Entropy(Decision|Outlook=Sunny)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= -(3/5).log2(3/5) – (2/5).log2(2/5)

= 0.441 + 0.528

= 0.970

2. Entropy for Outlook = Overcast

Day Outlook Temp. Humidity Wind Decision


3 Overcast 83 78 Weak Yes

7 Overcast 64 65 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

All decisions are Yes here.

Entropy(Decision|Outlook=Overcast)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= -(0/4)*log2(0/4) – (4/4)*log2(4/4)

[Here log20 should be undefined. But we took it as 0. Because if we consider x*log2x, then if x te

=0

3. Entropy for Outlook = Rain

Day Outlook Temp. Humidity Wind Decision

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

6 Rain 65 70 Strong No
10 Rain 75 80 Weak Yes

14 Rain 71 80 Strong No

We have 3 Yes and 2 No decisions.

Entropy(Decision|Outlook=Rain)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= -(2/5)*log2(2/5) – (3/5)*log2(3/5)

= 0.528 + 0.441

= 0.970

4. Gain for Outlook variable:

We are done with calculating Entropies for Outlook variable.

Putting these in the Equation 1 above:

Gain(Decision, Outlook)

= 0.940 – (5/14)*(0.970) – (4/14)*(0) – (5/14)*(0.970)

= 0.247

5. SplitInfo for Outlook variable:

Sunny: 5 cases

Overcast: 4 cases
Rain: 5 cases

SplitInfo(Decision, Outlook)

= -(5/14)*log2(5/14) -(4/14)*log2(4/14) -(5/14)*log2(5/14)

= 1.577

6. Finally, Gain Ratio for Outlook variable:

GainRatio(Decision, Outlook)

= Gain(Decision, Outlook)/SplitInfo(Decision, Outlook)

= 0.247/1.577

= 0.156

More work needs to be done. This is Gain Ratio for just 1 of the attributes. we have to calculate t
that we can compare them at the end.

II. Gain & Gain Ratio for Wind Variable:

This is also a nominal variable. It has 2 values: Weak & Strong.

Gain (Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) * Entropy(Decision|Win

Let’s call this Equation 2.

1. Entropy for Wind = Weak

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No
3 Overcast 83 78 Weak Yes

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

10 Rain 75 80 Weak Yes

13 Overcast 81 75 Weak Yes

We have 6 Yes and 2 No decisions.

Entropy(Decision|Wind=Weak)

= – p(No) * log2p(No) – p(Yes) * log2p(Yes)

= – (2/8) * log2(2/8) – (6/8) * log2(6/8)

= 0.811

2. Entropy for Wind = Strong

Day Outlook Temp. Humidity Wind Decision


2 Sunny 80 90 Strong No

6 Rain 65 70 Strong No

7 Overcast 64 65 Strong Yes

11 Sunny 75 70 Strong Yes

12 Overcast 72 90 Strong Yes

14 Rain 71 80 Strong No

We have 3 Yes and 3 No decisions.

Entropy(Decision|Wind=Strong)

= – (3/6) * log2(3/6) – (3/6) * log2(3/6)

=1

3. Gain for Wind variable:

Gain(Decision, Wind)

= 0.940 – (8/14)*(0.811) – (6/14)*(1)

= 0.940 – 0.463 – 0.428

= 0.049

4. SplitInfo for Wind variable:


Weak: 8 cases

Strong: 6 cases

SplitInfo(Decision, Wind)

= -(6/14)*log2(6/14) -(8/14)*log2(8/14)

= 0.524 + 0.461

= 0.985

5. Finally, Gain Ratio for Wind variable:

GainRatio(Decision, Wind)

= Gain(Decision, Wind)/SplitInfo(Decision, Wind)

= 0.049 / 0.985

= 0.049

III. Gain & Gain Ratio for Humidity Variable:

This is where things get interesting because Humidity is a continuous variable. How do we dea

Step 1. Arrange the values in ascending order.

Step 2. Convert them to nominal values by performing a binary split on a threshold value.

[Gain for this variable must be maximum at the threshold value.]

Step 3. The gain at this threshold value will be used for comparison of gains and gain ratios of al

1. Let’s arrange it in ascending order of values of Humidity:


Day Humidity Decision

7 65 Yes

6 70 No

9 70 Yes

11 70 Yes

13 75 Yes

3 78 Yes

5 80 Yes

10 80 Yes

14 80 No

1 85 No

2 90 No

12 90 Yes

8 95 No
4 96 Yes

Now, we need to calculate the gains and gain ratios for every value of Humidity. The value whic
Here, we will separate our dataset in 2 parts: (i) values less than or equal to the current value, and

2. Calculating Gains and Gain Ratios for all values:

2.a. For Humidity = 65

We have 1 Yes & 0 No decisions at <= 65 and 8 Yes & 5 No decisions at > 65

Entropy(Decision|Humidity<=65)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(0/1).log2(0/1) – (1/1).log2(1/1)

=0

Entropy(Decision|Humidity>65)

= -(5/13).log2(5/13) – (8/13).log2(8/13)

=0.530 + 0.431

= 0.961

Gain(Decision, Humidity<> 65)

= 0.940 – (1/14).0 – (13/14).(0.961)

= 0.048
SplitInfo(Decision, Humidity<> 65) =

-(1/14).log2(1/14) -(13/14).log2(13/14)

= 0.371

GainRatio(Decision, Humidity<> 65)

= 0.048/0.371

= 0.129

2.b. For Humidity = 70

We have 3 Yes & 1 No decisions at <= 70 and 6 Yes & 4 No decisions at > 70

Entropy(Decision|Humidity<=70)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(1/4).log2(1/4) – (3/4).log2(3/4)

= 0.811

Entropy(Decision|Humidity>70)

= -(4/10).log2(4/10) – (6/10).log2(6/10)

= 0.971

Gain(Decision, Humidity<> 70)

= 0.940 – (4/14).(0.811) – (10/14).(0.971)

= 0.014
SplitInfo(Decision, Humidity<> 70)

= -(4/14).log2(4/14) -(10/14).log2(10/14)

= 0.863

GainRatio(Decision, Humidity<> 70)

= 0.014/0.863

= 0.016

Similarly, calculate the Gains and Gain Ratios for all other values of Humidity.

We found out that the Gain was maximum for Humidity = 80

[Note: Here is something interesting. You can take either Gain or Gain Ratio as the threshold val
Decision Trees. We are taking Gain.]

Gain(Decision, Humidity <> 80) = 0.101

GainRatio(Decision, Humidity <> 80) = 0.107

IV. Gain & Gain Ratio for Temp. Variable:

This is also a continuous variable. We will repeat the steps we did for Humidity variable.

1. Let’s arrange it in ascending order of values of Temp:

Day Temp. Decision

7 64 Yes

6 65 No
5 68 Yes

9 69 Yes

4 70 Yes

14 71 No

8 72 No

12 72 Yes

10 75 Yes

11 75 Yes

2 80 No

13 81 Yes

3 83 Yes

1 85 No
2. Calculating Gains and Gain Ratios for all values:

2.a. For Temp = 64

We have 1 Yes & 0 No decisions at <= 64 and 8 Yes & 5 No decisions at > 64

Entropy(Decision|Temp<=64)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(0/1).log2(0/1) – (1/1).log2(1/1)

=0

Entropy(Decision|Temp>64)

= -(5/13).log2(5/13) – (8/13).log2(8/13)

=0.530 + 0.431

= 0.961

Gain(Decision, Temp <> 64)

= 0.940 – (1/14).0 – (13/14).(0.961)

= 0.048

SplitInfo(Decision, Temp <> 64) =

-(1/14).log2(1/14) -(13/14).log2(13/14)

= 0.371
GainRatio(Decision, Temp <> 64)

= 0.048/0.371

= 0.129

2.b. For Temp = 65

We have 1 Yes & 1 No decisions at <= 65 and 8 Yes & 4 No decisions at > 65

Entropy(Decision|Temp<=65)

= – p(No) . log2p(No) – p(Yes) . log2p(Yes)

= -(1/2).log2(1/2) – (1/2).log2(1/2)

=1

Entropy(Decision|Temp>65)

= -(4/12).log2(4/12) – (8/12).log2(8/12)

= 0.918

Gain(Decision, Temp<> 65)

= 0.940 – (2/14).1 – (12/14).(0.918)

= 0.010

SplitInfo(Decision, Temp<> 65)

= -(2/14).log2(2/14) -(12/14).log2(12/14)

= 0.591
GainRatio(Decision, Temp<> 65)

= 0.010/0.591

= 0.017

Similarly, calculate the Gains and Gain Ratios for all other values of Temp.

We found out that the Gain was maximum for Temp = 83

Gain(Decision, Temp <> 83) = 0.113

GainRatio(Decision, Temp <> 83) = 0.305

Comparison of Gains and Gain Ratios

Attribute Gain Gain Ratio

Wind 0.049 0.049

Outlook 0.247 0.156

Humidity <> 80 0.101 0.107

Temp <> 83 0.113 0.305

If we use Gain, Outlook will be the root node. (Because it has the highest Gain value)

Similarly, if we use Gain Ratio, Temp will be the root node.

We will proceed using the Gain.


Outlook = Sunny

Day Outlook Temp. Humidity Wind Decision

1 Sunny 85 85 Weak No

2 Sunny 80 90 Strong No

8 Sunny 72 95 Weak No

9 Sunny 69 70 Weak Yes

11 Sunny 75 70 Strong Yes

If humidity > 80, decision is ‘No’

If humidity <= 80, decision is ‘Yes’


Outlook = Overcast

Day Outlook Temp. Humidity Wind Decision

3 Overcast 83 78 Weak Yes

7 Overcast 64 65 Strong Yes

12 Overcast 72 90 Strong Yes

13 Overcast 81 75 Weak Yes

All decisions are ‘Yes’

Outlook = Rain
Day Outlook Temp. Humidity Wind Decision

4 Rain 70 96 Weak Yes

5 Rain 68 80 Weak Yes

6 Rain 65 70 Strong No

10 Rain 75 80 Weak Yes

14 Rain 71 80 Strong No

If Wind = Weak, decision is ‘No’

If Wind = Strong, decision is ‘Yes’

So, this is our final Decision Tree using C4.5 algorithm.

Advantages of C4.5 over ID3


C4.5 is an evolution of ID3 by the same author (Quinlan). He made sure that the bottlenecks are
Following are the improvements he made in C4.5

1. It can handle both continuous and discrete variables.

2. It can handle missing values by marking them as ‘?’. They are not used in Gain and Entropy c

3. Prunes the tree and thereby avoids ‘overfitting’.

CART Algorithm
Classification and Regression Trees (CART) is a decision tree algorithm that is
used for both classification and regression tasks. It is a supervised learning
algorithm that learns from labelled data to predict unseen data.
 Tree structure: CART builds a tree-like structure consisting of nodes and
branches. The nodes represent different decision points, and the branches
represent the possible outcomes of those decisions. The leaf nodes in the tree
contain a predicted class label or value for the target variable.
 Splitting criteria: CART uses a greedy approach to split the data at each
node. It evaluates all possible splits and selects the one that best reduces the
impurity of the resulting subsets.
For classification tasks, CART uses Gini impurity or Gini index as the
splitting criterion. The lower the Gini impurity, the more pure the subset is.
The formula for Gini Index is as per the following:

where pi is the probability of a thing having a place with a specific class.


For regression tasks, CART uses residual reduction as the splitting
criterion. The lower the residual reduction, the better the fit of the model to the
data.
 Pruning: pruning is a technique used to remove the nodes that contribute little
to the model accuracy.
To prevent overfitting (Overfitting happens due to several reasons, such as: •
The training data size is too small and does not contain enough data samples to
accurately represent all possible input data values. )of the data,
Cost complexity pruning and information gain pruning are two popular
pruning techniques. Cost complexity pruning involves calculating the cost of
each node and removing nodes that have a negative cost. Information gain
pruning involves calculating the information gain of each node and removing
nodes that have a low information gain.
How does CART algorithm works?
The CART algorithm works via the following process:
 The best-split point of each input is obtained.
 Based on the best-split points of each input in Step 1, the new “best” split
point is identified.
 Split the chosen input according to the “best” split point.
 Continue splitting until a stopping rule is satisfied or no further desirable
splitting is available.

You might also like