0% found this document useful (0 votes)
19 views3 pages

Information Gain With Calculations

The document presents calculations of entropy and information gain for a weather dataset used to predict whether to play tennis. It details the entropy calculations for different attributes such as Outlook, Temp, Humid, and Wind, along with their corresponding information gains. The overall entropy of the dataset is calculated to be 0.971, with the highest information gain observed for the Outlook attribute at 0.322.

Uploaded by

NAUMAN SHAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views3 pages

Information Gain With Calculations

The document presents calculations of entropy and information gain for a weather dataset used to predict whether to play tennis. It details the entropy calculations for different attributes such as Outlook, Temp, Humid, and Wind, along with their corresponding information gains. The overall entropy of the dataset is calculated to be 0.971, with the highest information gain observed for the Outlook attribute at 0.322.

Uploaded by

NAUMAN SHAH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Information Gain Calculations for Weather Dataset

Weather Dataset

Day Outlook Temp Humid Wind PlayTennis

01 Sunny Hot High Weak No

02 Sunny Hot High Strong No

03 Overcast Hot High Weak Yes

04 Rain Mild High Weak Yes

05 Rain Cool Normal Weak Yes

06 Rain Cool Normal Strong No

07 Overcast Cool Normal Strong Yes

08 Sunny Mild High Weak No

09 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

Entropy & Information Gain Formulas

Entropy(S) = -p_pos * log2(p_pos) - p_neg * log2(p_neg)

Where:

p_pos = proportion of positive examples (PlayTennis = Yes)

p_neg = proportion of negative examples (PlayTennis = No)

Information Gain(S, A) = Entropy(S) - SUM (|Sv| / |S|) * Entropy(Sv)

Where Sv is the subset of S for which attribute A has value v.

Calculation Steps

Total examples: 10

Yes: 6, No: 4

Entropy(S) = -(6/10)*log2(6/10) - (4/10)*log2(4/10) = 0.971


Information Gain Calculations for Weather Dataset

Attribute: Outlook

Value: Sunny

Subset size = 4, Yes = 1, No = 3

Entropy = -(1/4)*log2(1/4) - (3/4)*log2(3/4) = 0.8113

Weighted Entropy = 0.3245

Value: Overcast

Subset size = 2, Yes = 2, No = 0

Entropy = -(2/2)*log2(2/2) - (0/2)*log2(0/2) = 0.0

Weighted Entropy = 0.0

Value: Rain

Subset size = 4, Yes = 3, No = 1

Entropy = -(3/4)*log2(3/4) - (1/4)*log2(1/4) = 0.8113

Weighted Entropy = 0.3245

Information Gain(Outlook) = 0.971 - 0.649 = 0.322

Attribute: Temp

Value: Mild

Subset size = 3, Yes = 2, No = 1

Entropy = -(2/3)*log2(2/3) - (1/3)*log2(1/3) = 0.9183

Weighted Entropy = 0.2755

Value: Hot

Subset size = 3, Yes = 1, No = 2

Entropy = -(1/3)*log2(1/3) - (2/3)*log2(2/3) = 0.9183

Weighted Entropy = 0.2755

Value: Cool

Subset size = 4, Yes = 3, No = 1

Entropy = -(3/4)*log2(3/4) - (1/4)*log2(1/4) = 0.8113

Weighted Entropy = 0.3245

Information Gain(Temp) = 0.971 - 0.8755 = 0.0955


Information Gain Calculations for Weather Dataset

Attribute: Humid

Value: Normal

Subset size = 5, Yes = 4, No = 1

Entropy = -(4/5)*log2(4/5) - (1/5)*log2(1/5) = 0.7219

Weighted Entropy = 0.3609

Value: High

Subset size = 5, Yes = 2, No = 3

Entropy = -(2/5)*log2(2/5) - (3/5)*log2(3/5) = 0.971

Weighted Entropy = 0.4855

Information Gain(Humid) = 0.971 - 0.8464 = 0.1246

Attribute: Wind

Value: Strong

Subset size = 3, Yes = 1, No = 2

Entropy = -(1/3)*log2(1/3) - (2/3)*log2(2/3) = 0.9183

Weighted Entropy = 0.2755

Value: Weak

Subset size = 7, Yes = 5, No = 2

Entropy = -(5/7)*log2(5/7) - (2/7)*log2(2/7) = 0.8631

Weighted Entropy = 0.6042

Information Gain(Wind) = 0.971 - 0.8797 = 0.0913

You might also like