0% found this document useful (0 votes)
62 views6 pages

Decision Tree and Cart

The document discusses decision trees and the CART algorithm. It provides examples of datasets with different attributes and targets, and calculates the information gain and Gini index for each attribute in predicting the target. Specifically, it analyzes credit risk data using home owner status, income, and credit score as attributes to predict default. It also examines tennis play decisions using weather attributes and a dataset on company performance using age, type, and profit as attributes to predict competition. Finally, it analyzes customer data using gender, marital status to predict profitability. The formulas for information gain and Gini index are also presented.

Uploaded by

Monisha Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views6 pages

Decision Tree and Cart

The document discusses decision trees and the CART algorithm. It provides examples of datasets with different attributes and targets, and calculates the information gain and Gini index for each attribute in predicting the target. Specifically, it analyzes credit risk data using home owner status, income, and credit score as attributes to predict default. It also examines tennis play decisions using weather attributes and a dataset on company performance using age, type, and profit as attributes to predict competition. Finally, it analyzes customer data using gender, marital status to predict profitability. The formulas for information gain and Gini index are also presented.

Uploaded by

Monisha Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Decision Tree and CART

1.

Home Owner Income Bracket Credit Score Default


Home Owner Low Below Average Yes
Home Owner Medium Average No
Not Home Owner High Average No
Not Home Owner Low High Yes
Home Owner Medium Average No
Home Owner High Below Average Yes
Not Home Owner Low Below Average Yes
Not Home Owner Medium Average Yes
Home Owner High High No
Home Owner Low High Yes
Not Home Owner Medium Average No
Not Home Owner High High No
Home Owner Low Below Average Yes
Home Owner Medium Average No
Not Home Owner High Average Yes
Not Home Owner Low Average Yes
Not Home Owner Low Average Yes
Not Home Owner Medium Average No

The formula for information gain (Decision Tree) is:


|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝐷𝑒𝑓𝑎𝑢𝑙𝑡, 𝐶𝑟𝑒𝑑𝑖𝑡 𝑆𝑐𝑜𝑟𝑒) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.991 − 0.789

= 0.202

|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝐷𝑒𝑓𝑎𝑢𝑙𝑡, 𝐼𝑛𝑐𝑜𝑚𝑒 𝐵𝑟𝑎𝑐𝑘𝑒𝑡) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.991 − 0.504
= 0.487
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝐷𝑒𝑓𝑎𝑢𝑙𝑡, 𝐻𝑜𝑚𝑒 𝑂𝑤𝑛𝑒𝑟) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.991 − 0.984
= 0.007

The formula for Gini Index (CART) is:


𝑛

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥(𝐷) = 1 − ∑ 𝑝𝑗2


𝑗=1

𝐺𝑖𝑛𝑖(𝐷𝑒𝑓𝑎𝑢𝑙𝑡, 𝐶𝑟𝑒𝑑𝑖𝑡 𝑆𝑐𝑜𝑟𝑒) = 0.3378


𝐺𝑖𝑛𝑖(𝐷𝑒𝑓𝑎𝑢𝑙𝑡, 𝐼𝑛𝑐𝑜𝑚𝑒 𝐵𝑟𝑎𝑐𝑘𝑒𝑡) = 0.2259
𝐺𝑖𝑛𝑖(𝐷𝑒𝑓𝑎𝑢𝑙𝑡, 𝐻𝑜𝑚𝑒 𝑂𝑤𝑛𝑒𝑟) = 0.4889

2.
Day Outlook Temperature Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

The formula for information gain (Decision Tree) is:


|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝑊𝑖𝑛𝑑) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.940 − 0.8921
= 0.048
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.940 − 0.7884
= 0.152
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.940 − 0.9110
= 0.029
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.940 − 0.6935
= 0.247
The formula for Gini Index (CART) is:
𝑛

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥(𝐷) = 1 − ∑ 𝑝𝑗2


𝑗=1

𝐺𝑖𝑛𝑖(𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝑊𝑖𝑛𝑑) = 0.4286


𝐺𝑖𝑛𝑖(𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) = 0.3673
𝐺𝑖𝑛𝑖(𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒) = 0.4405
𝐺𝑖𝑛𝑖(𝑃𝑙𝑎𝑦 𝑇𝑒𝑛𝑛𝑖𝑠, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘) = 0.3429

3.
Age Type Profit Competition
Old Software Down Yes
Old Software Down No
Old Hardware Down No
Mid Software Down Yes
Mid Hardware Down Yes
Mid Hardware Up No
Mid Software Up No
New Software Up Yes
New Hardware Up No
New Software Up No

The formula for information gain (Decision Tree) is:


|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛, 𝑃𝑟𝑜𝑓𝑖𝑡) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.971 − 0.8464

= 0.125
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛, 𝑇𝑦𝑝𝑒) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.971 − 0.8806
= 0.091
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛, 𝐴𝑔𝑒) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.971 − 0.8208
= 0.150

The formula for Gini Index (CART) is:


𝑛

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥(𝐷) = 1 − ∑ 𝑝𝑗2


𝑗=1

𝐺𝑖𝑛𝑖(𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛, 𝑃𝑟𝑜𝑓𝑖𝑡) = 0.4000


𝐺𝑖𝑛𝑖(𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛, 𝑇𝑦𝑝𝑒) = 0.4500
𝐺𝑖𝑛𝑖(𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛, 𝐴𝑔𝑒) = 0.4667
4.
Marital
Customer Gender Status Profitability
1 Male Married Yes
2 Male Single No
3 Male Married Yes
4 Male Single No
5 Male Married Yes
6 Female Married Yes
7 Female Single No
8 Female Single Yes
9 Female Married No
10 Female Married No
The formula for information gain (Decision Tree) is:
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑃𝑟𝑜𝑓𝑖𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦, 𝑀𝑎𝑟𝑖𝑡𝑎𝑙 𝑆𝑡𝑎𝑡𝑢𝑠 ) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 1.000 − 0.8754

= 0.125

|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑃𝑟𝑜𝑓𝑖𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦, 𝐺𝑒𝑛𝑑𝑒𝑟) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 1.000 − 0.9709
= 0.029

The formula for Gini Index (CART) is:


𝑛

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥(𝐷) = 1 − ∑ 𝑝𝑗2


𝑗=1

𝐺𝑖𝑛𝑖(𝑃𝑟𝑜𝑓𝑖𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦, 𝑀𝑎𝑟𝑖𝑡𝑎𝑙 𝑆𝑡𝑎𝑡𝑢𝑠 ) = 0.4167


𝐺𝑖𝑛𝑖(𝑃𝑟𝑜𝑓𝑖𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦, 𝐺𝑒𝑛𝑑𝑒𝑟) = 0.4880

5.
Author Thread Length Reads
Known New Long No
Unknown New Short Yes
Unknown Old Long No
Known Old Long No
Known New Short Yes
Known Old Long No
Unknown Old Short No
Unknown Old Short Yes
Unknown New Long No
Known New Short No

The formula for information gain (Decision Tree) is:


|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑅𝑒𝑎𝑑, 𝐿𝑒𝑛𝑔𝑡ℎ) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.881 − 0.4854
= 0.396
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑅𝑒𝑎𝑑, 𝑇ℎ𝑟𝑒𝑎𝑑) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.881 − 0.8464
= 0.035
|𝑆(𝑉) |
𝐺𝑎𝑖𝑛 (𝑅𝑒𝑎𝑑, 𝐴𝑢𝑡ℎ𝑜𝑟) = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) − ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)
|𝑆|
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)

= 0.881 − 0.8464
= 0.035
The formula for Gini Index (CART) is:
𝑛

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥(𝐷) = 1 − ∑ 𝑝𝑗2


𝑗=1

𝐺𝑖𝑛𝑖(𝑅𝑒𝑎𝑑, 𝐿𝑒𝑛𝑔𝑡ℎ) = 0.2400


𝐺𝑖𝑛𝑖(𝑅𝑒𝑎𝑑, 𝑇ℎ𝑟𝑒𝑎𝑑) = 0.4000
𝐺𝑖𝑛𝑖(𝑅𝑒𝑎𝑑, 𝐴𝑢𝑡ℎ𝑜𝑟) = 0.4000

You might also like