CALCULATION
CALCULATION
CALCULATION:
NOTEPAD:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {true, false}
@attribute play {yes, no}
@data
sunny 85, 85, false, no
sunny 80, 90, true, no
overcast 83, 86, false, yes
rainy 70, 96, false, yes
rainy 68, 80, false, yes
CALCULATION:
Excel:
NUMERIC TO NOMINAL
CALCULATION:
No 1:Outlook 2:Temperature 3:Humidity 4:Windy 5:Play
Nominal Nominal Nominal Nominal Nominal
1 Sunny Hot high FALSE no
2 Sunny Hot high TRUE no
3 Rainy Hot high FALSE yes
4 Overcast Mid high FALSE yes
5 Rainy Cool normal FALSE yes
= 68.1/294.9*100
= 23%
CALCULATION:
X y x– x
i y -y
i (x – x)
i
2
(y – y)
i
2
(x – x)* (y – y)
i i
3 30 6 24 36 576 144
8 57 1 -3 1 9 3
9 64 0 -10 0 100 0
13 72 -4 -18 16 324 72
3 30 6 24 36 576 144
6 43 3 11 9 121 33
11 50 -2 4 4 16 8
21 90 -12 -35 144 1225 420
1 20 8 34 64 1156 272
16 83 -7 -29 49 841 203
∑ 9.1 53.9 359 4944 1299
= 1299/359
=3.618
b =y - b *x
0
—
1
—
= 53.9-3.618*9.1
=21
y linear regression:
^
y = b +b x
^
0 1
=21+3.618*10
= 57.18
Co-efficient:
CALCULATION:
CALCULATION:
First check which attribute provides the highest Information Gain in order splitter training set based
on that attribute. We need to calculate the expected information to classify the seand theanthropos
each attribute. The information gained is this mutual information minus the entropy. The mutual
information of the two classes:
For Age we have three values age <=30 (2yes and 3no), age31..40( 4yes and 0no) and age>40
(3yes2 no)
Entropy(age) = 5/14 (-2/5 log(2/5)-3/5log2 (3/5)) + 4/14 (0) + 5/14 (-3/5log2 (3/5)-2/5log2 (2/5))
= 5/14(0.9709) + 0 + 5/14(0.9709)
= 0.6935
For Income we have Three values income high (2yesand2no), income medium( 4yesand2no)
and income Low(3 yes 1 no)
1)
For Student we have two values student yes(6 yes and 1 no) and student no(3 yes 4 no)
= 7/14(0.5916) + 7/14(0.9852)
For Credit Rating we have two values credit rating fair(6yesand2no) and credit_rating
excellent(3yes 3 no)
= 8/14(0.8112) + 6/14(1)
Since Age has the highest Information Gain we start splitting the dataset using the age
attribute
Since all records under the branch age31..40are all of class Yes,we can replace the leaf
with Class=Yes
The same process of splitting has tohappen for the two remaining branches.
For Student we have two values student yes(2 yes and 0 no) and student no(0 yes 3 no)
Wecanthensafelysplitonattributestudentwithoutcheckingtheotherattributessincetheinformation
gain is maximized.
Since the set whole branches are from distinct classes, we make the min to leaf nodes with their
respective class as label:
Again the same process is needed for the other branch of age.
The mutual information is I(SYes,SNo)=I (3,2) = -3/5 log (3/5)–2/5 log (2/5)=0.97
2 2
- For Income we have two values income medium (2 yes and 1 no) and income low (1 yes and 1
no)
For Student we have two values student yes(2 yes and 1 no) and student no(1 yes and
1 no)
= 0.95
For Credit Rating we have two values credit rating fair(3yesand0no) and credit rating
excellent (0yes and 2 no)
We then split based on credit rating. These splits give partition search with records from the same
class. We just need to make these into leaf nodes with their class label attached:
CALCULATION:
This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition,
let the A & B values of the two individuals furthest apart (using the Euclidean distance measure),
define the initial cluster means, giving
The remaining individuals are now examined in sequence and allocated to the cluster to which they
are closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated
each time a new member is added. This leads to the following series of steps:
M2=(1/2(1.0+1.5),1/2(1.0+2.0))=3.9
M =(1/5(3.0+3.5+4.5+3.5),1/5(4.0+7.0+5.0+4.5)) = 5.1
2