0% found this document useful (0 votes)
27 views15 pages

CALCULATION

The document contains information about student mark lists from a notepad and calculations in Excel and relation format. It includes student registration numbers, names, and marks in three subjects. It also contains information about converting data between different formats like numeric to nominal.

Uploaded by

E2-08 Bharath.M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views15 pages

CALCULATION

The document contains information about student mark lists from a notepad and calculations in Excel and relation format. It includes student registration numbers, names, and marks in three subjects. It also contains information about converting data between different formats like numeric to nominal.

Uploaded by

E2-08 Bharath.M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

NOTEPAD:

@relation student mark list @ attribute regno numeric


@relation name {aa, bb, cc, dd, ee}
@relation mark1 numeric
@relation mark2 numeric
@relation mark3 numeric
@data1, aa, 35, 45, 65
@data2, bb, 46, 85, 52
@data3, cc, 65, 49, 63
@data4, dd, 75, 45, 41
@data5, ee, 41, 51, 74

CALCULATION:

Excel: Student Mark List

Reg NO Name Mark1 Mark2 Mark3


1 aa 35 45 65
2 bb 46 85 52
3 cc 65 49 63
4 dd 75 45 41
5 ee 41 51 74

Relation: Student Mark List

No 1:Reg No 2: Name 3: Mark1 4: Mark2 5: Mark3


Numeric Numeric Numeric Numeric Numeric

1 1.0 Aa 35.0 45.0 65.0


2 2.2 Bb 46.0 85.0 52.0
3 3.0 Cc 65.0 49.0 63.0
4 4.0 Dd 75.0 45.0 41.0
5 5.0 Ee 41.0 51.0 74.0

NOTEPAD:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {true, false}
@attribute play {yes, no}
@data
sunny 85, 85, false, no
sunny 80, 90, true, no
overcast 83, 86, false, yes
rainy 70, 96, false, yes
rainy 68, 80, false, yes

CALCULATION:
Excel:

Outlook Temperature Humidity Windy Play


real Real
Sunny 85 85 FALSE NO
Sunny 80 90 TRUE NO
Rainy 70 96 FALSE YES
overcast 83 86 FALSE YES
Rainy 68 80 FALSE YES
CALCULATION:

NUMERIC TO NOMINAL

No 1:Outlook 2:Temperature 3:Humidity 4:Windy 5:Play


Nominal Numeric Numeric Nominal Nominal
1 Sunny 85.0 85.0 FALSE NO
2 Sunny 80.0 90.0 TRUE NO
3 Rainy 70.0 96.0 FALSE YES
4 Overcast 83.0 86.0 FALSE YES
5 Rainy 68.0 80.0 FALSE YES

No 1:Outlook 2:Temperature 3:Humidity 4:Windy 5:Play


Nominal Nominal Nominal Nominal Nominal
1 Sunny “ALL” “ALL” FALSE NO
2 Sunny “ALL” “ALL” TRUE NO
3 Rainy “ALL” “ALL” FALSE YES
4 Overcast “ALL” “ALL” FALSE YES
5 Rainy “ALL” “ALL” FALSE YES

CALCULATION:
No 1:Outlook 2:Temperature 3:Humidity 4:Windy 5:Play
Nominal Nominal Nominal Nominal Nominal
1 Sunny Hot high FALSE no
2 Sunny Hot high TRUE no
3 Rainy Hot high FALSE yes
4 Overcast Mid high FALSE yes
5 Rainy Cool normal FALSE yes

N 1:Outlook 2:Outl 3:Outl 4:Tempera 5:Temera 6:Tempera 7:Humi 8:Wi 9:Pla


o =sunny ook ook ture ture ture dity ndy y
Numeric =overc =rainy =hot =mid =cool =norma =fals Nomi
ast Numer Numeric Numeric Numeric l e nal
Numer ic Numeri Num
ic c eric
1 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 no
2 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 no
3 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 yes
4 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 yes
5 0.0 0.0 1.0 0.0 0.0 0.1 1.0 1.0 yes

Applying Bias Theorem:

Calculation Old data Predicted data Error (Error)2


3,3 56 77.16 21.16 441

3,5 89 69.8 19.2 368.6

5,4 98 76.3 21.7 470.89

7,6 98 71.8 26.2 38.44

∑ 341 294.9 68.1 1318.97

Accuracy =error/predicted data∗100

= 68.1/294.9*100

= 23%

CALCULATION:

X y x– x
i y -y
i (x – x)
i
2
(y – y)
i
2
(x – x)* (y – y)
i i

3 30 6 24 36 576 144
8 57 1 -3 1 9 3
9 64 0 -10 0 100 0
13 72 -4 -18 16 324 72
3 30 6 24 36 576 144
6 43 3 11 9 121 33
11 50 -2 4 4 16 8
21 90 -12 -35 144 1225 420
1 20 8 34 64 1156 272
16 83 -7 -29 49 841 203
∑ 9.1 53.9 359 4944 1299

b = ∑[(x - x )(y - y )]/∑[(x - x ) ]


1 i

i

i
— 2

= 1299/359

=3.618

b =y - b *x
0

1

= 53.9-3.618*9.1

=21

y linear regression:
^

y = b +b x
^
0 1

=21+3.618*10

= 57.18

Co-efficient:

R = {(1/N)*∑(x - x )*(y - y )/(σ *σ ) }


2
i

i

x y
2

= (1/10)*1299/ (6.315*23.605). = 0.8714 ≈ 0.9

CALCULATION:

RID Class Distance to New


1 No (1+0+0+1)/4=0.5
2 No (1+0+0+0)/4=0.25
3 Yes (0+0+0+1)/4=0.25
4 Yes (0+2+0+1)/4=0.75
5 Yes (0+0+1+1)/4=0.5
6 No (0+0+1+0)/4=0.25
7 Yes (0+0+1+0)/4=0.25
8 No (1+2+0+1)/4=1
9 Yes (1+0+1+1)/4=0.75
10 Yes (0+2+1+1)/4=1
11 Yes (1+2+1+0)/4=1
12 Yes (0+2+0+0)/4=0.5
13 Yes (0+0+1+1)/4=0.5
14 No (0+2+0+0)/4=0.5

CALCULATION:
First check which attribute provides the highest Information Gain in order splitter training set based
on that attribute. We need to calculate the expected information to classify the seand theanthropos
each attribute. The information gained is this mutual information minus the entropy. The mutual
information of the two classes:

I(SYes,SNo) =I (9,5) = -9/14 log2(9/14)– 5/14 log2(5/14) =0.94

For Age we have three values age <=30 (2yes and 3no), age31..40( 4yes and 0no) and age>40
(3yes2 no)

Entropy(age) = 5/14 (-2/5 log(2/5)-3/5log2 (3/5)) + 4/14 (0) + 5/14 (-3/5log2 (3/5)-2/5log2 (2/5))

= 5/14(0.9709) + 0 + 5/14(0.9709)
= 0.6935

Gain(age) = 0.94 – 0.6935 = 0.2465

For Income we have Three values income high (2yesand2no), income medium( 4yesand2no)
and income Low(3 yes 1 no)

Entropy(income) = 4/14(-2/4log2 (2/4)-2/4log(2/4)) + 6/14 (-4/6log2 (4/6)-2/6log2 (2/6))

+ 4/14 (-3/4log2 (3/4)-1/4log2 (1/4))

= 4/14 (1) + 6/14(0.918) + 4/14 (0.81

1)

= 0.285714 + 0.393428 + 0.231714 = 0.9108

Gain(income) = 0.94 – 0.9108 = 0.0292

For Student we have two values student yes(6 yes and 1 no) and student no(3 yes 4 no)

Entropy(student) = 7/14(-6/7log2 (6/7)) + 7/14(-3/7log2 (3/7)-4/7log2 (4/7)

= 7/14(0.5916) + 7/14(0.9852)

= 0.2958 + 0.4926 = 0.7884

Gain (student) = 0.94 – 0.7884 = 0.1516

For Credit Rating we have two values credit rating fair(6yesand2no) and credit_rating
excellent(3yes 3 no)

Entropy(credit rating) = 8/14(-6/8log2 (6/8)-2/8log2 (2/8)) + 6/14(-3/6log2 (3/6)-3/6log2 (3/6))

= 8/14(0.8112) + 6/14(1)

= 0.4635 + 0.4285 = 0.8920

Gain(credit rating) = 0.94 – 0.8920 = 0.479

Since Age has the highest Information Gain we start splitting the dataset using the age
attribute
Since all records under the branch age31..40are all of class Yes,we can replace the leaf
with Class=Yes

The same process of splitting has tohappen for the two remaining branches.

For branch age<=30westillhaveattributesincome,studentandcredit_rating.Whichoneshouldbeuse


to split the partition?

The mutual information is I(SYes,SNo)=I(2,3)= -2/5 log2(2/5)–3/5 log2(3/5)=0.97.

For Income we have three values income high (0yesand2no), income


medium(1yesand1no)and income low(1 yes and 0 no)

Entropy(income) = 2/5(0) + 2/5 (-1/2log (1/2)-1/2log (1/2)) + 1/5 (0)


2 2

= 2/5 (1) = 0.4

Gain(income) = 0.97 – 0.4 = 0.57

For Student we have two values student yes(2 yes and 0 no) and student no(0 yes 3 no)

Entropy(student) = 2/5(0) + 3/5(0) = 0


Gain (student) = 0.97 – 0 = 0.97

Wecanthensafelysplitonattributestudentwithoutcheckingtheotherattributessincetheinformation
gain is maximized.

Since the set whole branches are from distinct classes, we make the min to leaf nodes with their
respective class as label:
Again the same process is needed for the other branch of age.

The mutual information is I(SYes,SNo)=I (3,2) = -3/5 log (3/5)–2/5 log (2/5)=0.97
2 2

- For Income we have two values income medium (2 yes and 1 no) and income low (1 yes and 1
no)

Entropy(income) = 3/5(-2/3log (2/3)-1/3log(1/3)) + 2/5 (-1/2log (1/2)-1/2log (1/2))


2 2 2

= 3/5(0.9182) +2/5 (1) = 0.55+0. 4= 0.95

Gain(income) = 0.97 – 0.95 = 0.02

For Student we have two values student yes(2 yes and 1 no) and student no(1 yes and
1 no)

Entropy(student) = 3/5(-2/3log (2/3)-1/3log(1/3)) +2/5(-1/2log (1/2)-1/2log (1/2))


2 2 2

= 0.95

Gain (student) = 0.97 – 0.95 = 0.02

For Credit Rating we have two values credit rating fair(3yesand0no) and credit rating
excellent (0yes and 2 no)

Entropy (credit rating) = 0

Gain (credit rating) = 0.97 – 0 = 0.97

We then split based on credit rating. These splits give partition search with records from the same
class. We just need to make these into leaf nodes with their class label attached:
CALCULATION:
This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition,
let the A & B values of the two individuals furthest apart (using the Euclidean distance measure),
define the initial cluster means, giving

The remaining individuals are now examined in sequence and allocated to the cluster to which they
are closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated
each time a new member is added. This leads to the following series of steps:
M2=(1/2(1.0+1.5),1/2(1.0+2.0))=3.9
M =(1/5(3.0+3.5+4.5+3.5),1/5(4.0+7.0+5.0+4.5)) = 5.1
2

You might also like