0% found this document useful (0 votes)
6 views14 pages

Dsbda - Solved Numeric Pyq

The document discusses various machine learning concepts, including Naive Bayes classification for spam detection, K-means clustering, and evaluation metrics such as accuracy, precision, recall, and error rate in the context of heart attack and diabetes risk prediction. It provides examples of calculations for probabilities, distances, and confusion matrices. Additionally, it includes support and confidence calculations for itemsets in a transactional dataset.

Uploaded by

practicalcodes04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views14 pages

Dsbda - Solved Numeric Pyq

The document discusses various machine learning concepts, including Naive Bayes classification for spam detection, K-means clustering, and evaluation metrics such as accuracy, precision, recall, and error rate in the context of heart attack and diabetes risk prediction. It provides examples of calculations for probabilities, distances, and confusion matrices. Additionally, it includes support and confidence calculations for itemsets in a transactional dataset.

Uploaded by

practicalcodes04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

VEDHA

Page No. :
Date :

Ut:4 -3M
Suppose ou g'ven a datase+ containing inormation
a bout whetther emails qre spam or not spam, along coith
oo fetures:
i. the pre Sence cf the coord "offerlipresent, 0:a bset)
i the presence af the cord Free"(:present, 0:absent)
You are Aasked with chssi fying new email with he
follooing features Ualues. "O£fe) ree' l.
Giyen the roini hg dataset :
ErmailOCfet PfreeSpan
No
2
yes
3 yes
4 No
yes
Calculate probability that he neo email is spam toing
Naive Bayes.
i. Gven:
Total emails:S
Spam -Yes : 3
Spam No : 2

ii. Prisr Probabilities:

P(Not SPam) 2<


Study material provided by: Vishwajeet Londhe

Join Community by clicking below links

Telegram Channel

https://t.me/SPPU_TE_BE_COMP
(for all engineering Resources)

WhatsApp Channel
(for all Engg & tech updates)

https://whatsapp.com/channel/
0029ValjFriICVfpcV9HFc3b

Insta Page
(for all Engg & tech updates)

https://www.instagram.com/
sppu_engineering_update
VEDHÂ
Page No.:
Date:

tii. Calculate tike lihoods/onditioralpro babilitiesa


"Spam yes" (3 mailts):
-"OFfer'= l ccurs n 2 mails (3 %

-"free":l OcCUrs ìn 3 mails (2. 3, &)

P(Aree l spamsyes) $/3 1


b. for Spam No (2 mails):

0£ fer = oCcu in loud 6f enails


-

free s l Outof 2 mai s

3. Plorser-t/spam Mo)
P (iree sl spam o) /2
UApply Niue Bayes:
a. Pspam yes/ot Ker -t, Brees)oc P(SAum). P(osterst spam:ya
p(Eree-t/spam yes)
:3.2.12
S 3 S
VEDH¤
Pago No.
Date

b. PSpam No/offers\, Ereest)« PNot sporm)


p(offersL/sPomMo)-paireesllspam: No)

V. Normalize:

S 2

Noa). dvide each by consBant o get tina probability:

PSpam) /s
S

p(Not spo) o o 0 . 2

Ans:
he probabiliy thot the new email th
CÊ fer =1& Areesl, is am is Q-9So)
VEDHÂ
Page No. :
Date:

-9M
Syppos e you have the follouing dodaset containing the
LCoordinates of points in aa 2 dimensionad space.
PointX(oordinadey- Coordinate
2 3 perorm
Kmeans clu tering.
B 7 Assume initial cen-trnid
C s) &(3. 6).
D 6
6
7

i. Tnitial centroids:

IBera tion ): (Assigh to nearest cluster)


Distance to e
Pointoordinc es -Cr2 3) C (3. 6)Clus ter
A 2,3 0.00 6.70 CL
47 4.47 4.12 Ca
C 3.S 2.23 Cj
D 6.9 7-2) 9.60 Cq
S, 6 6-70 C2
7,8 7-07 2-23 Cz

find dis tance using euclidean distante:


ez pointA (2.3) &G Ce.3)
distan ce-(2-2)+3-3)? - O.o0
point A(2,) &GaC3.6)
de(62+ (6-3) =6.70
VEDHÂ
Page No.
Date :

RecompuBe Centroids:
Clastr (G) : Points A23), c(3.5)
New Certroid /23, 345 (2s4)
2 2

claster 2a ): Points B(4,2), DG.3). Es.s). F)


Ne Centroid 9+6+8t7, 7t3t 6+8

(6-25, 7-)
Teration2: Distance to
Point Coordinates C(2-5,4)lalG.es.c)TVewç aster
23 1:1| 6-18
B 4, 7 3:3S 2-30 2
3, S 11|
D 6. 9 610 2

E 8. 6 230 2

7, 602 2
Centroids remain the same as in îBeration 1/
- (on verence reached.
:.Pinal Clustes & Cenrojds.
Cluster 1 Centroid CG): (2.s, )
cluster 2 Centoid C): C6-es, 7.5)
coith asifments as
Cluster : A,D. C
cluster 2: B, D, E, E
VEDHÂ
Page No. :
Date

- SM
Ut:4 Confidence value for al!
Calculade the Support &
Support

the possible iHe sets.

Tronsac tion TD Tiemà Bought drink


Onion, Potato, Cold
Colddrin k
2 Önign, Onlon,
Burjer. Colddin k
4 Potato, MiIK, EgpsS
S P o t a t o , Burger, colddin k, MiIk, Lags

Supportcount
i.ist al items & count Hitemset

item Count@ Support =C/Tota transactioks.


Onion 3 3/s - 607
Potato 3/s0-6 607
Colddrink 4
2 /s.04 yoe
Burger / -06 60%
MiYK
VEDHÂ
Page No. :
Dato:

2-itemset Support Count

itemset Count Support


ionion, Potato
Onion, (olddrin k 60
i Onion. Burger o20
iOnion, Egg9 bb20t
iPotato, Colddrink{ 2 40%hhl
1PoBato, Burger & 20%
PoBato. Ers 2
{ Potato, MiIK? 2

2
Colddrink, Burgera
i Colddrin k, Egs 2
icolddrink,MIk 20thh
11
iBurger. Ejgs 20%
iBurger, MiK
2
VEDH¤
Page No. :
Date :

ii. Confidence Calculations:


formula:
Confidence (A»8) SuppsrtA0B)/Support(4)
Rule Suppart (AVB) Suppat) Conidence
Onion Colddrink 3 100%

Colddrink’Onion 4 7S2
Potato ’ (olddrin k
3 6667.
Colddrin K Potato
2 3 66-67.
Potato ’ Eggs
E9s Potato 3 66 67l
Potato’ Mìlk 2 66-6Z/
MiK Potato 2 1004
Colddrin k Burçer 2
Col ddrink 2 2 b b i o (o0%.
Burger
2
Colddrink Eggs 2 3 66-67t
Colddrink
MiIK E3g3 2 2 100
Mik
3 66671
Onion'5 Potato 3 33.34%
VEDHÂ
Page No. :
Date

Ut:s eeoh 9M
Suppose that the given data the taste is to
cluster points (oith a) representing locadiun) into
three cluster. qohere the points are
re :
AL10) A2(2, S),A3(8.4),
BI(S), B2(7.5). B3(6.)
CL2), C2(49)
The distance function s Euchidean distance, Sppose
iniHaly sipn Al. B\, and Cl as he cener of
each cluster, Tespectuely. ise the Kme ans algotit hin to
shoo only the Og clstes cen tes ofter the
irst round of execuion wtth steps.
Tnìial Clusters : Centroids)
i. cluster (C): AL(2, 10)
ih. Cluster 2(Ca): BL(S, 8)
iii. Cluster g (Ca) : C1(,2)
Heratiorn l: Distance to
Point CoordinatesC(2,10)G (s, 8) Cs (, 2)luser
Al 2, 0 3-6/ 8.06
A2 2 S 3.16
A3
4.24 Cs.
8, 4 7:22
B) 3.61 7:2)
B2 7,S 7.07 3-6/ G08
B2 12
6.4 7.21
1 2 SO6 7.2) C3
C2 49 2.24 7.07
VEDH¤
Page No.
Date :

R Compude Cendoi ds.

Cluster lL C 4 A, (2, 10)


New Centoid2(0)

clus ter2:a A3&.), BI(s 9), £2(7,s), B3G)


C2(.3)
:New Cenvoid s/8+s +7t6t4 4t StSttt

-6.6)
Clustey 9: G Azl2,s); CH(.2)
'Neco (eintroid/2+1 S42 (:5,3.)
2

(s4 tOund ofexe cuti on,rhe


3 clugter cen tes are:
CI(2,10)
Ce (66)
C3(1-5,3.)
VEDHA
Page No. :
Date:

Given he (onfusion matrix, calculate


Error rate accuracy. Re call.
with descriptionl on heart adtack risk.
predicted classes
He art AtacK THeart AtácK
classesRisk yes RisK No
Actua Heart 80 92.0hosf
Atack
classes Risk-yes
He art
AttaK 9so0i
RisK -N0

Confusion Matrix
AcBuaredited yes (Risk) GNo (Ris K) Tota
Yes CRiSK)8o 220 (FN) 300
Ao (Risk) SO (EP)*9s00.(TN) 96so

Total 230 3720 99SO


b

-TP :$0 orrectly predicted "ges" for heart atBack rist)


-

FN:220 (Tncorrectly predicBed "M when acual wes yes)


FP:SO Cn correcty preditted yes" cohen _actualwas No.
TN: 9SO0 (Correcty predkcted "No for NO ris k).
VEDH¤
Page No.
Date:

O the model.2
i Accuracy: Measares Correctness

Accuracy PA TN 0+9SO0 - gs30


Total 9950 99S0

9628 96-287
ett
The model s 96.28% Correct acrU Ss hoth classe

i. Precisìon
Measurs hou many predicted ye" cases
0ere actually trR core ct.

Precision: TP 0
TH Fp 30+IS0 30
0.3478'=34:
only
only 39.78 of predicted"yes as es trüly shadi
heart attack rìsk.

hi. Recal: hous many actua ges caucs (orrectly predicted.

Q.2 667= 26- 67


002A
The model coptures 0nly 26-67% of actual heart attack Cases.

iV. Error Rate Meas res ouerall incorrect predictons.


Error Rate =: |- ACUracy - 0:9623
O: 6372 3.724

372: ofpre oictio ns are orong


VEDHA
Page No. :
Date:

Ut:s
-3M
Given the confusion maix. Calculate Accuray, Pre cisiun,
Re call. Etror Rate oith desciption on Diabetc Risk.
Predicied classes
|Diabe+c
Classes Diabetc Ris

Actual Diabetic 210


classes RiSk Ye)
Diobeic
RisK(No)

Confusion Matrix
Actua redicted Yes (Pisk) No (Ris k) Total

Yes (Risk) 9o (TP) 210 (FN) 30


No (Risk) J40 (FP) 3S60(Tv) 9700

Total 230 9770 (0.00 o

90+9560 0.96S =- 96
Tota 10,00 0

Jj. Precisi an : TP 90 20.3913 = 3913 %


TP+ FP 90+140

Ü. Recal :TP 90 Q.3 30


TP+ FN g0+ 21O

trror Rate : 1- Ac uracy E J- 0. 965 = 0.03s 3.S%.

You might also like