Data Mining
Data Mining
DATAMINING
(Elective II)(CT 7 25 02)
I
I
QUESTION COLLECTION
I
.,,-",.
l. What are the fundamental differences between Data Mining and Data Warehousing?
Describe the steps of KDD for data mining. [3+71
2. 'What do yoir mean by dimensional data? What are base & apex cuboid? Slicing & Dicing?
3. How do you measure the accuracy of classifiers? How do you select best root attribute in
decision tree? Explain' [4+61
4. What are prior and posterior probabilities? Explain the algorithmic steps of Bayesian
classifier and write its stengths. [3+7]
S. For the transactions given below, c,onsider confidencr60g6 and minimum'suppo*=30%.
Identi$ large itemsets (L-Itemset) at L=3 with possible associations using A-priori
algorithm and generate F-List using FP-Growth algorithm. t12l
6. How DBSCAN algorithm works? How do we avoid the issues of DBSCAN? [8+2]
7, Explain web mining taxonomy. t8l
8. Write short notes on (&y-IUIS) [3+3+3I
a. Data smoothing techniques
b. Clustering and its application in anomaly detection
c. AprioriAll: Sequential pattem mining algorithm
o 26D ]'RIBHT}VAN UNI\IERSI'IY Ex:rm.
INSTITUTE OF ENGINEERING I,CVel BE Full Marks 80
Ilxamination Control Division Programme BCT,I]EX Pass N{arl<s
2075 Aslnvin Yc:rr / Ilart IV/I Tirne 3 hrs.
^lr.!r-jf
rl " P^ata
lrulinin g (Eteuirtc I) (cT72s02)
Candidates are required to give their ausrvers in their own
Attempt l!!questions.
The figures in the ntargin indicatc Full jl[arks.
Assume suitable data if neces$ary.-
6. Whalt are the advantages of FP grorvth rnethod? Explain FP grorvth algorithm. 12+61
7. E>:plain I(-n:eatrs clustering u,itii Iirnitation. Generate two clusters from following dataset
using K-rneans clustering. 14+6)
A B
I )
2.5 4.5
4 6
3.5 .ti
4 5.5
J 6
o
6. What are outliers? Explain an algorithm that can be used to generate density based
clusters. t8l
9" Whi, s1r*moly detection is important? Explain distance based method for anamoly
detection. t2+61
tFi.*
'09
Predicted Class
Class 1 Class 2
Class i 25 9
Actual Class
Class 2 4 31
Calculate:
a) Accuracy b) Sensitivity
c) Specificity d) Precision
5. Identify the candidate, frequent item sets and association rules for the following
transaction data using Apriori algorithm.
t8l
TID ITEMS
1 Ml, M2, M5
2 ]|d2,M4
a
J M2, M3
4 Ml, M2, M4
5 MI, M3
6 M2, M3
7 Ml, M3
8 M1, M2, M3, M5
9 M1, M2, M3
Take minimum support :2\Yo,minimum confidence 80oZ
6. Explain FP-Growth algorithm with example.
l8l
7. Write K -means al ithm and find clusters for following data set. [2+8]
Instance X Y
1 1.0 2.0
2 2.5 1.0
3 3.5 1.5
4 4.0 1.0
5 3.5 2.5
6 5.0 3.0
(Take K:2)
8. What is web mining? Explain different categories of web mining. t6]
9. List the various types of partition based clustering methods. Explain Hierarchical
clustering method with anexample. [10]
10. Write short notes on: (Any two) [2x4]
a) OLAP Operations
b) Density reachable and Density Connected
c) Data Mining for Anomy Detection
***
?68 I'IIII];iLJV.\N I-]NI\']IRSITY
ll! STITt j]'l:l 0F ENGIN LrllRNG
Iixamination Control Division
2874 Chaitra
Candidates are required to give their answers in their own words as far as practicable.
Attempt All questions.
The figures in the ntargin indicate Full Murks.
Assume suitable data if nec.essary.
1. \\rhat is data u,arehouse and data mart? Describe Snowflake scheme with example. 12+41
2. What are the approaches to handle missing data? Describe OLAP and operations on
OLAP with suitable example. Differentiate between OLAP and OLTP. L2+s+31
3. Draw clear block diagram depicting different stages in classification. Explain the inverse
relation between precision and recall. Given the confusion matrix, determine accuracy,
sensitivity and precision of the classifier model. [2+3+s]
Predicted
Positive Negative
Actual
Positve 142 40
Negative 98 720
4. Explain decision tree with the concept of Naive base classification with appropriate
example. [10]
5. Why association analysis is required in data mining? Explain apriori principle with
example. l2-r6l
How does FP growth approach overcomes the disadvantages of Apriori algorithm. For the
transaction data given in table generate FP-Tree. [2+8]
7. Describe the difference between Hierarchical and partitioning clustering. How K-means
clustering is applied? Verifu using example. 12+81
8. What do you mean by anomaly detection and why is it important? Describe distance
based approaches for anomaly detection. [4+3]
9. Write short notes on: (any thlee) [3 x3]
i) Issues in clustering
ii) Multimedia mining
iii) Time series data mining
iv) Web mining
378 'IRIT}HUVAN LINIVERSITY
Exam.
-NewBack Q066 & LaterBatch)
INSTITUTE OF ENCINEERiNG Level BE Full Marks 80
Examination Control Division Programme BE, BCT Pass Marks 32
2073 Shrawan Year / Part IV/I Time 3 hrs.
$ y "tj_g g (,,
_; D*aQ |-ra!n! og_ (Et e c t i v e I I) (C r 7 2 5 0 2)
'/ Candidates are required to give their answers in their own words as far as practicable.
'/ Attempt All qucstions.
( Thefigures in the margin indicate Full Marks.
./ Assume suitable data if necessary.
1. "The world is data rich but information is poor". Justify with your own words tg]
2. What are the measuring elements of data Quality? Explain different data transformation
by normalization methods with an example p+6j
3. What is a decision tree and how information gain is used for attribute selection? Explain
with example.
t8I
4. Explain Roc. Using the following dat4 calculate TpR, FpR, precision for given
confusion matrix.
[1+3+6]
A B
A 20 5
B i0 40
Classiff,A=Yes,B=No
5. What is FP Tree? How FP--gtowth algorithm eliminate the problem of Apriori algorithm?
Construct the FP tree and find association rules foi' the following transaction database
using FG- Growth algorithm. support = 3}yoand confide nce = 75o/i.
tt0l
Transaction ID Items
1 P.R.S
2 R,S,T
J P,Q,IT
4 P,R,S,T
5 P,S,T
6 P,Q,T
7 Q,S,T
8 Q,R,T
6. What are Categorical data? What are the possible issues,arriveg when
using Categorical
€:r!:P*}p!lpiog-{Fk:!-v"s"il{"-c:l-?!"0-2)
--!"'fi to give their answers in their own words as far as practiaable.
{ Candidates are required
r', ttempt AUquestions.
/ Thefigures in the margiw indicate fuil Mar$s.
{ Assume suitable data if necessary.
1. How is data warehouse different from RDBMS? Also list the similarities. l2+2i
2. What is data prs-processing? Explain data sampling and dimensionality reduction in data
pre-processing with su"itebl e exarnple. [2-F4+-.4]
3. F{r:rr,, data in must real apptrication becomes Asymmetric. Explain the difference bet'*-een
synrmetric and asymrnetric data. tsl
4. What is ID3 al thm? Caler.rlate'IPE. FFR arrrj Accuracy tbr given confusion tnatrix. 12+67
Fredicted r- Predicted -
Predicted + 100 40
Predicted - 60 300
5. Explain Apriori algorithm in market basket analysis? IJerive assoeiation rule frorn the
t"ollowing market basket transactions with 50% of minimun:. suppoc and confidence
)uutl v trl [3+?.1
Transaction Iterrsets
1 A.B.C
2 A,C
J A, r,
4 B.E.F
5. What is the use of FP-Grou{h method in market basket analysis? Expiain FP-Growt}r
nr.ethod with a suitable example. [1CI]
tN.e*r.
zic TRIBI{WA}.IUNIVERSITY Exam. Buct* (2d66 i&' l.at$f8titeh].,.-,
INSTITUTE OF ENGN{EERiNG Level BE Full Marks 80
Examination Control flivision Programme BEX, BCT Psss Marks 32
J:tj:"!::gsYNtYUgl::lY:Ll-(W3!9
Candidates are required to gi're their answers in their owu words as far as practicable.
Attempt All questions.
rno is"Gin it,
margin indicate Futl Marks.
Assume suitable data dnecessary.
./ Candidates are required tn give their answers in their own words as far as praeticable.
,/ Attempt All questions.
{ The.figures in the margin indit:ate F"utt Mqr$s,
,/ Assume suitable data if necesssry.
1. What is data mining? Explain ali the steps of knowiedge discovery. [2+6]
) How do you perform analysis of multidimensional data? Explain rn'ith the concept of
OLAP. t10l
J. Predict Ciass label using naive Bayesian classifier for X
: (age : youth,
ineorne: medium, student: Yos, credit-rating: fair) using the fullowing data set. t10l
calcu'late a. accuracy
b. sens'itiv'itv
c . speci t'ici ty
d. preci sion
e. recal I
5. What is the importance ,:f SUPPOII.T and COI;iDEl'iCt--l durins assaciatir:n anatrysis?
Expla.in FP-Grcwth method rvith exarnple' l10l
6. What are the fypes r:f chisteling methods? Explain DBSCAi{ rnettrod cf clustering r,r'ilh
an example. [10]
What is the use ol Apriori Algcrithm in market basket ar:ralysis? llxplain with suitabie
exampie. t10l
8. Writs sh*rt notes on: [4x3]
i) f irne series Data mining
ii) Issues itr anomalyiF'raud eietection
iii) Categorical data and related issues
**f
27C TRIBHUVAN UNIVERSITY Exam. New Back (o066 & Later Batch)
INSTITUTE OF ENGINEERING Level BE Full Marks 80
Examination Control Division Pass Marks 32
207l Shawan Year lPart IV/I Time 3 hrs.
r' Candidates are required to give their answers in their own words as far as practicable.
,/ Attempt <4,11 questions.
,/ 44questions corry equal marks.
/ Assume suitable data if necessary.
C:atclidates are required to give titeir answers in their or.rn rvorcis ils lsl iis plar;ticable.
iltt empl All quesr ions.
,4!! iS ue s i i rtrrs c(t r ry e qzttt I m ir ks.
A,rsume suitu.blc tlara if necessol',1'.
2. Ilxplain the pronerties tirat a Distance Metric ne'eds to rtlp{)i.;rt rtith respect lir
Nlirrkorvsk i's dil;tturce.
.1. \Vhal is a decision tree? Explairr Girri index rvittt suitable exampie.
tl . I:xpiain a Bayes classifier. In rvhat cases can Naive Ba;-es and Fla,rc,sinn llelic['Nefivork
be used'J
p.c'ne ration?
8. What is ar1 /\nomaly Detcu"iilrr') I:rplain l'eu'distancg !35r:d approltl"tc:s that can ire uliei-l
ft rr Anotni:ly,Deti:ction.
02C.i:: - i;,:TRIBHUVAN'UNtVe RSt't y -" lixam.
INSTITLJ'IE OF ENC] INL, L]I( N C [,eveI BIr Fyll N'larks BO
!*/
\
' t^
' Iixarninatlon Contt'ol l)ivision []rugra nrrne ll[;\, I-]i- i' I'ass I{a rl<-s
ii
2070 Cliailra i"rirfoil , r(r i t 1-irne ,lhrs. i
(c7-725)
Candid.ates are requirecl to give tlrcir answers in tlreir owu rvords as fur as practicable.
Attempt Ail ques'tions
Theftgttres in the margitz indicuta I;ull lVhrks
Assunte suitable datct if nccessur.l,
) 5. Why is pattern evaluation important in association rule mining? Explain with example the
statistical based measures used lor nreasuring interestingness of association rules. t8l
6. What. is a density based cluster. Explain zur algorithm that can be used to generate density
based clusters. t8I
7. What is Hierarchical Clustering? Differentiate between agglomerative and divisive ,:4
i;;*
approach of hierarchical clustering, Augment your answer r,vith appropriate illustrative
examples. l10l
,,:ij.,l
0
o. Write short notcs on: tl sl .,:,t
'/ Candidates are required to give their answers in their own words as far as practicable,
{ Atrcrytt All questiotrs.
'/ T'he figures in the nzargitt indicate F'ull Mafis.
{ lssume suitable dara if'neces.rory.
(-L hw.-
.u,{ Wnat are the different data types? Explain with examples.
l5l
I{ow is dsg.ignlE classifier diff'erent tha, url.e-ba$Frl crassifier?
rsT)
Explain Baye's Theoreyr. Horv can it be used for classification? Explain how Naive 1
Baye's simplilier the cornputational complexity of Baye's classification algorithm. 121 \
\/K What is frequent item set mining? Ilow clo Apriori and FP-grou.th algorithm optimize the _-)
brute force approach for finding frequent item sets?
[1s]
,-*4"plain K-means elirstering algorithm with examples.
j ti0lifi
the issues regarding cluster validation.
;|--Wptain t6t_)
7. What is Base Rate frallac-v? Explain r.vith example.
l7)
t$"/ How can-Apriori Alg.rithm be used lbr finding associalion rules crut of a frequent item
l- JufnLlr
-r+9
J!t. ,.,r,,iqn(e
t7l
9. Write short notes on: [s+s]
gat'Page RanJ<
,prDatama*