III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Sub Code: BCST 604 (B) ROLL NO……………..……………..

VI SEMESTER EXAMINATION, 2022 – 23


IIIrd yr B.Tech. – Computer Science & Engineering/Information Technology
Data Mining

Duration: 3:00 hrs Max Marks: 100


Note: - Attempt all questions. All Questions carry equal marks. In case of any ambiguity or missing data,
the same may be assumed and state the assumption made in the answer.

Q 1. Answer any four parts of the following. 5x4=20


a) Demonstrate the steps involved in data mining when viewed as a process of knowledge
discovery.
b) In real-world data, tuples with missing values for some attributes are a common occurrence.
Classify various methods for handling this problem.
c) Explain Apriori Algorithm with example.
d) Explain that in DBSCAN, the density-connectedness is an equivalence relation.
e) Why is data preprocessing required? Explain.
f) Explain 3-tier architectures of data warehouse.
Q 2. Answer any four parts of the following. 5x4=20
a) “Data mining as KDP (Knowledge Discovery Process)” Justify this statement with
example.
b) What are the various requirements of clustering in data mining?
c) Describe the procedure for Mining Association Rules in Large Databases.
d) Find out the mean, variance and standard deviation for the height of animals
555mm, 450mm, 165mm, 410mm and 300mm.
e) Explain attribute relevance analysis with example.
f) How CURE is different from CHAMELEON.
Q 3. Answer any two parts of the following. 10x2=
a) Suppose your task as a software engineer at Big-University is to design a data mining 20
system to examine their university course database, which contains the following information:
the name, address, and status (e.g., undergraduate or graduate) of each student, the courses
taken, and their cumulative grade point average (GPA).
Model the architecture you would choose. What is the purpose of each component of this
architecture?
b) Why is tree pruning useful in decision tree induction? What is a drawback of using a
separate set of tuples to evaluate pruning?
c) Briefly compare the following concepts. You may use an example to explain your point(s).
(i) Snowflake schema, fact constellation
(ii) Data cleaning, data transformation.
Q 4. Answer any two parts of the following. 10x2=
a) Given the following data (in increasing order) for the attribute age: 13, 15, 16, 16, 19, 20, 20
20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(i) Use smoothing by bin means to smooth the above data, using a bin depth of 3. Illustrate
your steps. Comment on the effect of this technique for the given data.
(ii) How might you determine outliers in the data?

b) Explain web mining and types of web mining with example.


c) What are the essential differences between the ROLAP and MOLAP?
Q 5. Answer any two parts of the following. 10x2=
a) The following table consists of training data from an employee database. The data have 20
been generalized. For example, “31 . . . 35” for age represents the age range of 31 to 35.
For a given row entry, count represents the number of data tuples having the values for
department, status, age, and salary given in that row.

Let status be the class label attribute.


(i) How would you modify the basic decision tree algorithm to take into consideration the
count of each generalized data tuple (i.e., of each row entry)?
(ii) Given a data tuple having the values “systems”, “26. . . 30”, and “46–50K” for the
attributes department, age, and salary, respectively, what would a naive Bayesian
classification of the status for the tuple be?
b) Briefly demonstrate and give examples of each of the following approaches to clustering:
partitioning methods, hierarchical methods
c) Compare the advantages and disadvantages of eager classification versus lazy
classification.

**********

You might also like