0% found this document useful (0 votes)
32 views2 pages

Assignment 2

Uploaded by

Amit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views2 pages

Assignment 2

Uploaded by

Amit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

GOVERNMENT ENGINEERING COLLEGE DAHOD

COMPUTER ENGINEERING
SUBJECT: DATA MINING (3160714)
SEMESTER – VI

Assignment 2(CO-1)

1) Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
a. What is the mean of the data?What is the median?
b. What is the mode of the data? Comment on the data’s modality (i.e., bimodal,
trimodal, etc.).
c. What is the midrange of the data?
d. Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
e. Give the five-number summary of the data.
f. Show a boxplot of the data.
g. How is a quantile–quantile plot different from a quantile plot?

2) Explain the following data normalization techniques:


a. min-max normalization and
b. decimal scaling
3) Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
a. Compute the Euclidean distance between the two objects.
b. Compute the Manhattan distance between the two objects.
c. Compute the Minkowski distance between the two objects, using q D 3.
d. Compute the supremum distance between the two objects

4) In real-world data, tuples with missing values for some attributes are a common occurrence.
Describe various methods for handling this problem.
5) Discuss issues to consider during data integration.
6) Use these methods to normalize the following group of data: 200, 300, 400, 600,1000
a. min-max normalization by setting min = 0 and max = 1
b. z-score normalization
c. z-score normalization using the mean absolute deviation instead of standard deviation
d. normalization by decimal scaling
7) Explain the need for data smoothing during pre-processing and discuss data smoothing by
Binning.
8) What is meant by “clustering”? Explain why clustering is called unsupervised learning. Mention
any two applications of clustering.
9) Explain why we need to perform data pre-processing, with proper example.
10) List out and explain major pre-processing tasks of data mining.

You might also like