0% found this document useful (0 votes)
278 views2 pages

Data Mining Assignment 2

This document provides instructions for a data mining assignment involving analysis of age data, body fat data from hospital patients, computing distances between data objects, and distinguishing between quantile and quantile-quantile plots. Students are asked to calculate summary statistics like mean, median, standard deviation and draw plots for the given data sets. They also need to outline how to compute dissimilarity between objects based on nominal, binary, numeric attributes and term frequencies. Further, students must calculate different distance metrics like Euclidean, Manhattan, Minkowski and supremum distance between sample data tuples. The assignment is to be completed before a specified date.

Uploaded by

tempman tempman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
278 views2 pages

Data Mining Assignment 2

This document provides instructions for a data mining assignment involving analysis of age data, body fat data from hospital patients, computing distances between data objects, and distinguishing between quantile and quantile-quantile plots. Students are asked to calculate summary statistics like mean, median, standard deviation and draw plots for the given data sets. They also need to outline how to compute dissimilarity between objects based on nominal, binary, numeric attributes and term frequencies. Further, students must calculate different distance metrics like Euclidean, Manhattan, Minkowski and supremum distance between sample data tuples. The assignment is to be completed before a specified date.

Uploaded by

tempman tempman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Mining and Data Warehouse

Assignment 2
NOTE: To be completed before 19/09/2021

1. Suppose that the data for analysis includes the attribute age. The age values for
the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25,
25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data?What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal,
trimodal, etc.).
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the
data?
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.
(g) How is a quantile–quantile plot different from a quantile plot?

2. Suppose that a hospital tested the age and body fat data for 18 randomly selected
adults with the following results:

(a) Calculate the mean, median, and standard deviation of age and %fat.
(b) Draw the box plots for age and %fat.
(c) Draw a scatter plot and a q-q plot based on these two variables.
3. Briefly outline how to compute the dissimilarity between objects described by
the following:
(a) Nominal attributes (b) Asymmetric binary attributes (c) Numeric attributes
(d) Term-frequency vectors
4. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
(a) Compute the Euclidean distance between the two objects.
(b) Compute the Manhattan distance between the two objects.
(c) Compute the Minkowski distance between the two objects, using q = 3.
(d) Compute the supremum distance between the two objects.

You might also like