Assignment2
Assignment2
Assignment 2
Data Mining (CSE4052)
1. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45,
46, 52, 70.
2. Find Q1, Q2, and Q3 for the following data set, and draw a box-and-whisker plot.
{2,6,7,8,8,11,12,13,14,15,22,23}
3. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
Class Frequency
40 – 44 1
45 – 49 5
50 – 54 9
55 – 59 12
60 – 64 7
65 – 69 2
5. If the median of a distribution given below is 28.5, then find the value of x and y.
Frequency 5 x 20 15 y 5 60
6. For a moderately skewed distribution, the mean and median are respectively 26.8 and 27.9.
What is the mode of the distribution?
7.Suppose we have a table with five products, each assigned one of three priorities: Urgent
(assigned the ordinal value of 3), High Priority (assigned the ordinal value of 2), and Low
Priority (assigned the ordinal value of 1). This table also includes their values in numeric forms.
The table is as follows:
2 Product B Urgent 93
Find out the dissimilarity matrix for the mixed attribute data.
8. Compute a dissimilarity matrix for the data given below considering gender and weight as
symmetric binary attribute and hair color as nominal attribute.
Observatio Gender Weight Hair Color
n#
1 Male Light Blond
Item A: [1,0,1,1,0]
Item B: [1,1,1,0,1]
Weights: w1=1,w2=2,w3=3
● Vector X: [3,5,0,2,1]
● Vector Y: [1,0,4,2,3]
Calculate the cosine similarity between Vector X and Vector Y, and interpret the result.
13. Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215.
Partition them into three bins by each of the following methods.
(a) Equal-frequency (equidepth) partitioning
(b) equal-width partitioning
(c) clustering
14. Use smoothing by bin means, median, and boundaries to smooth the following data, using a
bin depth of 6.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75
15. Use the methods below to normalize the following group of data:
3.63 53.1
3.02 49.7
3.82 48.4
3.42 54.2
3.59 54.9
2.87 43.7
3.03 47.2
3.46 45.2
3.36 54.4
3.3 50.4
17. Perform the chi-square test for correlation for the following observation of survey where
256 people shared the month of their birth where the expected distribution of moths are evenly
distributed.
January 29
February 24
March 22
April 19
May 21
June 18
July 19
August 20
September 23
October 18
November 20
December 23