0% found this document useful (0 votes)
20 views

Assignment2

Uploaded by

AVIJEET Swain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Assignment2

Uploaded by

AVIJEET Swain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Problem Solving

Assignment 2
Data Mining (CSE4052)

1. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45,
46, 52, 70.

(a) What is the mean of the data? What is the median?


(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal.)
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
What is the interquartile range?

2. Find Q1, Q2, and Q3 for the following data set, and draw a box-and-whisker plot.

{2,6,7,8,8,11,12,13,14,15,22,23}

3. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):

(a) Compute the Euclidean distance between the two objects.


(b) Compute the Manhattan distance between the two objects.
(c) Compute the Minkowski distance between the two objects, using h = 3.

4. Calculate median for the following distribution

Class Frequency

40 – 44 1
45 – 49 5
50 – 54 9
55 – 59 12
60 – 64 7
65 – 69 2
5. If the median of a distribution given below is 28.5, then find the value of x and y.

Class 0-10 10-20 20-30 30-40 40-50 50-60 Total

Frequency 5 x 20 15 y 5 60

6. For a moderately skewed distribution, the mean and median are respectively 26.8 and 27.9.
What is the mode of the distribution?

7.Suppose we have a table with five products, each assigned one of three priorities: Urgent
(assigned the ordinal value of 3), High Priority (assigned the ordinal value of 2), and Low
Priority (assigned the ordinal value of 1). This table also includes their values in numeric forms.
The table is as follows:

Object Identifier Test 1 Test II (ordinal) Test


(Nominal) III(Numerical)

1 Product A Low Priority 45

2 Product B Urgent 93

3 Product B High Priority 65

4 Product C High Priority 74

5 Product A Low Priority 23

Find out the dissimilarity matrix for the mixed attribute data.

8. Compute a dissimilarity matrix for the data given below considering gender and weight as
symmetric binary attribute and hair color as nominal attribute.
Observatio Gender Weight Hair Color
n#
1 Male Light Blond

2 Male Heavy Blond

3 Male Light Brown

4 Female Heavy Black

5 Female Heavy Blond

6 Male Light Brown

7 Female Light Black

8 Female Heavy Brown

9 Male Light Brown

10 Female Heavy Black

9. Compute Jaccard similarity matrix for the data given below:

Patient Test Test Test


1 2 3
1 Posi Negat Nega
tive ive tive
2 Neg Negat Positi
ativ ive ve
3 Neg Positi Positi
e
ativ ve ve
4 Posi Positi Positi
e
tive ve ve
5 Neg Negat Positi
ativ ive ve
10. Consider the following two binary vectors representing the presence (1) or absence (0) of
e
attributes in two different items:

Item A: [1,0,1,1,0]

Item B: [1,1,1,0,1]

Calculate the Jaccard similarity between Item A and Item B .


11.Consider the following two points in a 3-dimensional space with weights assigned to each
dimension:

Point A: (2,3,5)(2, 3, 5)(2,3,5)

Point B: (1,4,2)(1, 4, 2)(1,4,2)

Weights: w1=1,w2=2,w3=3

Calculate the weighted Euclidean distance between Point A and Point B.

12. Two vectors in a high-dimensional space:

● Vector X: [3,5,0,2,1]
● Vector Y: [1,0,4,2,3]

Calculate the cosine similarity between Vector X and Vector Y, and interpret the result.

13. Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215.
Partition them into three bins by each of the following methods.
(a) Equal-frequency (equidepth) partitioning
(b) equal-width partitioning
(c) clustering
14. Use smoothing by bin means, median, and boundaries to smooth the following data, using a
bin depth of 6.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75
15. Use the methods below to normalize the following group of data:

200, 300, 400, 600, 1000

a) min-max normalization by setting min = 0 and max = 1


b) z-score normalization
c) z-score normalization using the mean absolute deviation instead of standard deviation
d) normalization by decimal scaling
16. Compute the Pearson Correlation of the following data:

Weight (kg) Length (cm)

3.63 53.1

3.02 49.7

3.82 48.4

3.42 54.2

3.59 54.9

2.87 43.7

3.03 47.2

3.46 45.2

3.36 54.4

3.3 50.4
17. Perform the chi-square test for correlation for the following observation of survey where
256 people shared the month of their birth where the expected distribution of moths are evenly
distributed.

January 29

February 24
March 22

April 19
May 21
June 18
July 19

August 20
September 23
October 18
November 20
December 23

You might also like