0% found this document useful (0 votes)

22 views6 pages

Assignment2

Uploaded by

AVIJEET Swain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

Assignment2

Uploaded by

AVIJEET Swain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Problem Solving

Assignment 2
Data Mining (CSE4052)

1. Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order):
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45,
46, 52, 70.

(a) What is the mean of the data? What is the median?

(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal.)
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
What is the interquartile range?

2. Find Q1, Q2, and Q3 for the following data set, and draw a box-and-whisker plot.

{2,6,7,8,8,11,12,13,14,15,22,23}

3. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):

(a) Compute the Euclidean distance between the two objects.

(b) Compute the Manhattan distance between the two objects.
(c) Compute the Minkowski distance between the two objects, using h = 3.

4. Calculate median for the following distribution

Class Frequency

40 – 44 1
45 – 49 5
50 – 54 9
55 – 59 12
60 – 64 7
65 – 69 2
5. If the median of a distribution given below is 28.5, then find the value of x and y.

Class 0-10 10-20 20-30 30-40 40-50 50-60 Total

Frequency 5 x 20 15 y 5 60

6. For a moderately skewed distribution, the mean and median are respectively 26.8 and 27.9.
What is the mode of the distribution?

7.Suppose we have a table with five products, each assigned one of three priorities: Urgent
(assigned the ordinal value of 3), High Priority (assigned the ordinal value of 2), and Low
Priority (assigned the ordinal value of 1). This table also includes their values in numeric forms.
The table is as follows:

Object Identifier Test 1 Test II (ordinal) Test

(Nominal) III(Numerical)

1 Product A Low Priority 45

2 Product B Urgent 93

3 Product B High Priority 65

4 Product C High Priority 74

5 Product A Low Priority 23

Find out the dissimilarity matrix for the mixed attribute data.

8. Compute a dissimilarity matrix for the data given below considering gender and weight as
symmetric binary attribute and hair color as nominal attribute.
Observatio Gender Weight Hair Color
n#
1 Male Light Blond

2 Male Heavy Blond

3 Male Light Brown

4 Female Heavy Black

5 Female Heavy Blond

6 Male Light Brown

7 Female Light Black

8 Female Heavy Brown

9 Male Light Brown

10 Female Heavy Black

9. Compute Jaccard similarity matrix for the data given below:

Patient Test Test Test

1 2 3
1 Posi Negat Nega
tive ive tive
2 Neg Negat Positi
ativ ive ve
3 Neg Positi Positi
e
ativ ve ve
4 Posi Positi Positi
e
tive ve ve
5 Neg Negat Positi
ativ ive ve
10. Consider the following two binary vectors representing the presence (1) or absence (0) of
e
attributes in two different items:

Item A: [1,0,1,1,0]

Item B: [1,1,1,0,1]

Calculate the Jaccard similarity between Item A and Item B .

11.Consider the following two points in a 3-dimensional space with weights assigned to each
dimension:

Point A: (2,3,5)(2, 3, 5)(2,3,5)

Point B: (1,4,2)(1, 4, 2)(1,4,2)

Weights: w1=1,w2=2,w3=3

Calculate the weighted Euclidean distance between Point A and Point B.

12. Two vectors in a high-dimensional space:

● Vector X: [3,5,0,2,1]
● Vector Y: [1,0,4,2,3]

Calculate the cosine similarity between Vector X and Vector Y, and interpret the result.

13. Suppose a group of 12 sales price records has been sorted as follows:
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215.
Partition them into three bins by each of the following methods.
(a) Equal-frequency (equidepth) partitioning
(b) equal-width partitioning
(c) clustering
14. Use smoothing by bin means, median, and boundaries to smooth the following data, using a
bin depth of 6.
Data: 11,13,13,15,15,16,19,20,20,20,21,21,22,23,24,30,40,45,45,45,71,72,73,75
15. Use the methods below to normalize the following group of data:

200, 300, 400, 600, 1000

a) min-max normalization by setting min = 0 and max = 1

b) z-score normalization
c) z-score normalization using the mean absolute deviation instead of standard deviation
d) normalization by decimal scaling
16. Compute the Pearson Correlation of the following data:

Weight (kg) Length (cm)

3.63 53.1

3.02 49.7

3.82 48.4

3.42 54.2

3.59 54.9

2.87 43.7

3.03 47.2

3.46 45.2

3.36 54.4

3.3 50.4
17. Perform the chi-square test for correlation for the following observation of survey where
256 people shared the month of their birth where the expected distribution of moths are evenly
distributed.

January 29

February 24
March 22

April 19
May 21
June 18
July 19

August 20
September 23
October 18
November 20
December 23

Cm53Xh Operating Manual Contents
86% (14)
Cm53Xh Operating Manual Contents
119 pages
Assignment#2 RT WQ2021
No ratings yet
Assignment#2 RT WQ2021
2 pages
Sheet With Answers
No ratings yet
Sheet With Answers
87 pages
Owner'S Manual: Model AXT-240 240 Watt 4-Channel Amplifier
No ratings yet
Owner'S Manual: Model AXT-240 240 Watt 4-Channel Amplifier
12 pages
Product Instructions - Flat Bending Jig Instructions
100% (1)
Product Instructions - Flat Bending Jig Instructions
34 pages
DM&DW Individual Assignment (50%)
No ratings yet
DM&DW Individual Assignment (50%)
4 pages
21CS63 - Unit1 Practice Questions
No ratings yet
21CS63 - Unit1 Practice Questions
3 pages
Assignment DMBI 2
No ratings yet
Assignment DMBI 2
2 pages
No 2
No ratings yet
No 2
2 pages
Lec 5
No ratings yet
Lec 5
24 pages
Similarity
No ratings yet
Similarity
19 pages
9-2 Data analysis and pre-processing part 2.pdf
No ratings yet
9-2 Data analysis and pre-processing part 2.pdf
27 pages
HW1
0% (1)
HW1
2 pages
PS2 Sol
No ratings yet
PS2 Sol
7 pages
Quiz2 Source
No ratings yet
Quiz2 Source
8 pages
Assignment 1
No ratings yet
Assignment 1
9 pages
DM - Topic Four - Part III (Autosaved)
No ratings yet
DM - Topic Four - Part III (Autosaved)
67 pages
DMDW-Solution For Unit 1-5
50% (2)
DMDW-Solution For Unit 1-5
20 pages
Predictive Numericals 20 Questions
No ratings yet
Predictive Numericals 20 Questions
4 pages
2CSOE03-O_IR_December_2023 (2)
No ratings yet
2CSOE03-O_IR_December_2023 (2)
4 pages
Chapter - 2 Data Mining
No ratings yet
Chapter - 2 Data Mining
21 pages
29.measuring Data Similarity and Dissimilarity Introduction
No ratings yet
29.measuring Data Similarity and Dissimilarity Introduction
43 pages
Data Similarity
0% (1)
Data Similarity
18 pages
2 Similarity Disimilarity Measure
No ratings yet
2 Similarity Disimilarity Measure
35 pages
Data Preprocessing II
No ratings yet
Data Preprocessing II
21 pages
02 Data
No ratings yet
02 Data
35 pages
Lec2 Activities
No ratings yet
Lec2 Activities
2 pages
Data Mining Assignment 2
No ratings yet
Data Mining Assignment 2
2 pages
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
No ratings yet
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
86 pages
Show Your Work in Detail: 1. Given The Following Data
No ratings yet
Show Your Work in Detail: 1. Given The Following Data
6 pages
Getting To Know Your Data: 2.1 Exercises
100% (1)
Getting To Know Your Data: 2.1 Exercises
8 pages
Cluster Analysis Introduction (Unit-6)
No ratings yet
Cluster Analysis Introduction (Unit-6)
20 pages
Data Preprocessing for Clustering
No ratings yet
Data Preprocessing for Clustering
40 pages
4
No ratings yet
4
26 pages
III-IT-Data Mining Unit 1-Session 3
No ratings yet
III-IT-Data Mining Unit 1-Session 3
21 pages
Chapter 2: Getting To Know Your Data
No ratings yet
Chapter 2: Getting To Know Your Data
30 pages
X Chapter 02 Data
No ratings yet
X Chapter 02 Data
67 pages
Lecture 2. Similarity Measures For Cluster Analysis
No ratings yet
Lecture 2. Similarity Measures For Cluster Analysis
31 pages
Important Questions Related To Module-1 & Module-2
No ratings yet
Important Questions Related To Module-1 & Module-2
2 pages
DM_Practice_Problem_Set-2
No ratings yet
DM_Practice_Problem_Set-2
7 pages
Business Statistics Practice Questions
No ratings yet
Business Statistics Practice Questions
8 pages
Cluster Analysis Introduction
No ratings yet
Cluster Analysis Introduction
23 pages
Assignment 2 Slot8 TTS3208 Summer
No ratings yet
Assignment 2 Slot8 TTS3208 Summer
11 pages
Data Mining Mid Term
No ratings yet
Data Mining Mid Term
9 pages
class 1c -DataFundamentals
No ratings yet
class 1c -DataFundamentals
27 pages
To Students Data Mining Part-2 Sept 13_240913_160930
No ratings yet
To Students Data Mining Part-2 Sept 13_240913_160930
5 pages
It-3031 (DMDW) - Cs End April 2024
No ratings yet
It-3031 (DMDW) - Cs End April 2024
22 pages
Data Science: Department of Computer Science & Engineering
No ratings yet
Data Science: Department of Computer Science & Engineering
31 pages
Question Bank
No ratings yet
Question Bank
3 pages
2nd Quarterly Exam in Practical Research 2 - 12
No ratings yet
2nd Quarterly Exam in Practical Research 2 - 12
3 pages
02data Part4
No ratings yet
02data Part4
28 pages
Lecture5
No ratings yet
Lecture5
27 pages
1 Assignment
No ratings yet
1 Assignment
2 pages
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
No ratings yet
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
15 pages
DWM UNIT-VI (2)
No ratings yet
DWM UNIT-VI (2)
30 pages
DOC-20230925-WA0000.
No ratings yet
DOC-20230925-WA0000.
5 pages
IS328 Data Mining-Tutorial 1 Solution
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
Module No 2 - Part 2 - Compressed - Compressed
No ratings yet
Module No 2 - Part 2 - Compressed - Compressed
46 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Clustering Lecture 1: Basics: Jing Gao
No ratings yet
Clustering Lecture 1: Basics: Jing Gao
62 pages
]lkk;
No ratings yet
]lkk;
3 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
The Five Pillars of Effective Writing
No ratings yet
The Five Pillars of Effective Writing
4 pages
IOT Based Smart Door Lock System Using Arduino: Dr. Alok Kole Professor Department of Electrical Engineering
No ratings yet
IOT Based Smart Door Lock System Using Arduino: Dr. Alok Kole Professor Department of Electrical Engineering
71 pages
Cisco AnyConnect VPN Statistics
No ratings yet
Cisco AnyConnect VPN Statistics
3 pages
Project Management First Exam Review
100% (1)
Project Management First Exam Review
19 pages
Week3 - Introduction To CentOS
No ratings yet
Week3 - Introduction To CentOS
50 pages
RCC COMPOUND WALL-Model
No ratings yet
RCC COMPOUND WALL-Model
1 page
2023 Delfin Albano BASA Report
100% (1)
2023 Delfin Albano BASA Report
7 pages
Subtex Fabric 2023
0% (1)
Subtex Fabric 2023
40 pages
48 TMSS 01 R0
No ratings yet
48 TMSS 01 R0
0 pages
Einhell RT-TS 920 - 4340732 - 01029
No ratings yet
Einhell RT-TS 920 - 4340732 - 01029
1 page
Dongmi Catalog 03.04.20
No ratings yet
Dongmi Catalog 03.04.20
29 pages
Quick Guide To SMTP Configuration
No ratings yet
Quick Guide To SMTP Configuration
5 pages
WBS Week 2
No ratings yet
WBS Week 2
2 pages
Final Requirement STAAD (CE583) : Eastern Visayas State University Tacloban City
100% (1)
Final Requirement STAAD (CE583) : Eastern Visayas State University Tacloban City
117 pages
Luvyu BB Drix
No ratings yet
Luvyu BB Drix
10 pages
UPDATED - Consent by Father or Mother and Legal Guardian of APAAR ID - Docx - 20241013 - 081158 - 0000
No ratings yet
UPDATED - Consent by Father or Mother and Legal Guardian of APAAR ID - Docx - 20241013 - 081158 - 0000
2 pages
Pocket Spiral Antenna
No ratings yet
Pocket Spiral Antenna
1 page
4L-PB351G-L60D - 4L-PB531G-L60D
No ratings yet
4L-PB351G-L60D - 4L-PB531G-L60D
4 pages
Ex MOScap
No ratings yet
Ex MOScap
9 pages
Technical Data Hydraulic Lift Crane
No ratings yet
Technical Data Hydraulic Lift Crane
12 pages
B.SC - I July 2021
No ratings yet
B.SC - I July 2021
3 pages
A Proposal On Machine Learning Via Dynamical Systems
No ratings yet
A Proposal On Machine Learning Via Dynamical Systems
11 pages
CE LAWS Chapter I-IV
No ratings yet
CE LAWS Chapter I-IV
37 pages
PPC Optimization - SPL-BBL-Rapid Launch1
No ratings yet
PPC Optimization - SPL-BBL-Rapid Launch1
22 pages
IOM-9
No ratings yet
IOM-9
27 pages
Sample Medical Equipment Business Plan
No ratings yet
Sample Medical Equipment Business Plan
3 pages
Water Bath HB-010WB
No ratings yet
Water Bath HB-010WB
4 pages