Assign 1

This document outlines an assignment on data mining consisting of 5 parts: 1. Analyze and compare student exam results from 2020 and 2021 using statistical analysis and plots. 2. Download a dry bean dataset and report on attribute types, compute summaries for continuous attributes, means, standard deviations, and generate plots. 3. Download and explore the Weka data mining tool using the Iris dataset, reporting basic statistics and scatter plot matrix. 4. Compute dissimilarity matrices using Euclidean and Manhattan distances for 4 points in 3D space and plot the relationship between the measures. 5. Compute a dissimilarity matrix for sample data with different attribute types, and suggest the most similar friend to "Ali"

Uploaded by

Suleman Butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views1 page

Assign 1

Uploaded by

Suleman Butt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Data Mining Assignment 1: Data Understanding

Submission: Submit the assignment hardcopy in the second Data Mining class of the week (23 or 24 Nov. 2023).

1. (20 points)
Apply your basic data mining knowledge to compare students’ performance in the midterm exam results of a
course for two years, i.e., 2020 and 2021 (result_20_21.xls). You should provide your comments and comparison
by using the statistical description of the data (e.g., mean, median, mode, variance, 5-number summary, etc.)
and plots (boxplot, histogram, etc.). (2 to 3 pages report required)

2. (20 points)
Download the DryBean dataset from UCI Machine Learning Repository. Read the datasets’ descriptions and report
the following (use any language or tool of your choice to solve this problem):

a. The types of the attributes (continuous [interval, ratio], categorical [nominal, ordinal]). Also identify which
attribute(s) are input attribute(s) and which are class attribute(s) (if any).
b. Compute the five-number summary for any two continuous attributes. Compute the mode for categorical
attributes.
c. Compute the mean and standard deviation for the two continuous attributes.
d. Generate the quantile (percentile) plots for two attributes in each dataset.
e. Generate the histogram or distribution plot for each of the two attributes selected in (b).
f. Generate the scatter plots for the two attributes selected in (d).
3. (10 points)

Download and install Weka, a data mining tool, on your systems. Explore the tool and the datasets provided
with the installation. Submit a report containing basic statistics and plots (e.g., scatter plot matrix) for the Iris
dataset using Weka tool. (2 to 3 pages report required)

The following links can be useful.

https://sourceforge.net/projects/weka/

https://www.cs.waikato.ac.nz/ml/weka/

https://waikato.github.io/weka-wiki/downloading_weka/

4. (30 points) Handwritten solution is required.

a. Given these four points in a 3-D space, compute and show the dissimilarity matrix. Use
Euclidian distance as the dissimilarity measure. A(4,5,5), B(5,3,3), C(1,1,0), D(4,4,1)
b. Repeat part (a) using Manhattan distance as dissimilarity measure.
c. Draw a scatter plot for the distances obtained in parts (a) and (b) to identify the relationship
between the two dissimilarity measures.
5. (20 points) Handwritten solution is required.
Name Fever Cough Height Weight Profession City
Ali N Y 65 80 Student Lahore
Bilal Y Y 55 65 Student Karachi
Khan N N 70 75 Teacher Lahore
Ahmed Y N 60 55 Doctor Islamabad
Given the data above, compute the dissimilarity matrix. Fever and Cough are asymmetric binary, Height and
weight are numeric, Profession and City are nominal attributes. Who should be suggested as a friend to Ali
based on your computed dissimilarity matrix?

Assignment#2 RT WQ2021
No ratings yet
Assignment#2 RT WQ2021
2 pages
Hw2 Solution
No ratings yet
Hw2 Solution
5 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
DM&DW Individual Assignment (50%)
No ratings yet
DM&DW Individual Assignment (50%)
4 pages
Assignment I
No ratings yet
Assignment I
4 pages
21CS63 - Unit1 Practice Questions
No ratings yet
21CS63 - Unit1 Practice Questions
3 pages
Data Mining Assignment 2
No ratings yet
Data Mining Assignment 2
2 pages
Assignment 2 Slot8 TTS3208 Summer
No ratings yet
Assignment 2 Slot8 TTS3208 Summer
11 pages
QB FDS
No ratings yet
QB FDS
5 pages
To Students Data Mining Part-2 Sept 13_240913_160930
No ratings yet
To Students Data Mining Part-2 Sept 13_240913_160930
5 pages
HW1
0% (1)
HW1
2 pages
Assignment2
No ratings yet
Assignment2
6 pages
02 Tinh Khoang Cach - Compatibility Mode
No ratings yet
02 Tinh Khoang Cach - Compatibility Mode
14 pages
Mod 4 Types of Data in Cluster Analysis
No ratings yet
Mod 4 Types of Data in Cluster Analysis
31 pages
Lec 5
No ratings yet
Lec 5
24 pages
Data Similarity
0% (1)
Data Similarity
18 pages
Predictive Numericals 20 Questions
No ratings yet
Predictive Numericals 20 Questions
4 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
Int to DS (1)
No ratings yet
Int to DS (1)
2 pages
Lec2 Activities
No ratings yet
Lec2 Activities
2 pages
No 2
No ratings yet
No 2
2 pages
qb2
No ratings yet
qb2
3 pages
QB for DS - V Sem Students
No ratings yet
QB for DS - V Sem Students
23 pages
Data Preprocessing II
No ratings yet
Data Preprocessing II
21 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
DS1000 Assignment 1
No ratings yet
DS1000 Assignment 1
6 pages
Data Mining Assignment 1
No ratings yet
Data Mining Assignment 1
2 pages
Data Mining Paer 2 Oct 12, 2024_241012_224522 (1)
No ratings yet
Data Mining Paer 2 Oct 12, 2024_241012_224522 (1)
13 pages
Similarity
No ratings yet
Similarity
19 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
HW1
No ratings yet
HW1
3 pages
02 Data
No ratings yet
02 Data
35 pages
Matlab Assignment Help
100% (1)
Matlab Assignment Help
4 pages
Lecture 2. Similarity Measures For Cluster Analysis
No ratings yet
Lecture 2. Similarity Measures For Cluster Analysis
31 pages
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
No ratings yet
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
86 pages
9-2 Data analysis and pre-processing part 2.pdf
No ratings yet
9-2 Data analysis and pre-processing part 2.pdf
27 pages
Data Mining
No ratings yet
Data Mining
24 pages
2 Similarity Disimilarity Measure
No ratings yet
2 Similarity Disimilarity Measure
35 pages
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
No ratings yet
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
5 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Show Your Work in Detail: 1. Given The Following Data
No ratings yet
Show Your Work in Detail: 1. Given The Following Data
6 pages
Quiz2 Source
No ratings yet
Quiz2 Source
8 pages
Assignment-2 3
No ratings yet
Assignment-2 3
4 pages
It-3031 (DMDW) - Cs End April 2024
No ratings yet
It-3031 (DMDW) - Cs End April 2024
22 pages
Ds Fall 2018 Midterm Exam
0% (1)
Ds Fall 2018 Midterm Exam
12 pages
Clustering Lecture 1: Basics: Jing Gao
No ratings yet
Clustering Lecture 1: Basics: Jing Gao
62 pages
STAT 412_M_2022
No ratings yet
STAT 412_M_2022
21 pages
X Chapter 02 Data
No ratings yet
X Chapter 02 Data
67 pages
Chapter 2: Getting To Know Your Data
No ratings yet
Chapter 2: Getting To Know Your Data
30 pages
23HCS4142.pdf
No ratings yet
23HCS4142.pdf
24 pages
Assign1 s2 2024
No ratings yet
Assign1 s2 2024
5 pages
Data8 Fa24 Final
No ratings yet
Data8 Fa24 Final
19 pages
data8-fa24-final-solutions
No ratings yet
data8-fa24-final-solutions
20 pages
arunav da prac
No ratings yet
arunav da prac
55 pages
FDS Important Q
No ratings yet
FDS Important Q
5 pages
Data Mining Assignment 1
No ratings yet
Data Mining Assignment 1
2 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
ADA Assignment - Final - 2022
No ratings yet
ADA Assignment - Final - 2022
6 pages
List of Programs in R 2 Sem
No ratings yet
List of Programs in R 2 Sem
48 pages
Statistics Formula Sheet and Tables 2020
No ratings yet
Statistics Formula Sheet and Tables 2020
6 pages
Triola Cap 12 Slide
No ratings yet
Triola Cap 12 Slide
65 pages
Sample Quiz 2 Statistics Essentials of Business Development
No ratings yet
Sample Quiz 2 Statistics Essentials of Business Development
15 pages
JIMS
No ratings yet
JIMS
5 pages
05.1 Data Organization PRESENTATION
No ratings yet
05.1 Data Organization PRESENTATION
19 pages
Single Variable Data (3) MA5.2-15SP
No ratings yet
Single Variable Data (3) MA5.2-15SP
12 pages
Assignment1 Solution
No ratings yet
Assignment1 Solution
6 pages
NormalDistribution_D18-Feb-2025
No ratings yet
NormalDistribution_D18-Feb-2025
4 pages
tugas mandiri
No ratings yet
tugas mandiri
18 pages
13.exploratory Data Analysis
0% (1)
13.exploratory Data Analysis
10 pages
Unit 5 Data Analysis Mental Tests: M 5.1 Standard Route
No ratings yet
Unit 5 Data Analysis Mental Tests: M 5.1 Standard Route
1 page
STA302 Week11 Full
No ratings yet
STA302 Week11 Full
49 pages
Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA
No ratings yet
Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA
42 pages
Grade 10 Exam Course Statistics 3
No ratings yet
Grade 10 Exam Course Statistics 3
49 pages
2023 u6 Statistics Test 6 Share....Pm
No ratings yet
2023 u6 Statistics Test 6 Share....Pm
2 pages
Spearman and Kendalls Tau B
No ratings yet
Spearman and Kendalls Tau B
7 pages
S1 - Chapter Review 2
No ratings yet
S1 - Chapter Review 2
5 pages
Hasil Stata GMM - Analisis Pengaruh Kebijakan Moneter Terhadap Profitabilitas Bank Di Indonesia Pada Masa Pandemi
No ratings yet
Hasil Stata GMM - Analisis Pengaruh Kebijakan Moneter Terhadap Profitabilitas Bank Di Indonesia Pada Masa Pandemi
4 pages
Statistics A. Introduction
50% (2)
Statistics A. Introduction
24 pages
6767f5fdbdc183a816d5935d_##_Statistics _ DPP 01 (of Lec 03) __ Arjuna JEE 2.0 2025
No ratings yet
6767f5fdbdc183a816d5935d_##_Statistics _ DPP 01 (of Lec 03) __ Arjuna JEE 2.0 2025
2 pages
Semi-Detailed Lesson Plan Grade10 quarter 4 week 1
No ratings yet
Semi-Detailed Lesson Plan Grade10 quarter 4 week 1
6 pages
Chapter 07
No ratings yet
Chapter 07
40 pages
Statistics Materials: Data Science: Week 9
No ratings yet
Statistics Materials: Data Science: Week 9
22 pages
MMW Statistic m6
No ratings yet
MMW Statistic m6
5 pages
SPSS Session 1 Descriptive Statistics and Univariate
No ratings yet
SPSS Session 1 Descriptive Statistics and Univariate
8 pages
Application Module 4 Lesson 3
No ratings yet
Application Module 4 Lesson 3
2 pages
Business Statistics in Practice 8th Edition Bowerman Test Bank download
100% (3)
Business Statistics in Practice 8th Edition Bowerman Test Bank download
56 pages
Summary Output: Regression Statistics
No ratings yet
Summary Output: Regression Statistics
6 pages
Final SPSS Record (1)
No ratings yet
Final SPSS Record (1)
44 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages

Assign 1

Uploaded by

Assign 1

Uploaded by

Data Mining Assignment 1: Data Understanding

The following links can be useful.

4. (30 points) Handwritten solution is required.

You might also like