The Handwritten Solutions To The First Five Questions, and The Report of Last Question

This document provides instructions for Assignment 3 on data pre-processing. It is due on April 24th and is worth 100 marks. The assignment involves answering questions on data smoothing, normalization, binning, and outlier detection. It also involves using the Weka data mining tool to explore pre-processing techniques like discretization, normalization, resampling and attribute selection on sample datasets. A report summarizing the experiments in Weka is required.

Uploaded by

Qä Sïm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views2 pages

The Handwritten Solutions To The First Five Questions, and The Report of Last Question

Uploaded by

Qä Sïm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 2

Assignment 3: Data Pre-processing Due Date: 24 April, 2020 Marks: 100

Submission Instructions: Submit a single pdf file (or zipped folder) on LMS containing
the handwritten solutions to the first five questions, and the report of last question.

1. Given the following data (in ascending order) for the attribute age: 12, 15,16, 16,
19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52,
72. [10 Marks]
(a) Use smoothing by bin means to smooth these data, using a bin depth of 3.
Illustrate your steps. Comment on the effect of this technique for the given data.
(b) How might you determine outliers in the data?
(c) What other methods are there for data smoothing?

2. What are the ranges of the following normalization methods?

[10 Marks]
(a) min-max normalization
(b) z-score normalization
(c) normalization by decimal scaling

3. Use these methods to normalize the following group of data: 200, 300, 400,
600,1000
(a) min-max normalization by setting min = 0 and max = 1 [10
Marks]
(b) z-score normalization
(c) normalization by decimal scaling

4. Using the data for age given in question 1, answer the following: [10
Marks]
(a) Use min-max normalization to transform the value 35 for age onto the range [0.0,
1.0].
(b) Use z-score normalization to transform the value 35 for age.
(c) Use normalization by decimal scaling to transform the value 35 for age.
(d) Comment on which method you would prefer to use for the given data, giving
reasons as to why.

5. Suppose a group of 12 sales price records has been sorted as follows: [10
Marks]
5, 10, 11, 13, 15, 35, 50, 55, 72, 92, 204, 215.
Partition them into three bins by each of the following methods:
(a) equal-frequency (equal-depth) partitioning
(b) equal-width partitioning
(c) clustering
(d) do numerosity reduction of these data to obtain 50% data reduction.

6. Pre-processing using Weka [50

Marks]
Weka (https://www.cs.waikato.ac.nz/ml/weka/) is a collection of machine
learning algorithms for solving real-world data mining problems. It is written in Java
and runs on almost any platform. The algorithms can either be applied directly to a
dataset or called from your own Java code. You should install Weka (if not already
done) on your machine, experiment the following parts and submit a report.

a) Load Iris dataset (available in the data folder of Weka installation). Explore
different filters in the preprocess tab and demonstrate the use of following:
attribute: Discretize, Normalize, Standardize, MathExpression
instance: Resample, Randomize
Assignment 3: Data Pre-processing Due Date: 24 April, 2020 Marks: 100

b) Apply attribute selection (from select attributes tab) and report the resulting
attributes, using following methods:
CfsSubsetEval
PrincipalComponents
c) Apply dimensionality reduction to the SGPA dataset you created in assignment 1
for prediction of your Spring2020 SGPA. Report the original dimensions and the
reduced dimensions obtained.

DigSILENT DPL Function Reference
No ratings yet
DigSILENT DPL Function Reference
763 pages
Student Database Management
No ratings yet
Student Database Management
23 pages
WsCube Tech Online MERN Stack Course
No ratings yet
WsCube Tech Online MERN Stack Course
24 pages
DWM Lab Manual 2025-26 Updated
No ratings yet
DWM Lab Manual 2025-26 Updated
47 pages
Chapter - 5
No ratings yet
Chapter - 5
51 pages
Massive X-16x9 Main
No ratings yet
Massive X-16x9 Main
541 pages
Chapter 3 Questions
No ratings yet
Chapter 3 Questions
2 pages
2023 Its665 - Isp565 - Group Project
No ratings yet
2023 Its665 - Isp565 - Group Project
6 pages
Unit 2
No ratings yet
Unit 2
46 pages
Lab2
No ratings yet
Lab2
8 pages
CS322 - Lec 3 - S25
No ratings yet
CS322 - Lec 3 - S25
42 pages
27 - IAT-1 Syllabus and Question Bank
No ratings yet
27 - IAT-1 Syllabus and Question Bank
3 pages
Data Mining Presentation
No ratings yet
Data Mining Presentation
206 pages
Data Preprocessing Questions
No ratings yet
Data Preprocessing Questions
2 pages
Autosar Sws Lindriver
No ratings yet
Autosar Sws Lindriver
67 pages
Course 1 Module 02 Lesson 4
No ratings yet
Course 1 Module 02 Lesson 4
9 pages
Coding Journey
No ratings yet
Coding Journey
14 pages
DMBI Index
No ratings yet
DMBI Index
2 pages
1 s2.0 S0045790623000320 Main
No ratings yet
1 s2.0 S0045790623000320 Main
15 pages
10-2 Data Analysis and Pre-Processing Part 4 PDF
No ratings yet
10-2 Data Analysis and Pre-Processing Part 4 PDF
23 pages
Experiment 1: Installation of WEKA Tool Aim
No ratings yet
Experiment 1: Installation of WEKA Tool Aim
19 pages
Unit 3
No ratings yet
Unit 3
3 pages
Experiment No 5 DBMS
No ratings yet
Experiment No 5 DBMS
3 pages
Expresiones Phyton
No ratings yet
Expresiones Phyton
15 pages
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
No ratings yet
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
15 pages
Lecture 5
No ratings yet
Lecture 5
27 pages
Its665 Isp565 Group Project Mac2024
No ratings yet
Its665 Isp565 Group Project Mac2024
9 pages
Lesson 10
No ratings yet
Lesson 10
32 pages
1 Assignment
No ratings yet
1 Assignment
2 pages
Assignment 3 (Warehouse)
No ratings yet
Assignment 3 (Warehouse)
2 pages
Data Minig Lab Manual
No ratings yet
Data Minig Lab Manual
58 pages
DMW 5
No ratings yet
DMW 5
1 page
JU NCPC 2023 - Online Preliminary Contest Editorial
No ratings yet
JU NCPC 2023 - Online Preliminary Contest Editorial
4 pages
MCQ 3 Aiml
No ratings yet
MCQ 3 Aiml
2 pages
MSDSModule 2
No ratings yet
MSDSModule 2
35 pages
DSR Unit III
No ratings yet
DSR Unit III
11 pages
ML 4
No ratings yet
ML 4
17 pages
Powers Hell Command Tore Name
No ratings yet
Powers Hell Command Tore Name
3 pages
1 James M. Curran: 1 Summary
No ratings yet
1 James M. Curran: 1 Summary
7 pages
Bi Ut2 Answers
No ratings yet
Bi Ut2 Answers
23 pages
DM QB
No ratings yet
DM QB
3 pages
21CS63 - Unit1 Practice Questions
No ratings yet
21CS63 - Unit1 Practice Questions
3 pages
Assignment Questions - Data Analysis and Visualization Using Power BI and Tableau
No ratings yet
Assignment Questions - Data Analysis and Visualization Using Power BI and Tableau
2 pages
Os Chapter 8
No ratings yet
Os Chapter 8
18 pages
ML Assignment-1
No ratings yet
ML Assignment-1
7 pages
Task 1
No ratings yet
Task 1
3 pages
Data Mining Notes: 7 Semester. CS 1435: Syllabus
No ratings yet
Data Mining Notes: 7 Semester. CS 1435: Syllabus
4 pages
Unit 2
No ratings yet
Unit 2
37 pages
PS2 Sol
No ratings yet
PS2 Sol
7 pages
Module 3 Notes
No ratings yet
Module 3 Notes
5 pages
Linked List Data Structure
No ratings yet
Linked List Data Structure
5 pages
Data Mining Guidelines
No ratings yet
Data Mining Guidelines
4 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Id No Inst Time Status Ag e Se X Ph. Ecog Ph. Karno Pat. Karno Meal - Cal WT - Loss
No ratings yet
Id No Inst Time Status Ag e Se X Ph. Ecog Ph. Karno Pat. Karno Meal - Cal WT - Loss
4 pages
Assigment Q Machine Learning - CSE
No ratings yet
Assigment Q Machine Learning - CSE
4 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
8 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
QB 2
No ratings yet
QB 2
3 pages
Processing JSON With JavaScript
No ratings yet
Processing JSON With JavaScript
18 pages
new-Guidelines-Datamining-I-UGCF-DSE-CS Hons-Sem 4-Jan 25
No ratings yet
new-Guidelines-Datamining-I-UGCF-DSE-CS Hons-Sem 4-Jan 25
3 pages
RFC
No ratings yet
RFC
92 pages
Ese 2
No ratings yet
Ese 2
2 pages
15 Chapter6 PDF
No ratings yet
15 Chapter6 PDF
12 pages
This Study Resource Was: Page 1 of 7
No ratings yet
This Study Resource Was: Page 1 of 7
7 pages
How To Write Shared Libraries
No ratings yet
How To Write Shared Libraries
0 pages
Data Structure - Algo Expert
No ratings yet
Data Structure - Algo Expert
3 pages
126VW122019
No ratings yet
126VW122019
2 pages
Spring Boot
No ratings yet
Spring Boot
2 pages
Chapter 3 - Data Pre-Processing Notes
No ratings yet
Chapter 3 - Data Pre-Processing Notes
8 pages
A Study On Windows Mobile 6.5 Operation System
No ratings yet
A Study On Windows Mobile 6.5 Operation System
13 pages
Introduction To Maven
No ratings yet
Introduction To Maven
30 pages
HW3
0% (1)
HW3
3 pages
Chapter 4. Computer Software
No ratings yet
Chapter 4. Computer Software
40 pages
Data Mining Worksheet One
No ratings yet
Data Mining Worksheet One
2 pages
Chapter 6 - Optimization Models With Integer Variables: Page 1
No ratings yet
Chapter 6 - Optimization Models With Integer Variables: Page 1
14 pages
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
No ratings yet
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
86 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Cse2021 - Data Mining CH
No ratings yet
Cse2021 - Data Mining CH
13 pages
Data Pre Processing - NG
No ratings yet
Data Pre Processing - NG
43 pages
Viva Voice Daa
No ratings yet
Viva Voice Daa
6 pages
Assg 2 Pre-Processing
No ratings yet
Assg 2 Pre-Processing
1 page
Nofile Enquiry-R15 PDF
No ratings yet
Nofile Enquiry-R15 PDF
40 pages
Its665 Isp565 Group Project March 2023
No ratings yet
Its665 Isp565 Group Project March 2023
10 pages
Cap Classification System Web
No ratings yet
Cap Classification System Web
16 pages
Assignment 02
No ratings yet
Assignment 02
9 pages
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
No ratings yet
2021 ITS665 - ISP565 - GROUP PROJECT-revMac21
6 pages
Microsoft Azure Data Engineer DP 203
From Everand
Microsoft Azure Data Engineer DP 203
Manish Soni
No ratings yet
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
From Everand
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
Manish Soni
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

The Handwritten Solutions To The First Five Questions, and The Report of Last Question

Uploaded by

The Handwritten Solutions To The First Five Questions, and The Report of Last Question

Uploaded by

Assignment 3: Data Pre-processing Due Date: 24 April, 2020 Marks: 100

2. What are the ranges of the following normalization methods?

6. Pre-processing using Weka [50

You might also like