Assignment#2 RT WQ2021

This document outlines an assignment for a data mining course. It includes 4 problems related to data preprocessing tasks like data visualization, normalization, binning, and correlation analysis. Students are asked to perform these operations on datasets relating to hospital patient information, loan approvals, and Spotify music listening data. The assignment is worth 50 total points and is due by midnight on February 11th, 2021. Late submissions within 3 days are allowed but will incur a penalty.

Uploaded by

Manoj Vemuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

324 views

Assignment#2 RT WQ2021

Uploaded by

Manoj Vemuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

DSC 441: Winter 2020-2021 Assignment #2, Page 1 of 2

Assignment #2

Due Date: Thursday, February 11th, 2021, by midnight

Total number of points: 50 points

Problem 1 (10 points): This problem is an example of data preprocessing needed in a data mining process.
Suppose that a hospital tested the age and body fat data for 18 randomly selected adults with the following
results:

Age 26 26 29 29 40 45 50 55 60

%fat 10.5 30.5 8.8 20.8 32.4 26.9 30.4 30.2 33.2

Age 55 45 60 55 61 62 63 75 66

%fat 36.6 44.5 30.8 35.4 33.2 36.1 37.9 43.2 37.7

a. (2 points) Draw the box-plots for age and %fat. Interpret the distribution of the data.
b. (2 points) Normalize the two attributes based on z-score normalization.
c. (2 points) Regardless of the original ranges of the variables, normalization techniques transform
the data into new ranges that allow to compare and use variables on the same scales. What are the
values ranges of the following normalization methods (for this data set and in general)? Explain
and backup your answer.
i. Min-max normalization
ii. Z-score normalization
iii. Normalization by decimal scaling.
d. (2 points) Draw a scatterplot based on the two variables and interpret the relationship between the
two variables.
e. (2 points) Calculate the correlation matrix. Are these two attributes positively or negatively
correlated? Calculate the covariance matrix. How is the correlation matrix different from the
covariance matrix?

Problem 2 (10 points): There are two parts to this discussion assignment.

Part 1: Given the following set of data, bin them with equal width bins (choose how many) and then smooth the data
by replacing each item with the median value of the bin. Show the new bins and show the new list of the data after
smoothing.

18, 8, 22, 10, 12, 5, 4, 32, 2, 9, 16, 25, 26, 28

Part 2: Normalize the same data above with three techniques: min-max (to range 10 to 20), standardization, and
decimal scaling. What value gets mapped to 0 in each case? What are the min and max values after normalization
with each?

Problem 3 (20 points): For this problem, you will load and perform some cleaning steps on a dataset in the
provided BankData.csv, which is data about loan approvals from a bank in Japan (it has been modified from the
DSC 441: Winter 2020-2021 Assignment #2, Page 2 of 2

original for our purposes in class, so use the provided version). Specifically, you will use visualization to examine
the variables and normalization, binning and smoothing to change them in particular ways.

a. Visualize the distributions of the variables in this data. You can choose bar graphs or histograms. Make
appropriate choices given each type of variables and be careful when selecting parameters like the number of bins
for the histograms. Note there are some numerical variables and some categorical ones. The ones labeled as a ‘bool’
are Boolean variables, meaning they are only true or false and are thus a special type of categorical. Checking all the
distributions with visualization and summary statistics is a typical step when beginning to work with new data.

b. Now apply normalization to some of these numerical distributions. Specifically, choose to apply z-score to one,
min-max to another, and decimal scaling to a third.

c. Visualize the new distributions for the variables that have been normalized. What has changed from the previous
visualization?

d. Choose one of the numerical variables to work with for this problem. Let’s call it v. Create a new variable called
v_bins that is a binned version of that variable. This v_bins will have a new set of values like low, medium, high.
Choose the actual new values (you don’t need to use low, medium, high) and the ranges of v that they represent
based on your understanding of v from your visualizations. You can use equal depth, equal width or custom ranges.
Explain your choices: why did you choose to create that number of values and those particular ranges? (Explore
SPSS visual binning)

Problem 4 (10 points): Download the Spotify Dataset along with the description from D2L.
a) (5 points) Describe the data in terms of number of attributes, number of cases, class distribution. Is there
any correlation between features? Explain your answer.
b) (5points) Report the ranges for each numerical variable. Would you recommend normalizing the data? If
yes, which approach would you apply? Justify your answer.

Submission Instructions

1. Answer the problems and write your answers in a Word document.

2. Submit your file online at the website at http://d2l.depaul.edu and check your submission
3. Keep a copy of all your submissions!
4. If you have questions about the homework, email me BEFORE the deadline.
5. Late submissions are allowed with a 5%, 10%, and 15% penalty for a one day, two days, and three days,
respectively.
6. No late work will be accepted after three days since the assignment was due.

A Contemporary Look at the Relationship Between Cognitive Ability and Job Performance (2022)
No ratings yet
A Contemporary Look at the Relationship Between Cognitive Ability and Job Performance (2022)
27 pages
Make Do and Mend
0% (1)
Make Do and Mend
3 pages
Changing Views in Textile Conservation - Content
0% (1)
Changing Views in Textile Conservation - Content
7 pages
Ottoman Turkish Thread Covered Buttons
75% (4)
Ottoman Turkish Thread Covered Buttons
8 pages
Textile Conservation 2010 PDF
No ratings yet
Textile Conservation 2010 PDF
338 pages
EASTPAK - SS14 - Lookbook 1 PDF
No ratings yet
EASTPAK - SS14 - Lookbook 1 PDF
28 pages
The Italian Camicia
100% (1)
The Italian Camicia
12 pages
Cravat - Free Pattern and Tutorial The Cravat
No ratings yet
Cravat - Free Pattern and Tutorial The Cravat
4 pages
Late Georgian Costume _ the Tailor's Friendly Instructor -- Edited by R_L_ Shep; With Additional Notes and Illustrations -- New Ed_ With Additional -- 9780914046127 -- d79ef28a76304aaf2884d172ceb5eafa -- Anna
No ratings yet
Late Georgian Costume _ the Tailor's Friendly Instructor -- Edited by R_L_ Shep; With Additional Notes and Illustrations -- New Ed_ With Additional -- 9780914046127 -- d79ef28a76304aaf2884d172ceb5eafa -- Anna
214 pages
(1898) Leaves From A Game Book
No ratings yet
(1898) Leaves From A Game Book
308 pages
Textiles and Costumes in The Ulster Museum
No ratings yet
Textiles and Costumes in The Ulster Museum
4 pages
Restoration and Conservation of Textiles
100% (1)
Restoration and Conservation of Textiles
78 pages
Cam Brige
No ratings yet
Cam Brige
48 pages
History of Knitting Hosiery 01
No ratings yet
History of Knitting Hosiery 01
27 pages
The Dressmaker'S Craft: Costume
100% (1)
The Dressmaker'S Craft: Costume
13 pages
2013 Historic Deerfield Annual Report
No ratings yet
2013 Historic Deerfield Annual Report
32 pages
PW2004 2005TOCs
100% (1)
PW2004 2005TOCs
12 pages
Tartans and Their History: Rium Scoticum Along With Official Scottish Registration and Acceptance
No ratings yet
Tartans and Their History: Rium Scoticum Along With Official Scottish Registration and Acceptance
4 pages
Iccrom Ics07 Conservingtextiles00 en PDF
No ratings yet
Iccrom Ics07 Conservingtextiles00 en PDF
196 pages
Art Quilts Made Easy: 12 Nature-Inspired Projects with Appliqué Techniques and Patterns
From Everand
Art Quilts Made Easy: 12 Nature-Inspired Projects with Appliqué Techniques and Patterns
Dr. Susan Kruszynski
No ratings yet
Natural Dyes Our Global Heritage of Colors
No ratings yet
Natural Dyes Our Global Heritage of Colors
11 pages
Fashion in the 1890s
No ratings yet
Fashion in the 1890s
2 pages
Basicweavesandtheircharacteristics 150210062534 Conversion Gate01 PDF
No ratings yet
Basicweavesandtheircharacteristics 150210062534 Conversion Gate01 PDF
46 pages
History of The Neelys
No ratings yet
History of The Neelys
2 pages
Instant Download Picts Gaels Scots Early Historic Scotland Sally M. Foster PDF All Chapters
100% (9)
Instant Download Picts Gaels Scots Early Historic Scotland Sally M. Foster PDF All Chapters
60 pages
100 Years of Fashion
No ratings yet
100 Years of Fashion
33 pages
18th Century Male Tailoring Theatrical and Historical Tailoring c1680 1790 1st Edition Graham Cottenden - The ebook in PDF/DOCX format is available for instant download
100% (3)
18th Century Male Tailoring Theatrical and Historical Tailoring c1680 1790 1st Edition Graham Cottenden - The ebook in PDF/DOCX format is available for instant download
67 pages
Spinning-Wheel Stories
From Everand
Spinning-Wheel Stories
Louisa May Alcott
No ratings yet
Quilting Legacy: 12 Reproduction Designs from a Cherished Collection of Antique Quilts
From Everand
Quilting Legacy: 12 Reproduction Designs from a Cherished Collection of Antique Quilts
Jan and Jim Shore
No ratings yet
19th Century Fashion Costume
No ratings yet
19th Century Fashion Costume
7 pages
Design History
No ratings yet
Design History
7 pages
Ottoman Fabrics During The 18th and 19th Centuries PDF
No ratings yet
Ottoman Fabrics During The 18th and 19th Centuries PDF
9 pages
One Hundred Years of Women's Fashion: by Shannon Perry
100% (1)
One Hundred Years of Women's Fashion: by Shannon Perry
11 pages
How To Make and Trim Your Own Hats 2
0% (2)
How To Make and Trim Your Own Hats 2
2 pages
T Wills 4 Shafts Preview
No ratings yet
T Wills 4 Shafts Preview
10 pages
17 Century: Baroque Period
No ratings yet
17 Century: Baroque Period
25 pages
CLothing and Colonial Culture of Appearances in 19th C Philippines PHD Diss Full Coo 2014 PDF
No ratings yet
CLothing and Colonial Culture of Appearances in 19th C Philippines PHD Diss Full Coo 2014 PDF
619 pages
Hitler's Vineyards: How the French Winemakers Collaborated with the Nazis
From Everand
Hitler's Vineyards: How the French Winemakers Collaborated with the Nazis
Christophe Lucand
No ratings yet
A Tunic From Early First Millenium
No ratings yet
A Tunic From Early First Millenium
14 pages
The Book of the Ice
From Everand
The Book of the Ice
Mark Lawrence
4/5 (167)
Historic Costumes
100% (1)
Historic Costumes
140 pages
HW2005 TOCs
0% (1)
HW2005 TOCs
5 pages
Medieval Mordants
No ratings yet
Medieval Mordants
4 pages
A. C.1923. Lavendar Sheer Cotton Three-Piece Night Ensemble: With A Ribbon Drawstring. The Leg, Gathered Into A Band, Is Open For Eight and A Half
100% (1)
A. C.1923. Lavendar Sheer Cotton Three-Piece Night Ensemble: With A Ribbon Drawstring. The Leg, Gathered Into A Band, Is Open For Eight and A Half
3 pages
Chroma Modern: Designed by Lunn Studios
No ratings yet
Chroma Modern: Designed by Lunn Studios
4 pages
The Mode in Hats and Headdress A Historical Surveys
No ratings yet
The Mode in Hats and Headdress A Historical Surveys
186 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
1 Assignment
No ratings yet
1 Assignment
2 pages
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
No ratings yet
Homework Index: To See If The Questions Have Been Changed, or If You Are Required To Use Different Data or Examples
86 pages
22CB340
No ratings yet
22CB340
4 pages
HW3
0% (1)
HW3
3 pages
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
No ratings yet
IS328 Data Mining-Tutorial Lab Session 2 - Solution - Updated
15 pages
Data Preparation DM
No ratings yet
Data Preparation DM
26 pages
Eda
No ratings yet
Eda
48 pages
Data Preprocessing for Clustering
No ratings yet
Data Preprocessing for Clustering
40 pages
CCW331 SET4
No ratings yet
CCW331 SET4
5 pages
Id No Inst Time Status Ag e Se X Ph. Ecog Ph. Karno Pat. Karno Meal - Cal WT - Loss
No ratings yet
Id No Inst Time Status Ag e Se X Ph. Ecog Ph. Karno Pat. Karno Meal - Cal WT - Loss
4 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
21CS63 - Unit1 Practice Questions
No ratings yet
21CS63 - Unit1 Practice Questions
3 pages
Data Preprocessing II
No ratings yet
Data Preprocessing II
21 pages
Lec 5
No ratings yet
Lec 5
24 pages
MM ZG522-L1
No ratings yet
MM ZG522-L1
37 pages
Catalog For 2021 2022 Academic Year
No ratings yet
Catalog For 2021 2022 Academic Year
114 pages
Module On Standard Scores and The Normal Curve
No ratings yet
Module On Standard Scores and The Normal Curve
27 pages
[FREE PDF sample] (Ebook) Business Essentials Research Project by BPP Learning Media ISBN 9780751768343, 0751768340 ebooks
100% (2)
[FREE PDF sample] (Ebook) Business Essentials Research Project by BPP Learning Media ISBN 9780751768343, 0751768340 ebooks
82 pages
Week 5 - Cases
No ratings yet
Week 5 - Cases
17 pages
Paradoxical Leadership (2014)
No ratings yet
Paradoxical Leadership (2014)
21 pages
Internal Audit of The Supply Chain Management in F
No ratings yet
Internal Audit of The Supply Chain Management in F
6 pages
PDF Smart Geography 100 Years of the Bulgarian Geographical Society Stoyan Nedkov download
100% (3)
PDF Smart Geography 100 Years of the Bulgarian Geographical Society Stoyan Nedkov download
65 pages
Technical Program ICSMAI2024 V5
No ratings yet
Technical Program ICSMAI2024 V5
12 pages
Grade 9 Third Exam
No ratings yet
Grade 9 Third Exam
4 pages
Kite Runner Essays
100% (3)
Kite Runner Essays
2 pages
Gagne Nine Events Worksheet
No ratings yet
Gagne Nine Events Worksheet
3 pages
Full download (Ebook) City Limits: Crime, Consumer Culture and the Urban Experience by Keith Hayward ISBN 9781843146032, 1843146037 pdf docx
100% (4)
Full download (Ebook) City Limits: Crime, Consumer Culture and the Urban Experience by Keith Hayward ISBN 9781843146032, 1843146037 pdf docx
81 pages
GTRE
No ratings yet
GTRE
2 pages
Supporting Document To The ISO 26000 Self Declaration of Arla Foods B.V
No ratings yet
Supporting Document To The ISO 26000 Self Declaration of Arla Foods B.V
30 pages
Addington Et Al-2018-Early Intervention in Psychiatry
No ratings yet
Addington Et Al-2018-Early Intervention in Psychiatry
8 pages
Failing Firms and Successful Entrepreneurs: Serial Entrepreneurship As A Temporal Portfolio
No ratings yet
Failing Firms and Successful Entrepreneurs: Serial Entrepreneurship As A Temporal Portfolio
19 pages
24-8 Humera
No ratings yet
24-8 Humera
201 pages
Fin702 Ud 2023
No ratings yet
Fin702 Ud 2023
12 pages
The Relationship of Faculty Development On Employee Productivity of DLSUD
No ratings yet
The Relationship of Faculty Development On Employee Productivity of DLSUD
52 pages
FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only
No ratings yet
FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only
77 pages
Tonkin & Taylor Christchurch Central City Geological Interpretative Report
No ratings yet
Tonkin & Taylor Christchurch Central City Geological Interpretative Report
79 pages
FM TASK 2...
No ratings yet
FM TASK 2...
39 pages
Geographic Approaches To Global Health
No ratings yet
Geographic Approaches To Global Health
87 pages
Autoethnography in TESOL PDF
No ratings yet
Autoethnography in TESOL PDF
14 pages
JURNAL VINA New
No ratings yet
JURNAL VINA New
11 pages
The Impact of Intellectual Capital On Innovation: A Literature Study
No ratings yet
The Impact of Intellectual Capital On Innovation: A Literature Study
13 pages
Limburg 2016
No ratings yet
Limburg 2016
26 pages
A Leadership Development Instrument For Students - Updated
No ratings yet
A Leadership Development Instrument For Students - Updated
15 pages