0% found this document useful (0 votes)
3 views

1.4 Data Analysis with Python- Data Wrangling 2

Uploaded by

namdudotran1
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

1.4 Data Analysis with Python- Data Wrangling 2

Uploaded by

namdudotran1
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Wrangling

Objectives
 Describe data normalization
 Demonstrate the use of binning
 Demonstrate the use of categotical variables

Data Wrangling 2
Data Normalization in
Python
 Uniform the features value with different range.
 The used car data set:
 The feature length ranges from 150-250,

 The feature width and height ranges from 50-

100.
 We may want to normalize these variables so that
the range of the values is consistent.
 This normalization can make some statistical
analyses easier down the road.

Data Wrangling 3
Data Normalization in
Python
 Why data normalization is important?

 Student give the answers ?

Data Wrangling 4
Data Normalization in
Python
 Methods of normalizing data
1. Simple feature scaling: divides each value by the maximum
value for that feature. The new values range between
zero and one.

Data Wrangling 5
Data Normalization in
Python
 Methods of normalizing data
2. Min-max: each value X_old subtract it from the minimum
value of that feature, then divides by the range of that
feature. new values range between zero and one.

Data Wrangling 6
Data Normalization in
Python
 Methods of normalizing data
3. Z-score: each value you subtract the mu and then divide by
the standard deviation sigma. The resulting values hover
around zero.

Data Wrangling 7
Binning in Python
 What is Binning data?
 Grouping of values into “bins” you can bin “age” into [0 to
5], [6 to 10], [11 to 15]
 Group a set of numerical values into a smaller number of
bins to have a better understanding of the data
distribution we categorize the price into three bins: low
price, medium price, and high prices.

Data Wrangling 8
variables into
quantitative variables
 Problem
 Most statistical models can not take in the objects/ strings
as input
 How to turn categorical variables into quantitative
variables( numeric, string)
 The fuel type feature as a categorical variable has two
values, gas or diesel, which are in string format

Data Wrangling 9
variables into
quantitative variables
 Categorical  Numeric: “one-hot encoding”
technique
 Add dummy variables for each unique category
 Assign 0 or 1 in each category

Data Wrangling 10
variables into
quantitative variables
 Pandas.get_dummies(): convert categorical
variables to dummy variables( o or 1)

Data Wrangling 11
Summary
 Describe data normalization
 Demonstrate the use of binning
 Demonstrate the use of categotical variables

Data Wrangling 12
Q&A

Data Wrangling 13

You might also like