0% found this document useful (0 votes)
2 views

1.3 Data Analysis with Python- Data Wrangling 1

Uploaded by

namdudotran1
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

1.3 Data Analysis with Python- Data Wrangling 1

Uploaded by

namdudotran1
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Wrangling

Objectives
 Pre-processing Data in Python
 Describe how to handle missing values
 Describe data formatting techniques
 Describe data normalization
 Demonstrate the use of binning
 Demonstrate the use of categotical variables

Data Wrangling 2
Pre-processing Data in
Python
 Data preprocessing is a necessary step in data analysis.
 It is the process of converting or mapping data from one raw
form into another format to make it ready for further
analysis.
 Data preprocessing is often called data cleaning or data
wrangling:
 Identify and handle missing values
 Data formatting
 Data Normalization( centering/ scaling)
 Data binning
 Turning Categorical values to numeric variables

Data Wrangling 3
Dealing with Missing
Values
 A missing value condition occurs whenever a data entry
is left empty.
 When no data value is stored for a variable in an
observation
 Missing value in data set appears as question mark and
a zero or just a blank cell.

Data Wrangling 4
Dealing with Missing
Values
 How to deal with missing data?

--> Student give the opinions

Data Wrangling 5
Dealing with Missing
Values
 How to deal with missing data?
 Go back and find what the actual value should be
 Just to remove the data where that missing value is found
 Drop the whole variable

 Drop the single data entry with the missing value

 If you don't have a lot of observations with missing

data, usually dropping the particular entry is the best.

Data Wrangling 6
Dealing with Missing
Values
 How to deal with missing data?
 Replace the missing values
 Replace it with an average

 Replace it by frequency

 Replace it based on other functions

 Leave it as missing data


 It may be useful to keep that observation even if some

features are missing

Data Wrangling 7
Dealing with Missing
Values
 Using dataframes.dropna() to drop missing data

 Inplace= true: writes the result back into the data frame

Data Wrangling 8
Dealing with Missing
Values
 Using dataframe.replace(missingValue, newValue):
replace missing data by other value

Data Wrangling 9
Dealing with Missing
Values
 How to deal with missing data?
 Go back and find what the actual value should be
 Leave it as missing data
 You can always check for a higher quality data set or
source
 You may want to leave the missing data as missing
data.

Data Wrangling 10
Data Formatting in Python
 Data are usually collected from different places and
stored in different formats
 What is data formatting? bring data into a common
standard of expression allows users to make meaningful
comparison.

Data Wrangling 11
Data Formatting in Python
 Data types in Python and Pandas
 Objects: “B”, “HoaDNT”
 Int64: 0,2,4
 Float64: 1.345, 78.9
 To identify data types: dataframe.dtypes().
 To convert data types: dataframe.astype().
 Example: convert data type to integer in column “price”

Data Wrangling 12
Summary
 Pre-processing Data in Python
 Describe how to handle missing values
 Describe data formatting techniques

Data Wrangling 13
Q&A

Data Wrangling 14

You might also like