0% found this document useful (0 votes)
54 views9 pages

5 Data - Preparation PDF

Data preparation involves cleaning data by addressing issues like missing values, duplicates, and outliers. It also involves formatting data by selecting and transforming features to create clean data ready for analysis. The goal is to get data into the proper shape through activities like data cleaning, munging, wrangling, and preprocessing to ensure meaningful analysis can be conducted.

Uploaded by

Tempat Data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views9 pages

5 Data - Preparation PDF

Data preparation involves cleaning data by addressing issues like missing values, duplicates, and outliers. It also involves formatting data by selecting and transforming features to create clean data ready for analysis. The goal is to get data into the proper shape through activities like data cleaning, munging, wrangling, and preprocessing to ensure meaningful analysis can be conducted.

Uploaded by

Tempat Data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Preparation

Overview
After this video you will be able to..

• Articulate the importance of data


preparation
• Define the objectives of data
preparation
• List some activities in preparing data
Preparing Data
Goal: Create data for analysis

Clean Format
- Select features to use
- Transform data
Data Cleaning

• Data quality issues


• Missing values
• Duplicate data
• Inconsistent data
• Noise
• Outliers
Addressing Data Quality Issues
• Some techniques:
• Remove data with missing
values
• Merge duplicate records
• Generate best estimate for
invalid values
Cleaning Data
Data Cleaning

Data Cleansing
Getting Data in Shape
Data Data
Munging Preprocessing

Data
Wrangling
Data Wrangling
• Feature selection
• Combing features
• Adding/Removing features
• Feature transformation
• Scaling
• Dimensionality reduction
Always Remember!

Data preparation is
very important for
meaningful analysis.

Garbage in = Garbage out

You might also like