0% found this document useful (0 votes)
37 views11 pages

Data Cleaning

Uploaded by

asaf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views11 pages

Data Cleaning

Uploaded by

asaf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Cleaning

World of Data
Agenda
● Definition
● Why & Who
● How - Data Cleaning guide
● Tools
Data Cleaning
● When we analyze data, the results of the analysis depend first and foremost on the quality
of the data we work with.
● Sometimes a correction will be required in order for the data to be ready for analysis.
● The data clearing process refers to any correction of information, removal of damaged /
incorrect data, removal of duplicates, completion of deficiencies in the data and more.
● Data transformation = converting data from one format or structure into another.
Data Cleaning - Why?
● There are many reasons why we want or need to perform data cleaning:
○ Consolidation of 2 or more tables
○ Incomplete data entry into server
○ Dual data entry into the server
○ Deleting unnecessary data
○ Changing the format to perform a particular analysis
○ More
● Data cleaning has many benefits, and it makes it possible to
make sure that the data is high quality
and ready for analysis.
Data Cleaning - Who?
● There is no single defined role responsible for clearing the data.
● The responsibility varies from company to company.
● As a data analyst you need to know how to perform data cleaning, and especially alert when it
is necessary to perform data cleaning.
The Guide for Data Cleaning
1. Investigate the data and see if there are any corrections that need to be made to it.
2. Decide on the repairs that need to be made
3. Make the repairs
4. Perform quality control on the new data
The Guide for Data Cleaning
● Features to pay attention to when accessing data:
○ Missing data
○ Structure / Format error
○ Duplicate values
○ Outliers
Tools
● Data cleaning can be done in the analysis environment in which you work (SQL, Python,
visualization tool).
● There are tools on the market designed to clean the data:
○ Trifacta
○ Drake
○ OpenRefine
○ Melissa Clean Suite
○ WinPure Clean & Match
○ More
Summary
● What is Data Cleaning?
● Why?
● Who?
● How?
● Tools

Be careful when using data cleaning - do not delete or change important data, which will
cause data biases
Sources of Information
● https://www.tableau.com/learn/articles/what-is-data-cleaning
● https://towardsdatascience.com/the-ultimate-guide-to-data-cleaning-3969843991d4
● https://www.sisense.com/glossary/data-cleaning/
● https://www.trifacta.com/data-cleansing/
● https://www.xplenty.com/blog/top-10-data-cleansing-tools/
● https://analyticsindiamag.com/10-best-data-cleaning-tools-get-data/
Questions?

You might also like