0% found this document useful (0 votes)
1 views

2.data Management & Data Cleaning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

2.data Management & Data Cleaning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Data Cleaning, Managing

and
Data Modification
DOWNLOD YOUR MATERIALS HERE
Checklist For Managing Data

• Coding of data appropriately/Code Book

• Enter data into database/ Download from system

• Knowing your Database

• Conduct range and visual checks

• Check for duplicate records in key fields

• Make all corrections that are needed

• Archive a copy of the database

• Data processing
Data Management

Data management is the


practice of (1)collecting,
(2)organizing,
(3)protecting, and
(4)storing an organization’s
data so it can be analyzed
for business.
Data Cleaning
Information Data Database Documentation

Data editing, valid


Data collectors Data Entry ation, cleaning

Information Management Data presentation and release


Users
and Priority Setting

• Data cleaning is to identify and correct there errors or at least minimize their impact o
n the study results.
• Cleaning data and preparing for analysis is a thankless, dull, laborious and painstakin
g job. The cost of mistake is considerable.
Prevention is better than cure

• Accurate data is the responsibility of the custodian

• Encourage 2-way feedback – Feedback errors to data collectors


and data entry staff and find ways to minimize errors.
KEY POINTS
• In small studies a single outlier/wrong data will have a greater distorting
effect on the results.

• Cost is lower if the data cleaning process is planned and starts early in data
collection.

• Transparency and documentation of all procedures.


Knowing Your Database
Preparing Code Book
Variable/ Description Database Variable Name Variable Value
Name Name
Kad Pengenalan KP
Age Age in Years
Ethicity Ethicity 1-Malay, 2-Chinese,3 Indian
Marital Status Marital 1- Married,2-Single, 3-Divorced
Education Level Education 1-Primary,2-Secondary,3-Tertiary
Income Per Month Income
Do you Exercise Exercise 0=No;1=Yes
Weight in Kg Weight(kg)
Height in Metre Height(M)
Date of Birth DOB
Negeri Negeri 01-Johor,02-Kedah,03-Kelantan
Lokaliti :Urban Rural Lokaliti 1-Urban,2-Rural
Negeri_Lokaliti Negeri_Lokaliti
Date of Interview Date of Interview
Date of Follow Up Date of Follow Up
Knowing Your Database
Data Cleaning Dataset
#1.Organizing

▪ Fill series (unique ID)


▪ Sort and Filter
▪ Freeze Panes
Data Cleaning Dataset
▪ Fill series (unique ID) (Creating unique ID)
To create a serial / series of
numbers

Click on the first 2 box, A2 and


A3, move cursor to the bottom
right corner (cursor changes to
Fill Series

a + sign) and drag it down

- It copies the number down


Insert column the column

A3& A4
Data Cleaning Dataset
#1.Organizing

▪ Fill series (unique ID)


▪ Sort and Filter
▪ Freeze Panes
Data Cleaning Dataset
#1.Organizing (Filter) To select only a chosen
category.

First we need to set


where the filters will be.

To select only a
chosen category.

First we need to
set where the
filters will be.

Shows all available options in the column

Can be used to identify wrongly keyed in


data!
Data Cleaning Dataset
#1.Organizing (Filter)…. Clearing The Filter
Data Cleaning Dataset
If we want the
#1.Organizing (Sort) data listed by
the Name, we
can “Sort” the
list accordingly
by applying the
filter first.
Data Cleaning Dataset
#1.Organizing

▪ Fill series (unique ID)


▪ Sort and Filter
▪ Freeze Panes
Data Cleaning Dataset
#1.Organizing (Freeze Panes)

Click on cell Freeze First Column to lock the


To “lock” certain cells on the page. Usually the
serial number
headers will be locked so that we can see the
category name.
Click on cell Freeze Top Row to lock the
category name and serial number
#1.Organizing (Freeze Panes)

To “lock”
certain cells on
the page.
Usually the
headers will be
locked so that
we can see the
category
name.

Click on cell
“B2” – to lock
the category
name and
serial number
#1.Organizing (Freeze Panes)

The lines on the


datasheet
indicate the lock
position.

Even if you scroll


down or to your
right, ROW1 and
COLUMN A will
always be visible
Formulas
•+ Plus
- Minus
* Times
/ Divide
^ Power

• A2*B2
• A1/(B2*B2) or A1/B2^2
The most powerful function in Excel …. =
The strength of Excel is in its formulas…

• Enables dynamic results


• Preset Calculations
• Immediate display of results

• Easier to learn
• “Everyone can Excel”
• Better charts and graphs
• More flexibility
Please Open the Excel Data
• #01. Capitalizing First Letter of Each Word
• #02.Standardize your text and display (Trim Spaces)
• #03.Find & Replace
• #04. Count Number of Characters
• #04. Detecting Duplicate Values
• #06. Maximum and Minimum Value
• #07. Maximum and Minimum Value to Detect Error
• #08.Pivot Table to Detect Error
• #09.Fixing Error using V-LOOKUP
• #10. Format Date (Using Clean Data_KDN)
• #11. Recode using V LOOK UP (Using Clean Data)
• #12. Compute (Using Clean Data)
#01. Capitalizing First Letter of Each Word

Capitalize Each Word: =PROPER(CELL)

Try this one….and see UPPERCASE: =Upper(CELL) Lowercase: =Lower(CELL)


#01. Capitalizing First Letter of Each Word

Copy the cells

DO NOT click paste

Click Paste Values

To make sure it don’t move


#02.Standardize your text and display

Trimming Spaces =TRIM(CELL)


#03. Find & Replace
#04. Count Number of Characters
Count Number of Characters:
=LEN(CELL)

Find & Replace


# 05. Detecting Duplicate Values

Delete the Duplicate!


#06. Maximum and Minimum Value
Insert 3 new row
#06. Maximum and Minimum Value

Find the Minimum Value =MIN(CELL RANGE)


Find the Maximum Value =MAX(CELL RANGE)
#07. Maximum and Minimum Value To Detect Error

Make the correction


#08. Pivot Table To Detect Error

Highlight all

Fix this
#09. V-LOOK UP To Fix Error
(recode)

Fix this

Do a reference table at new sheet to recode

Vlookup =VLOOKUP(search
key,range,index,option(exact or
approx))
##10. Format Date (Using Clean Data) AGE AS AT TODAY

AGE AS AT TODAY in year=DATEDIF(CELL,TODAY(),"y")


AGE AS AT TODAY in month=DATEDIF(CELL,TODAY(),“m")
AGE AS AT TODAY in days=DATEDIF(CELL,TODAY(),“d")
##10. Format Date (Using Clean Data)

Length Between 2 Date

Length Between 2 Date in year =DATEDIF(CELL First


Date,CELL Second date,"y")

Length Between 2 Date in year =DATEDIF(CELL First


Date,CELL Second date,“m")

Length Between 2 Date in year =DATEDIF(CELL First


Date,CELL Second date,“d")
#11. Recode using V LOOK UP : (Using Clean Data)
VLOOKUP([value], [range], [column number], [false or true])
FROM THIS TO THIS

VLOOKUP(O2,'Look Up Table'!$C$18:$D$21,2,FALSE)
DO A REFFERENCE TABLE ON NEXT
SHEET

Exact
Match
#11. Recode using V LOOK UP :Range (Using Clean Data)
FROM THIS DO A REFFERENCE TABLE ON NEXT
SHEET
TO THIS
#11. Recode using V LOOK UP (Using Clean Data)

Your Data in
Categorical

VLOOKUP([value], [range], [column number])


=VLOOKUP(G2, Look up Table(Lock with$),Column no 3 )
#12. Compute Using Calculation Function(Using
Clean Data)
Try to recode BMI to BMI Group using Vlookup
Calculate BMI from Weight and Height Given
BMI=Weight(kg)/Height(𝑚2 )
^=Power
• TQ

You might also like