0% found this document useful (0 votes)
27 views34 pages

Data Prepration Presentation

Data preparation

Uploaded by

Harsh Raiyani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views34 pages

Data Prepration Presentation

Data preparation

Uploaded by

Harsh Raiyani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Data Preparation:

An Introduction to Getting Data


Ready for Analysis

TIMOTH Y SERVIN SKY


PROJECT MANAGER
C EN TER FOR SURVEY R ESEAR C H
Data Preparation Overview

Getting Started

Organizing Data

Cleaning Data

 Data Entry Verification


 Checking for Errors and Consistency
 Formatting for Analysis: Data Transformations
Getting Started
Bridging the Gap

Analysi
Research s
Methodolo
gy
What Is Data Preparation?

Creating and preparing a dataset to be


analyzed
Data can come from one or more data
sources
 Survey data (phone, web, mail)
 Data tracked by your organization (internal

reports)
 Customer/client databases

 Program data

 Quality control data

 Coded interviews
What Is Data Preparation?

Before you start analysis, your data should


be:
 Organized

 Consistently recorded
 Error-free

 Formatted for analysis


Allow for Ample Time

Data Preparation can take up 50% or more


of the time you dedicate to analysis

Rushing/skipping data preparation


Data errors 
Low confidence 
Starting analysis over
Document, Document, Document!

Data collection and compilation process


should be replicable
 Where did you get the data?
 How did you obtain it?

Document problems
 Collecting
 Recording
 Extracting
Document, Document, Document!

Methodology report
 Survey/Instrument/Experiment/Process Design
 Sampling procedures (if applicable)
 Response rates (if applicable)
 Limitations
Organizing Data
Compile Your Data

Start by figuring out all of the data


components you will need for analysis.
 What are the sources?

Do you have access to all data that you


need?
 How do you get access to data that you need?
 Track contact/retrieval information and date
expectations
 How much time do you need to build into your
timeline?
Create Codes

Code = A number or set of letters that


stands for something else
Question codes provide a way to
reference a question
Response codes provide a way to easily
record results or answers
Create Codes
Create Codes
Create a Codebook
Enter Data Into a Database

Two Methods:
 Export from existing database
 SurveyMonkey  Excel

 Data entry
 Paper survey data  Excel or SPSS
 Create a codebook
 Verification
Data Entry

Enter information into your database one


record at time
Use your codebook to determine what you
should enter into your database
 Statistics program (i.e., SPSS)
 Enter answer codes for data and define the codes
in the program (i.e., Male = 1; Female = 2)
 Excel or other spreadsheet program

 Enter answer labels into spreadsheet (i.e., Male


or Female)
Data Entry Considerations

Create a unique ID for each record

Illegible handwriting or unclear


markings
Missing data
 How much can you tolerate?

 Key questions?
Response Codes or Response Labels?
Data Cleaning
Data Entry Validation

GIGO: Garbage In, Garbage Out


Double user verification
Double entry verification
 Enter exact same data into two Excel tabs

 Use a formula on a third tab to check the first two

tabs
 http://

www.stattutorials.com/EXCEL/EXCEL_DOUBLE_DA
TA_ENTRY.html

Data entry software / SPSS (using Compare


Datasets)
Checking the Data: Ranges

Are all answers within the accepted


minimum and maximum values?
 Age values of 150 or 16
 Value of “7” on a 5-point scale
Checking the Data: Data Types

Is data formatted in the correct way?

 Age entered as thirty instead of 30


 Date entered 29-10-2015 instead of 10-29-
2015
Checking the Data: Data length

Do all of your data entries have the


correct number of digits or letters?
Examples

 Zip code with four digits instead of five or


nine
 Phone number with 11 digits instead of 10
Checking the Data: Fixing Errors

Can the data be clarified with

assistance?
 Trace data back to point of origin

 Review original
data/database/instrument/source
 Ifsomeone answered a survey, can you
contact that person for clarification?
Checking the Data: Fixing Errors

Is it reasonably correctable on your own?


 If valid values are 1 – 5 and you have “11”,
entering a “1” might be considered reasonable
 If the valid values are between 0 and 100 and
you have “232”, you can not make a reasonable
determination between 23 and 32.

Do not guess or choose a value at


random; make it a missing value
Checking the Data: Missing Data

Data can be missing for a variety of


reasons:
 Unanswered questions (forgot/declined to
answer, illegible handwriting)
 Data point was not applicable for a portion

of records
 Errors in recording data
 Manually removed because you lacked

confidence in the data


Checking the Data: Missing Data

Create a unique code for missing values

 Missing responses: -9998


 Not applicable fields: -9999
 Don’t know responses: 8888, 888, 88
 Declined to answer responses: 9999, 999, 99
Checking the Data: Missing Data

Statistical programs can remove your


missing data from analysis
Statistical Analysis in Excel

Can do statistical analysis in Excel


Excel 2010 and 2013 (Windows):
 File  Options  Add-Ins

 New Data Analysis option appears under the Data tab


Statistical Analysis in Excel
Data Transformations in
SPSS
Questions?
Thank you!
TIMOTH Y SERVIN SKY
PROJECT MANAGER
C EN TER FOR SURVEY R ESEAR C H
T J S 3 2 @ P S U. E D U
717-948-4312
H T T P S : / /C S R . H B G . P S U. E D U

You might also like