Data Preparation and Processing
Data Preparation and Processing
AND
PROCESSING
1
DATA PREPARATION
2
STEPS IN DATA PREPARATION
• Validate data
• Questionnaire checking
• Edit acceptable questionnaires
• Code the questionnaires
• Keypunch the data
• Clean the data set
• Statistically adjust the data
• Store the data set for analysis
• Analyse data 3
VALIDATION
• Validity exists when the data actually measure
what they are suppose to measure. If they fail
to, they are misleading and should not be
accepted.
• One of the most serious concerns is errors in
survey data.
• When secondary data are involved, they may
be ancient or unimportant.
• With primary data also, this review is
important.
4
QUESTIONNAIRE CHECKING
• A questionnaire returned from the field may be
unacceptable for several reasons.
– Parts of the questionnaire may be
incomplete. Inadequate answers. No
responses to specific questions
– The pattern of responses may indicate that
the respondent did not understand or follow
the instructions.
– Fictitious interviews
– Inconsistencies
– Illegible responses
Treatment of
Unsatisfactory
Responses
15
RESTAURANT PREFERENCE
19
KEYPUNCH THE
DATA / DATA
• TRANSCRIPTION
20
KEYPUNCH THE DATA / DATA TRANSCRIPTION
Raw Data
Computer Magnetic
Disks
Memory Tapes
Transcribed Data 21
DATA CLEANING
• Consistency Checks
- Consistency checks identify data that are out of
range, logically inconsistent, or have extreme
values.
- Computer packages like SPSS, SAS, EXCEL and
MINITAB can be programmed to identify out-
of- range values for each variable and print out
the respondent code, variable code, variable
name, record number, column number, and out-
of-range value.
- Extreme values should be closely examined.
22
DATA CLEANING
High School
1 to 3 years 6.39 8.65 1.35
4 years 25.39 29.24 1.15
College
1 to 3 years 22.33 29.42 1.32
4 years 15.02 12.01 0.80
5 to 6 years 14.94 7.36 0.49
7 years or more 12.18 6.90 0.57
• Variable Respecification
• Variable respecification involves the transformation of
data to create new variables or modify existing
variables.
• E.G., the researcher may create new variables that are
composites of several other variables.
• Dummy variables are used for respecifying categorical
variables. The general rule is that to respecify a
categorical variable with K categories, K-1 dummy
variables are needed
26
STATISTICALLY ADJUSTING THE DATA
Product Usage Original Dummy Variable Code
Category
Variable
Code X1 X2 X3
Nonusers 1 1 0 0
Light users 2 0 1 0
Medium users 3 0 0 1
Heavy users 4 0 0 0
Zi = (Xi -X )/sx
28
A CLASSIFICATION OF UNIVARIATE
TECHNIQUES
Univariate Techniques
Dependence Interdependence
Technique Technique