Data Preparation - 2
Data Preparation - 2
Data Preparation
14-2
Check Questionnaire
Edit
Code
Transcribe
Clean Data
Questionnaire Checking
A questionnaire returned from the field may be
unacceptable for several reasons.
Parts of the questionnaire may be incomplete.
Editing
Treatment of Unsatisfactory Results
Returning to the Field – The questionnaires
with unsatisfactory responses may be returned to
the field, where the interviewers recontact the
respondents.
Assigning Missing Values – If returning the
questionnaires to the field is not feasible, the
editor may assign missing values to unsatisfactory
responses.
Discarding Unsatisfactory Respondents –
In this approach, the respondents with
unsatisfactory responses are simply discarded.
14-5
Coding
Coding means assigning a code, usually a number, to each
possible response to each question. The code includes an
indication of the column position (field) and data record it will
occupy.
Coding Questions
Fixed field codes, which mean that the number of records for
each respondent is the same and the same data appear in the
same column(s) for all respondents, are highly desirable.
If possible, standard codes should be used for missing data.
Coding of structured questions is relatively simple, since the
response options are predetermined.
In questions that permit a large number of responses, each
possible response option should be assigned a separate column.
14-6
Coding
Guidelines for coding unstructured questions:
Only a few (10% or less) of the responses should fall
possible.
Parking (2)
Display (3)
14-7
Codebook
A codebook contains coding instructions and the
necessary information about variables in the data
set. A codebook generally contains the following
information:
column number
record number
variable number
variable name
question number
instructions for coding
14-8
Data Transcription
Fig. 14.4
Raw Data
Computer Magnetic
Disks
Memory Tapes
Transcribed Data
Data Cleaning 14-9
Consistency Checks
Weighting
In weighting, each case or respondent in
the database is assigned a weight to reflect
its importance relative to other cases or
respondents.
Weighting is most widely used to make the
sample data more representative of a target
population on specific characteristics.
Yet another use of weighting is to adjust the
sample so that greater importance is attached
to respondents with certain characteristics.
14-12
Elementary School
0 to 7 years 2.49 4.23 1.70
8 years 1.26 2.19 1.74
High School
1 to 3 years 6.39 8.65 1.35
4 years 25.39 29.24 1.15
College
1 to 3 years 22.33 29.42 1.32
4 years 15.02 12.01 0.80
5 to 6 years 14.94 7.36 0.49
7 years or more 12.18 6.90 0.57
Dependence Interdependence
Technique Technique