Data Processing and Coding Tabulation and Data Presentation
Data Processing and Coding Tabulation and Data Presentation
9.1 INTRODUCTION
The survey data collected from the field should be processed and analyzed as
indicated in the research plan. Data processing primarily involves editing, coding,
classification and tabulation of data, so that it becomes amenable for data analysis.
This unit concentrates on various aspects of data processing. The processing of data
can either be in the form of tables or in the form of' graphs. These aspects have been
widely covered in unit 6, block 2 of the course on Quantitative Analysis for
Managerial Applications (MS-8). You are therefore advised to go through it before
reading this unit.
The inaccuracy of the survey data may be due to interviewer bias or cheating. One
way of spotting is to look for a common pattern of responses in the instrument of a
particular interviewer.
Apart from ensuring quality data this will also facilitate in coding and tabulation of
data. In fact, the editing involves a careful scrutiny of the completed questionnaires.
The editing can be done at two stages:
1. Field Editing, and
2. Central Editing.
Field Editing : The field editing consists of review of the reporting forms by the
investigator for completing or translating what the latter has written in abbreviated
form at the time of interviewing the respondent. This form of editing is necessary in
view of the writing of individuals, which vary from individual to individual and
sometimes difficult for the tabulator to understand. This sort of editing should be
done as soon as possible after the interview, as it may be necessary sometimes to
recall the memory. While doing so, care should be taken so that the investigator does
not correct the errors of omission by simply guessing what the respondent would
have answered if the question was put to him.
Central Editing: Central editing should be carried out when all the forms of
schedules have been completed and returned to the headquarters. This type of editing
requires that all the forms are thoroughly edited by a single person (editor) in a small
field study or a small group of persons in case of a large field study, The editor may
correct the obvious errors, such as an entry in a wrong place, entry recorded in daily
terms whereas it should have been recorded in weeks/months, etc. Sometimes,
inappropriate or missing replies can also be recorded by the editor by reviewing the
other information recorded in the schedule. If necessary, the respondent may be
contacted for clarification. All the incorrect replies, which are quite obvious, must be
deleted from the schedules.
The editor should be familiar with the instructions and the codes given to the
interviewers while editing. The new (corrected) entry made by the editor should be in
some distinctive form and they be initialed by the editor. The date of editing may also
be recorded on the schedule for any future references.
Activity 1
Define the following.
a) Field Editing.
............................................................................................................................
............................................................................................................................
............................................................................................................................
..........................................................................................................................
b) Central Editing.
............................................................................................................................
............................................................................................................................
............................................................................................................................
...........................................................................................................................
Activity 2
A marketing research organization is conducting a survey to determine the
consumption pattern of food items by households in Delhi. You are the head of
computer division responsible for editing the raw data from the questionnaires and
analyzing the same. A filled up set of questionnaires have been sent to you. List out
the points on which you would like to concentrate while editing the raw data.
17
Data Processing .........................................................................................................................................
and Analysis .........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
The coding is necessary for the efficient analysis of data. The coding decisions
should usually be taken at the designing stage of the questionnaire itself so that the
likely responses to questions are pre-coded. This simplifies computer tabulation of
the data for further analysis. It may be noted that any errors in coding should be
eliminated altogether or at least be reduced to the minimum possible level.
Coding for an open-ended question is more tedious than the closed ended question.
For a closed ended or structured question, the coding scheme is very simple and
designed prior to the field work. For example, consider the following question.
Male Female
We may assign a code of `0' to male and `1' to female respondent. These codes may
be specified prior to the field work and if the codes are written on all questions of a
questionnaire, it is said to be wholly precoded.
The same approach could also be used for coding numeric data that either are not be
coded into categories or have had their relevant categories specified. For example,
• What is your monthly income?
Here the respondent would indicate his monthly income which may be entered in the
relevant column. The same question may also be asked like this:
• What is your monthly income?
− < Rs. 5000
− Rs. 5000 - 8999
− Rs. 13000 – 12999
− Rs. 13000 or above.
We may code the class less than Rs.5000' as ,1', Rs. 5000 - 8999' as `2', `Rs. 9000 -
12999' as `3' and `Rs. 13000 or above' as `4'.
Coding of open-ended questions is a more complex task as the verbatism responses
of the respondents are recorded by the interviewer. In what categories should these
responses be put to? The researcher may select at random 60-70 of the responses to a
question and list them. After examining the list, a decision is taken to what categories
are appropriate to summarize the data and the coding scheme for categorized data as
discussed above is used-A word of caution-that while classifying the data into
various categories we should keep provision for "any other" to include responses
which may not fall into our designated categories.
It may be kept in mind that the response categories must be mutually exclusive and
18 collectively exhaustive.
A study was carried out among the readers of newspapers with the following Data Processing- Coding,
objectives. Tabulation and Data
Presentation
• To identify and understand the factors that determine the preference for Times
of India amongst the readers.
• To ascertain the expectations vs. perceptual reality and locate gaps if any
amongst the readers of Times of India.
• To analyze the factors responsible for the most preferred subjects of information
attracting the readers to prefer Times of India.
To achieve these objectives a questionnaire was designed. We give below a part of the
questionnaire, and discuss the coding scheme for the same. Please note that the
objective here is not to evaluate the questionnaire but to design the coding scheme for
any given questionnaire of a study. The said questionnaire is given below in Exhibit 1.
19
Data Processing
and Analysis
20
• Let us design the coding scheme for the questionnaire given in exhibit 1. We Data Processing- Coding,
note that question number 1 may have multiple responses because a respondent Tabulation and Data
could read one or more than one newspapers. There are 5 alternatives assigned Presentation
for question number 1 and therefore we will use five columns in the data matrix
to record the responses of this question. If the respondent reads Times of India
we code it a value 1 otherwise O. Similarly it is done for the remaining
newspapers. However, if there is a respondent who read Times of India and
Indian Express we will code question la and lc having a value of I and for the
remaining parts namely b, d and e the coded value will be 0.
• For question number 2, the respondent can choose only one of the four
alternatives. Therefore one single column is required to record the responses of
the respondents. The response categories are mutually exclusive and collectively
exhaustive. Whichever category is chosen by the respondent that is coded 1 and
the remaining are coded O.
• Question number 3 has seven parts and the respondent is to rate each one of
them on a 5-point scale ranging from 1 to 5. Therefore a total of seven column is
required to record the responses of the respondent. Suppose the respondent
rates-International News as 4 the value of 4 should be assigned to question
number 3b and so on.
• There are five attributes of Times of India mentioned in question number 4 and
the respondent is assigned the job of rating each of them on a scale of I to 5.
Therefore five columns are required to record the responses of this question.
Suppose for question 4c (Weekend Supplements) the rating of the respondent is
2, and the same will be shown in the coding book corresponding to this question.
• There are six features of Times of India mentioned in question number 5 and
labeled as 5a to 5f. The respondent is to rank them from 1 to 6 with regards to
the importance it gives to each of these features. Therefore we need six columns
for this. Suppose the rating is 2, 3, 6, 1, 4 & 5 for questions numbering 5a to 5f
respectively. The same numbers would appear on the coding sheet
corresponding to this question.
• Question number 6 is divided into five parts. For each of the part one separate
column is required. 6a indicates the age of the respondent which will be
indicated as per the data revealed by the respondent. Question 6b is concerning
the sex of the respondent. Here male respondents are coded as 1 whereas female
respondents are coded as O. Question 6c indicates the total number of members
in the household. Question 6d is concerned with the occupation of the
respondent, Question 6e mention the monthly income of the household put in
categorized form. Here the responses are mutually exclusive and collectively
exhaustive. If the respondent has a monthly income of less than 5000 rupees, the
response is coded as 1, if monthly income is between 5001-10000 rupees, it is
coded as 2, in case it is between 10001- 15000; the code is 3. From 15001-
20000; the code is 4, 20001-25000; the code is 5 and above 25000; the code is 6:
The above discussion can be shown below in the form of a code book.
21
Data Processing
and Analysis
The data matrix corresponding to the above coding scheme is shown in the table given
below:
22
Data Processing- Coding,
Tabulation and Data
Presentation
23
Data Processing The above table indicates that the respondent number I reads both Times of India and
and Analysis Indian Express and no other newspaper. This is indicated by code 1 corresponding to
question la and is and for the remaining parts of questions 1 a `0' is indicated. Question
number 2 indicates that the respondent is reading Times of India from 6 to 12 months.
The rating of various features of a newspaper in terms of the interest he has in them is
indicated by responses indicated in questions 3a to 3g. The respondent is not very
uninterested in critical news, interested in international news, not particular about city
news, very interested in corporate and business news, very uninterested in sports news
and interested in people and lifestyle news; and leisure art and entertainment news. The
respondent rates Times of India on five attributes. He can give a possible rating of Times
of India on various attributes on a scale of 1 to 5 where 1 is on extremely unfavorable side
whereas 5 represents extremely favourable side. He has rated Times of India on news
content as 4, editorial as 3, weekend supplements as 5, weekdays supplements as 3 and
layout as 5. However, his ranking of how various features are important to him on 1 to 7
scale, where 1 represents very important and 7 the least important is indicated in question
5. As per the respondent, classified advertisements are ranked the least, weekdays
supplements get a rank of 4, number of pages get a rank of 6, advertisement; a rank of 3,
news content; a rank of 1, weekend supplements; a rank of 2, and layout; a rank of 5. The
respondent is of 32 years of age and is a male as indicated by a code of 1 to question 6b.
There are four members in his household. His occupation is business and has a monthly
income between Rs. 10001 to 15000 as indicated by code 3 for question 6c.
Respondent 2 does not read Times of India. In fact the respondent is a reader of Hindustan
Times and no other newspaper and therefore the questions mentioned in questions numbering
6 are asked to the respondent. The respondent is 30 years of age, and a female as indicated by
code 0 for question 6b. The respondent has 3 family members, is a professional and have
monthly income between 15001 - 20000 rupees as indicated by code 4 corresponding to 6c.
Activity 3
Describe the characteristics of nth respondent as given in data matrix.
.........................................................................................................................................
.........................................................................................................................................
........................................................................................................................................
25
Data Processing two categories such as below 25 years of age and 25 and more years
and Analysis of age. We can further classify them as professionals at non-
professionals. This way one can keep on adding more attributes. This
is shown in Figure - 1. However, the addition of a particular attribute
(process of sub-classification) depends upon the basic purpose for
which the classification is required. The objectives of such a
classification has to be clearly spelt out.
Example: Following data refers to the sales of a company for the 40 quarters.
Tabulate the data using the inclusive method.
26
We will be using the data given above. We form five class intervals each of width Data Processing- Coding,
370. These are inclusive class intervals in the sense that the variable X could take any Tabulation and Data
value between the lower and upper limit in such a way that both ends of the interval Presentation
could be covered under this. The class intervals alongwith the number of items in
each class interval is shown in the table below:
Activity 4
Rsp Inc Rsp Inc Rsp Inc Rsp Inc Rsp Inc Rsp Inc
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
………………………………………………………………………………………… 27
Data Processing
and Analysis
9.5 STATISTICAL SERIES
A series is defined as a logical or systematic arrangement of observations or items.
Whet the attributes or things are counted, measured or weighed and arranged in an
orderly manner, say either descending or ascending order, they constitute a series.
When the statistical data pertains to time, the series is said to be historical or time
series. The important factor in such series is the chronology. In time series data, the
time difference between any two observations must be same. It could be hour,
minute, week, month, quarter, year etc. The data presented in following table on sales
forms a time series data
Year 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
When the data pertains to space, the series is referred to as special, and is also known
as geographical series. When the data refers to physical conditions such as height,
weight, age, etc., the series is referred to as condition series. The following series, for
example, refers to special series of regionwise sales of a firm during 1989-90.
Southern 85.00
Central 76.00
Western 163.00
Eastern 68.00
The series can also be classified as individual observations, discrete series and
continuous series. In case of series of individual observations, the items are listed
singly as distinguished from listing them in groups. Incase of discrete series, items
are arranged in groups (frequency distribution) showing definite breaks from one
point to another and are exactly measurable. The grades obtained by 70 students in
Marketing Research are given in the following frequency distribution table.`
In case of continuous series, the items are arranged in class and they can be arranged
either in ascending order or descending order of magnitude and their continuity is not
broken. At the point at which a class ends, the next begins and thus the continuity is
maintained. The distribution of the lifetime of 350 radio tubes is given below in the
form of a continuous series.
Life-Time No. of tubes Life-time No. of tubes
(in-hours) with lifetime (in hours) with life time
300400 6 700-800 62
400-500 18 800-900 22
500-600 73 900-1000 4
600-700 165
28
Activity 5 Data Processing- Coding,
Tabulation and Data
Collect the data on the salaries of the employees in your organisation and develop a
Presentation
continuous series for the data you have collected.
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
Activity 6
Stale what labels you will attach to the following series of data.
i) Density of population (per sq. Ian in different cities of India)
............................................................................................................................
ii) Number of defective items produced by machine per week
………………………………………………………………………………..
iii) Population of workers classified as male, female and further classification is
based on their being vegetarian or non-vegetarian.
……………………………………………………………………………….
.........................................................................................................................................
.........................................................................................................................................
.........................................................................................................................................
........................................................................................................................................
The above data is presented below in two-dimensional diagram. In this diagram the width
of the rectangles for family A to that of family B is taken in the proportion of 2: 3.
30
When the difference between two quantities is very large, one bar would become too Data Processing- Coding,
big and the other too small in a rectangular diagram. To overcome this difficulty, Tabulation and Data
squares are used to present the data. The size of the square is the square roots of the Presentation
given data.
31
Data Processing Monthly Expenditure on Various Commodities by Two Families (Figures in Rupees)
and Analysis
Type of Commodity Family A Family B
The various items are converted into corresponding degrees using the fact that the
sum total of degrees in the circle equals 360. The degrees for various items
corresponding to family A and B is given below:
The pie chart corresponding to the figures given in the above table is shown below
32
Data Processing- Coding,
Tabulation and Data
Presentation
Years 87-88 88-89 89-90 90-91 91-92 92-93 93-94 94-95 95-96 96-97
Sales 10.4 12.3 11.6 11.9 13.2 14.1 14.6 14.9 15.4 16.2
(Rs. lacs)
Although we can see changes in the data, the presentation of the same on a line chart
gives a better picture of the information. The other forms of presentation of the data
have already been discussed in Unit 6, Block 2 of Quantitative Analysis for
Managerial Applications (MS-8).
33
Data Processing
and Analysis
9.8 SUMMARY
In this unit various aspects of data processing namely editing, coding, classification and
presentation of data through tables and graphs have been discussed. Editing is of two type
namely field editing and central editing. Coding involves assigning of some symbols or
numerals or both to the answers of the questions in a questionnaire so that the responses
can be recorded in a limited number of classes or categories. This helps in analysis of data.
Designing of a coding scheme has been discussed with the help of a sample questionnaire.
Classification is the process of arranging data in groups or classes on the basis of certain
characteristics. It involves condensation of data which facilitates comparison and helps in
establishing relationships between variables. Classification can be according to attributes
or numerical characteristics. The former may be divided into simple and manifold
classification. The later is achieved using either inclusive or exclusive method of forming
frequency distribution. The data may be presented in the form of tables or graphs. The unit
discusses various characteristics which should be taken into consideration while forming a
table. The graphical presentation of data to be done by using pie chart, line chart,
histograms etc. Some of these have been covered in this unit.
Luck, David J. and Ronald S. Rubin, "Marketing Research " Prentice-Hall of India Pvt. Ltd.
(7th Edition).
Majumdar, Ramanuj "Marketing Research - Text, Applications and Case Studies " Wiley
Eastern Ltd. (1st Edition).
35