Unit - 8 Data Analysis
Unit - 8 Data Analysis
Once the field data begin to flow in, the attention is turned to data processing and
analysis. Data processing implies editing, coding, classifying and tabulation of the
collected data so that they are amenable to analysis.
EDITING:
The first step in data processing is to edit the raw data. Editing of data is the process of
examining the collected raw data to detect errors and omissions and to correct these when
possible. It involves a careful scrutiny of the completed questionnaires one edits to
assure that the data are:
Accurate
Concise with other information / facts gathered
Uniformly entered
As complete as possible
Arrange to facilitate coding and tabulation.
The editing can be done at two levels; on the field where the data is collected or in the
office.
CODING:
Coding refers to the process of assigning numerical or other symbols to answer so that
responses can be put into a limited number of categories or classes. By this several
answers are reduced to a few categories.
CLASSIFICATION:
TABULATION:
Tabulation is the process of summarizing raw data and displaying in the compact form (in
the form of statistical tables) for further analysis. It is an orderly arrangement of data in
columns and rows
1
Descriptive Vs Inferential statistics:
Descriptive Statistics: It is a type of statistics used to describe or summarize information
about population or a sample.
Inferential statistics: Statistics used to make inferences or judgment about a population
on the basis of sample information.
The Mean: it is the arithmetic average; it involves all observations and is affected by
extreme cases.
The Medium: It is the mid point / middle value. It does not involve all observations.
The Mode: The value that occurs most often is referred to as the modal value.
MEASURES OF DISPERSION:
Range: The difference between the smallest and the largest value of a frequency
distribution is known as the range of the distribution.
Deviation scores: A method of calculating how far way any observation is from the
mean is to calculate the individual deviation
d=x-x
2
Data analysis:
Tabulation: It refers to the orderly arrangement of data in a table or other summary
format. At the very beginning, we have to develop a master sheet, and transform all
information from each questionnaire on this master sheet. Counting the number of
responses to a question and putting them in a frequency distribution is tabulation.
RESPONSES FREQUENCY
YES 27
NO 93
TOTAL 100
3
Year Land (in hectares) Index Population size Index
(in millions)
1990 20000 100 3 100
1991 21000 105 3.1 103
1992 21500 107.5 3.2 107
1993 22000 110.0 3.3 110
1994 23000 115 3.3 110
1995 22000 110 3.4 113
1996 22500 112.5 3.5 117
1997 21500 107.5 3.6 117
1998 23000 115 3.7 120
Preference Rank
Place of Work 1 2nd 3rd 4th
st
5th
Mekalle 1 - 1 - 3
Bahirdar 2 1 - 2 -
Nazareth 1 2 - 1 1
Awassa 1 1 2 - 1
Jimma - 1 2 2 -
In this case we have to multiply number of respondents by the rank score and then
summarizing up the scores. The lowest total score show the first preference ranking.
Accordingly:
Mekalle: (1x1) + (1x3) + (3x5) = 19 Ranked 5th
Bahirdar: (2x1) + (1x2) + (2x4) = 12 Ranked 1st
Nazareth: (1x1) + (2x2) + (1x4) + (1x5) = 14 Ranked 2nd
Awassa: (1x1) + (1x2) + (2x3) + (1x5) = 14 Ranked 2nd
Jimma: (1x2) + (2x3) + (2x4) = 16 Ranked 4th
4
STATISTICAL ANALYSIS:
This requires choosing the appropriate statistical techniques. The choice of the method
of statistical analysis depends on:
1. The type of questions to be asked: i.e., is it to measure central tendency,
relationship between the variables, or category differences.
2. The number of variables:
Univariate data analysis: When a researcher generalizes from a sample
about one variable
Bivariate data analysis: When the desire is to explain the relationship
between two variables at a time
Multivariate data analysis: Is the simultaneous investigation of more
than two variables.
3. Scale measurement of data: There are four scale measurements of data as
follows:
a) Nominal Data: (scale) Data that fall into different categories. Where one cannot
array the category in any order of magnitude, no mathematical operations can be
conducted on this data Ex: about sex, religion, the chest number’s worn by the
athletes or the numbers on the t-shirts of the football players meant for their
identification.
b) Ordinal Data: (scale) Data that permits a ranking by order of magnitude but it is
not possible to determine how much one item is compared with another. Not
much of mathematical operations can be conducted on data in this type as well ex:
one might rank from 1 to 5 for five towns in which they might prefer to work, the
position of a athlete at the end of a race.
c) Interval Data: (scale) / ratio data – provides detailed information but not much
of mathematical operations as addition, subtraction, multiplication, division can
be worked out Ex: time of finishing a 100 meters race by the athletes in a
competition etc
d) Ratio Data: (scale) Provides detailed information and all the mathematical
operations can be conducted on this type of data. Ex: income, age, weight, height,
price, output etc.
PARAMETRIC Vs NON-PARAMETRIC:
Note:
Parametric analysis based on interval data and nonparametric analysis is based on
nominal or ordinal data
Parametric statistics: Statistical procedures that use interval scales or ratio scale and
assume population or sampling distribution is normal.
Non-parametric statistics: Statistical procedures that use nominal or ordinal data and
make no assumptions about the distribution of the population or sampling distribution.
Examples of selecting the appropriate statistical methods:
5
Scale of Problems Statistical questions to Possibilities of
measurement be answered statistical
significance
Interval or ratio Compare actual Z-test if sample size
Is the sample mean
scale and hypothetical is large ( n > 30)
significantly different
values of average t-test if sample size
from the hypothesized
is small (n<30)
population mean.
Ordinal Scale Compare actual Does the distribution The Chi – square
and expected differ from the expected test (2 )
values
Nominal Scale Identify sex of key Is the number of female The Chi – square
executives executives equal to the test (2 )
number of male
executives
Nominal scale For comparing Is there significant One way and two
more than two difference between more ANOVA (analysis
variables than two variables in of variance test)
terms of mean?