0% found this document useful (0 votes)
12 views35 pages

Chapter 7

Data Processing and Analysis

Uploaded by

Haile Girma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views35 pages

Chapter 7

Data Processing and Analysis

Uploaded by

Haile Girma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 7: Data Processing and Analysis

Introduction
 The goal of any research is to provide information out of raw data.
 The raw data after collection has to be processed and analyzed in
line with outline (plan) laid down for purpose at time of
developing research plan.
 Response on measurement instruments (words, check mark etc.)
conveys little information as such.
 The compiled data must be classified, processed, analyzed and
interpreted carefully before their complete meanings and
implications can be understood.
Cont’d
 Generally stages in data processing and analysis can be
summarized in chart as follow :
Editing

Coding processing

Classification and tabulation (data entry)

Data Analysis
Descriptive

Inferential Statistics Univariate Bivariate Multivariate


Cont’d
 There are two stages of data analysis, data processing and
analysis.
 Some authors do like to make difference between processing
and analysis.
 However we see separately these terms briefly
7.1. Data processing
 It implies editing, coding, classification and tabulation of
collected data so that they are amendable to analysis.
Editing: Is process of examining collected raw data to detect
errors and omission (extreme values) and to correct those when
possible
 It involves careful scrutiny of completed questionnaires or
schedules
Cont’d
 It is done to assure that data are:-
i. Accurate
ii. Consistent with other data gathered
iii. Uniformly entered
iv. As complete as possible
v. has been well organized to facilitate coding and tabulation
 Editing can be either field editing or central editing
Cont’d
Field editing: Consist of reviewing of reporting forms by
investigator for completing what has been written in abbreviation and/
or in illegible form at time of recording respondents’ response
 This sort of editing should be done as soon as possible after
interview or observation.
Central editing: It will take place at research office.
 Its objective is to correct errors such as entry in wrong place,
entry recorded in month
Coding: Refers to process of assigning numerical or other symbols
to answers so that responses can be put into limited number of
categories or classes.
 Such classes should be appropriate to research problem under
consideration.
Cont’d
 There must be class of every data items.
 They must be mutually exclusive (specific answer can be placed in one
and only one cell in given category set)
 Coding is necessary for efficient analysis and through it several replies
may be reduced to small number of classes, which contain critical
information required for analysis
E.g., Closed end question
1 [ ] Yes
2 [ ] No Or
Less than 200 [ ] 001
201- 699 [ ] 002
1500 and more [ ] 006
 Coding is used when researcher uses computer to analyze data otherwise
it can be avoided.
Cont’d
 Classification: Most research studies result in large volume of raw
data, which must be reduced into homogeneous group.
 Which means to classify raw data or arranging data in-groups or
classes on basis of common characteristics?
 Data Classification implies processes of arranging data in
groups or classes on basis of common characteristics.
 Data having common characteristics placed in one class and in this
way entire data get divided into number of groups or classes.
 Classification according to attributes: Data are classified on
basis of common characteristics, which can either be descriptive
(such as literacy, sex, honesty, etc.) or numerical (such as,
weight, age, height, income, expenditure, etc.).
Cont’d
 Descriptive characteristics refer to qualitative phenomenon,
which cannot be measured quantitatively: only their presence or
absence in individual item can be noticed.
 Data obtained this way on basis of certain attributes are known as
statistics of attributes and their classification is said to be
classification according to attributes.
 Classification according to class interval: Unlike descriptive
characteristics numerical characteristics refer to quantitative
phenomenon, which can be measured through some statistical
unit.
 Data relating to income, production, age, weighted, come under
this category.
 Such data are known as statistics of variables and are classified on
basis of class interval.
Cont’d
 Fore example, individuals whose incomes, say, are within 1001-
1500 Birr can form one group, those whose incomes within 500-
1000 Birr form another group and so on.
 In this way entire data may be divided into a number of groups or
classes or what are usually called, class interval.
 Each class-interval, thus, has upper as well as lower limit, which is
known as class limit.
 The difference between two class limits is class magnitude
 The number of items that fall in given class is known as frequency
of given class.
 All classes with their respective frequency are taken together and
put in form of table are describing as group frequency
distribution or simply frequency distribution
Cont’d
 Classification according to class intervals usually involves
following problems:-
1. How many classes should be there?
2. What should be their class size (magnitude)?
 The answer is left to skill and experience of researcher.
 However, objective should be to display data in such way as to
make it meaningful to analyst.
 Concerning class size, each group is expected to have equal size.
 Multiples of 2, 5 and 10 are generally preferred while determining
class size.
Cont’d
 Some statistician adopts the following formula:

Where, I = class size


 R = Range (i.e., difference between value of largest item and
smallest item among items to be grouped.
 N = Number of item to grouped
Some problems in processing
 Don’t know (DK) Responses: During data processing, researcher
often comes across some responses that are difficult to handle.
 Don’t know (DK) is one example of such responses.
 When DK response group is small, it is of little significance.
 But when it is relatively big, it becomes matter of major concern.
 How DK responses are to be dealt with by researcher?
 Prevention is best!
 The best way is to design better types of question.
 Good rapport (understanding) of interviews with respondents will
result in minimizing DK response.
Cont’d
 But what about DK responses that have already taken place?
 One way to tackle this issue is to estimate allocation of DK
answers from other data in questionnaire
 The other way is to keep DK responses as separate replay
category if DK response happens to be legitimate, otherwise we
should let reader make his own decision.
7.2. Data Analysis
 Data analysis is further transformation of processed data to
look for patterns and relations among data groups.
 By analysis we mean computation of certain indices or
measures along with searching for patterns or relationship that
exist among data groups.
 Analysis particularly in case of survey or experimental data
involves estimating values of unknown parameters of
population and testing of hypothesis for drawing inferences.
 Analysis can be categorized as :-
i. Descriptive Analysis
ii. Inferential (Statistical) Analysis
7.2.1. Descriptive analysis:
 Descriptive analysis is largely study of distribution of one
variable.
 Analysis begins for most projects with some form of descriptive
analysis to reduce data into summary format.
 Descriptive analysis refers to transformation of raw data into
form that will make them easy to understand and interpret.
 Descriptive response or observation is typically first form of
analysis.
 The calculation of averages, frequency distribution, and
percentage distribution is most common form of summarizing
data.
Cont’d
 The most common forms of describing processed data are:
i. Tabulation
ii. Percentage
iii. Measurements of central tendency
iv. Measurements of dispersion
v. Measurement of asymmetry
vi. Data transformation and index number
Cont’d
Tabulation: Refers to orderly arrangement of data in table or other
summary format.
 It presents responses or observations on question-by-question or
item-by-item basis and provides most basic form of information.
 It tells researcher how frequently each response occurs
 This starting point of analysis requires counting of responses or
observations for each of categories. E.g., Frequency tables
Need for tabulation:
 It conserves space and reduces explanatory and descriptive
statement to minimum
 It facilitate process of comparison
 It facilitate summation of items, detection of errors and omission
 It provide basis for various statistical computation,
Cont’d
Percentage: Whether data are tabulated by computer or by hand, it is
useful to have percentages and cumulative percentage.
 Table containing percentage and frequency distribution is easier to
interpret.
 Percentages are useful for comparing trend over time or among
categories
Cont’d
Measure of central tendency: Describing central tendency of
distribution with mean, median or mode is another basic form of
descriptive analysis.
 These measures are most useful when purpose is to identify typical
values of variable or most common characteristics of group.
 Measure of central tendency is also known as statistical average.
Mean, median and mode are most popular averages.
 Mean (arithmetic mean) is common measure of central tendency
 Mode is not commonly used but in such study like estimating
popular size of shoes it can be used
 Median is commonly used in estimating average of qualitative
phenomenon like estimating intelligence.
Cont’d
Measurement of dispersion: Is measurement of how value of item
scattered around true value of average.
 Average value fails to give any idea about dispersion of values of
item or variable around true value of average.
 After identifying typical value of variable researcher can measure
how value of item is scattered around true value of mean.
 It is measurement of how far is value of variable from average
value.
 It measures variation of value of item.
 Important measures of dispersion are:
1. Range: Measures difference between maximum and minimum
value of observed variable
Cont’d
2. Mean deviation: It is average dispersion of observation around
mean value:
3.Variance: It is mean deviation square :
 It measures sample variability.
Cont’d
 Measurement of asymmetry (skew-ness):
 When distribution of items is happen to be perfectly symmetrical,
we then have normal curve and relating distribution is normal
distribution.
 Such curve is perfectly bell shaped curve in which case value of
Mean = Median = Mode
 Under this condition skew-ness is altogether absent.
 If curve is distorted (whether on right or left side), we have
asymmetric distribution this indicates that there is skew ness.
Cont’d
Cont’d
 If curve is skewed on right side we call it positive skewness
Positively skewed data

 Z is mean, M is median and X is mode


 In such case Z > M > X
Cont’d
 But when curve is skewed toward left, we call it negative skew
ness.
Negatively skewed data

And X M Z
Where X is mean, M is median and Z is mode
Cont’d
 Skew-ness is, thus measurement of asymmetry and shows
manner in which items are clustered around average.
 In symmetric (normal distribution) items show perfect balance
on either side of mode, but in skewed distribution balance is
skewed one side or distorted.
 The amount by which balance exceeds on one side measures
skew-ness.
Cont’d
 Knowledge about shape of distribution is crucial to use statistical
measure in research analysis, Since most method make specific
assumption about nature of distribution.
 Data transformation: It is process of changing original form of
data to form that is more suitable to perform data analysis that
will achieve research objective.
 The researcher often modifies value of scalar data or even create
new variable
 Index numbers: Most of the time, financial information (price,
value of output, interest rate, and exchange rate) will be
adjusted for possible price changes by using index numbers (like
CPI, PPI).
Cont’d
 An index number is a number, which is used to measure level of
given phenomenon at some standard date.
i. Index numbers measures only relative changes.
ii. Different indices serve different purpose
iii. Commodity index serves as measure of changes in phenomenon
on that commodity only
iv. Some index numbers are used to measure cost of living (CPI)
v. In economic sphere they are often termed as economic barometer
Cont’d
 Scores of observation are recalibrated so that they may be
related to certain base period or base number.
 Most commonly used index number to reduce influence of price
change on our observation is CPI
 Researcher also uses index numbers to make comparison
between observations.
 When series (data) are expressed in same units, we can use,
averages for purpose of comparison.
 But two or more series are expressed in different units;
statistical average cannot be used to compare them.
 By converting numbers in to index number we can make
comparison between two or more series.
Inferential Analysis
 Most researcher wishes to go beyond simple tabulation of
frequency distribution and calculation of averages and / or
dispersion.
 They frequently conduct and seek to determine relationship
between variables and test statistical significance.
 When population is consisting of more than one variable it is
possible to measure relationship between them.
 If we have data on two variables we said to have bivariate
variable, if data is more than two variables then population is
known as multivariate population.
 If for every measure of variable, X, we have corresponding value
of variable, Y, resulting pairs of value are called bivariate
population
Cont’d
 In case of bivariate or multivariate population, we often wish to know
relationship between two or more variables from data obtained.
 E.g., we may like to know, “Whether number of hours students devote
for study is somehow related to their family income, to age, to sex, or
to similar other factors.
 There are several methods of determining relationship between
variables.
 Two questions should be answered to determine relationship
between variables.
1. Is there exist association or correlation between two or more
variables? If yes, then up to what degree?
 This will be answered by use of correlation technique. Correlation
technique can be different
Cont’d
 In case of bivariate population correlation can be found using:-
i. Cross tabulation
ii. Karl Pearson’s coefficient of correlation: It is simple correlation
and commonly used
iii. Charles Spearman’s coefficient of correlation
 In case of multivariate population correlation can be studied
through:
i. Coefficient of multiple correlation
ii. Coefficient of partial correlation
Cont’d
2. Is there any cause and effect (causal relationship) between two
variables or between one variable on one side and two or more variables
on other side?
 This question can be answered by use of regression analysis.
 In regression analysis researcher tries to estimate or predict
average value of one variable on basis of value of other variable.
 For instance a researcher estimates average value score on statistics
knowing a student’s score on mathematics examination.
 There are different techniques of regression.
 In case of bivariate population cause and effect relationship can
be studied through simple regression.
 In case of multivariate population: Causal relationship can be
studied through multiple regression analysis.
Cont’d
 Time series Analysis; Successive observations of given
phenomenon over period of time are analyzed through time series
analysis.
 It measures relationship between variables and time (trend)
 Time series will measure seasonal (seasonal fluctuation), cyclical
irregular fluctuation, and Trend.
 The analysis of time series is done to understand dynamic
condition of achieving short term and long-term goal of
business firm for forecasting purpose
 The past trend can be used to evaluate success or failure of
management or any other policy.
 Based on past trend future patterns can be predicted and policy
may accordingly be formulated.
R 7
P T E
H A
F C
D O
EN

You might also like