0% found this document useful (0 votes)
10 views39 pages

Econ 656 - Research Methods V - 2023

The document outlines the process of data processing and analysis in research methodology, emphasizing the importance of data preparation, including editing, coding, and handling missing or inconsistent data. It details various statistical analyses, including univariate, bivariate, and multivariate analyses, explaining how to summarize and interpret data in relation to research questions. Additionally, it discusses the use of software for data analysis and the significance of accurately presenting findings.

Uploaded by

Eyuel Ayele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views39 pages

Econ 656 - Research Methods V - 2023

The document outlines the process of data processing and analysis in research methodology, emphasizing the importance of data preparation, including editing, coding, and handling missing or inconsistent data. It details various statistical analyses, including univariate, bivariate, and multivariate analyses, explaining how to summarize and interpret data in relation to research questions. Additionally, it discusses the use of software for data analysis and the significance of accurately presenting findings.

Uploaded by

Eyuel Ayele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

Econ 656- Research Methodology and

Seminar in Economics

Part V
Data Processing and Analysis

1
Content of the Lecture
1. Introduction
2. Data Preparation and Processing
3. Analysis of Data
1. Univariate analysis
2. Bivariate analysis
3. Multivariate analysis

2
Introduction
 Once data is acquired you will need to use it to help you
address your research questions.
 For the data to be meaningfully used you need to:
 Ensure that the data is complete.
 Know your data - becoming familiar with what you have got.
 Organize your data .
 Analysis is the most rewarding part of your research project.
 There is a sense of relief, excitement and satisfaction that
your work is meaningful.

3
Introduction
 It is the process of working with the data to describe, discuss,
interpret, evaluate and explain it in terms of the research
questions or hypothesis.

 i.e., the computation of certain indices or measures and


searching for patterns of relationships.

 It ranges from very simple summary statistics to extremely


complex multivariate analyses.

4
Introduction
 Much of the quantitative data analysis is conducted using
software programs.
 So, the collected data must be converted into a machine-
readable, numeric format.
 Numerical data can be analyzed quantitatively using statistical
tools in two different ways.

 Descriptive analysis- statistically describing, aggregating, and


presenting the constructs of interest.
 Inferential analysis- the statistical testing of hypotheses
(theory testing).
5
Data Preparation and Processing

6
Data Preparation and Processing
 Data processing starts with editing, coding, classifying and
tabulating the collected data.

Editing: is the process of examining the collected raw data to


detect errors and omissions.
 It involves a careful scrutiny of the completed
questionnaire to assure that the data are:
 Accurate
 Consistent with other facts gathered
 Uniformly entered, etc.

7
Data Preparation and Processing
 Two levels of editing: field and central levels.

Field level Editing: after an interview, field workers should


review their reporting forms, complete what was
abbreviated, translate personal short hands, rewrite illegible
entries, and make callback if necessary.

Central editing: takes place when all forms have been completed
and returned to the office.
 Data editors correct obvious errors such as entry in
wrong place, recorded in wrong units, etc.

8
Data Preparation and Processing
 Checking questionnaires: Identifiers
 Each questionnaire or case needs a unique identifier.

 Sometimes this will be assigned prior to collecting the


data by numbering the questionnaires.

 If this has not been done, then an identifier should be


given to each data source.
 Example: all questionnaires from people in Addis
may start with ‘1’ (101, 102, 103, etc.) and those from
Bahir Dar with ‘2’ (201, 202, 203, etc.).
9
Data Preparation and Processing
 This will make it easier to sort out where questionnaires
have come from,
 it also allows analysis to be carried out on the two
sets of questionnaires separately.

 This information also enables the researcher to refer


back to his participants.
 (if, for example, he wishes to involve them in further
research, etc.

10
Data Preparation and Processing
 What to do with partial responses – missing responses
 If a questionnaire is only partially completed there may
be a number of reasons for this.

 May be the length of the questionnaire deterred your


participants from completing it, or
 They did not wish to answer a particular question or
section.
 For example, is a sensitive topic and decided not
to complete the questionnaire
11
Data Preparation and Processing
 What we do with the data depends on the number of missing
cases and the possible reasons for incompletion.
 You will need to decide whether to reject the incomplete
questionnaires or whether to include the partial
information.

 And you must make it clear when writing up and


discussing your findings that this is the case.

12
Data Preparation and Processing
 Inconsistent data
 Sometimes you will find that the information given by a
respondent within a questionnaire is inconsistent.

 This can be the case with both factual and value data.

 Example: a participant may give her/his date of birth


as 1990 but also record that s(he) has children born
in 2000.

13
Data Preparation and Processing
 This type of inconsistency could have occurred for a number
reasons.

 Either (or both) of the dates given may be incorrect or


may have been misread or mis-heard, or

 perhaps the participant has misunderstood the question


and recorded his brothers and sisters who live with him
as his children, or is referring to his stepchildren, etc.

14
Data Preparation and Processing
 As with missing information, you will need to consider
whether:
 the data can be checked in some way (by referring to other
questions) or by contacting the participant); and
 whether the data is useable in your analysis.

 Usually the default mode of handling missing values in some


software programs is to simply drop the entire observation
containing even a single missing value.
 But Such deletion can significantly shrink the sample size
and make it extremely difficult to detect small effects.
15
Data Preparation and Processing
 Some software programs allow the option of replacing
missing values with an estimated value via a process called
imputation.
 For instance, the imputed value could be the average of
the respondent’s responses to remaining items.

 But, such imputation can be biased if the missing value is of a


systematic nature rather than a random nature.
 Other procedures (multiple imputations)

16
Data Preparation and Processing
 Coding: Many data collection instruments include open
questions.
 i.e., questions that do not have a preset range of
answers.
 In order to be able to work with this data using statistical
analysis the data from open questions need to be coded.
 Coding refers to the process of assigning numerals to
answers so that responses can be put into a limited number
of categories or classes – coding sheet.

17
Data Preparation and Processing
 It is the process of converting data into numeric format.
 This enables you to enter the data quickly using the
numeric keypad on your keyboard and with fewer
errors.
 Coding is especially important for large complex studies
involving many variables and measurement items.

 A codebook which is a comprehensive document


containing detailed description of each variable would be
created.

18
Data Preparation and Processing
 The coding must be:
 Exhaustive - there must be a class for every data item.

 Mutually exclusive – category components should be


mutually exclusive i.e. specific answers can be placed
in one and only one cell in a given category set.
 Multiple codes
 Some questions can ask for more than one answer.
 In this case there is more than one variable attached to
the question.

19
Example Example
You can consider
each of the listed
foods as a variable
and code each
variable as 1 if it is
ticked, 2 if it is not
ticked.

You could then count


how many people
eat, for example,
cereal more than
twice a week.

20
Data Preparation and Processing
 Data entry: Coded data can be entered into a spreadsheet,
database, text file, or directly into a statistical program like
Stata or SPSS.

 Each observation can be entered as one row in the


spreadsheet and each measurement item can be represented
as one column.

 The entered data should be checked for accuracy, via


occasional spot checks on a set of items or observations,
during and after entry.

21
Analysis of quantitative data

22
Analysis of quantitative data
 Analysis is a process of summarizing, describing and
explaining the data in terms of the research questions or
hypothesis.

 So, analysis of data is more than simply summarizing and


tabulating the data that has been collected.
 As a researcher, you must act as an intermediary between
the data you have gathered and the people who will be
interested in what you have found out.

23
Analysis of quantitative data: Univariate analysis
 With respect to the number of variables three types of statistical
analysis could be considered:
 Univariate analysis: only one variable
 Bivariate analysis: two variables
 Multivariate analysis: more than two variables
 Univariate analysis refers to a set of statistical techniques that
can describe the general properties of one variable.

 Univariate statistics include: (1) frequency distribution, (2)


central tendency, and (3) dispersion.
 Examples: means, medians and modes, variances, and
percentiles.
24
Analysis of quantitative data: Univariate analysis
 Whatever statistical analysis you have in mind, you are likely
to begin by producing some frequency tables.
 For each variable you know how many of each answer or
code have been given.
 You can then take a look at the way in which the answers to
your questions are distributed, and
 identify potentially interesting distributions which you may
wish to explore further.

25
Analysis of quantitative data: Univariate analysis
 The distribution or the ‘shape’ of your data, can also be
depicted in the form of graphs and charts.

 Bar charts and histograms can help you to visualize the shape
or distribution of the values for each of your variables.
 Graphs are effective ways for summarizing your data and
helping you to identify interesting or anomalous features
within the data

 They help you to begin to explore relationships between


variables.
26
Example: Frequency distribution of religiosity
How many times a sample of respondents attend religious services

27
Analysis of quantitative data: Univariate analysis
 Measures of central tendency: a value typical for the data
 The mean, median and mode are methods of
summarizing the data relating to one variable.

 Measures of dispersion: measure the amount of variation in


the data
 similar measures of central tendency may come from
very different distributions
 Range, Variance and standard deviation
 variance is the average amount of variation around the
mean
28
Analysis of quantitative data: Bivariate analysis
 Bivariate Analysis is the analysis of two variables to
examine if they are correlated
 Bivariate analysis examines how two variables are
related to each other.

 Correlation can be shown by:


 Scatter plot/diagram: the values of the two variables are
plotted on X and Y axis
 strong relationships can be identified by scatter
diagrams

29
Scatter plot of a positive association
Income and livestock ownership

60
50
Livestock

40
30
20
10
0
0 200 400 600 800 1000 1200
Income

30
Scatter plot of a negative association
Income & illitracy rates (%)
Rate of illiteracry (%)

100
80
60
40
20
0
0 200 400 600 800 1000 1200
Income

31
Scatter plot of no association
Income and household size

12
10
hh size

8
6
4
2
0
0 200 400 600 800 1000 1200
income

32
Analysis of quantitative data: Bivariate analysis
Correlation analysis : The most common bivariate statistic is the
bivariate correlation which is a number between -1 and +1.

The correlation coefficient is numerical value reflecting strength of


relationship.

II I

Mean y

III IV

Mean x
33
Analysis of quantitative data: Multivariate analysis
 Multivariate analysis: the relationship between three or
more variables
 Some of the relationships identified in bivariate analysis can
be spurious - when there is no real relationship
 Analysis should control for the effects of additional variables

 Multiple regression analysis (econometrics) controls for


all important variables on which data are available

34
Analysis of quantitative data: Multivariate analysis
 General Linear Model: Most statistical procedures are derived
from a general family of statistical models called the general
linear model (GLM).
Yi = β0 + β1*X1 + β2*X2 + … + βn*Xn + εi
Yi = β0 + βi∑Xi+ εi

where X and Y are the independent and dependent variables


βi = coefficient parameters to be estimated; increase/decrease in Y
when X changes by one unit (controlling for other factors)
εi = random error term; difference between estimated values of Y
and real values of Y; and assumptions on εi
35
Two-variable linear model

36
Analysis of quantitative data: Multivariate analysis
 How are the parameters (βi) estimated?
 The widely used method is ordinary least squares (OLS)

 In least squares method the difference between the


expected values of Y from the regression and the real
values of Y is minimised = the error terms are minimised

 Other estimation methods are also available (MLE, GMM,


etc.)

37
Analysis of quantitative data: Multivariate analysis
 Various tests can be organized.

 Overall test (F-test): the null hypothesis for the overall test
is ‘all the coefficient of the regression are zero?’ (no
explanatory power)
Ho: β1 = β2= β3 = … = βn = 0

 Test for a single variable (t-test): Does a particular


independent variable adds significantly to the explanation?
Ho: βi = 0

38
Analysis of quantitative data: Multivariate analysis
 Several Econometric problems are also expected.
 Sample Selectivity
 Misspecification
 Omitted Variables
 Fixed Effects
 Endogenous Variables

 Appropriate tests and remedial measures need to be


considered for these problems.

39

You might also like