Chapter 4: Data Analysis and Summarization

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 14

CHAPTER 4: DATA ANALYSIS AND

SUMMARIZATION

What is statistics?
It is the science of making inference based on
sample data.
It includes collection, analysis and interpretation
of data
Population is the aggregate of all arbitrarily
defined sample units.
It is basically very large.

Sample is any subset (collection) of


units from a population of units

Example:
choosing 10 rich households from AA.
Selecting Econometrics over others.

Why sample?
Measuring all units is impractical, if not impossible.

Sampling saves money.

Sampling saves time.


A Parameter is a characteristics of the population or
constants that describe the population as a whole.
A statistics is a characteristics of the sample or
parameter of a sample (sample distribution).

Type of data/variable

Qualitative data
Quantitative data

Nominal categories
eg. gender,
race, religion,

Ordinal categories
strongly disagree,
disagree,
agree,
strongly agree

Discrete

Continuous

A variable is a characteristic that may vary from one


sample to the next.

Examples:

Age
Farm size
Household income
Students mark

Types of variables
Dependent variable:
not controlled but simply measured or registered

Independent variables:
are more or less controlled.

Independent

Vs Dependent

Intentionally
manipulated

Intentionally left alone

Vary at known rate

Vary at unknown rate

Cause

Effect

Example: What affects a bond demand?


Independent Variables
Income
Family size
Farm size
Livestock owned
Access to education
Dependent variable: demand for a bond

ESTIMATION
We estimate parameters such as
Means
Variances
Correlations

However, if the primary goal is to draw a


conclusion about a state of nature or the result
of an experiment, we are interested in statistical
testing.

Descriptive statistics
Primarily aimed to describe data.
But we are also interested in estimation parameters
such as
Percentage
Frequency
Measure of central tendency
Mean (Average )
Medium( middle number)
Mode( number most frequently appearing value)

x
n
Mean

Measure of dispersion/variability
It is a numeric representation of how much data
points differ from one another.
Variance: A measure of dispersion among
individual observations about their average
value
x x 2

n 1

Standard deviation
s

x x 2
n 1

Range: max- min


Coefficient of variation: it permits a comparison of
relative variability about means of different sizes
CV

s
100%
x

Standard error of the mean


It is a measure of variation among sample
means calculated from the same population.

It is used to determine required sample sizes for


a sampling effort.

SE

Example:
1
2
3
4

Mean age:
mean = (19 + 21 + 21 + 24 + 30 + 20) / 6
mean = 22.5
Variance of the mean:
s2 = 81.5 / 5
s2 = 16.3
Standard Deviation of the mean:
SD = 16.3
SD = 4.04
Coefficient of Variation:
CV = 4.04 / 22.5
SE = 0.18 or 18%

df = n-1= 5
n = 2.45

mean

(xmean)

(x-mean)2

19

22.5

-3.5

12.25

21

22.5

-1.5

2.25

21

22.5

-1.5

2.25

24

22.5

1.5

2.25

30

22.5

7.5

56.25

20

22.5

-2.5

6.25

81.5

N=6

You might also like