Chapter 4: Data Analysis and Summarization
Chapter 4: Data Analysis and Summarization
Chapter 4: Data Analysis and Summarization
SUMMARIZATION
What is statistics?
It is the science of making inference based on
sample data.
It includes collection, analysis and interpretation
of data
Population is the aggregate of all arbitrarily
defined sample units.
It is basically very large.
Example:
choosing 10 rich households from AA.
Selecting Econometrics over others.
Why sample?
Measuring all units is impractical, if not impossible.
Type of data/variable
Qualitative data
Quantitative data
Nominal categories
eg. gender,
race, religion,
Ordinal categories
strongly disagree,
disagree,
agree,
strongly agree
Discrete
Continuous
Examples:
Age
Farm size
Household income
Students mark
Types of variables
Dependent variable:
not controlled but simply measured or registered
Independent variables:
are more or less controlled.
Independent
Vs Dependent
Intentionally
manipulated
Cause
Effect
ESTIMATION
We estimate parameters such as
Means
Variances
Correlations
Descriptive statistics
Primarily aimed to describe data.
But we are also interested in estimation parameters
such as
Percentage
Frequency
Measure of central tendency
Mean (Average )
Medium( middle number)
Mode( number most frequently appearing value)
x
n
Mean
Measure of dispersion/variability
It is a numeric representation of how much data
points differ from one another.
Variance: A measure of dispersion among
individual observations about their average
value
x x 2
n 1
Standard deviation
s
x x 2
n 1
s
100%
x
SE
Example:
1
2
3
4
Mean age:
mean = (19 + 21 + 21 + 24 + 30 + 20) / 6
mean = 22.5
Variance of the mean:
s2 = 81.5 / 5
s2 = 16.3
Standard Deviation of the mean:
SD = 16.3
SD = 4.04
Coefficient of Variation:
CV = 4.04 / 22.5
SE = 0.18 or 18%
df = n-1= 5
n = 2.45
mean
(xmean)
(x-mean)2
19
22.5
-3.5
12.25
21
22.5
-1.5
2.25
21
22.5
-1.5
2.25
24
22.5
1.5
2.25
30
22.5
7.5
56.25
20
22.5
-2.5
6.25
81.5
N=6