Descriptive Statistics: Frequency Distributions and Related Statistics
Descriptive Statistics: Frequency Distributions and Related Statistics
Continuous
(0-100] (100-200] (200-300]
(300-500] (500-1000]
Continuous frequency distribution
• Class interval length
i = x − x ''
i
'
i
• Usually same-length classes are used
• Constructed based on socio-economic context
Continuous frequency distribution
• Different class interval length
– if a classification system is used,
– if there are few observations in outlining intervals,
– if empty intervals appear.
Continuous frequency distribution
• Interval density – proportion of interval
frequency and length
Determining the length of intervals
• Sturges’ formula (n ≤ 100)
xmax − xmin
=
1 + 3,2 lg n
• Brex formula (n > 100)
xmax − xmin
=
5 lg n
Cumulative frequency distribution
• Used for analysis of total frequency up to the
class.
• The frequency of observations corresponding
to the value in the class or a lower value.
Charts
• Polygon
– Used for discrete frequency distributions
– Horizontal axis — classes, vertical axis — frequencies
• Histogram
– Used for continuous frequency distributions
– Horizontal axis — segments, that correspond to length
of classes, vertical axis — frequencies
• Cumulate
– Cumulative frequency distribution
– Horizontal axis — values of discrete or continuous
classes, vertical axis — accumulated frequencies
Polygon
(Scatter with lines)
Histogram
Analysis of a discrete frequency
distribution
Mean statistics
• Objective
• Abstract
• Describe the phenomenon as a
whole
Mean statistics
Measures of central tendency
Summary means
and
Structural (location) means
x=
x i
N
• Weighed arithmetic mean
x=
x f i i
f i
Arithmetic mean
𝒙𝒊
𝒙𝒊 𝒇𝒊
3
3 1
5
5 2
5
8 1
8
𝟑+𝟓+𝟓+𝟖 𝟑+𝟓∙𝟐+𝟖
ഥ
𝒙= ഥ
𝒙=
𝟒 𝟒
Structural means
• Median
• Quantiles
• Mode
Median
• Class, that separates an ordered frequency
distribution (ascending or descending) in two
equal parts (by frequencies).
𝒙𝒊 1 2 2 3 7 8 9
Me
Mode
• The most frequent observation or class.
• x with the largest f
• For continuous frequency distributions:
f Mo − f Mo−1
Mo = x0 + Mo
f Mo − f Mo−1 + f Mo − f Mo+1
Mode
• One mode — monomodal
• Two modes — bimodal
• Three or more modes — multimodal
Variance statistics
How much observed values differ
from the average and how
significant are this differences.
Variance statistics
• Variation range
• Variance and standard deviation
• Variance coefficient
Variation range
• Different between the largest and smallest
observation
Rv = xmax − xmin
• Takes rare, extreme values into account!
Variance and standard deviation
• Variance — mean quadratic deviation from the
arithmetic mean in quadratic measures
N
(x − x )
2
i
=
2 i =1
N
• Weighed variance
(x − x ) 2
fi
2
= i
f i
Variance and standard deviation
𝒙𝒊 𝒙𝒊 − 𝒙ഥ 𝟐 𝒙𝒊 𝒇𝒊 𝒙 𝟐 𝒇𝒊
𝒙𝒊 − ഥ
3 (3 - 5,25)2
3 1 (3 - 5,25)2
5 (5 - 5,25)2
5 2 (5 - 5,25)2·2
5 (5 - 5,25)2
8 1 (8 - 5,25)2
8 (8 - 5,25)2
Variance and standard deviation
• Standard deviation — mean quadratic deviation
from the arithmetic mean in the same measures
as xi
= 2
Variance application
• Dispersion comparison
• Inequality analysis
• Convergence analysis
• What is “normal”?
• etc.
Variance coefficient
• Relative level of variance (in percent)
V= 100
x
• Allows comparison of different objects with
different measures
Skewness and kurtosis statistics
Skewness
• Skewness statistics characterise the skew of
the symmetry relatively to the arithmetic
mean
Structural skewness
• An approximate statistic of skewness
x − Me
A=
x − Mo
• A = 0 → symmetric
• A < 3 → asymmetric
Skewness coefficient
• A more precise skewness statistic
m3
K3 =
• K3 > 0 → positive skew
3
m4
E= −3
4
• E > 0 → pointed at peak
• For normal distribution E = 0
Central moments
n
i
( x − x ) f i
k
mk = i =1
n
i =1
fi
Analysis of a continuous
frequency distribution
Arithmetic mean
• Simple arithmetic mean
x=
x i
N
• Weighed arithmetic mean
x=
x f i i
f i
Median
For continuous frequency distributions:
𝑛
σ𝑖=1 𝑓𝑖
σ 𝑀𝑒−1
− 𝑖=1 𝑓𝑖
2
𝑀𝑒 = 𝑥0 + Δ𝑀𝑒
𝑓𝑀𝑒
𝑥0 - start of the median interval
Δ𝑀𝑒 - length of the median interval
σ𝑛𝑖=1 𝑓𝑖 - total number of observations
σ𝑀𝑒−1
𝑖=1 𝑓𝑖 - number of observations until median
interval
𝑓𝑀𝑒 - number of observations in the median interval
Mode
For continuous frequency distributions:
f Mo − f Mo−1
Mo = x0 + Mo
f Mo − f Mo−1 + f Mo − f Mo+1
𝑥0 - start of the mode interval
Δ𝑀𝑜 - length of the median interval
𝑓𝑀𝑜 - number of observations in the mode interval
𝑓𝑀𝑜−1 - number of observations in the interval before the
mode interval
𝑓𝑀𝑜+1 - number of observations in the interval after the
mode interval
Variance and standard deviation
• Variance — mean quadratic deviation from the
arithmetic mean in quadratic measures
N
(x − x )
2
i
=
2 i =1
N
• Weighed variance
(x − x ) 2
fi
2
= i
f i
Analysis of a growth rate
Geometric mean
• Simple geometric mean
N
x0 = N xi
i =1
• Weighed geometric mean
n
n
i
f
x0 =
i fi
i =1 x
i =1
Example – growth rate vs. percent