Measures of Dispersion
Measures of Dispersion
√
n
from the i th element o s= ∑ t=1 ¿ ¿ ¿
of population
¿
X̄ = sample mean
n = sample size
Example:
Is a statistic
o subtracted) = original
standard deviation)
o new mean will increase if c is
positive
o new mean will decrease if c
is negative
(MULTIPLICATION / DIVISION) If
each observation of a set of data is
transformed by the multiplication (or
division) of a constant c to each
observation, the standard deviation
of the new set of data is equal to the
standard deviation of the original
data set multiplied (or divided) by |c|.
o New standard deviation
Computational Formula of the Variance (where each data is
If the mean is a rounded figure, then multiplied / divided) is equal
the propagation of rounding errors is to the original standard
very fast when we use the deviation multiplied / divided
definitional formula to compute the by c)
variance. Can be avoided by using o
this formula
Population Variance:
o σ 2=Σ Ni=1 X 2i −( Σi=1
N
X i)2 / N 2
o Do not cancel
Sample Variance: o
2
2
nΣ n
i=1 X −( Σ
2
i
N
i=1 X i)
o s=
n ( n−1 )
o Do not cancel
Example:
o
o
NO
Cannot use measures of absolute more stable during 2000-
dispersion for comparison if the units 2001 since its CV is lesser
are different and the means are very meaning less varied
different from each other
Z-SCORE
Z-score or standard score
Indicates relative position of an
observation in the collection where
MEASURES OF RELATIVE
the observed value came from
DISPERSION Used to compare two values from
Measure of relative dispersion
different collections that
o Differ with respect to X̄ or s
Measures of dispersion that have no or both or
unit of measurement and are used to o Expressed in different units
compare the scatter of one Used to identify possible outliers. As a
distribution with the scatter of rule, if the [standard score] > 3, then it
another distribution is marked as a possible outlier
Not considered a measure of relative
Coefficient of variation (CV) dispersion
When all the observations in a
Measure of relative dispersion collection are standardized then the
o Population CV mean and standard deviation of this
σ collection of standard scores are 0 and
x 100 % 1 respectively
μ Measures how many standard
Sigma – population deviations an observed value is above
standard deviation or below the mean
Mu – population
mean Population Z-score
o Sample CV X–μ/σ
s / X̄ x 100% o μ – population mean
s – sample standard
deviation
o σ – population standard Opposite of Symmetric Distribution –
deviation if possible to divide the histogram at
the center into two identical halves,
Sample Z-score wherein each half is a mirror image
X – X̄ / s of the other
o X̄ - sample mean o Examples of Symmetric
o s – sample standard deviation Distributions:
can be positive, negative or zero
positive z-score = number of
standard deviations an observation
is above the mean o
negative z-score = number of Types of Skewdness:
standard deviations an observation o (“POSITIVELY SKEWED)
is below the mean Skewed to the right
zero z-score = the observation is distribution – if the
equal to the mean concentration of the values
has no unit which makes it possible is at the left-end of the
to compare the z-scores computed distribution and the upper tail
using different collections of the distribution stretches
Example: out more than the lower tail
o
o
o (“NEGATIVELY SKEWED”)
o Skewed to the left
distribution - if the
concentration of the values
is at the right-end of the
distribution and the lower tail
of the distribution stretches
out more than the upper tail
MEASURES OF SKEWNESS
Measure of Skewness
Indicates whether the density of the
data set looks just the same to the
left as to the right of the center point
Single value that indicates the
degree and direction of asymmetry
Measure of Kurtosis
Indicates the concentration of data
o around the peak, whether it is flat or
POSITIVELY / Skewed to the right peaked
Normal Distribution
Bell-shaped curve that is symmetric
about its mean
Its tails approach the x-axis on both
sides but will never touch them
The area below any normal curve is
o equal to 1
NEGATIVELY / Skewed to the left
o o In order to achieve equal
variability, leptokurtic curve
must have thinner tails or
more observations on the
tails to compensate for the
shaper peak
o Attributed to the presence of
more observations that
moderately deviate from the
Types of Kurtosis – Karl Pearson introduced
mode as compared to the
the following terms to classify a unimodal
normal
distribution according to the shape of its
hump as compared to a normal distribution
with the same variance
Mesokurtic
o Hump is the same as the
normal curve
o Neither too flat nor too
peaked
Leptokurtic
o Curve is more peaked about
the mean and the hump is
narrower than the normal Importance of describing kurtosis
curve with the same Used to explain the type of variability
variance of a distribution (few observations
o Prefix “lepto” from Greek that highly deviate from the mode as
word “leptos” means small or opposed to many observations that
thin moderately deviate from the mode)
o The sharper peak implies a Used to detect nonnormality since
higher concentration of many classical statistical procedures
values around the mode assume normality
compared to a normal
distribution of the same Population Coefficient of Kurtosis Based on
variance the Fourth Moment
o In order to achieve equal
variability, leptokurtic curve
must have thicker tails or
more observations on the
tails to compensate for the
shaper peak
o Attributed to the presence of
more observations that
highly deviate from the mode
as compared to the normal
Platykurtic
o Curve is less peaked about
the mean and the hump is
flatter than the normal curve
with the same variance
o “platus” meaning flat or wide
o The flatter peak implies a
lower concentration of
values around the mode
compared to a normal
distribution of the same BOXPLOTS
variance Boxplot
Convenient way to visualize the
basic summary measures on one
graph
“box-and-whisker plot”
Features of the boxplot
Location – measures of location
such as median and quartiles
Spread – variation of dataset
through lengths of boxes and
whiskers
Symmetry – if the length of the
upper half of the boxplot has the
same length as the lowerhalf
Extremes – lowest and highest
observations are the endpoints of
the boxplots
Outliers – data points that are
extrememly low and high relative to
the others
Constructing a Boxplot
Compute median and quartiles
Construct rectangle with one end at
the first quartile and the other end at
the third quartile. Put a line across
the interior at the median
Construct a rectangle by connecting
the lines for the first and third
quartiles
Compute the IQR (Q3 – Q1)
Compute lower and upper fences
o Lower fence = Q1 – 1.5 IQR
o Upper fence = Q3 + 1.5 IQR
o Are considered outliers
Excluding outliers, identify two
values that are closest to the lower
and upper fence, respectively. Draw
a line, starting from these values up
to each side of the rectangle
Plot each outlier at its corresponding
value using an asterisk or an X mark