0% found this document useful (0 votes)
21 views8 pages

Measures of Dispersion

This document discusses measures of dispersion, which characterize how varied observations in a data set are from each other. There are two general classifications of measures: measures of absolute dispersion that have the same units as the observations (e.g. range, interquartile range, standard deviation); and measures of relative dispersion that are unit-less (e.g. coefficient of variation). Specific measures discussed include range, interquartile range, variance, standard deviation, and how they are calculated for populations and samples. Transformations like adding or multiplying a constant to observations will impact the mean and standard deviation in predictable ways.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Measures of Dispersion

This document discusses measures of dispersion, which characterize how varied observations in a data set are from each other. There are two general classifications of measures: measures of absolute dispersion that have the same units as the observations (e.g. range, interquartile range, standard deviation); and measures of relative dispersion that are unit-less (e.g. coefficient of variation). Specific measures discussed include range, interquartile range, variance, standard deviation, and how they are calculated for populations and samples. Transformations like adding or multiplying a constant to observations will impact the mean and standard deviation in predictable ways.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

MEASURES OF DISPERSION  Sometimes

Measure of dispersion presented by stating


 Descriptive summary measure that the smallest and
helps us characterize the data set in largest values (i.e.
terms of how varied the observations 45-66kg)
are from each other  Simple, easy-to-
 A small value indicates that the compute, and easy-
observations are not too different to-understand
from each other; there is a  Fails to communicate
concentration or observations about any information
the center of the distribution about clustering of
 A large value indicates that the the values in the
observations are very different from middle of the
each other or they are widely spread distribution since it
out from the center uses only the
 The higher the value, the more extreme values (min
varied the observations are and max)
 Smallest possible value of a  An outlier can greatly
measure of dispersion should be 0 affect its value
o A zero measure should  Tends to be smaller
indicate the absence of for smaller collection
variation than for larger
o Impossible to have a collections
negative dispersion  Cannot be
o approximated from
frequency
distributions with an
open-ended class
(with no lower and
upper limit)
 Not tractable
mathematically
o Interquartile Range –
difference between the third
o Set B has a higher measure
and first quartiles of the data
of dispersion set
o The mean of Set A is more  IQR = Q3 – Q1
reliable (meaning if an  Reflects the
observation is repeatedly variability of the
randomly selected from the middle 50% of the
collection, its value is not too observations in the
different from the mean) array
 May be viewed as
General Classifications of Measures of the range for a
Dispersion trimmed data set
 Measures of Absolute Dispersion – wherein the smallest
has the same unit as the 25% and largest 25%
observations of observations have
o Examples: range, been removed. This
interquartile range, standard modified range
deviation addresses the
o Range – distance between weakness of the
the maximum value and the range’s sensitivity
minimum value towards outliers
 Range = maximum –  Could be 0 even if
minimum there is still some
variation among the
smallest 25% and  Squared difference
largest 25% of all of an observation
observations from the mean gives
 Depends on the us an idea on how
method used to close this
compute for the observation is to the
quartiles mean
 Example:  Unit of the variance
 is the square of the
units of measures in
the data set
 Variance is
not a
measure of
absolute
dispersion
 Often it is
o Population variance – mean desirable to
of the squared deviations return to the
between each observed original units
value and the mean of measure
 (sigma) and so it is
2 N 2 the standard
σ =Σ i=1( X i−μ) /N
deviation that
 X i = measure taken is presented
from the i th element  Measures of Relative Dispersion –
of population has no unit and is therefore useful in
 μ = population mean comparing the variability of one
 N = population size distribution with another distribution
 Compute for the o Example: coefficient of
mean variation
 Subtract population o Standard Deviation –
mean to the measure positive square root of the
taken variance
 Square and then  Population standard deviation for a
divide finite population with N elements,
 Is a paramater denoted by σ
Sample Variance

o N
o σ = ∑t =1 ¿ ¿ ¿
2
Σ Ni=1 ( X i− X̄ ) 2 ¿
 s= Sample standard deviation – for a sample
n−1
with n elements
 X i = measure taken


n
from the i th element o s= ∑ t=1 ¿ ¿ ¿
of population
¿
 X̄ = sample mean
 n = sample size
 Example:

 Is a statistic
o subtracted) = original
standard deviation)
o new mean will increase if c is
positive
o new mean will decrease if c
is negative
 (MULTIPLICATION / DIVISION) If
each observation of a set of data is
transformed by the multiplication (or
division) of a constant c to each
observation, the standard deviation
of the new set of data is equal to the
standard deviation of the original
data set multiplied (or divided) by |c|.
o New standard deviation
Computational Formula of the Variance (where each data is
 If the mean is a rounded figure, then multiplied / divided) is equal
the propagation of rounding errors is to the original standard
very fast when we use the deviation multiplied / divided
definitional formula to compute the by c)
variance. Can be avoided by using o
this formula
 Population Variance:
o σ 2=Σ Ni=1 X 2i −( Σi=1
N
X i)2 / N 2
o Do not cancel
 Sample Variance: o
2
2
nΣ n
i=1 X −( Σ
2
i
N
i=1 X i)
o s=
n ( n−1 )
o Do not cancel
 Example:
 o

Mathematical Properties of the Standard


Deviation
 ADDITION / SUBTRACTION If each
observation of a set data is
transformed by addition or
subtraction of a constant c to each
observation, the standard deviation Characteristics of the Standard Deviation
of the new data set is the same as
the standard deviation of the original  Uses every observation in its
data set computation
o new standard deviation  May be distorted by outliers because
(where each data is added / squaring large deviations from the
mean will give more weight to these  X̄ - sample mean
outliers o Example:
 Amenable to algebraic treatment o
 Always nonnegative – value of 0
implies the absence of variation
 Level of measurement must be at
least be interval for the standard
deviation to be interpretable

Comparing variation of two or more o Price is more varied since its


distributions CV is greater
o

o
NO
 Cannot use measures of absolute more stable during 2000-
dispersion for comparison if the units 2001 since its CV is lesser
are different and the means are very meaning less varied
different from each other
Z-SCORE
Z-score or standard score
 Indicates relative position of an
observation in the collection where
MEASURES OF RELATIVE
the observed value came from
DISPERSION  Used to compare two values from
Measure of relative dispersion
different collections that
o Differ with respect to X̄ or s
 Measures of dispersion that have no or both or
unit of measurement and are used to o Expressed in different units
compare the scatter of one  Used to identify possible outliers. As a
distribution with the scatter of rule, if the [standard score] > 3, then it
another distribution is marked as a possible outlier
 Not considered a measure of relative
Coefficient of variation (CV) dispersion
 When all the observations in a
 Measure of relative dispersion collection are standardized then the
o Population CV mean and standard deviation of this
σ collection of standard scores are 0 and
 x 100 % 1 respectively
μ  Measures how many standard
 Sigma – population deviations an observed value is above
standard deviation or below the mean
 Mu – population
mean Population Z-score
o Sample CV  X–μ/σ
 s / X̄ x 100% o μ – population mean
 s – sample standard
deviation
o σ – population standard  Opposite of Symmetric Distribution –
deviation if possible to divide the histogram at
the center into two identical halves,
Sample Z-score wherein each half is a mirror image
 X – X̄ / s of the other
o X̄ - sample mean o Examples of Symmetric
o s – sample standard deviation Distributions:
 can be positive, negative or zero
 positive z-score = number of
standard deviations an observation
is above the mean o
 negative z-score = number of  Types of Skewdness:
standard deviations an observation o (“POSITIVELY SKEWED)
is below the mean Skewed to the right
 zero z-score = the observation is distribution – if the
equal to the mean concentration of the values
 has no unit which makes it possible is at the left-end of the
to compare the z-scores computed distribution and the upper tail
using different collections of the distribution stretches
 Example: out more than the lower tail
o

o 

o (“NEGATIVELY SKEWED”)
o Skewed to the left
distribution - if the
concentration of the values
is at the right-end of the
distribution and the lower tail
of the distribution stretches
out more than the upper tail

MEASURES OF SKEWNESS
Measure of Skewness 
 Indicates whether the density of the
data set looks just the same to the
left as to the right of the center point
 Single value that indicates the
degree and direction of asymmetry

Pearson’s First and Second Coefficient of


Skewness
Importance of Detecting Skewness
 Presents a problem in the analysis
of data because it can adversely
affect the behavior of certain
summary measures
 Would be inappropriate to use these
procedures in the presence of
severe skewness
 At the onset, we are already able to 
detect skewness in order to prevent  Is a function of the mode
contamination of subsequent  First Coefficient becomes a problem
analysis or else will only end up with if the mode does not exist or the
spurious conclusions collection is too small so that the
mode is not a stable measure of
Interpreting Measure of Skewness central tendency
 Direction of Skewness  Second Coefficient was based on
o Sk = 0: symmetric Karl Pearson’s empirical derivation
o Sk > 0: positively skewed on the distance of the median and
o Sk < 0: negatively skewed the mean as compared to the
 Degree of Skewness distance of the mode and the mean
o The farther |Sk| is from 0, the o
more skewed the distribution

Relationship of the three measures of


central tendency for unimodal distributions
 Symmetric – mean=median=mode

Measure of Kurtosis
 Indicates the concentration of data
o around the peak, whether it is flat or
 POSITIVELY / Skewed to the right peaked
Normal Distribution
 Bell-shaped curve that is symmetric
about its mean
 Its tails approach the x-axis on both
sides but will never touch them
 The area below any normal curve is
o equal to 1
 NEGATIVELY / Skewed to the left
o o In order to achieve equal
variability, leptokurtic curve
must have thinner tails or
more observations on the
tails to compensate for the
shaper peak
o Attributed to the presence of
more observations that
moderately deviate from the
Types of Kurtosis – Karl Pearson introduced
mode as compared to the
the following terms to classify a unimodal
normal
distribution according to the shape of its
hump as compared to a normal distribution 
with the same variance
 Mesokurtic
o Hump is the same as the
normal curve
o Neither too flat nor too
peaked
 Leptokurtic
o Curve is more peaked about
the mean and the hump is
narrower than the normal Importance of describing kurtosis
curve with the same  Used to explain the type of variability
variance of a distribution (few observations
o Prefix “lepto” from Greek that highly deviate from the mode as
word “leptos” means small or opposed to many observations that
thin moderately deviate from the mode)
o The sharper peak implies a  Used to detect nonnormality since
higher concentration of many classical statistical procedures
values around the mode assume normality
compared to a normal
distribution of the same Population Coefficient of Kurtosis Based on
variance the Fourth Moment
o In order to achieve equal
variability, leptokurtic curve
must have thicker tails or
more observations on the
tails to compensate for the
shaper peak
o Attributed to the presence of 
more observations that
highly deviate from the mode
as compared to the normal 
 Platykurtic
o Curve is less peaked about
the mean and the hump is
flatter than the normal curve
with the same variance
o “platus” meaning flat or wide
o The flatter peak implies a
lower concentration of
values around the mode
compared to a normal
distribution of the same BOXPLOTS
variance Boxplot
 Convenient way to visualize the
basic summary measures on one
graph
 “box-and-whisker plot”
Features of the boxplot
 Location – measures of location
such as median and quartiles 
 Spread – variation of dataset
through lengths of boxes and
whiskers
 Symmetry – if the length of the
upper half of the boxplot has the
same length as the lowerhalf
 Extremes – lowest and highest
observations are the endpoints of
the boxplots 
 Outliers – data points that are
extrememly low and high relative to
the others

Constructing a Boxplot
 Compute median and quartiles
 Construct rectangle with one end at 
the first quartile and the other end at
the third quartile. Put a line across
the interior at the median
 Construct a rectangle by connecting
the lines for the first and third
quartiles 
 Compute the IQR (Q3 – Q1)
 Compute lower and upper fences
o Lower fence = Q1 – 1.5 IQR
o Upper fence = Q3 + 1.5 IQR
o Are considered outliers 
 Excluding outliers, identify two
values that are closest to the lower
and upper fence, respectively. Draw
a line, starting from these values up
to each side of the rectangle
 Plot each outlier at its corresponding 
value using an asterisk or an X mark

You might also like