Measures of Variation
(Dispersion)
Introduction
• Measures of central tendency provide only a partial
description of a quantitative data set. That means,
Knowledge of central tendency alone is not sufficient for
complete understanding of distribution.
Fore example, consider the distribution of yield of two
rice varieties from 5 plots each
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30
The mean yield of both varieties is 42 kg. However, this
value alone does not throw light on the composition of the
two data set, hence to supplement it we need a measure
which will tell us regarding the spread of the data.
Therefore, knowledge of the data's variability
along with its center can help us visualize the
shape of a data set as well as its extreme values.
Definition: The scatter or spread of items of a
distribution is known as dispersion or variation.
Objectives of measuring variation:
To describe dispersion (variability) in a data
To compare the spread in two or more distributions
To determine the reliability of an average
Absolute and relative measures
Absolute measures of dispersion:
• are expressed in the same unit of measurement in which the
original data are given.
• They are not suitable for comparing the variability of two
distributions which are expressed in different units of
measurement and different average size.
Relative measures dispersion:
• They are useful when
two sets of data are expressed in different units
the average sizes are very different
It is the ratio of a measure of absolute dispersion to an
appropriate measure of central tendency
It is a unitless measure.
Types of measures of variation
1. The Range and Relative range
Range = Max – Min
Range is the crudest absolute measures of variation
It is widely used in the construction of quality control
charts and description of daily temperature
RR = Range/(Max + Min)
Properties of range
• It is affected by extreme values
• It does not take into account all observations
• It is easy to calculate and simple to understand
The mean deviations
MD – is the average of the absolute deviations
taken from a central value, generally the mean or
median
x i x
x x fi
MD ( x ) i 1
i
n n
~
MD( x ) xi ~
x
i x fi
x ~
n n
Example: Calculate the mean deviation about
the mean and median for the students score
in a test: 6,7,7,10,10.
• From the mean is 1.6; and from the median
is 1.4.
Properties of mean deviation
• simple to understand and compare as
compared to standard deviation
• less affected by extreme values than
standard deviation
Variance and Standard Deviation
Population Variance Sample Variance
(x x)
n
N 2
(x m) 2
s
2 i 1
s 2 i1
N
(n 1)
( )
2
( x)
2
N n
x
i 1
N
x
n
x 2 i 1 2
n
i 1
i1 N
N (n 1)
s s 2
s s 2
Variance – is average of the squared deviations from
the mean
Standard deviation - is the square root of the variance
Example: The height of nine students was measured
in inches and the data is presented below.
Height(x): 69 66 67 69 64 63 65 68 72
Calculate the population variance and standard
deviations.
Variance = 7.11 inch2; S.D = 2.66 inch
• The variance and Standard deviation of a grouped
data is calculated by using th
Properties of Variance and Standard deviation
The unit of measurement of the variance is the
square of the unit of measurement of the
observed values.
It is based on all observations
Standard deviation is considered to be the
best measure of dispersion and is used widely
If the variance/standard deviation is large, the
data is more dispersed
The variance and standard deviation are used
quite often in inferential statistics
Coefficient of variation (CV)
it is the corresponding relative measure of
standard deviation
It is used to compare the variability of two or
more different groups
s
CV 100%
x
• Less coefficient of variation – is said to be less
variable or more consistent or more uniform
or more homogeneous.
The Standard Score (Z – score)
is a measure that describes the relative position of a
single value in the entire distribution of values
It also gives the number of standard deviations a
x x lie above or below the mean\
particular observation
Z
s
If Z is +ve – the observation lies above the mean
If Z is -ve – the observation lies below the mean
If Z is 0 – the observation is equal to the mean
Example: Two sections were given an exam in
a course. The average score was 72 with
standard deviation of 6 for section 1 and 85
with standard deviation of 5 for section 2.
Student A from section 1 scored 84 and
student B from section 2 scored 90. Who
performed better relative to his/her group?
Z-score for student A = 2.00
Z-score for student B = 1.00
Skewness and Kurtosis
Skewness
– Measure of asymmetry of a frequency
distribution
• Symmetric or unskewed
• Skewed to right
• Skewed to left
Symmetric
Skewed to right
Skewed to left
Kurtosis
–Measure of flatness or peakedness of
a frequency distribution
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)
Platykurtic - flat distribution
Mesokurtic - not too flat and not too peaked
Leptokurtic - peaked distribution
Elementary Probability
Introduction
Probability is:
A quantitative measure of uncertainty
A measure of the strength of belief in the
occurrence of an uncertain event
A measure of the degree of chance or likelihood of
occurrence of an uncertain event
Measured by a number between 0 and 1 (or
between 0% and 100%)
Definition of some basic concepts
Set - a collection of elements or objects of
interest
Empty set (denoted by )
a set containing no elements
Universal set (denoted by S)
a set containing all possible elements
Complement (Not). The complement of A is ( A)
a set containing all elements of S not in A
Intersection (And) ( A B)
– a set containing all elements in both A and B
Union (Or) ( A B)
– a set containing all elements in A or B or
both
Mutually exclusive or disjoint sets
–sets having no elements in common, having
no intersection, whose intersection is empty
set
Sets: A Intersecting with B
S
A
B
A B
Sets: A Union B
S
A
B
A B
Mutually Exclusive or Disjoint Sets
Sets have nothing in common
B
A
Random Experiment
• Process that leads to one of several possible
outcomes *, e.g.:
Coin toss
• Heads, Tails
Throw die
• 1, 2, 3, 4, 5, 6
Pick a card
AH, KH, QH, ...
Introduce a new product
• Each trial of an experiment has a single observed
outcome.
• The precise outcome of a random experiment is
unknown before a trial.
* Also called a basic outcome, elementary event, or simple event
Sample Space or Event Set
Set of all possible outcomes (universal set) for a given
experiment
E.g.: Roll a regular six-sided die
S = {1,2,3,4,5,6}
Event
Collection of outcomes having a common characteristic
E.g.: Even number
A = {2,4,6}
Event A occurs if an outcome in the set A occurs
Probability of an event
Sum of the probabilities of the outcomes of which it
consists
P(A) = P(2) + P(4) + P(6)
Equally-likely outcomes
• For example: Throw a die
• Six possible outcomes {1,2,3,4,5,6}
• If each is equally-likely, the probability of each is 1/6 = 0.1667 = 16.67%
• Probability of each equally-likely outcome is 1 divided by the number of
possible outcomes
Event A (even number)
• P(A) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 = 1/2
• 1 for e in A
P ( e)
n( S )
P ( A ) P ( e)
n( A ) 3 1
n( S ) 6 2
Pick a Card: Sample Space
Hearts Diamonds Clubs Spades
Union of A
K
A
K
A
K
A
K
Event „Ace‟
Events „Heart‟ Q Q Q Q
n ( Ace ) 4 1
and „Ace‟ J J J J
P ( Ace )
10 10 10 10
P ( Heart Ace ) 9 9 9 9 n(S ) 52 13
8 8 8 8
n ( Heart Ace ) 7 7 7 7
6 6 6 6
n(S ) 5 5 5 5
4 4 4 4
16 4 3 3 3 3
2 2 2 2
52 13
The intersection of the
events „Heart‟ and „Ace‟
Event „Heart‟
comprises the single point
n ( Heart ) 13 1
P ( Heart ) circled twice: the ace of hearts
n(S ) 52 4 n ( Heart Ace ) 1
P ( Heart Ace )
n(S ) 52
Counting rules
To assign probabilities for an event, the possible outcomes of a
random experiment should be counted. The following
principles helps to determine the number of possible
outcomes favoring a given event.
1. Addition Rule
If a task can be accomplished by k distinct procedures where
the ith procedure has ni alternatives, then the total number of
ways of accomplishing the task equals
n1 + n2+…+nk
Example: Suppose that a man wants to make a journey from
Addis Ababa to Djibouti. The following are the means of
transportation. Air transport: 2 flights; Vehicles: 4 alternatives;
Train: 2 alternatives. (The total alternatives are 2+4+2=8)
2. Multiplication Principle
If a choice consists of k steps of which the 1st can be
made in n1 ways, the 2nd can be made in n2 ways,…,
and the kth can be made in nk ways, then the whole
choice can be made in
n1.n2….nk ways
Example: If a test consists of 10 multiple choice
questions, with each permitting 4 possible answers,
how many ways are there in which a student gives
his/her answers?
4x4x4x…x4=410 ways
= 1, 048, 576 ways of completing the exam.
3. Permutation
It is the possible ordered selections of r objects out of
a total of n objects. The number of permutations of n
objects taken r at a time is denoted by nPr, where
n!
n Pr
( n r )!
The number of permutations of n objects taken all at
a time is geven by:
nPn = n!
In permutation order is important
• Example 5.8: Suppose that we have five letters a, b, c,
d.
What is the number of possible arrangements of these
letters taken all at a time?
4! = 4*3*2*1 = 24
What is the number of possible arrangements of these
letters if we use only three of the letters at a time?
4P3 = 24
4. Combination
• It is the possible selections of r items from a group of n items
regardless of the order of selection. The number of
combinations is denoted and is read as n choose r (nCr).
• The number of combinations of r out of n elements is:
• Order is not important
n n!
n Cr
r r!(n r )!
Example: How many different committees of 3 can be
formed from Tolosa, Bethelhiem, Kebede and Lensa?
4 possible number of committees
Example: From a group of 5 men and 7 women, how
many different committees consisting of 2 men and 3
women can be formed?
350 possible committees.
Probability of an event
The Axioms of Probability
Range of Values for P(A): 0 P( A) 1
Sample space (S): P(S) = 1
Complements - Probability of not A
P( A ) 1 P( A)
Intersection - Probability of both A and B
P( A B) n( A B)
n( S )
Mutually exclusive events (A and C) : P( A C) 0
• Union - Probability of A or B or both (rule of unions)
Mutually exclusive events: If A and B are mutually exclusive, then
n( A B)
P( A B) P( A) P( B) P( A B)
n( S )
P( A B) 0 so P( A B) P( A) P( B)
Types of Probability
Classical (Objective) Probability
based on equally-likely events
not based on personal beliefs
is the same for all observers (objective)
examples: toss a coin, throw a die, pick a card
The probability of an even A is:
n
P ( A)
N
Subjective Probability
based on personal beliefs, experiences, prejudices,
intuition, judgment
different for all observers (subjective)
examples: elections, new product introduction, snowfall
Example: From a group of 5 men and 7 women, it is required
to form a committee of 5 persons. If the selection is made
randomly,
what is the probability that 2 men and 3 women will be in the
committee?
350/792
what is the probability that all members of the committee will be
men? 1/792
what is the probability that at least three members will be women?
546/792
Example: Suppose that an office has 100 calculating machines.
Some of them use electric power (E) while others are manual
(M); and some machines are old brand (O) while others are new
brands (N). The table below gives numbers of machines in each
category.
Power
E M Total
O 40 30 70
Brand
N 20 10 30
Total 60 40 100
A person pick one of the machine randomly, calculate the
following probabilities:
a) The selected machine is new brand?
b) The selected brand is manual?
c) The selected brand is old and uses electric power?
d) The selected brand is old and uses electric power?
e) The selected brand operates manually and is new brand?
f) The selected brand is old or uses electric power?
g) The selected brand is old and uses electric power?
h) The selected brand uses electric power or is new brand?
Sampling and Sampling
Distributions
Introduction
• Sampling: is the technique of selecting representative
sample from the population
• Population: is the totality of elements or units under
study
• Sample: is the part of the population
• Sampling frame: A complete list of all the units of the
population
• Statistical Inference: On basis of sample statistics
Predict and forecast values of
population parameters... derived from limited and
Test hypotheses about values incomplete sample
of population parameters... information
Make decisions...
Make generalizations On the basis of
about the observations of a
characteristics of a sample, a part of a
population... population
Reasons for sampling
• Using sample saves time and cost
• It prevents destruction
• It provides higher level of accuracy
• It may be the only way of undertaking the study
Types of sampling
Probability (Random) And Non Probability (Non-random)
sampling
• Probability sampling: the selection of the sample is purely
based on chance
• Every unit of the population has a known nonzero probability
of to be included in the sample
• Includes: Simple random sampling, Stratified sampling, Cluster
sampling, and Systematic random sampling
• Simple random sampling: every unit of the population is given
an equal chance of being selected
• The sample can be drawn using lottery method or table of
random numbers
• Stratified sampling: in stratified sampling, the
population is partitioned into two or more
subpopulation called strata, and from each stratum a
desired sample size is selected at random.
• Cluster sampling: in cluster sampling, a random sample
of the strata is selected and then samples from these
selected strata are obtained.
• Systemic sampling: in systemic sampling, we start at a
random point in the sampling frame, and from this point
selected every kth, say, value in the frame to formulate
the sample.
• Non probability sampling: the sample is
not based on chance. It is based on
personal judgment
• It includes:
• quota
• judgment or purposive, and
• convenience sample
Hypothesis Testing
about the Mean
Introduction
• Statistical hypothesis is a statement/assumption/claim
about the true value of an unknown population parameter
• Every hypothesis implies its contradiction or alternative
• A hypothesis is either true or false, and you may fail to
reject it or you may reject it on the basis of information
Example:
• In sub-Saharan Africa 40% of individuals are leaving below
poverty line
• The industrial sector of our country is growing by 10%
• There is association between agriculture and industrial
sector development
Procedures for testing hypothesis
1. State the hypothesis
The Null Hypothesis (H0)
• is an assertion about one or more population parameters
• is the assertion we hold to be true until we have sufficient
statistical evidence to conclude otherwise
The Alternative Hypothesis (H1)
• is the assertion of all situations not covered by the null
hypothesis
• H0 and H1 are mutually exclusive
2. State the level of significance, α
• is the probability to wrongly reject the null
hypothesis when it is actually true
• It is specified by the statistician or the
researcher before the sample is drawn
• The most commonly used values of α are
0.10, 0.50 or 0.01
3. Calculate the appropriate test statistics
• is a value computed from a sample that is used to
determine whether the null hypothesis has to be
rejected or not
Cases in which the test statistic is Z
s is known and the population is normal.
s is known and the sample size is large (n>=30). (The
population need not be normal)
The formula for calculating Z is :
xm
z
s
n
Cases in which the test statistic is t
s is unknown but the sample standard deviation is known
and the population is normal
The formula for calculating t is :
xm
t
s
n
4. Decision rule
• The cut-off point to reject or not reject H0 depends
on:
• the level of significance α
• test statistic
• the form of the alternative hypothesis
• The value of the sample statistic that separates the
regions of acceptance and rejection is called critical
value
• Based on the form of the alternative hypothesis and
the test statistic we can make the following
decisions:
Rejection regions
5. Interpret the result
Errors in hypothesis testing
• A decision may be correct in two ways:
Fail to reject a true H0
Reject a false H0
• A decision may be incorrect in two ways:
Type I Error: Reject a true H0
• The Probability of a Type I error is denoted by .
• is called the level of significance
Type II Error: Fail to reject a false H0
• The Probability of a Type II error is denoted by .
Type I and Type II Errors
A contingency table illustrates the possible outcomes
of a statistical hypothesis test.
Examples (1)
The average time it takes a computer to perform a
certain task is believed to be 3.24 seconds. It was
decided to test the statistical hypothesis that the average
performance time of the task using the new algorithm is
the same, against the alternative that the average
performance time is no longer the same, at the 0.05
level of significance. Sample of size 200 was taken and
the sample mean and standard deviations are found to
be 3.48 and 2.8, respectively.
Examples (2)
A certain kind of packaged food bears the following statement on
the package: “Average net weight 12 oz.” Suppose that a consumer
group has been receiving complaints from users of the product who
believe that they are getting smaller quantities than the
manufacturer states on the package. The consumer group wants,
therefore, to test the hypothesis that the average net weight of the
product in question is 12 oz. A random sample of 144 packages of
the food product is collected, and it is found that the average net
weight in the sample is 11.8 oz. and the sample standard deviation
is 6 oz. Given these findings, is there evidence the manufacturer is
underfilling the packages?