0% found this document useful (0 votes)
36 views47 pages

4 Variability

PPT of Statistical Analysis for MPhil in Psychology
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views47 pages

4 Variability

PPT of Statistical Analysis for MPhil in Psychology
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Variability

Instructor: Dr. irum Naqvi


Central Tendency Vs.
Variability
 Central tendency describes the central point of the
distribution, and variability describes how the scores are
scattered around that central point. Together, central
tendency and variability are the two primary values that
are used to describe a distribution of scores.
 Variability provides a quantitative measure of the
differences between scores in a distribution and
describes the degree to which the scores are spread
out or clustered together.
 Variability serves both as a descriptive measure and as an
important component of most inferential statistics.
 As a descriptive statistic, variability measures the
degree to which the scores are spread out or clustered
together in a distribution.
 In the context of inferential statistics, variability
 In statistics, our goal is to measure the amount of
variability for a particular set of scores, a distribution.
In simple terms, if the scores in a distribution are all
the same, then there is no variability. If there are
small differences between scores, then the variability
is small, and if there are large differences between
scores, then the variability is large.

 When the population variability is small, all of the


scores are clustered close together and any
individual score or sample will necessarily provide a
good representation of the entire set.
 On the other hand, when variability is large and
scores are widely spread, it is easy for one or two
extreme scores to give a distorted picture of the
general population.
The purpose for measuring variability is to
obtain an objective measure of how the scores
are spread out in a distribution. In general, a
good measure of variability serves two
purposes:
1. Variability describes whether the scores are
clustered close together or are spread out
over a large distance. Usually, variability is
defined in terms of distance. It tells how
much distance to expect between one score
and another, or how much distance to expect
between an individual score and the mean.
2. Variability measures how well an individual
score (or group of scores) represents the
entire distribution. This aspect of variability is
Measuring Variability
Variability can be measured by following
three:
– the range
– the interquartile range
– the standard deviation/variance.

In each case, variability is determined by


measuring distance.
Range
The range is the total distance covered by
the distribution, from the highest score to the
lowest score (using the upper and lower real
limits of the range).
Range = Xmax - Xmin
 When the scores are measurements of a
continuous variable, the range can be defined as
the difference between the upper real limit (URL)
for the largest score (Xmax) and the lower real
limit (LRL) for the smallest score (Xmin).
Range = URL for X max - LRL for
Xmin
 Two Steps:
Step 1: Sort the numbers in order, from smallest
Exercise
Example question 1:
What is the range for the following set of
numbers? 10, 99, 87, 45, 67, 43, 45, 33, 21, 7, 65,
98?
Example question 2: What is the range of these
integers?
14, -12, 7, 0, -5, -8, 17, -11, 19
Example question 3: What is the range of the
following times? 2.7 hrs, 8.3 hrs, 3.5 hrs, 5.1 hrs,
4.9 hrs
Example question 4: You take 7 statistics tests
over the course of a semester. You score 94, 88,
73, 84, 91, 87, and 79. What is the range of your
scores?
The problem with using the range as a
measure of variability is that it is completely
determined by the two extreme values and
ignores the other scores in the distribution.
Thus, a distribution with one unusually large
(or small) score will have a large range even if
the other scores are all clustered close
together.
Because the range does not consider all the
scores in the distribution, it often does not
give an accurate description of the variability
for the entire distribution. For this reason, the
range is considered to be a crude and
unreliable measure of variability. Therefore,
in most situations, it does not matter which
1. Which of the following sets of scores has the greatest variability?
a. 2, 3, 7, 12
b. 13, 15, 16, 17
c. 24, 25, 26, 27
d. 42, 44, 45, 46

2. What is the range for the following set of scores? 3, 7, 9, 10, 12


a. 3 points
b. 4 or 5 points
c. 9 or 10 points
d. 12 points

3. How many scores in the distribution are used to compute the


range?
a. only 1
b. 2 1=a
c. 50% of them 2=c
d. all of the scores 3=b
Interquartile Range
Quartiles refer to the segment of any
distribution that’s ordered from low to
high into four equal parts.
The interquartile range (IQR) contains
the second and third quartiles, or the
middle half of your data set.
Whereas the range gives you the spread of
the whole data set, the interquartile range
gives you the range of the middle half of a
data set.
Calculate the interquartile
range
The interquartile range is found by
subtracting the Q1 value from the Q3
value:

IQR = Q3 –Q1
IQR = interquartile range
Q3 = 3rd quartile or 75th percentile
Q1 = 1st quartile or 25th percentile

Q1 is the value below which 25 percent of


the distribution lies, while Q3 is the value
below which 75 percent of the distribution
lies.
You can think of Q1 as the median of the
Exclusive method vs inclusive
method
The exclusive method excludes the median
(Q2) when identifying Q1 and Q3, while
the inclusive method includes the median in
identifying the quartiles.
The procedure for finding the median is
different depending on whether your data set
is odd- or even-numbered.
When you have an odd number of data
points, the median is the value in the middle
of your data set. You can choose between the
inclusive and exclusive method.
With an even number of data points, there
are two values in the middle, so the median is
While there is little consensus on the best
method for finding the interquartile range,
the exclusive interquartile range is always
larger than the inclusive interquartile
range.
The exclusive interquartile range may be
more appropriate for large samples, while
for small samples, the inclusive
interquartile range may be more
representative because it’s a narrower
range.
Steps for the exclusive method
To see how the exclusive method works by
hand, we’ll use two examples: one with an
even number of data points, and one with an
odd number.
Even-numbered data set We’ll walk through
four steps using a sample data set with 10
values.
Step 1: Order your values from low to high.
48, 52, 57, 64, 72, 76, 77, 81, 85, 88
Step 2: Locate the median, and then
separate the values below it from the
values above it.
72 + 76 /2 = 74
Step 3: Find Q1 and Q3.
48, 52, 57 (Q1), 64, 72……….. 76, 77, 81 (Q3),
85, 88

Step 4: Calculate the interquartile range.


IQR = Q3 – Q1 (81-57) = 24

For Odd-numbered data set


This time we’ll use a data set with 11 values.
Step 1: 48, 52, 57, 61, 64, 72, 76, 77, 81, 85, 88
Step 2: 72
Step 3: Q1 = 57, Q3 = 81
Step 4: 81- 57 = 24
Steps for the inclusive
method
Almost all of the steps for the inclusive and
exclusive method are identical. The
difference is in how the data set is
separated into two halves.
The inclusive method is sometimes
preferred for odd-numbered data sets
because it doesn’t ignore the median, a
real value in this type of data set.
Step 1: 48, 52, 57, 61, 64, 72, 76, 77, 81,
85, 88
Step 2: Step 2: Separate the list into two
halves, and include the median in both
halves. The median is included as the
highest value in the first half and the lowest
value in the second half.
Step 3: 48, 52, 57, 61, 64, 72…72, 76, 77,
81, 85, 88
Q1 = 57 + 61/ 2 = 59
Q3 = 77 + 81/2 = 79
Rule of Thumb
We can see from these examples that using
the inclusive method gives us a smaller
IQR. With the same data set, the exclusive
IQR is 24, and the inclusive IQR is 20.
EXERCISE

For the following scores, find the interquartile


range. Scores:
3, 4, 4, 1, 7, 3, 2, 6, 4, 2, 1, 6, 3, 4, 5, 2, 5, 4,
3, 4

Answer is 2
When is the interquartile range useful?

The interquartile range is an especially


useful measure of variability for skewed
distributions.
For these distributions, the median is the
best measure of central tendency because
it’s the value exactly in the middle when all
values are ordered from low to high.
Along with the median, the IQR can give
you an overview of where most of your
values lie and how clustered they are.
The IQR is also useful for datasets with
outliers. Because it’s based on the middle
half of the distribution, it’s less influenced
Standard Deviation and
Variance
The standard deviation is the most commonly
used and the most important measure of
variability.

Standard deviation uses the mean of the


distribution as a reference point and measures
variability by considering the distance
between each score and the mean.

In simple terms, the standard deviation


provides a measure of the standard, or
average distance from the mean, and
How to Calculate Standard
Deviation
Step 1: Deviation from Mean
The first step in finding the standard distance
from the mean is to determine the deviation, or
distance from the mean, for each individual
score. By definition, the deviation for each
score is the difference between the score and
the mean.
Deviation = X -
µ

For a distribution of scores with µ= 50, if your


score is X = 53, then your deviation score is X -
Notice that there are two parts to a
deviation score: the sign ( + or -) and the
number.
The sign (+ or -) tells the direction from the
mean—that is, whether the score is located
above (+) or below (-) the mean, and the
number gives the actual distance from the
mean.
For example, a deviation score of -5
corresponds to a score that is below the
mean by a distance of 5 points.
Step 2: Deviation
Score
Because our goal is to X X -µ

compute a measure of
8 +5
the standard distance
from the mean, you 1 -2
might be tempted to
calculate the mean of 3 0
the deviation scores.
To compute this mean, 0 -3

you first add up the


0 = ∑ (X - µ)
deviation scores and
then divide by N.
We start with the
following set of N = 4
Note that the deviation scores add up to
zero. This should not be surprising if you
remember that the mean serves as a
balance point for the distribution.

The total of the distances above the mean is


exactly equal to the total of the distances
below the mean . Thus, the total for the
positive deviations is exactly equal to the
total for the negative deviations, and the
complete set of deviations always adds up
to zero.
STEP 3: Mean Squared
Deviation
The average of the deviation scores will not
work as a measure of variability because it is
always zero. Clearly, this problem results from
the positive and negative values canceling
each other out.
The solution is to get rid of the signs (+ and -).
The standard procedure for accomplishing this
is to square each deviation score. Using the
squared values, you then compute the
average of the squared deviations, or the
mean squared deviation, which is called
variance.
Variance equals the mean of the squared
score Deviation Squared Mean of
X X -µ Deviation
(X - µ)2 squared
deviation
8 +5 25
38/4 = 9.5
1 -2 4

3 0 0

0 -3 9

∑ (X - µ) = 0 ∑(X - µ)2 =
38
Step 4: Standard Deviation
Standard deviation is the square root of
the variance and provides a measure of the
standard, or average distance from the
mean.

Standard deviation = √variance = √9.5 =


3.08

Technically, the standard deviation is the


square root of the average squared
deviation. Conceptually, however, the
standard deviation provides a measure of
EXERCISE
Find the variance and standard deviation of
the following scores on an exam: 92, 95,
85, 80, 75, 50
Solution
STEP 1: Find the mean of data = 79.5
STEP 2: Find the deviation of each score from
mean
STEP 3: Find Mean squared deviation by
taking sum of squares for each score value =
263.5
STEP 4: Square root of variance = 16.2
Measuring Variance and
Standard Deviation for a
Population
The Sum of Squared
Deviations (SS)

Variance = Mean squared deviation

= Sum of squared deviations


Number of scores
The value in the numerator of this
equation, the sum of the squared
deviations, is a basic component of
variability, and we will focus on it. To
simplify things, it is identified by the
notation SS (for sum of squared
deviations), and it generally is referred to
Two formulas to compute SS
First Formula is called definitional formula

Definitional formula: SS = ∑(X - µ)2


To find the sum of the squared deviations, the
formula instructs you to perform the following
sequence of calculations:
1. Find each deviation score (X - µ)
2. 2. Square each deviation score (X - µ)2
3. 3. Add the squared deviations.
The result is SS, the sum of the squared
deviations. The following example demonstrates
using this formula.
2nd Formula is called computational formula
when the mean is not a whole number, the
deviations all
contain decimals or fractions, and the
calculations become difficult.
In addition, calculations with decimal values
introduce the opportunity for rounding error,
which can make the result less accurate.
For these reasons, an alternative formula has
been developed for computing SS. The
alternative, known as the computational formula,
performs calculations with the scores (not the
deviations) and therefore minimizes the
Note that the two formulas produce exactly
the same value for SS.
Although the formulas look different, they are
in fact equivalent.
The definitional formula provides the most
direct representation of the concept of SS;
however, this formula can be awkward to use,
especially if the mean includes a fraction or
decimal value. If you have a small group of
scores and the mean is a whole number, then
the definitional formula is fine; otherwise the
computational formula is usually easier to use.
Final Formulas and Notation
With the definition and calculation of SS
behind you, the equations for variance and
standard deviation become relatively
simple. Remember that variance is defined
as the mean squared deviation.
The mean is the sum of the squared
deviations divided by N, so the equation for
the population variance is
Variance = SS/N
Standard deviation is the square root of
variance, so the equation for the population
 Population variance = s2 = SS/N
population standard deviation s = √s2 =
√SS/N
Measuring Variance and
Standard Deviation for a
Sample
The goal of inferential statistics is to use
the limited information from samples to
draw general conclusions about
populations.
The basic assumption of this process is that
samples should be representative of the
populations from which they come.
This assumption poses a special problem
for variability because samples consistently
tend to be less variable than their
populations.
General tendency is shown in Figure 4.6
reflects few extreme scores in the
population tend to make the population
variability relatively large.
However, these extreme values are unlikely
to be obtained when you are selecting a
sample, which means that the sample
variability is relatively small. The fact that a
sample tends to be less variable than its
population means that sample variability
gives a biased estimate of population
variability.
This bias is in the direction of
underestimating the population value rather
than being right on the mark and require
adjustment.
The calculations of variance and standard
deviation for a sample follow the same
steps that were used to find population
variance and standard deviation except the
changes in notation.

The changes in notation involve using M for


the sample mean instead of µ, and using n
(instead of N) for the number of scores.
Thus, the definitional formula for SS for a
sample is
To find the sum of the squared deviations
using the following three steps:

1. Find the deviation from the mean for each


score deviation = X - M
2. Square each deviation: squared deviation
= (X - M)2

3. Add the squared deviations: ∑(X - M)2


The value of SS also can be obtained using
a computational formula. Except for one
minor difference in notation (using n in
place of N), the computational formula for
SS is the same for a sample as it was for a
population

Computational formula: SS = ∑X2 - (∑X)2/n


To correct bias in sample variability the
adjustment in variance and standard
deviation is as follows
Sample variance = S2 = SS/n -1
This is the adjustment that is necessary to
correct for the bias in sample variability.
The effect of the adjustment is to increase
the value you will obtain.
Dividing by a smaller number (n - 1 instead
of n) produces a larger result and makes
sample variance an accurate and unbiased
estimator of population variance.
Exercise 1: Calculate the sample variance
and standard deviation for following scores
4, 6, 5, 11, 7, 9, 7, 3.

Exercise 2: Find the standard deviation of


the average temperatures recorded over a
five-day period last winter: 18, 22, 19, 25,
12

Exercise 3: Find the variance and standard


Exercise 1 = S2 = 6.86 and S = 2.62
deviation of the scores on the most recent
reading test: 7.7, 7.4, 7.3, and 7.9
Exercise 2 = S = 23.7 and S = 4.9
2

Exercise 3 = S2 = 0.076 and S = 0.275

You might also like