0% found this document useful (0 votes)
40 views

Decriptive Part 3

Measures of variation describe how spread out or dispersed a data set is. Common measures include range, interquartile range, variance, and standard deviation. The range is the difference between maximum and minimum values. The interquartile range describes the middle 50% of data and is less influenced by outliers. Variance and standard deviation measure how far data points deviate from the mean, with standard deviation being the square root of variance. Standard deviation is a widely used measure of dispersion.

Uploaded by

Noor Fazana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Decriptive Part 3

Measures of variation describe how spread out or dispersed a data set is. Common measures include range, interquartile range, variance, and standard deviation. The range is the difference between maximum and minimum values. The interquartile range describes the middle 50% of data and is less influenced by outliers. Variance and standard deviation measure how far data points deviate from the mean, with standard deviation being the square root of variance. Standard deviation is a widely used measure of dispersion.

Uploaded by

Noor Fazana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Descriptive Statistics

Part 3 : Measures of Variation

1
Outline

• 2.1 Frequency Distributions and Their Graphs

• 2.2 Measures of Central Tendency

• 2.3 Measures of Variation/Variability

2
Section 2.3

Measures of Variation/ Variability

3
Measures of Variation (“Spread”)

Another important characteristic of quantitative


data is how much the data varies, or is spread
out.

The most common method of measuring spread:


1. Range
2. inter-quartile range
3. Standard deviation and Variance
4. Skewness – Will be discuss in detail in
NORMAL DISTRIBUTION
4
Range

Range
• The difference between the maximum and minimum
data entries in the set.
• The data must be quantitative.
• Range = (Max. data entry) – (Min. data entry)

5
Example: Finding the Range
The wait time to see a bank teller is studied at 2 banks.

Bank A has multiple lines, one for each teller.


Bank B has a single wait line for 1st available teller.

5 wait times (in minutes) are sampled from each bank:


Bank A: 5.2 6.2 7.5 8.4 9.2
Bank B: 6.6 6.8 7.5 7.7 7.9

Find the mean, median, and range for each bank.


Solution: Finding the Range

• Bank A: Range = ?
• Bank B: Range = ?

• Note: The range is easy to compute, but only uses 2


values. Do the following 2 sets vary the same?

 Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
 Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10

7
Inter-quartile range

• The inter-quartile range is a measure of spread or


dispersion. It is the difference between the 75th percentile
(often called Q3) and the 25th percentile (Q1). The
formula for inter-quartile range is therefore: Q3-Q1. It is
sometimes called the H-spread. Although not used
extensively, the inter-quartile range is a stable measure of
spread and perhaps should be in more common usage.
Quartiles
• Split Ordered Data into 4 Quarters

25% 25% 25% 25%

 Q1   Q2   Q3 
i  n  1
• Position of i-th Quartile  Qi  
4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
1 9  1 12  13
Position of Q1   2.5 Q1   12.5
4 2
• Q1 and Q3 Are Measures of Noncentral Location
• Q2 = Median, A Measure of Central Tendency
==Example==

i x[i]
1 102
2 104
3 105 ---- the first quartile, Q1 = 105
4 106
5 108
6 109 ---- the second quartile, Q2 or median = 109
7 110
8 112
9 115 ---- the third quartile, Q3 = 115
10 115
11 118

From this table, the '''interquartile range''' is 115 - 105 = 10.


Inter-quartile Range

 The inter-quartile range is the range for the


middle 50% of observations. That is the
distance from the third quartile (75th
percentile) to first quartile (25th percentile)
on a frequency distribution.
 Because the inter-quartile range is the
distance between the 25th and 75th
percentiles it is not sensitive to changes in
the extreme scores at either end of the
distribution.
Example

4 7 6 31 10 29 4 6 9 11 7 23
5 8 10 7 11 6 5 8 10 9 12 9 8

 Find the 1) Range


2) Inter-quartile range
Solution:
 Arrange in order :
 4
 4
 5
 5
 6 1. Range = High Score – Low Score
 6
 6 = 31 -4 = 27
 7 Q1
 7
 7
 8
 8
 8
 9
Median, Q2= 8
 9
 9 2) Find median: median (odd) :
 10
 10
=score 13th = 8
 10 Q3
 11
 11
Q1 = 6
 12 Q3 = 10.5
 23
 29
IQR = Q3-Q1= 10.5-6 = 4.5
 31
The inter-quartile range (IQR)
in particular is used to describe the dispersion
of the data.

The inter-quartile range (IQR) is defined as the range between


the first and the third quartile. Please note that the IQR
contains exactly 50 %of the data within the distribution.
Median, Quartiles, Deciles & Percentiles
• The Median is a value that subdivides the ordered data into two
halves.
• The Quartiles subdivide the data into quarters, the deciles
provide a subdivision into tenths, and the percentiles a
subdivision into hundredths.
• There are three quartiles: the lower quartiles, Q1, the
median(Q2), and the upper quartile, Q3.
• The percentiles are simply called the 1st percentile, the 2nd
percentile and so on.
• The median is the 5th decile and the 50th percentile.
• A study of the values of the deciles or quartiles gives us an idea
of the spread of the data, but an ‘ idea’ is all we get and there is
no need for great precicision
IN SUMMARY:

• MEDIAN
 (Data is divided into 2 parts)
• QUARTILE
 (Data is divided into 4 parts)
• DECILES
 (Data is divided into 10 parts)
• PERCENTILES
 (Data is divided into 100 parts)
Standard Deviation and Variance
Measures the typical amount data deviates from the
mean.

2
Sample Variance, s : (SAMPLE SIZE <30)

( x  x ) 2 ( x  x ) 2

• s 
2 OR s 
2

nn 1 n 1

Sample Standard Deviation, s:


( x  x ) 2 ( x  x ) 2

s s  2 OR s  s  2
• n 1
n n1
17
Finding Sample Variance & Standard Deviation
x
1. Find the mean of the sample x
data set. n

2. Find deviation of each entry. xx


3. Square each deviation. ( x  x )2
4. Add to get the sum of the ( x  x ) 2
deviations squared.
( x  x ) 2

5. Divide by n – 1 to get the s2 


sample variance (if sample n 1
size less than 30).
( x  x ) 2
6. Find the square root to get s
n 1
the sample standard
deviation. 18
Find the Standard Deviation and Variance
for Bank A (multi-line)
 x 36.5
x   7.3 min Wait time, Deviation: x – x Squares: (x – x)2
n 5 x (in min)
5.2 5.2 – 7.3 = -2.1 (–2.1)2 = 4.41
6.2 6.2 – 7.3 = ( )2 =
7.5 – 7.3 =
( x  x )2 7.5 ( )2 =
s 
2
 8.4 8.4 – 7.3 = ( )2 =
n 1 9.2 9.2 – 7.3 = ( )2 =
x  x  
2
 x  36.5 Σ(x – x) =

s  s2 
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Find the Standard Deviation and Variance
for Bank B (1 wait line)
 x 36.5
x   7.3 min Wait time, Deviation: x – x Squares: (x – x)2
n 5 x (in min)
6.6
6.8

 ( x  x ) 2 7.5
s2   7.7
n 1 7.9
x  x  
2
 x  36.5 Σ(x – x) =

s  s2 
• Round to one more decimal than the data.
• Don’t round until the end.

• Include the appropriate units.


Sample versus Population
Standard Deviation and Variance

Sample Population
Statistics: Parameters:

Mean x µ

Standard s σ
Deviation

Variance s2 σ2
Sample versus Population
Standard Deviation
Note: Unlike x and µ, the formulas for s and σ
are not mathematically the same:

Sample Standard Deviation

 ( x  x ) 2
• s  s2 
n 1
Population Standard Deviation

 ( x   ) 2

•    2

N
22
Standard Deviation: Key Points

 s0 ( When would s = 0 ?)

 The standard deviation is a measure of variation of all


values from the mean. The larger s is, the more the
data varies.
 The units of the standard deviation s are the same as
the units of the original data values. (The variance
has units2).

 The value of the standard deviation s can increase


dramatically with the inclusion of one or more
outliers (data values far away from all others)
Interpreting Standard Deviation

• Standard deviation is a measure of the typical amount


an entry deviates from the mean.
• The more the entries are spread out, the greater the
standard deviation.

24
The Empirical Rule

Empirical (68-95-99.7) Rule


For data sets having a symmetric distribution:
 About 68% of all values fall within 1 standard
deviation of the mean

 About 95% of all values fall within 2 standard


deviations of the mean

 About 99.7% of all values fall within 3 standard


deviations of the mean
The Empirical Rule
The Empirical Rule
The Empirical Rule
Example: Using the Empirical Rule

A sample of IQs has a symmetric distribution with a mean


of 100 and a standard deviation of 15.

1. Sketch the distribution.


2. 68% of people have an IQ between what 2 values?
3. What percent of people have an IQ between 70 and 130?
4. What percent of people have an IQ between 100 and 115?
5. What percent of people have an IQ above 145?

29
Summary

• The inter-quartile range is used in the conjunction with


the median to describe skewed distribution. It calculated
as one-half of the distance between the scores at the 25th
and 75th percentiles
• The variance is used with the conjunction of the mean to
describe symmetrical or normal distributions of interval
or ratio scores. It is the average of the squared deviations
of scores around the mean.
Summary

• The standard deviation is also used in conjunction


with the mean to describe symmetrical or normal
distribution of interval/ratio scores. It is the square
root of the variance. It can be thought of as the
“average” amount that scores deviate from the mean.
Summary

• Measures of variability describe how much the score


differ from each other, or how much the distribution
is spread out
• The range is a measure of variability based on the
difference between the highest score and the lowest
sacore

You might also like