0% found this document useful (0 votes)
17 views289 pages

Statistics

The document outlines the syllabus for MATH-361 Probability and Statistics, including the course textbook and grading criteria. It introduces key concepts in statistics, such as descriptive and inferential statistics, populations and samples, and types of variables. Additionally, it covers data representation methods including frequency distributions, pie charts, bar charts, and histograms.

Uploaded by

awaisahmed4420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views289 pages

Statistics

The document outlines the syllabus for MATH-361 Probability and Statistics, including the course textbook and grading criteria. It introduces key concepts in statistics, such as descriptive and inferential statistics, populations and samples, and types of variables. Additionally, it covers data representation methods including frequency distributions, pie charts, bar charts, and histograms.

Uploaded by

awaisahmed4420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 289

MATH-361 Probability and Statistics

Dr. Umer Saeed

() Introduction 1 / 178
Probability and Statistics
Spring Semester 2025

TEXT BOOK: Introduction to Probability and Statistics for


Engineers and Scientists by Sheldon M. Ross.
References Material :
Calculus with Analytic Geometry by Thomas
and Finny
Probability and Statistics by Murray R.Spiegel,
John J.Schiller, A.V.Srinivasan, Mike Levan.
GRADING :
Assignments 10%
Quizzes 10%
OHTs 30%
Final Exam 50%
() Introduction 2 / 178
Statistics is the science of collecting, organizing, analyzing and
interpreting data to assist in making decisions.

() Introduction 3 / 178
Statistics is the science of collecting, organizing, analyzing and
interpreting data to assist in making decisions.

Types of Statistics

() Introduction 3 / 178
Statistics is the science of collecting, organizing, analyzing and
interpreting data to assist in making decisions.

Types of Statistics

The study of statistics is usually divided into two categories:


Descriptive Statistics
Inferential Statistics

() Introduction 3 / 178
Population and Sample

A Population is the collection of all objects or measurements


that are of interest to the experiment.

Example

Suppose we wish to study the height of all students at university.


The population will be the collection of measured heights of
students in the university.

sample

The sample is a subset of data selected from population.

() Introduction 4 / 178
We introduce few more definitions and new terms in statistical
language.

Variable

A variable is a characteristics that changes or varies over the time


and/or for different individuals or objects under consideration.
Variable can be classified into one of two categories:
Qualitative Variable
Quantitative Variable

() Introduction 5 / 178
There are two types of quantitative variables:
Discrete Variable
Continuous Variable
Data: Values of a variable for one or more people or things yield
data.
Qualitative data : Values of a qualitative variable
Quantitative data: Values of a quantitative variable
Discrete data: Values of a discrete variable
Continuous data: Values of a continuous variable
Each individual piece of data is called an observation, and the
collection of all observations for a particular variable is called a
data set.

() Introduction 6 / 178
Data Representation

Organizing Qualitative Data

Frequency Distribution
There are several methods of organization and presentation of
observed data which facilitate its interpretation and evaluation.
Example: The responses of the 40 students in the class for their
political party are recorded as follows:

P N O N N N N N
P O N P O O N P
P N O P N N O N
P O P P P N O P
O N P N N N N P

() Introduction 7 / 178
A class frequency table is given as follows:
Class Frequency Cumulative Relative
intervals x f frequency frequency frel (x)
P 13 13 13/40=0.325
N 18 31 18/40=0.450
O 9 40 9/40=0.225
40 1

() Introduction 8 / 178
Pie Chart
A pie chart is a disk divided into wedge-shaped pieces
proportional to the relative frequencies of the qualitative data.

() Introduction 9 / 178
Pie Chart
A pie chart is a disk divided into wedge-shaped pieces
proportional to the relative frequencies of the qualitative data.
The main steps for the construction of pie chart are as follows:
Obtain a relative-frequency distribution of the data.
Divide a disk into wedge-shaped pieces proportional to the
relative frequencies.
Label the slices with the distinct values and their relative
frequencies.

() Introduction 9 / 178
Pie Chart
A pie chart is a disk divided into wedge-shaped pieces
proportional to the relative frequencies of the qualitative data.
The main steps for the construction of pie chart are as follows:
Obtain a relative-frequency distribution of the data.
Divide a disk into wedge-shaped pieces proportional to the
relative frequencies.
Label the slices with the distinct values and their relative
frequencies.

() Introduction 9 / 178
Bar Chart
A bar chart displays the distinct values of the qualitative data on
a horizontal axis and the relative frequencies (or frequencies or
percents) of those values on a vertical axis. The relative frequency
of each distinct value is represented by a vertical bar whose height
is equal to the relative frequency of that value. The bars should
be positioned so that they do not touch each other.

() Introduction 10 / 178
Bar Chart
A bar chart displays the distinct values of the qualitative data on
a horizontal axis and the relative frequencies (or frequencies or
percents) of those values on a vertical axis. The relative frequency
of each distinct value is represented by a vertical bar whose height
is equal to the relative frequency of that value. The bars should
be positioned so that they do not touch each other.
The main steps for the construction of bar chart are as follows:
Obtain a relative-frequency distribution of the data.
Draw a horizontal axis on which to place the bars and a
vertical axis on which to display the relative frequencies.
For each distinct value, construct a vertical bar whose height
equals the relative frequency of that value.
Label the bars with the distinct values, the horizontal axis
with the name of the variable, and the vertical axis with
Relative frequency.
() Introduction 10 / 178
Bar Charts

() Introduction 11 / 178
Exercise

The following table gives the number of deaths on British roads in


1987 for individuals in various classifications
A class frequency table is given as follows:
Classification Number of deaths
Pedestrians 1699
Bicyclists 280
Motorcyclists 650
Automobile drivers 1327

Find relative frequency, and express this data set in a pie and bar
chart.

() Introduction 12 / 178
Data Representation

Organizing Quantitative Data

There are three types of class grouping


Single value grouping
Limit grouping
Cutpoint grouping

() Introduction 13 / 178
Single value grouping

Number of TV sets in each of 50 randomly selected households


1 1 1 2 6 3 3 4 2 4
3 2 1 5 2 1 3 6 2 2
3 1 1 4 3 2 2 2 2 3
0 3 1 2 1 2 3 1 1 3
3 2 1 2 1 1 3 1 5 1

() Introduction 14 / 178
Single value grouping

Number of TV sets in each of 50 randomly selected households


1 1 1 2 6 3 3 4 2 4
3 2 1 5 2 1 3 6 2 2
3 1 1 4 3 2 2 2 2 3
0 3 1 2 1 2 3 1 1 3
3 2 1 2 1 1 3 1 5 1

() Introduction 14 / 178
Limit grouping

Example: Following are the marks obtained by 40 math students

70 64 99 55 64 89 87 65
62 38 67 70 60 69 78 39
75 56 71 51 99 68 95 86
57 53 47 50 55 81 80 98
51 36 63 66 85 79 83 70

Use limit grouping, with grouping by 10s, to organize these data


into frequency and relative frequency distributions.

() Introduction 15 / 178
Cutpoint grouping

Example: Following are the weights, in pounds, of 37 males aged


18-24 years

129.2 185.3 218.1 182.5 142.8 155.2 170.0


151.3 187.5 145.6 167.3 161.0 178.7 165.0
172.5 191.1 150.7 187.0 173.7 178.2 161.7
170.1 165.8 214.6 136.7 278.8 175.6 188.7
132.1 158.5 146.4 209.1 175.4 182.0 173.6
149.9 158.6

Usa a class width of 20 and a first cutpoint of 120.

() Introduction 16 / 178
Examples

Example: The following data represent the number of days of


sick leave taken by each of 50 workers of a given company over the
last 6 weeks:
2, 2, 0, 0, 5, 8, 3, 4, 1, 0,
0, 7, 1, 7, 1, 5, 4, 0, 4, 0,
1, 8, 9, 7, 0, 1, 7, 2, 5, 5,
4, 3, 3, 0, 0, 2, 5, 1, 3, 0,
1, 0, 2, 4, 5, 0, 5, 7, 5, 1
How many workers had at least 1 day of sick leave?
How many workers had between 3 and 5 days of sick leave?
How many workers had more than 5 days of sick leave?

() Introduction 17 / 178
Examples

Example: The 14 measurements of tensile strength of sheet of


steel are recorded as follows:
89 84 87 81 89 86 91
90 78 89 87 99 83 89

() Introduction 18 / 178
Example: The following table gives the speeds, in miles per
hour, over 14 mile for 35 cheetahs. Use cutpoint grouping with 52
as the first cutpoint and classes of equal width 2.

57.3 57.5 59.0 56.5 61.3 57.6 59.2


65.0 60.1 59.7 62.6 52.6 60.7 62.3
65.2 54.8 55.4 55.5 57.8 58.7 57.8
60.9 75.3 60.6 58.1 55.9 61.6 59.6
59.8 63.4 54.7 60.2 52.4 58.3 66.0

() Introduction 19 / 178
Simple Bar Chart
A simple bar chart consist of horizontal or vertical bar of equal
widths and lengths equal to value represented by frequency.
Example:Draw a simple bar diagram to represent the turnover of
a company for 5 years

() Introduction 20 / 178
Multiple Bar Chart

A multiple Bar chart shows two or more characteristics


corresponding to value of a common variable in the form of
grouped bars whose lengths are proportional to the value of the
characteristics and each bar is colored differently.
Example: Draw Multiple bar diagram to show area and
production of cotton from the following data

() Introduction 21 / 178
Multiple Bar Chart

() Introduction 22 / 178
Component Bar Chart

A component bar chart is an effective technique in which each bar


is divided into two or more sections proportional in size to
component part of total being displayed by each bar.
Example: Draw a component Bar chart of Population city wise

() Introduction 23 / 178
() Introduction 24 / 178
Class Boundary

() Introduction 25 / 178
Class Boundary

These numbers are used to separate the classes so that there are
no gaps in the frequency distribution.

() Introduction 26 / 178
Graphical Display for Quantitative Data

Histogram
A histogram displays the classes of the quantitative data on a
horizontal axis and the frequencies (relative frequencies, percents)
of those classes on a vertical axis. The frequency (relative
frequency, percent) of each class is represented by a vertical bar
whose height is equal to the frequency (relative frequency,
percent) of that class. The bars should be positioned so that they
touch each other.

() Introduction 27 / 178
Graphical Display for Quantitative Data

Histogram
A histogram displays the classes of the quantitative data on a
horizontal axis and the frequencies (relative frequencies, percents)
of those classes on a vertical axis. The frequency (relative
frequency, percent) of each class is represented by a vertical bar
whose height is equal to the frequency (relative frequency,
percent) of that class. The bars should be positioned so that they
touch each other.
For single-value grouping, we use the distinct values of the
observations to label the bars, with each such value centered
under its bar
For limit grouping or cutpoint grouping, we use the lower
class limits (or, equivalently, lower class cutpoints) to label
the bars.

() Introduction 27 / 178
Graphical Display for Quantitative Data

Histogram
A histogram displays the classes of the quantitative data on a
horizontal axis and the frequencies (relative frequencies, percents)
of those classes on a vertical axis. The frequency (relative
frequency, percent) of each class is represented by a vertical bar
whose height is equal to the frequency (relative frequency,
percent) of that class. The bars should be positioned so that they
touch each other.
For single-value grouping, we use the distinct values of the
observations to label the bars, with each such value centered
under its bar
For limit grouping or cutpoint grouping, we use the lower
class limits (or, equivalently, lower class cutpoints) to label
the bars.
Note: Some statisticians and technologies use class marks or
class midpoints centered under the bars.
() Introduction 27 / 178
Graphical Display for Quantitative Data

A histogram that uses frequencies on the vertical axis is called a


frequency histogram. Similarly, a histogram that uses relative
frequencies or percents on the vertical axis is called a
relative-frequency histogram or percent histogram, respectively.

() Introduction 28 / 178
Graphical Display for Quantitative Data

Examples

() Introduction 29 / 178
Graphical Display for Quantitative Data

Examples

() Introduction 30 / 178
Graphical Display for Quantitative Data

Examples

() Introduction 31 / 178
Graphical Display for Quantitative Data

Dotplots
Dotplots are particularly useful for showing the relative positions
of the data in a data set.

() Introduction 32 / 178
Graphical Display for Quantitative Data

Dotplots
Dotplots are particularly useful for showing the relative positions
of the data in a data set.
Prices, in dollar, of 16 DVD players

210 219 214 197


224 219 199 199
208 209 215 199
212 212 219 210

() Introduction 32 / 178
Graphical Display for Quantitative Data

Frequency Polygon

The frequency polygon is a graph that displays the data by using


lines that connect points plotted for the frequencies at the
midpoints of the classes. The frequencies are represented by the
heights of the points.

() Introduction 33 / 178
Graphical Display for Quantitative Data

Frequency Polygon

The frequency polygon is a graph that displays the data by using


lines that connect points plotted for the frequencies at the
midpoints of the classes. The frequencies are represented by the
heights of the points.
Example: These data represent the record high temperatures in
degrees Fahrenheit for each of the 50 states

112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

() Introduction 33 / 178
Graphical Display for Quantitative Data

Frequency Polygon

Class Class Frequency Cumulative Class


intervals x boundaries f frequency marks
100-104 99.5-104.5 2 2 102
105-109 104.5-109.5 8 10 107
110-114 109.5-114.5 18 28 112
115-119 114.5-119.5 13 41 117
120-124 120.5-124.5 7 48 122
125-129 124.5-129.5 1 49 127
130-134 129.5-134.5 1 50 132

() Introduction 34 / 178
Graphical Display for Quantitative Data

Frequency Polygon

() Introduction 35 / 178
Graphical Display for Quantitative Data

Cumulative Frequency Graph (Ogive)

Cumulative frequency
Less than 99.5 0
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50

() Introduction 36 / 178
Graphical Display for Quantitative Data

() Introduction 37 / 178
Graphical Display for Quantitative Data

Histogram for unequal class width

relative f requency of the class


rectangle height =
class width
The resulting rectangle heights are usually called densities, and
the vertical scale is the density scale.
This prescription will also work when class widths are equal.

() Introduction 38 / 178
Graphical Display for Quantitative Data

Consider the following 48 observations


Class frequency Relative frequency Density
2− < 4 9 0.1875 0.0938
4− < 6 15 0.3125 0.1563
6− < 8 5 0.1042 0.0521
8− < 12 9 0.1875 0.0469
12− < 20 8 0.1667 0.0208
20− < 30 2 0.0417 0.0042
48 1

() Introduction 39 / 178
Graphical Display for Quantitative Data

() Introduction 40 / 178
Graphical Display for Quantitative Data

Example

The frequency table gives information on the speeds (mph) of a


sample of drivers using a motorway. Construct a Histogram for
this data.
Class frequency Frequency Density
0-30 240 8
30-40 320 32
40-50 500 50
50-60 780 78
60-70 960 96
70-80 820 82
80-120 640 16

() Introduction 41 / 178
Graphical Display for Quantitative Data

relativef requency = (class width)(density)


= (rectangle width)(rectangle height)
= rectangle area

() Introduction 42 / 178
Measurement

Levels of measurement

There are four level of measurement


Nominal
Ordinal
Interval
Ratio

() Introduction 43 / 178
Measurement

Stem-and-Leaf Diagrams

In a stem-and-leaf display of quantitative data, each value is


divided into two portions a stem and a leaf.

() Introduction 44 / 178
Measurement

Stem-and-Leaf Diagrams

In a stem-and-leaf display of quantitative data, each value is


divided into two portions a stem and a leaf.
This diagram is often easier to construct than either a
frequency distribution or a histogram and generally displays
more information.
An advantage of a stem and leaf display over a frequency
distribution is that by preparing a stem and leaf display we
do not lose information on individual observations.
A very efficient way of displaying a small-to-moderate size
data set is to utilize a stem-and-leaf plot.

() Introduction 44 / 178
Measurement

Example1

The following are the scores of 30 college students on a statistics


test.
75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98
Construct a stem and leaf display

() Introduction 45 / 178
Measurement

Example2

The following data give the monthly rents paid by a sample of 30


households selected from a small town.
880 1081 721 1075 1023 775 1235 750 965 960
1210 985 1231 932 850 825 1000 915 1191 1035
1151 630 1175 952 1100 1140 750 1140 1370 1280

Construct a stem and leaf display for these data.

() Introduction 46 / 178
Measurement

Sometimes a data set may contain too many stems, with each
stem containing only a few leaves. In such cases, we may want to
condense the stem and leaf display by grouping the stems.

() Introduction 47 / 178
Measurement

Sometimes a data set may contain too many stems, with each
stem containing only a few leaves. In such cases, we may want to
condense the stem and leaf display by grouping the stems.

() Introduction 47 / 178
Measurement

Example3

The following stem and leaf display is prepared for the number of
hours that 25 students spent working on computers during the
past month.
0 | 6
1 | 179
2 | 26
3 | 2478
4 | 15699
5 | 368
6 | 24457
7 |
8 | 56

Prepare a new stem and leaf display by grouping the stems

() Introduction 48 / 178
Measurement

Some data sets produce stem-and-leaf displays that have a


small number of stems relative to the number of observations
in the data set and have too many leaves for each stem.

() Introduction 49 / 178
Measurement

Some data sets produce stem-and-leaf displays that have a


small number of stems relative to the number of observations
in the data set and have too many leaves for each stem.
In such a situation, we can create a stem-and-leaf display
with split stems.

() Introduction 49 / 178
Measurement

Some data sets produce stem-and-leaf displays that have a


small number of stems relative to the number of observations
in the data set and have too many leaves for each stem.
In such a situation, we can create a stem-and-leaf display
with split stems.
To do this, each stem is split into two or five parts.

() Introduction 49 / 178
Measurement

Some data sets produce stem-and-leaf displays that have a


small number of stems relative to the number of observations
in the data set and have too many leaves for each stem.
In such a situation, we can create a stem-and-leaf display
with split stems.
To do this, each stem is split into two or five parts.
Whenever the stems are split into two parts, any observation
having a leaf with a value of 0, 1, 2, 3, or 4 is placed in the
first split stem, while the leaves 5, 6, 7, 8, and 9 are placed in
the second split stem.

() Introduction 49 / 178
Measurement

Some data sets produce stem-and-leaf displays that have a


small number of stems relative to the number of observations
in the data set and have too many leaves for each stem.
In such a situation, we can create a stem-and-leaf display
with split stems.
To do this, each stem is split into two or five parts.
Whenever the stems are split into two parts, any observation
having a leaf with a value of 0, 1, 2, 3, or 4 is placed in the
first split stem, while the leaves 5, 6, 7, 8, and 9 are placed in
the second split stem.
Sometimes we can split a stem into five parts if there are too
many leaves for one stem. Whenever a stem is split into five
parts, leaves with values of 0 and 1 are placed next to the
first part of the split stem, leaves with values of 2 and 3 are
placed next to the second part of the split stem, and so on.
() Introduction 49 / 178
Measurement

Some data sets produce stem-and-leaf displays that have a


small number of stems relative to the number of observations
in the data set and have too many leaves for each stem.
In such a situation, we can create a stem-and-leaf display
with split stems.
To do this, each stem is split into two or five parts.
Whenever the stems are split into two parts, any observation
having a leaf with a value of 0, 1, 2, 3, or 4 is placed in the
first split stem, while the leaves 5, 6, 7, 8, and 9 are placed in
the second split stem.
Sometimes we can split a stem into five parts if there are too
many leaves for one stem. Whenever a stem is split into five
parts, leaves with values of 0 and 1 are placed next to the
first part of the split stem, leaves with values of 2 and 3 are
placed next to the second part of the split stem, and so on.
() Introduction 49 / 178
Measurement

Example4

Consider the following stem-and-leaf display, which has only two


stems. Using the split stem procedure, rewrite this stem-and-leaf
display.

3 | 1123334478999
4 | 0001111112222233667

() Introduction 50 / 178
Measures of Center

A measures of center is a value at the center or middle


of a data set.
A statistic is a characteristic or measure obtained by using
the data values from a sample.
A parameter is a characteristic or measure obtained by using
all the data values from a specific population.

() Introduction 51 / 178
Measures of Center

The Mean

The mean, also known as the arithmetic average, is the sum of the
values, divided by the total number of values. The symbol X̄
represents the sample mean.
P
X1 + X2 + X3 + · · · + Xn X
X̄ = =
n n
where n represents the total number of values in the sample. For
a population, the Greek letter µ (mu) is used for the mean
P
X1 + X 2 + X3 + · · · + XN X
µ= =
N N
where N represents the total number of values in the population

() Introduction 52 / 178
Measures of Center

Example: Days Off per Year

The data represent the number of days off per year for a sample of
individuals selected from nine different countries. Find the mean

20, 26, 40, 36, 23, 42, 35, 24, 30

Solution:
20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 276
X̄ = = = 30.7
9 9
Hence, the mean of the number of days off is 30.7 days

() Introduction 53 / 178
Measures of Center

Round-Off Rule for the Mean, Median, and


Midrange

Carry one more decimal place than is present in the original set of
values.

() Introduction 54 / 178
Measures of Center

The procedure for finding the mean for grouped data assumes
that the mean of all the raw data values in each class is equal
to the midpoint of the class.
In reality, this is not true, since the average of the raw data
values in each class usually will not be exactly equal to the
midpoint.
However, using this procedure will give an acceptable
approximation of the mean, since some values fall above the
midpoint and other values fall below the midpoint for each
class, and the midpoint represents an estimate of all values in
the class.

() Introduction 55 / 178
Measures of Center

Grouped Data
Example: The data represent the number of miles run during
one week for a sample of 20 runners

Class frequency f Midpoint X f ·X


5.5-10.5 1 8 8
10.5-15.5 2 13 26
15.5-20.5 3 18 54
20.5-25.5 5 23 115
25.5-30.5 4 28 112
30.5-35.5 3 33 99
35.5-40.5 2 38 76
n=20 490
P
f ·X 490
X̄ = P = = 24.5miles
f 20
() Introduction 56 / 178
Measures of Center

Combined Arithmetic Mean

If x1 , x2 , x3 , · · · , xk , be the arithmetic mean of k distributions


with respective frequencies n1 , n2 , n3 , · · · , nk , then the combined
arithmetic mean X c is defined by
n1 x1 + n2 x2 + n3 x3 + · · · + nk xk
Xc =
n1 + n2 + n3 + · · · + nk

() Introduction 57 / 178
Measures of Center

Example

The number of students in three section and the average marks


obtained by them in a paper of statistics in the ESE are as follows:

Section Average marks in statistics No. of students


A 75 50
B 60 60
C 50 50

Find the average marks obtained by the students of the three


sections taken together.

() Introduction 58 / 178
Measures of Center

Solution:
Here n1 = 50, n2 = 60, n3 = 50
x1 = 75, x2 = 60, x1 = 50

n1 x1 + n2 x2 + n3 x3 50(75) + 60(60) + 50(50)


Xc = = = 61.56
n1 + n2 + n3 50 + 60 + 50

() Introduction 59 / 178
Measures of Center

Change of Origin (Short Method)

If a certain constant say A is added to all the observations,


then we get a new set of observations
The observations can be decreased in size when a constant is
subtracted from all the observations.
This addition or subtraction of a constant is called change of
origin.
The constant may be called arbitrary origin.

() Introduction 60 / 178
Measures of Center

Example

The wages of 5 workers are Rs. 1950, 2000, 2050, 2060 and 2080.
Calculate the arithmetic mean by using the idea of change of
origin (short method)

() Introduction 61 / 178
Measures of Center

Example

Given below is the grouped data of wages of 500 workers of a


factory. Calculate the arithmetic mean by short method.
Groups Number of workers
1900-1950 60
1950-2000 180
2000-2050 185
2050-2100 65
2100-2150 10

() Introduction 62 / 178
Measures of Center

Geometric Mean

The geometric mean 0 G0 of n positive values x1 , x2 , x3 , · · · , xn is


defined as the nth root of their product. Thus, it is obtained by
multiplying together all the n values and then taking the nth root
of the product
1
G = (x1 · x2 · x3 · · · xn ) n

() Introduction 63 / 178
Measures of Center

Geometric Mean can be reduced into logarithmic


form

n
P
logxi
i=1 
G = Antilog
n
The logarithm of the geometric mean is equal to the arithmetic
mean of the logarithms of observations.

() Introduction 64 / 178
Measures of Center

Geometric Mean can be reduced into logarithmic


form

n
P
logxi
i=1 
G = Antilog
n
The logarithm of the geometric mean is equal to the arithmetic
mean of the logarithms of observations.

() Introduction 64 / 178
Measures of Center

Example
Calculate the geometric mean from the following observations:
9.7, 0.0009, 178.7, 0.874, and 1238.
Solution:
xi logxi
9.7 0.9868
0.0009 -3.0458
178.7 2.2521
0.874 -0.0585
1238 3.0927

5
P
logxi
i=1  3.2273 
G = Antilog = Antilog = 4.4208
n 5

() Introduction 65 / 178
Measures of Center

If the positive values x1 , x2 , x3 , · · · , xn occurs f1 , f2 , f3 , · · · , fn


times respectively, then geometric mean G is defined by

() Introduction 66 / 178
Measures of Center

If the positive values x1 , x2 , x3 , · · · , xn occurs f1 , f2 , f3 , · · · , fn


times respectively, then geometric mean G is defined by
1
G = (xf11 · xf22 · xf33 · · · xfnn ) N
where N = f1 + f2 + f3 + · · · + fn

() Introduction 66 / 178
Measures of Center

If the positive values x1 , x2 , x3 , · · · , xn occurs f1 , f2 , f3 , · · · , fn


times respectively, then geometric mean G is defined by
1
G = (xf11 · xf22 · xf33 · · · xfnn ) N
where N = f1 + f2 + f3 + · · · + fn
or n
P
fi logxi
i=1 
G = Antilog
N

() Introduction 66 / 178
Measures of Center

Example

Calculate the geometric mean for the following distribution


Weights Frequency
100-104 24
105-109 30
110-114 45
115-119 65
120-124 72
236

() Introduction 67 / 178
Measures of Center

Combined Geometric Mean

If G1 and G2 are geometric means of two component having n1


and n2 observations and Gc is the combined geometric mean of n
(n = n1 + n2 ) observations
1
Gc = (Gn1 1 · Gn2 2 ) n

() Introduction 68 / 178
Measures of Center

Harmonic Mean
Harmonic mean is quotient of ”number of the given values” and
”sum of the reciprocals of the given values”.
For Ungrouped Data
n
Harmonic mean = P
n
1
xi
i=1

For Grouped Data


n
P
fi
i=1
Harmonic mean = n
P fi
xi
i=1

() Introduction 69 / 178
Measures of Center

Example
Calculate the harmonic mean of the numbers: 13.2, 14.2, 14.8,
15.2 and 16.1
Solution:
The harmonic mean is calculated as below:
1
x x
13.2 0.0758
14.2 0.0704
14.8 0.0676
15.2 0.0658
16.1 0.0621
P 1
( x )=0.3417

5
Harmonic mean = = 14.63
0.3417
() Introduction 70 / 178
Measures of Center

Example: Grouped Data

Calculate the harmonic mean for the given data:

Marks f
30-39 2
40-49 3
50-59 11
60-69 20
70-79 32
80-89 25
90-99 7

() Introduction 71 / 178
Measures of Center

Solution
f
Marks x f x
30-39 34.5 2 0.0580
40-49 44.5 3 0.0674
50-59 54.5 11 0.2018
60-69 64.5 20 0.3101
70-79 74.5 32 0.4295
80-89 84.5 25 0.2959
90-99 94.5 7 0.0741
P P f
f = 100 ( x ) = 1.4368

n
P
fi
i=1 100
Harmonic mean = n = = 69.60
P fi 1.4368
xi
i=1

() Introduction 72 / 178
Measures of Center

Median
The median of a data set is the measure of center that is the
middle value when the original data values are arranged in order
of increasing (or decreasing) magnitude.

() Introduction 73 / 178
Measures of Center

Median
The median of a data set is the measure of center that is the
middle value when the original data values are arranged in order
of increasing (or decreasing) magnitude.
If the number of values is odd, the median is the number
located in the exact middle of the list.
n + 1 th
ordered value
2
If the number of values is even, the median is found by
computing the mean of the two middle numbers.
n th n th
average of and + 1 ordered values
2 2

() Introduction 73 / 178
Measures of Center

Example

Monitoring Lead in Air Listed below are measured amounts of


lead in the air. Find the median for this sample

5.40 1.10 0.42 0.73 0.48 1.10

Solution:

() Introduction 74 / 178
Measures of Center

Example

Monitoring Lead in Air Listed below are measured amounts of


lead in the air. Find the median for this sample

5.40 1.10 0.42 0.73 0.48 1.10

Solution:
First sort the values by arranging them in order:

0.42 0.48 0.73 1.10 1.10 5.40

() Introduction 74 / 178
Measures of Center

Example

Monitoring Lead in Air Listed below are measured amounts of


lead in the air. Find the median for this sample

5.40 1.10 0.42 0.73 0.48 1.10

Solution:
First sort the values by arranging them in order:

0.42 0.48 0.73 1.10 1.10 5.40

0.73 + 1.10
M edian = = 0.915
2

() Introduction 74 / 178
Measures of Center

Example

Repeat the preceding example after including the measurement of


0.66 recorded on another day. That is, find the median of these
lead measurements:

5.40 1.10 0.42 0.73 0.48 1.10 0.66

Solution:

() Introduction 75 / 178
Measures of Center

Example

Repeat the preceding example after including the measurement of


0.66 recorded on another day. That is, find the median of these
lead measurements:

5.40 1.10 0.42 0.73 0.48 1.10 0.66

Solution:
First sort the values by arranging them in order:

0.42 0.48 0.66 0.73 1.10 1.10 5.40

() Introduction 75 / 178
Measures of Center

Example

Repeat the preceding example after including the measurement of


0.66 recorded on another day. That is, find the median of these
lead measurements:

5.40 1.10 0.42 0.73 0.48 1.10 0.66

Solution:
First sort the values by arranging them in order:

0.42 0.48 0.66 0.73 1.10 1.10 5.40

M edian = 0.73

() Introduction 75 / 178
Measures of Center

Median for Frequency Distribution

Suppose f1 , f2 , f3 , · · · , fn are the respective frequencies of


the items x1 , x2 , x3 , · · · , xn .
First, we calculate the cumulative frequencies
Second, we see the median number ( n+1 2
) under the
cumulative frequency column.
The item which corresponds to the median number is called
median.

() Introduction 76 / 178
Measures of Center

Example

Given below is the frequency distribution of number of persons in


50 families in a village. Find the median as the average family
size.
Number of persons Number of families
2 5
3 8
4 12
5 20
6 5

() Introduction 77 / 178
Measures of Center

Solution

The necessary calculations are given below:

Number of persons(x) Number of families(f) Cumulative frequency


2 5 5
3 8 13
4 12 25
5 20 45
6 5 50

() Introduction 78 / 178
Measures of Center

Solution

M edian = V alue of ( n+1


2
)th item
50+1 th
= V alue of ( 2 ) item
(1)
= V alue of (25.5)th item
= 5
Hence, average size of the family is 5 person on the basis of
median.

() Introduction 79 / 178
Measures of Center

Median for Grouped Data

For grouped data, we find the cumulative frequencies and


then we calculate the median number n2 .
The group which corresponds to the median number is called
median group.
The median lies in this group.

() Introduction 80 / 178
Measures of Center

h n
M edian = l + ( − c)
f 2
where
l=Lower limit of the median class
h=Size of the class interval of median class
f =Frequency of the median class
n=Sum of the frequencies
c=cummulative frequency before the median class.

() Introduction 81 / 178
Measures of Center

Example

Find median from the following grouped data regarding heights of


students in a college.
Heights(in inches) Number of students
56-58 25
58-60 40
60-62 250
62-64 130
64-66 60
66-68 20

() Introduction 82 / 178
Measures of Center

Solution

The necessary calculations are given below:

Heights(in inches) Number of students(f) Cumulative frequencies


56-58 25 25
58-60 40 65
60-62 250 315
62-64 130 445
64-66 60 505
66-68 20 525

() Introduction 83 / 178
Measures of Center

Solution

M edian = V alue of ( n2 )th item


= V alue of ( 525
2
)th item (2)
= V alue of (262.5)th item
Median lies in the class 60-62.

() Introduction 84 / 178
Measures of Center

h n
M edian = l + ( − c)
f 2
n
where l = 60, h = 2, f = 250, 2
= 262.5, and c = 65
Thus, Median=61.58.

() Introduction 85 / 178
Measures of Center

Example

Calculate the median from the following data:

Groups Frequency
10-14 5
15-19 12
20-24 30
25-29 25
30-34 6

() Introduction 86 / 178
Measures of Center

Solution

Groups Class Boundaries Frequency C. F


10-14 9.5-14.5 5 5
15-19 14.5-19.5 12 17
20-24 19.5-24.5 30 47
25-29 24.5-29.5 25 72
30-34 29.5-34.5 6 78

() Introduction 87 / 178
Measures of Center

Solution

M edian = V alue of ( n2 )th item


= V alue of ( 78
2
)th item (3)
= V alue of (39)th item
Median lies in the class 19.5-24.5.

() Introduction 88 / 178
Measures of Center

h n
M edian = l + ( − c)
f 2
n
where l = 19.5, h = 5, f = 30, 2
= 39, and c = 17
Thus, Median=23.17.

() Introduction 89 / 178
Measures of Center Quartiles, Deciles, Percentiles

Introduction

We have learned that the median divides a set of data into two
equal parts. In the same way, there are also certain other values
which divide a set of data into four, ten or hundred equal parts.
Such values are referred as quartiles, deciles and percentiles
respectively.

() Introduction 90 / 178
Measures of Center Quartiles, Deciles, Percentiles

Quartiles

There are three quartiles called (lower) first quartile Q1 ,


second quartile Q2 and (upper) third quartile Q3 .

() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles

Quartiles

There are three quartiles called (lower) first quartile Q1 ,


second quartile Q2 and (upper) third quartile Q3 .
These quartiles divide the set of observations into four equal
parts.

() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles

Quartiles

There are three quartiles called (lower) first quartile Q1 ,


second quartile Q2 and (upper) third quartile Q3 .
These quartiles divide the set of observations into four equal
parts.
The second quartile is equal to median.

() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles

Quartiles

There are three quartiles called (lower) first quartile Q1 ,


second quartile Q2 and (upper) third quartile Q3 .
These quartiles divide the set of observations into four equal
parts.
The second quartile is equal to median.
The lower quartile Q1 is a point which has 25% observations
less than it and 75% observations are above it.

() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles

Quartiles

There are three quartiles called (lower) first quartile Q1 ,


second quartile Q2 and (upper) third quartile Q3 .
These quartiles divide the set of observations into four equal
parts.
The second quartile is equal to median.
The lower quartile Q1 is a point which has 25% observations
less than it and 75% observations are above it.
The upper quartile Q3 is a point with 75% observations
below it and 25% observations above it.

() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles

Quartiles for Ungrouped Data

Quartiles for ungrouped data are calculated by the following


formulae.
(n + 1)
Q1 = V alue of th item
4
2(n + 1) (n + 1)
Q2 = V alue of th item = V alue of th item
4 2
3(n + 1)
Q3 = V alue of th item
4

() Introduction 92 / 178
Measures of Center Quartiles, Deciles, Percentiles

Example

Following is the data is of marks obtained by 20 students in a test

20 28 29 30 36 37 39 42 53 54
55 58 61 67 68 70 74 81 82 93

Solution:

Q1 = V alue of (n+1)
4
th item
(20+1)
= V alue of 4 th item
= 5.25th item
The value of 5th item is 36 and that of the 6th item is 37. Thus
Q1 = 36.25.

() Introduction 93 / 178
Measures of Center Quartiles, Deciles, Percentiles

Q2 = V alue of 2(n+1)
4
th item
2(20+1)
= V alue of 4
th item
= 10.5th item
The value of the 10th item is 54 and that of the 11th item is 55.
Thus Q2 = 54 + 0.5 ∗ (1) = 54.5

Q3 = V alue of 3(n+1)
4
th item
3(20+1)
= V alue of 4
th item
= 15.75th item
The value of the 15th item is 68 and that of the 16th item is 70.
Thus Q3 = 68 + 0.75(2) = 69.5

() Introduction 94 / 178
Measures of Center Quartiles, Deciles, Percentiles

Quartiles for Grouped Data

The quartiles may be determined from grouped data in the


same way as the median except that in place of n2 we will use
n
4
.
For calculating quartiles from grouped data we will form
cumulative frequency column.

() Introduction 95 / 178
Measures of Center Quartiles, Deciles, Percentiles

h n
Q1 = l + ( − c)
f 4
h n
Q2 = l + ( − c)
f 2
h 3n
Q3 = l + ( − c)
f 4
where
l=Lower limit of the class
h=Size of the class interval
f =Frequency of the class
n=Sum of the frequencies
c=cummulative frequency before the class.

() Introduction 96 / 178
Measures of Center Quartiles, Deciles, Percentiles

Example
We will calculate the quartiles from the frequency distribution for
the weight of 120 students
Weight (lb) Frequency (f) Class Boundaries C. F
110 - 119 1 109.5 - 119.5 1
120 - 129 4 119.5 - 129.5 5
130 - 139 17 129.5 - 139.5 22
140 - 149 28 139.5 - 149.5 50
150 - 159 25 149.5 - 159.5 75
160 - 169 18 159.5 - 169.5 93
170 - 179 13 169.5 - 179.5 106
180 - 189 6 179.5 - 189.5 112
190 - 199 5 189.5 - 199.5 117
200 - 209 2 195.5 - 209.5 119
210 - 219 1 209.5 - 219.5 120
P
f = n = 120
() Introduction 97 / 178
Measures of Center Quartiles, Deciles, Percentiles

Solution

The first quartile Q1 is the value of n4 th = 30 th item from the


lower end. From Table, we see that cumulative frequency of the
third class is 22 and that of the fourth class is 50. Thus Q1 lies in
the fourth class i.e. 140 - 149.
Q1 = l + fh ( n4 − c)
= 139.5 + 10
28
(30 − 22)
= 142.36 pounds

() Introduction 98 / 178
Measures of Center Quartiles, Deciles, Percentiles

Cont...

The thirds quartile Q3 is the value of 3n 4


th = 90 th item from the
lower end. The cumulative frequency of the fifth class is 75 and
that of the sixth class is 93. Thus, Q3 lies in the sixth class i.e.
160 - 169.
Q3 = l + fh ( 3n
4
− c)
10
= 159.5 + 18 (90 − 75)
= 167.83 pounds
From Q1 and Q3 we conclude that 25% of the students weigh
142.36 pounds or less and 75% of the students weigh 167.83
pounds or less.

() Introduction 99 / 178
Measures of Center Quartiles, Deciles, Percentiles

Deciles

The values which divide an array into ten equal parts are called
deciles. Denote f irst, second, · · · , ninth deciles by
1
D1 , D2 , · · · , D9 respectively. D1 is a point which has 10 part of
the observations below it.
The fifth decile D5 corresponds to median.

() Introduction 100 / 178


Measures of Center Quartiles, Deciles, Percentiles

Deciles for Ungrouped Data

Quartiles for ungrouped data are calculated by the following


formulae.
(n + 1)
D1 = V alue of th item
10
2(n + 1)
D2 = V alue of th item
10
..
.
9(n + 1)
D9 = V alue of th item
10

() Introduction 101 / 178


Measures of Center Quartiles, Deciles, Percentiles

Example

We will calculate second and seventh deciles from the following


array of data.

20 28 29 30 36 37 39 42 53 54
55 58 61 67 68 70 74 81 82 93
Solution:
D2 = V alue of 2(n+1)
10
th item
2(20+1)
= 10
th item
= 4.2th item
The value of the 4th item is 30 and that of the 5th item is 36.
Thus D2 = 30 + 0.2 ∗ (6) = 31.2

() Introduction 102 / 178


Measures of Center Quartiles, Deciles, Percentiles

D7 = V alue of 7(n+1)
10
th item
7(20+1)
= 10
th item
= 14.7th item
The value of the 14th item is 67 and that of the 15th item is 68.
Thus D7 = 67 + 0.7 ∗ (1) = 67.7

() Introduction 103 / 178


Measures of Center Quartiles, Deciles, Percentiles

Decile for Grouped Data

Decile for grouped data can be calculated from the following


formulae:
h n
D1 = l + ( − c)
f 10
h 2n
D2 = l + ( − c)
f 10
..
.
h 9n
D9 = l + ( − c)
f 10

() Introduction 104 / 178


Measures of Center Quartiles, Deciles, Percentiles

Example
We will calculate fourth and ninth deciles from the frequency
distribution of weights of 120 students
Weight (lb) Frequency (f) Class Boundaries C. F
110 - 119 1 109.5 - 119.5 1
120 - 129 4 119.5 - 129.5 5
130 - 139 17 129.5 - 139.5 22
140 - 149 28 139.5 - 149.5 50
150 - 159 25 149.5 - 159.5 75
160 - 169 18 159.5 - 169.5 93
170 - 179 13 169.5 - 179.5 106
180 - 189 6 179.5 - 189.5 112
190 - 199 5 189.5 - 199.5 117
200 - 209 2 195.5 - 209.5 119
210 - 219 1 209.5 - 219.5 120
P
f = n = 120
() Introduction 105 / 178
Measures of Center Quartiles, Deciles, Percentiles

Solution

The fourth decile D4 is the value of 4n


10
th = 48 th item from the
lower end. From Table, we see that cumulative frequency of the
third class is 22 and that of the fourth class is 50. Thus D4 lies in
the fourth class i.e. 140 - 149.
D4 = l + fh ( 4n
10
− c)
10
= 139.5 + 28 (48 − 22)
= 148.79 pounds

() Introduction 106 / 178


Measures of Center Quartiles, Deciles, Percentiles

Cont...

The ninth decile D9 is the value of 9n


10
th = 108 th item from the
lower end. The cumulative frequency of the seventh class is 106
and that of the eighth class is 112. Thus, D9 lies in the eighth
class i.e. 180 - 189.
D9 = l + fh ( 9n
10
− c)
10
= 179.5 + 6 (108 − 106)
= 182.83 pounds

From D4 and D9 , we conclude that 40% students weigh 148.79


pounds or less, and 90% students weigh 182.83 pounds or less

() Introduction 107 / 178


Measures of Center Quartiles, Deciles, Percentiles

Percentiles

The values which divide an array into one hundred equal


parts are called percentiles.

() Introduction 108 / 178


Measures of Center Quartiles, Deciles, Percentiles

Percentiles

The values which divide an array into one hundred equal


parts are called percentiles.
The f irst, second, · · · , N inety − ninth percentile are denoted
by P1 , P2 , · · · , P99 .

() Introduction 108 / 178


Measures of Center Quartiles, Deciles, Percentiles

Percentiles

The values which divide an array into one hundred equal


parts are called percentiles.
The f irst, second, · · · , N inety − ninth percentile are denoted
by P1 , P2 , · · · , P99 .
The 50th percentile P50 corresponds to the median.

() Introduction 108 / 178


Measures of Center Quartiles, Deciles, Percentiles

Percentiles

The values which divide an array into one hundred equal


parts are called percentiles.
The f irst, second, · · · , N inety − ninth percentile are denoted
by P1 , P2 , · · · , P99 .
The 50th percentile P50 corresponds to the median.
The 25th percentile P25 corresponds to the first quartile and
the 75th percentile P75 corresponds to the third quartile.

() Introduction 108 / 178


Measures of Center Quartiles, Deciles, Percentiles

Percentiles for Ungrouped Data

(n + 1)
P1 = V alue of th item
100
2(n + 1)
P2 = V alue of th item
100
..
.
99(n + 1)
P99 = V alue of th item
100

() Introduction 109 / 178


Measures of Center Quartiles, Deciles, Percentiles

Example

We will calculate fifteenth, and sixty-fourth percentile from the


following array

20 28 29 30 36 37 39 42 53 54
55 58 61 67 68 70 74 81 82 93

Solution:
P15 = V alue of 15(n+1)
100
th item
15(20+1)
= 100
th item
= 3.15th item
= 29.15

() Introduction 110 / 178


Measures of Center Quartiles, Deciles, Percentiles

P64 = V alue of 64(n+1)


100
th item
64(20+1)
= 100
th item
= 13.44th item
= 63.64

() Introduction 111 / 178


Measures of Center Quartiles, Deciles, Percentiles

Percentiles for Grouped Data

h n
P1 = l + ( − c)
f 100
h 2n
P2 = l + ( − c)
f 100
..
.
h 99n
P99 =l+ ( − c)
f 100

() Introduction 112 / 178


Measures of Center Quartiles, Deciles, Percentiles

Example
We will calculate thirty-seventh, forty-fifth and ninetieth
percentile from the frequency distribution of weights of 120
students
Weight (lb) Frequency (f) Class Boundaries C. F
110 - 119 1 109.5 - 119.5 1
120 - 129 4 119.5 - 129.5 5
130 - 139 17 129.5 - 139.5 22
140 - 149 28 139.5 - 149.5 50
150 - 159 25 149.5 - 159.5 75
160 - 169 18 159.5 - 169.5 93
170 - 179 13 169.5 - 179.5 106
180 - 189 6 179.5 - 189.5 112
190 - 199 5 189.5 - 199.5 117
200 - 209 2 195.5 - 209.5 119
210 - 219 1 209.5 - 219.5 120
P
()
f =n= 120
Introduction 113 / 178
Measures of Center Quartiles, Deciles, Percentiles

Solution

The thirty-seventh percentile P37 is the value of 37n


100
th = 44.4 th
item from the lower end. Thus P37 lies in the fourth class i.e. 140
- 149.
P37 = l + fh ( 37n
100
− c)
10
= 139.5 + 28 (44.4 − 22)
= 147.5 pounds

() Introduction 114 / 178


Measures of Center Quartiles, Deciles, Percentiles

P45 = l + fh ( 45n
100
− c)
10
= 149.5 + 25 (54 − 50)
= 151.1 pounds
P90 = l + fh ( 90n
100
− c)
10
= 179.5 + 6 (108 − 106)
= 182.83 pounds
From P37 , P45 , and P90 , we have concluded or interpreted that
37% student weigh 147.5 pounds or less. Similarly, 45% students
weigh 151.1 pounds or less and 90% students weigh 182.83 pounds
or less.

() Introduction 115 / 178


Measures of Center Quartiles, Deciles, Percentiles

Mode

The mode of a data set is the value that occurs most frequently.
When two values occur with the same greatest frequency,
each one is a mode and the data set is bimodal.
When more than two values occur with the same greatest
frequency, each is a mode and the data set is said to be
multimodal.
When no value is repeated, we say that there is no mode.

() Introduction 116 / 178


Measures of Center Quartiles, Deciles, Percentiles

Example

Find the modes of the following data sets.


5.40 1.10 0.42 0.73 0.48 1.10
27 27 27 55 55 55 88 88 99
1 2 3 6 7 8 9 10

() Introduction 117 / 178


Measures of Center Quartiles, Deciles, Percentiles

Example

Find the modes of the following data sets.


5.40 1.10 0.42 0.73 0.48 1.10
27 27 27 55 55 55 88 88 99
1 2 3 6 7 8 9 10
Solutions:
The number 1.10 is the mode because it is the value that
occurs most often.
The numbers 27 and 55 are both modes because they occur
with the same greatest frequency. This data set is bimodal
because it has two modes.
There is no mode because no value is repeated.

() Introduction 117 / 178


Measures of Center Quartiles, Deciles, Percentiles

The mode isn’t used much with numerical data.

() Introduction 118 / 178


Measures of Center Quartiles, Deciles, Percentiles

The mode isn’t used much with numerical data.


Among the different measures of center we are considering,
the mode is the only one that can be used with data at the
nominal level of measurement.

() Introduction 118 / 178


Measures of Center Quartiles, Deciles, Percentiles

The mode isn’t used much with numerical data.


Among the different measures of center we are considering,
the mode is the only one that can be used with data at the
nominal level of measurement.
Recall that the nominal level of measurement applies to data
that consist of names, labels, or categories only.
For example

() Introduction 118 / 178


Measures of Center Quartiles, Deciles, Percentiles

The mode isn’t used much with numerical data.


Among the different measures of center we are considering,
the mode is the only one that can be used with data at the
nominal level of measurement.
Recall that the nominal level of measurement applies to data
that consist of names, labels, or categories only.
For example
A survey of college students showed that 84% have TVs, 76%
have VCRs, 60% have portable CD players, 39% have video
game systems, and 35% have DVD players (based on data
from the National Center for Education Statistics). Because
TVs are most frequent, we can say that the mode is TV.

() Introduction 118 / 178


Measures of Center Quartiles, Deciles, Percentiles

The mode isn’t used much with numerical data.


Among the different measures of center we are considering,
the mode is the only one that can be used with data at the
nominal level of measurement.
Recall that the nominal level of measurement applies to data
that consist of names, labels, or categories only.
For example
A survey of college students showed that 84% have TVs, 76%
have VCRs, 60% have portable CD players, 39% have video
game systems, and 35% have DVD players (based on data
from the National Center for Education Statistics). Because
TVs are most frequent, we can say that the mode is TV.
We cannot find a mean or median for such data at the
nominal level.

() Introduction 118 / 178


Measures of Center Quartiles, Deciles, Percentiles

Grouped Data

Example: Find the modal class for the frequency distribution of


miles that 20 runners ran in one week
Class Frequency
5.5 − 10.5 1
10.5 − 15.5 2
15.5 − 20.5 3
20.5 − 25.5 5
25.5 − 30.5 4
30.5 − 35.5 3
35.5 − 40.5 2

() Introduction 119 / 178


Measures of Center Quartiles, Deciles, Percentiles

Solution:
The modal class is 20.5 − 25.5, since it has the largest
frequency.
Sometimes the midpoint of the class is used rather than the
boundaries.
Hence, the mode could also be given as 23 miles per week.

() Introduction 120 / 178


Measures of Center Quartiles, Deciles, Percentiles

A distribution of data is symmetric if the left half of its


histogram is roughly a mirror image of its right half.
A distribution of data is skewed if it is not symmetric and
extends more to one side than the other.

() Introduction 121 / 178


Measures of Center Quartiles, Deciles, Percentiles

Figure: Skewed to the Left (Negatively Skewed): The mean and


median are to the left of the mode (but their order is not always
predictable)

() Introduction 122 / 178


Measures of Center Quartiles, Deciles, Percentiles

Figure: Symmetric (Zero Skewness): The mean, median, and mode are
the same.

() Introduction 123 / 178


Measures of Center Quartiles, Deciles, Percentiles

Figure: Skewed to the Right (Positively Skewed): The mean and


median are to the right of the mode (but their order is not always
predictable)

() Introduction 124 / 178


Measures of Center Quartiles, Deciles, Percentiles

() Introduction 125 / 178


Measures of Center Merits and Demerits of Various Averages

Arithmetic Mean

It is based on all the observations in the data.


It is easy to calculate.
It is determined for almost every kind of data.
Disadvantages
It is greatly affected by extreme values in the data.

() Introduction 126 / 178


Measures of Center Merits and Demerits of Various Averages

Arithmetic Mean

It is based on all the observations in the data.


It is easy to calculate.
It is determined for almost every kind of data.
Disadvantages
It is greatly affected by extreme values in the data.An
outlier is an extremely high or an extremely low data value
when compared with the rest of the data values.
In a highly skewed distribution, the mean is not an
appropriate measure of average.

() Introduction 126 / 178


Measures of Center Merits and Demerits of Various Averages

Geometric Mean

It is based on all the observations in the data.


It gives equal weighage to all the observations.
Disadvantages
It is not easy to calculate.
It vanishes if any observation is zero.
In case of negative values, it cannot be computed at all.

() Introduction 127 / 178


Measures of Center Merits and Demerits of Various Averages

Harmonic Mean

It is based on all the observations in the data.


Disadvantages
It cannot be calculated, if any one of the observations is zero.
It gives too much weighage to the smaller observations.

() Introduction 128 / 178


Measures of Center Merits and Demerits of Various Averages

Median

It is easy to calculate.
It is not affected by extreme values.
In a highly skewed distribution, median is an appropriate
average to use.
Disadvantages
It is not rigorously defined.
It necessitates the arrangement of data into an array which
can be tedious and time consuming for a large body of data.

() Introduction 129 / 178


Measures of Center Merits and Demerits of Various Averages

Mode

It is simply defined an easily calculated.


It is not affected by large or small observations.
It can be determined for both the quantitative and
qualitative data.
Disadvantages
It is not rigorously defined.
It is often indeterminate.
It is not based on all the observations.

() Introduction 130 / 178


Measures of Variation

RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.

() Introduction 131 / 178


Measures of Variation

RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.

Range = (M aximum data entry) − (M inimum data entry)

Example: One corporation hired 10 graduates. The starting


salaries for each graduate are shown. Find the range of the
starting salaries for Corporation A.

() Introduction 131 / 178


Measures of Variation

RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.

Range = (M aximum data entry) − (M inimum data entry)

Example: One corporation hired 10 graduates. The starting


salaries for each graduate are shown. Find the range of the
starting salaries for Corporation A.

Salary |41 38 39 45 47 41 44 41 37 42
Solution:

() Introduction 131 / 178


Measures of Variation

RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.

Range = (M aximum data entry) − (M inimum data entry)

Example: One corporation hired 10 graduates. The starting


salaries for each graduate are shown. Find the range of the
starting salaries for Corporation A.

Salary |41 38 39 45 47 41 44 41 37 42
Solution:

Range = 47 − 37 = 10

() Introduction 131 / 178


Measures of Variation

Grouped Data

The range is the difference between the upper boundary of the


highest class and the lower boundary of the lowest class.

() Introduction 132 / 178


Measures of Variation

Grouped Data

The range is the difference between the upper boundary of the


highest class and the lower boundary of the lowest class.
The range is based on the two extreme observations, it gives
no weight to the central values of the data.
It is a poor measure of dispersion and does not give a good
picture of the overall spread of observations with respect to
the center of the observation.
It gives a general idea about the total spread of the
observations.

() Introduction 132 / 178


Measures of Variation

Quartile Deviation

It is based on the lower quartile Q1 and the upper quartile Q3 .

() Introduction 133 / 178


Measures of Variation

Quartile Deviation

It is based on the lower quartile Q1 and the upper quartile Q3 .


Interquartile range
Q3 − Q1

() Introduction 133 / 178


Measures of Variation

Quartile Deviation

It is based on the lower quartile Q1 and the upper quartile Q3 .


Interquartile range
Q3 − Q1

Semi Interquartile range or Quartile Deviation


Q3 − Q1
2

() Introduction 133 / 178


Measures of Variation

Quartile Deviation

It is based on the lower quartile Q1 and the upper quartile Q3 .


Interquartile range
Q3 − Q1

Semi Interquartile range or Quartile Deviation


Q3 − Q1
2

() Introduction 133 / 178


Measures of Variation

Coefficient of Quartile Deviation

A relative measure of dispersion based on the quartile deviation is


called the coefficient of quartile deviation.

() Introduction 134 / 178


Measures of Variation

Coefficient of Quartile Deviation

A relative measure of dispersion based on the quartile deviation is


called the coefficient of quartile deviation.
Q3 − Q1
Coef f icient of Quartile Deviation =
Q3 + Q1

() Introduction 134 / 178


Measures of Variation

Coefficient of Quartile Deviation

A relative measure of dispersion based on the quartile deviation is


called the coefficient of quartile deviation.
Q3 − Q1
Coef f icient of Quartile Deviation =
Q3 + Q1

It is pure number free of any units of measurement.

() Introduction 134 / 178


Measures of Variation

Coefficient of Quartile Deviation

A relative measure of dispersion based on the quartile deviation is


called the coefficient of quartile deviation.
Q3 − Q1
Coef f icient of Quartile Deviation =
Q3 + Q1

It is pure number free of any units of measurement.


It can be used for comparing the dispersion in two or more
than two sets of data.

() Introduction 134 / 178


Measures of Variation

Example

Calculate the quartile deviation and coefficient of quartile


deviation from the data given below:

Maximum Load(tons) Number of Cables


9.3-9.7 2
9.8-10.2 5
10.3-10.7 12
10.8-11.2 17
11.3-11.7 14
11.8-12.2 6
12.3-12.7 3
12.8-13.2 1

() Introduction 135 / 178


Measures of Variation

Solution

Maximum Load(tons) Number of Cables Class Cumulative


f Boundaries Frequencies
9.3-9.7 2 9.25-9.75 2
9.8-10.2 5 9.75-10.25 7
10.3-10.7 12 10.25-10.75 19
10.8-11.2 17 10.75-11.25 36
11.3-11.7 14 11.25-11.75 50
11.8-12.2 6 11.75-12.25 56
12.3-12.7 3 12.25-12.75 59
12.8-13.2 1 12.75-13.25 60

() Introduction 136 / 178


Measures of Variation

Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4

() Introduction 137 / 178


Measures of Variation

Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75

() Introduction 137 / 178


Measures of Variation

Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4

() Introduction 137 / 178


Measures of Variation

Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58

() Introduction 137 / 178


Measures of Variation

Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58
Upper Quartile
3n 3 × 60
Q3 = V alue of ( )th item = V alue of ( )th item = 45th item
4 4
Q3 lies in the class 11.25 − 11.75

() Introduction 137 / 178


Measures of Variation

Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58
Upper Quartile
3n 3 × 60
Q3 = V alue of ( )th item = V alue of ( )th item = 45th item
4 4
Q3 lies in the class 11.25 − 11.75
h 3n
Q3 = l + ( − c)
f 4

() Introduction 137 / 178


Measures of Variation

Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58
Upper Quartile
3n 3 × 60
Q3 = V alue of ( )th item = V alue of ( )th item = 45th item
4 4
Q3 lies in the class 11.25 − 11.75
h 3n
Q3 = l + ( − c)
f 4
Where l = 11.25, h = 0.5, f = 14, 3n/4 = 45 and c = 36
Q3 = 11.57
() Introduction 137 / 178
Measures of Variation

11.57 − 10.58
Q.D = = 0.495
2

() Introduction 138 / 178


Measures of Variation

11.57 − 10.58
Q.D = = 0.495
2

11.57 − 10.58
Coef f icient of Quartile Deviation = = 0.045
11.57 + 10.58

() Introduction 138 / 178


Measures of Variation

11.57 − 10.58
Q.D = = 0.495
2

11.57 − 10.58
Coef f icient of Quartile Deviation = = 0.045
11.57 + 10.58

() Introduction 138 / 178


Measures of Variation

As a measure of variation, the range has the advantage of


being easy to compute.

() Introduction 139 / 178


Measures of Variation

As a measure of variation, the range has the advantage of


being easy to compute.
Its disadvantage, however, is that it uses only two entries
from the data set.

() Introduction 139 / 178


Measures of Variation

As a measure of variation, the range has the advantage of


being easy to compute.
Its disadvantage, however, is that it uses only two entries
from the data set.
The quartile deviation is a slightly better measure of absolute
dispersion than the range.

() Introduction 139 / 178


Measures of Variation

As a measure of variation, the range has the advantage of


being easy to compute.
Its disadvantage, however, is that it uses only two entries
from the data set.
The quartile deviation is a slightly better measure of absolute
dispersion than the range.
It ignores the observations on the tails.

() Introduction 139 / 178


Measures of Variation

As a measure of variation, the range has the advantage of


being easy to compute.
Its disadvantage, however, is that it uses only two entries
from the data set.
The quartile deviation is a slightly better measure of absolute
dispersion than the range.
It ignores the observations on the tails.
Two measures of variation that use all the entries in a data
set are the variance and the standard deviation.

() Introduction 139 / 178


Measures of Variation

As a measure of variation, the range has the advantage of


being easy to compute.
Its disadvantage, however, is that it uses only two entries
from the data set.
The quartile deviation is a slightly better measure of absolute
dispersion than the range.
It ignores the observations on the tails.
Two measures of variation that use all the entries in a data
set are the variance and the standard deviation.
However, before you learn about these measures of variation,
you need to know what is meant by the deviation of an entry
in a data set.

() Introduction 139 / 178


Measures of Variation

DEVIATION

The deviation of an entry x in a population data set is the


difference between the entry and the mean of the data set

Deviation of x = x − µ

Example: Find the deviation of each starting salary for


Corporation

() Introduction 140 / 178


Measures of Variation

DEVIATION

The deviation of an entry x in a population data set is the


difference between the entry and the mean of the data set

Deviation of x = x − µ

Example: Find the deviation of each starting salary for


Corporation
Solution: The mean starting salary is µ = 415
10
= 41.5. To find out
how much each salary deviates from the mean, subtract 41.5 from
the salary.
For instance, the deviation of 41 is

41 − 41.5 = −0.5

() Introduction 140 / 178


Measures of Variation

Salary x Deviation x − µ
41 -0.5
38 -3.5
39 -2.5
45 3.5
47 5.5
41 -0.5
44 2.5
41 -0.5
37 -4.5
42 0.5
P P
x = 415 (x − µ) = 0

() Introduction 141 / 178


Measures of Variation

Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.

() Introduction 142 / 178


Measures of Variation

Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.
n
P
|Xi − X|
i=1
M.D =
n

() Introduction 142 / 178


Measures of Variation

Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.
n
P
|Xi − X|
i=1
M.D =
n
For frequency distribution, the mean deviation is given by
n
P
fi |Xi − X|
i=1
M.D = n
P
fi
i=1

() Introduction 142 / 178


Measures of Variation

Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.
n
P
|Xi − X|
i=1
M.D =
n
For frequency distribution, the mean deviation is given by
n
P
fi |Xi − X|
i=1
M.D = n
P
fi
i=1

() Introduction 142 / 178


Measures of Variation

n
P
fi |Xi − M edian|
i=1
M.D = n
P
fi
i=1

() Introduction 143 / 178


Measures of Variation

n
P
fi |Xi − M edian|
i=1
M.D = n
P
fi
i=1
n
P
fi |Xi − M ode|
i=1
M.D = n
P
fi
i=1

() Introduction 143 / 178


Measures of Variation

Coefficient of the Mean Deviation

A relative measure of dispersion based on the mean deviation is


called the coefficient of the mean deviation
M ean Deviation f rom M ean
Coef f icient of M.D =
M ean

() Introduction 144 / 178


Measures of Variation

The mean deviation is a better measure of absolute


dispersion than the range and the quartile deviation.

() Introduction 145 / 178


Measures of Variation

The mean deviation is a better measure of absolute


dispersion than the range and the quartile deviation.
A drawback in the mean deviation is that we use the absolute
deviations |X − average| which does not seem logical. The
n
P
reason for this is that (Xi − X) is always equal to zero.
i=1

() Introduction 145 / 178


Measures of Variation

Example

Calculate the mean deviation from mean and its coefficients from
the following data.

Size of items Frequency


3-4 3
4-5 7
5-6 22
6-7 60
7-8 85
8-9 32
9-10 8

() Introduction 146 / 178


Measures of Variation

Example: Comparison of Outdoor Paint


A testing lab wishes to test two experimental brands of outdoor
paint to see how long each will last before fading. The testing lab
makes 6 gallons of each paint to test. Since different chemical
agents are added to each group and only six cans are involved,
these two groups constitute two small populations. The results (in
months) are shown.

Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25

() Introduction 147 / 178


Measures of Variation

The means are equal for both the brands


We might conclude that both brands of paint last equally
well.
When the data sets are examined graphically, a somewhat
different conclusion might be drawn.

() Introduction 148 / 178


Measures of Variation

Examining Data Sets Graphically

Brand B performs more consistently; it is less variable.


() Introduction 149 / 178
Measures of Variation

Variance

The variance is the average of the squares of the distance each


value is from the mean. The symbol for the population variance is
σ 2 (σ is the Greek letter sigma). The formula for the population
variance is
N
(Xi − µ)2
P
i=1
σ2 = ,
N
where
X =individual value
µ =population mean
N =population size

() Introduction 150 / 178


Measures of Variation

Standard Deviation

The standard deviation is the square root of the variance. The


symbol for the population standard deviation is σ.
The corresponding formula for the population standard deviation
is v
uN
uP
u (Xi − µ)2

σ = σ 2 = i=1
t
N

() Introduction 151 / 178


Measures of Variation

Example: Comparison of Outdoor Paint


Find the variance and standard deviation for the data set for
brand A paint
10, 60, 50, 30, 40, 20
Solution:
10 + 60 + 50 + 30 + 40 + 20
µ= = 35
6
Brand A X X −µ (X − µ)2
10 -25 625
60 25 625
50 15 225
30 -5 25
40 5 25
20 -15 225
1750
() Introduction 152 / 178
Measures of Variation

1750
σ2 = = 291.7
6

σ = 291.7 = 17.1

() Introduction 153 / 178


Measures of Variation

Example
Find the variance and standard deviation for brand B paint data

35, 45, 30, 35, 40, 25

() Introduction 154 / 178


Measures of Variation

Example
Find the variance and standard deviation for brand B paint data

35, 45, 30, 35, 40, 25

Solution:

() Introduction 154 / 178


Measures of Variation

Example
Find the variance and standard deviation for brand B paint data

35, 45, 30, 35, 40, 25

Solution:
σ 2 = 41.7

() Introduction 154 / 178


Measures of Variation

Example
Find the variance and standard deviation for brand B paint data

35, 45, 30, 35, 40, 25

Solution:
σ 2 = 41.7

σ = 6.5

() Introduction 154 / 178


Measures of Variation

Example
Find the variance and standard deviation for brand B paint data

35, 45, 30, 35, 40, 25

Solution:
σ 2 = 41.7

σ = 6.5
Since the standard deviation of brand A is 17.1 and the standard
deviation of brand B is 6.5, the data are more variable for brand
A.

() Introduction 154 / 178


Measures of Variation

Example
Find the variance and standard deviation for brand B paint data

35, 45, 30, 35, 40, 25

Solution:
σ 2 = 41.7

σ = 6.5
Since the standard deviation of brand A is 17.1 and the standard
deviation of brand B is 6.5, the data are more variable for brand
A.
In summary, when the means are equal, the larger the variance or
standard deviation is, the more variable the data are.

() Introduction 154 / 178


Measures of Variation

Sample Variance and Standard Deviation

When computing the variance for a sample, one might expect the
following expression to be used:
n
(Xi − X)2
P
i=1
,
n
where X is the sample mean and n is the sample size.

() Introduction 155 / 178


Measures of Variation

Sample Variance and Standard Deviation

When computing the variance for a sample, one might expect the
following expression to be used:
n
(Xi − X)2
P
i=1
,
n
where X is the sample mean and n is the sample size.
This formula is not usually used, however, since in most cases the
purpose of calculating the statistic is to estimate the
corresponding parameter.

() Introduction 155 / 178


Measures of Variation

For example, the sample mean X is used to estimate the


population mean µ. The expression
n
(Xi − X)2
P
i=1
,
n
does not give the best estimate of the population variance because
when the population is large and the sample is small, the variance
computed by this formula usually underestimates the population
variance.

() Introduction 156 / 178


Measures of Variation

For example, the sample mean X is used to estimate the


population mean µ. The expression
n
(Xi − X)2
P
i=1
,
n
does not give the best estimate of the population variance because
when the population is large and the sample is small, the variance
computed by this formula usually underestimates the population
variance.
Therefore, instead of dividing by n, find the variance of the
sample by dividing by n − 1, giving a slightly larger value and
better estimate of the population variance.

() Introduction 156 / 178


Measures of Variation

Sample Variance and Sample Standard Deviation


The formula for the sample variance, denoted by s2 , is
n
(Xi − X)2
P
i=1
s2 = ,
n−1

() Introduction 157 / 178


Measures of Variation

Sample Variance and Sample Standard Deviation


The formula for the sample variance, denoted by s2 , is
n
(Xi − X)2
P
i=1
s2 = ,
n−1

The standard deviation of a sample (denoted by s) is


v
u n
uP
√ u (Xi − X)2
s = s2 = i=1
t
,
n−1
where
X =individual value
X =sample mean
n =sample size
() Introduction 157 / 178
Measures of Variation

Population and Sample Variances for Grouped Data

N
fi (Xi − µ)2
P
i=1
σ2 = N
,
P
fi
i=1

() Introduction 158 / 178


Measures of Variation

Population and Sample Variances for Grouped Data

N
fi (Xi − µ)2
P
i=1
σ2 = N
,
P
fi
i=1

and n
fi (Xi − X)2
P
i=1
s2 = n
P ,
fi − 1
i=1

() Introduction 158 / 178


Measures of Variation

Population and Sample Variances for Grouped Data

N
fi (Xi − µ)2
P
i=1
σ2 = N
,
P
fi
i=1

and n
fi (Xi − X)2
P
i=1
s2 = n
P ,
fi − 1
i=1

where σ is the population variance, s2 is the sample variance,


2

and Xi is the midpoint of a class.

() Introduction 158 / 178


Measures of Variation

Shortcut Formulas for σ 2 and s2

The shortcut formulas for computing the variance and standard


deviation for data obtained from samples are as follows.

() Introduction 159 / 178


Measures of Variation

Shortcut Formulas for σ 2 and s2

The shortcut formulas for computing the variance and standard


deviation for data obtained from samples are as follows.
N N
Xi2 ) − ( Xi )2
P P
N(
i=1 i=1
σ2 =
N2

() Introduction 159 / 178


Measures of Variation

Shortcut Formulas for σ 2 and s2

The shortcut formulas for computing the variance and standard


deviation for data obtained from samples are as follows.
N N
Xi2 ) − ( Xi )2
P P
N(
i=1 i=1
σ2 =
N2
n n
Xi2 ) − ( Xi )2
P P
n(
i=1 i=1
s2 =
n(n − 1)

() Introduction 159 / 178


Measures of Variation

Shortcut Formulas of σ 2 and s2 for Grouped Data

N N N
fi · Xi2 ) − ( fi · Xi )2
P P P
fi (
i=1 i=1 i=1
σ2 = N
P
( fi )2
i=1

() Introduction 160 / 178


Measures of Variation

Shortcut Formulas of σ 2 and s2 for Grouped Data

N N N
fi · Xi2 ) − ( fi · Xi )2
P P P
fi (
i=1 i=1 i=1
σ2 = N
P
( fi )2
i=1

and n n n
fi · Xi2 ) − ( f i · Xi ) 2
P P P
fi (
i=1 i=1 i=1
s2 = n
P n
P
fi ( fi − 1)
i=1 i=1

() Introduction 160 / 178


Measures of Variation

Example
Find the variance and the standard deviation for the frequency
distribution of the data. The data represent the number of miles
that 20 runners ran during one week.

() Introduction 161 / 178


Measures of Variation

Example
Find the variance and the standard deviation for the frequency
distribution of the data. The data represent the number of miles
that 20 runners ran during one week.

Class Frequency Midpoint


f X
5.5-10.5 1 8
10.5-15.5 2 13
15.5-20.5 3 18
20.5-25.5 5 23
25.5-30.5 4 28
30.5-35.5 3 33
35.5-40.5 2 38

() Introduction 161 / 178


Measures of Variation

Comparing Variation in Different Populations

It is used to compare variation for values taken from different


populations.

() Introduction 162 / 178


Measures of Variation

Comparing Variation in Different Populations

It is used to compare variation for values taken from different


populations.
Coefficient of Variation (CV):
The coefficient of variation, denoted by CV, is the standard
deviation divided by the mean. The result is expressed as a
percentage.

() Introduction 162 / 178


Measures of Variation

Comparing Variation in Different Populations

It is used to compare variation for values taken from different


populations.
Coefficient of Variation (CV):
The coefficient of variation, denoted by CV, is the standard
deviation divided by the mean. The result is expressed as a
percentage.
For Sample
s
CV = · 100%
X

() Introduction 162 / 178


Measures of Variation

Comparing Variation in Different Populations

It is used to compare variation for values taken from different


populations.
Coefficient of Variation (CV):
The coefficient of variation, denoted by CV, is the standard
deviation divided by the mean. The result is expressed as a
percentage.
For Sample
s
CV = · 100%
X
For Population
σ
CV = · 100%
µ

() Introduction 162 / 178


Measures of Variation

Example: Heights and Weights of Men

Using the sample height and weight data for the 40 males, we find
the statistics given in the table below. Find the coefficient of
variation for heights, then find the coefficient of variation for
weights, then compare the two results.

() Introduction 163 / 178


Measures of Variation

Example: Heights and Weights of Men

Using the sample height and weight data for the 40 males, we find
the statistics given in the table below. Find the coefficient of
variation for heights, then find the coefficient of variation for
weights, then compare the two results.
Mean (x) Standard Deviation (s)
Height 68.34 in 3.02 in.
Weight 172.55 lb 26.33 lb

() Introduction 163 / 178


Measures of Variation

Solution

s 3.02in
Heights : CV = · 100% = · 100% = 4.42%
X 68.34in

() Introduction 164 / 178


Measures of Variation

Solution

s 3.02in
Heights : CV = · 100% = · 100% = 4.42%
X 68.34in
s 26.33lb
W eights : CV = · 100% = · 100% = 15.26%
X 172.55lb

() Introduction 164 / 178


Measures of Variation

Solution

s 3.02in
Heights : CV = · 100% = · 100% = 4.42%
X 68.34in
s 26.33lb
W eights : CV = · 100% = · 100% = 15.26%
X 172.55lb

Although the difference in the units makes it impossible to


compare the standard deviation of 3.02 in. to the standard
deviation of 26.33 lb, we can compare the coefficients of variation,
which have no units. We can see that heights (with CV = 4.42%)
have considerably less variation than weights (with
CV = 15.26%).

() Introduction 164 / 178


Measures of Variation

EMPIRICAL RULE

For data with a (symmetric) bell-shaped distribution, the


standard deviation has the following characteristics.

() Introduction 165 / 178


Measures of Variation

EMPIRICAL RULE

For data with a (symmetric) bell-shaped distribution, the


standard deviation has the following characteristics.
About 68% of the data lie within one standard deviation of
the mean.

() Introduction 165 / 178


Measures of Variation

EMPIRICAL RULE

For data with a (symmetric) bell-shaped distribution, the


standard deviation has the following characteristics.
About 68% of the data lie within one standard deviation of
the mean.
About 95% of the data lie within two standard deviations of
the mean.

() Introduction 165 / 178


Measures of Variation

EMPIRICAL RULE

For data with a (symmetric) bell-shaped distribution, the


standard deviation has the following characteristics.
About 68% of the data lie within one standard deviation of
the mean.
About 95% of the data lie within two standard deviations of
the mean.
About 99.7% of the data lie within three standard deviations
of the mean.

() Introduction 165 / 178


Measures of Variation

() Introduction 166 / 178


Measures of Variation

Chebyshev’s Theorem

The Empirical Rule applies only to (symmetric) bell-shaped


distributions.

() Introduction 167 / 178


Measures of Variation

Chebyshev’s Theorem

The Empirical Rule applies only to (symmetric) bell-shaped


distributions.
What if the distribution is not bell-shaped, or what if the
shape of the distribution is not known?

() Introduction 167 / 178


Measures of Variation

Chebyshev’s Theorem

The Empirical Rule applies only to (symmetric) bell-shaped


distributions.
What if the distribution is not bell-shaped, or what if the
shape of the distribution is not known?
Chebyshev’s theorem gives an inequality statement that
applies to all distributions

() Introduction 167 / 178


Measures of Variation

Chebyshev’s Theorem

The portion of any data set lying within k standard deviations


(k > 1) of the mean is at least
1
1− .
k2

() Introduction 168 / 178


Measures of Variation

Chebyshev’s Theorem

The portion of any data set lying within k standard deviations


(k > 1) of the mean is at least
1
1− .
k2

k = 2: In any data set, at least 1 − 212 = 43 , or 75%, of the


data lie within 2 standard deviations of the mean.

() Introduction 168 / 178


Measures of Variation

Chebyshev’s Theorem

The portion of any data set lying within k standard deviations


(k > 1) of the mean is at least
1
1− .
k2

k = 2: In any data set, at least 1 − 212 = 43 , or 75%, of the


data lie within 2 standard deviations of the mean.
k = 3 : In any data set, at least 1 − 312 = 98 , or 88.89%, of the
data lie within 3 standard deviations of the mean.

() Introduction 168 / 178


Measures of Variation

Chebyshev’s Theorem

The portion of any data set lying within k standard deviations


(k > 1) of the mean is at least
1
1− .
k2

k = 2: In any data set, at least 1 − 212 = 43 , or 75%, of the


data lie within 2 standard deviations of the mean.
k = 3 : In any data set, at least 1 − 312 = 98 , or 88.89%, of the
data lie within 3 standard deviations of the mean.

() Introduction 168 / 178


Measures of Variation

() Introduction 169 / 178


Measures of Variation

Example: Prices of Homes

The mean price of houses in a certain neighborhood is $50, 000,


and the standard deviation is $10, 000. Find the price range for
which at least 75% of the houses will sell.

() Introduction 170 / 178


Measures of Variation

Example: Prices of Homes

The mean price of houses in a certain neighborhood is $50, 000,


and the standard deviation is $10, 000. Find the price range for
which at least 75% of the houses will sell.
Solution: Chebyshevs theorem states that three-fourths, or 75%,
of the data values will fall within 2 standard deviations of the
mean. Thus,

() Introduction 170 / 178


Measures of Variation

Example: Prices of Homes

The mean price of houses in a certain neighborhood is $50, 000,


and the standard deviation is $10, 000. Find the price range for
which at least 75% of the houses will sell.
Solution: Chebyshevs theorem states that three-fourths, or 75%,
of the data values will fall within 2 standard deviations of the
mean. Thus,

$50, 000 + 2($10, 000) = $70, 000

and
$50, 000 − 2($10, 000) = $30, 000
Hence, at least 75% of all homes sold in the area will have a price
range from $30, 000 to $70, 000.

() Introduction 170 / 178


Measures of Variation

Chebyshev’s theorem can be used to find the minimum percentage


of data values that will fall between any two given values.

() Introduction 171 / 178


Measures of Variation

Example: Travel Allowances

A survey of local companies found that the mean amount of travel


allowance for executives was $0.25 per mile. The standard
deviation was $0.02. Using Chebyshevs theorem, find the
minimum percentage of the data values that will fall between
$0.20 and $0.30.

() Introduction 172 / 178


Measures of Variation

Example: Travel Allowances

A survey of local companies found that the mean amount of travel


allowance for executives was $0.25 per mile. The standard
deviation was $0.02. Using Chebyshevs theorem, find the
minimum percentage of the data values that will fall between
$0.20 and $0.30.

() Introduction 172 / 178


Measures of Variation

Solution:
Subtract the mean from the larger value.

$0.30 − $0.25 = $0.05

Divide the difference by the standard deviation to get k


0.05
k= = 2.5
0.02
Use Chebyshevs theorem to find the percentage.
1
1− = 0.84 or 84%
k2
Hence, at least 84% of the data values will fall between $0.20 and
$0.30.
() Introduction 173 / 178
Measures of Variation

Z score

A standard score or z score tells how many standard deviations a


data value is above or below the mean for a specific distribution
of values. If a standard score is zero, then the data value is the
same as the mean.

() Introduction 174 / 178


Measures of Variation

Z score

A standard score or z score tells how many standard deviations a


data value is above or below the mean for a specific distribution
of values. If a standard score is zero, then the data value is the
same as the mean.
A z score or standard score for a value is obtained by subtracting
the mean from the value and dividing the result by the standard
deviation.

() Introduction 174 / 178


Measures of Variation

Z score

A standard score or z score tells how many standard deviations a


data value is above or below the mean for a specific distribution
of values. If a standard score is zero, then the data value is the
same as the mean.
A z score or standard score for a value is obtained by subtracting
the mean from the value and dividing the result by the standard
deviation.
For samples, z = X−X s

() Introduction 174 / 178


Measures of Variation

Z score

A standard score or z score tells how many standard deviations a


data value is above or below the mean for a specific distribution
of values. If a standard score is zero, then the data value is the
same as the mean.
A z score or standard score for a value is obtained by subtracting
the mean from the value and dividing the result by the standard
deviation.
For samples, z = X−X s
For populations, z = X−µ σ

() Introduction 174 / 178


Measures of Variation

Z score

A standard score or z score tells how many standard deviations a


data value is above or below the mean for a specific distribution
of values. If a standard score is zero, then the data value is the
same as the mean.
A z score or standard score for a value is obtained by subtracting
the mean from the value and dividing the result by the standard
deviation.
For samples, z = X−X s
For populations, z = X−µ σ
The z score represents the number of standard deviations that a
data value falls above or below the mean.

() Introduction 174 / 178


Measures of Variation

Example: Test Scores


A student scored 65 on a calculus test that had a mean of 50 and
a standard deviation of 10; she scored 30 on a history test with a
mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.

() Introduction 175 / 178


Measures of Variation

Example: Test Scores


A student scored 65 on a calculus test that had a mean of 50 and
a standard deviation of 10; she scored 30 on a history test with a
mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.
Solution:
For calculus the z score is
65 − 50
z= = 1.5
10

() Introduction 175 / 178


Measures of Variation

Example: Test Scores


A student scored 65 on a calculus test that had a mean of 50 and
a standard deviation of 10; she scored 30 on a history test with a
mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.
Solution:
For calculus the z score is
65 − 50
z= = 1.5
10
For history the z score is
30 − 25
z= = 1.0
5

() Introduction 175 / 178


Measures of Variation

Example: Test Scores


A student scored 65 on a calculus test that had a mean of 50 and
a standard deviation of 10; she scored 30 on a history test with a
mean of 25 and a standard deviation of 5. Compare her relative
positions on the two tests.
Solution:
For calculus the z score is
65 − 50
z= = 1.5
10
For history the z score is
30 − 25
z= = 1.0
5

() Introduction 175 / 178


Measures of Variation

The calculus score of 65 was actually 1.5 standard deviations


above the mean of 50.

() Introduction 176 / 178


Measures of Variation

The calculus score of 65 was actually 1.5 standard deviations


above the mean of 50.
The history score of 30 was actually 1.0 standard deviations
above the mean of 25.

() Introduction 176 / 178


Measures of Variation

The calculus score of 65 was actually 1.5 standard deviations


above the mean of 50.
The history score of 30 was actually 1.0 standard deviations
above the mean of 25.
Since the z score for calculus is larger, her relative position in
the calculus class is higher than her relative position in the
history class.

() Introduction 176 / 178


Measures of Variation

The calculus score of 65 was actually 1.5 standard deviations


above the mean of 50.
The history score of 30 was actually 1.0 standard deviations
above the mean of 25.
Since the z score for calculus is larger, her relative position in
the calculus class is higher than her relative position in the
history class.
Note that if the z score is positive, the score is above the
mean. If the z score is 0, the score is the same as the mean.
And if the z score is negative, the score is below the mean.

() Introduction 176 / 178


Measures of Variation

Example
Find the z score for each test, and state which is higher.
Test A Test B
X=38 X=94
X=40 X=100
s=5 s=10

() Introduction 177 / 178


Measures of Variation

Example
Find the z score for each test, and state which is higher.
Test A Test B
X=38 X=94
X=40 X=100
s=5 s=10

Solution:
For test A,
38 − 40
z= = −0.4
5
For test B,
94 − 100
z= = −0.6
10
The score for test A is relatively higher than the score for test B.
() Introduction 177 / 178
Measures of Variation

When all data for a variable are transformed into z scores,


the resulting distribution will have a mean of 0 and a
standard deviation of 1.

() Introduction 178 / 178


Measures of Variation

When all data for a variable are transformed into z scores,


the resulting distribution will have a mean of 0 and a
standard deviation of 1.
A z score, then, is actually the number of standard deviations
each value is from the mean for a specific distribution.

() Introduction 178 / 178


Measures of Variation

When all data for a variable are transformed into z scores,


the resulting distribution will have a mean of 0 and a
standard deviation of 1.
A z score, then, is actually the number of standard deviations
each value is from the mean for a specific distribution.

() Introduction 178 / 178

You might also like