Statistics
Statistics
() Introduction 1 / 178
Probability and Statistics
Spring Semester 2025
() Introduction 3 / 178
Statistics is the science of collecting, organizing, analyzing and
interpreting data to assist in making decisions.
Types of Statistics
() Introduction 3 / 178
Statistics is the science of collecting, organizing, analyzing and
interpreting data to assist in making decisions.
Types of Statistics
() Introduction 3 / 178
Population and Sample
Example
sample
() Introduction 4 / 178
We introduce few more definitions and new terms in statistical
language.
Variable
() Introduction 5 / 178
There are two types of quantitative variables:
Discrete Variable
Continuous Variable
Data: Values of a variable for one or more people or things yield
data.
Qualitative data : Values of a qualitative variable
Quantitative data: Values of a quantitative variable
Discrete data: Values of a discrete variable
Continuous data: Values of a continuous variable
Each individual piece of data is called an observation, and the
collection of all observations for a particular variable is called a
data set.
() Introduction 6 / 178
Data Representation
Frequency Distribution
There are several methods of organization and presentation of
observed data which facilitate its interpretation and evaluation.
Example: The responses of the 40 students in the class for their
political party are recorded as follows:
P N O N N N N N
P O N P O O N P
P N O P N N O N
P O P P P N O P
O N P N N N N P
() Introduction 7 / 178
A class frequency table is given as follows:
Class Frequency Cumulative Relative
intervals x f frequency frequency frel (x)
P 13 13 13/40=0.325
N 18 31 18/40=0.450
O 9 40 9/40=0.225
40 1
() Introduction 8 / 178
Pie Chart
A pie chart is a disk divided into wedge-shaped pieces
proportional to the relative frequencies of the qualitative data.
() Introduction 9 / 178
Pie Chart
A pie chart is a disk divided into wedge-shaped pieces
proportional to the relative frequencies of the qualitative data.
The main steps for the construction of pie chart are as follows:
Obtain a relative-frequency distribution of the data.
Divide a disk into wedge-shaped pieces proportional to the
relative frequencies.
Label the slices with the distinct values and their relative
frequencies.
() Introduction 9 / 178
Pie Chart
A pie chart is a disk divided into wedge-shaped pieces
proportional to the relative frequencies of the qualitative data.
The main steps for the construction of pie chart are as follows:
Obtain a relative-frequency distribution of the data.
Divide a disk into wedge-shaped pieces proportional to the
relative frequencies.
Label the slices with the distinct values and their relative
frequencies.
() Introduction 9 / 178
Bar Chart
A bar chart displays the distinct values of the qualitative data on
a horizontal axis and the relative frequencies (or frequencies or
percents) of those values on a vertical axis. The relative frequency
of each distinct value is represented by a vertical bar whose height
is equal to the relative frequency of that value. The bars should
be positioned so that they do not touch each other.
() Introduction 10 / 178
Bar Chart
A bar chart displays the distinct values of the qualitative data on
a horizontal axis and the relative frequencies (or frequencies or
percents) of those values on a vertical axis. The relative frequency
of each distinct value is represented by a vertical bar whose height
is equal to the relative frequency of that value. The bars should
be positioned so that they do not touch each other.
The main steps for the construction of bar chart are as follows:
Obtain a relative-frequency distribution of the data.
Draw a horizontal axis on which to place the bars and a
vertical axis on which to display the relative frequencies.
For each distinct value, construct a vertical bar whose height
equals the relative frequency of that value.
Label the bars with the distinct values, the horizontal axis
with the name of the variable, and the vertical axis with
Relative frequency.
() Introduction 10 / 178
Bar Charts
() Introduction 11 / 178
Exercise
Find relative frequency, and express this data set in a pie and bar
chart.
() Introduction 12 / 178
Data Representation
() Introduction 13 / 178
Single value grouping
() Introduction 14 / 178
Single value grouping
() Introduction 14 / 178
Limit grouping
70 64 99 55 64 89 87 65
62 38 67 70 60 69 78 39
75 56 71 51 99 68 95 86
57 53 47 50 55 81 80 98
51 36 63 66 85 79 83 70
() Introduction 15 / 178
Cutpoint grouping
() Introduction 16 / 178
Examples
() Introduction 17 / 178
Examples
() Introduction 18 / 178
Example: The following table gives the speeds, in miles per
hour, over 14 mile for 35 cheetahs. Use cutpoint grouping with 52
as the first cutpoint and classes of equal width 2.
() Introduction 19 / 178
Simple Bar Chart
A simple bar chart consist of horizontal or vertical bar of equal
widths and lengths equal to value represented by frequency.
Example:Draw a simple bar diagram to represent the turnover of
a company for 5 years
() Introduction 20 / 178
Multiple Bar Chart
() Introduction 21 / 178
Multiple Bar Chart
() Introduction 22 / 178
Component Bar Chart
() Introduction 23 / 178
() Introduction 24 / 178
Class Boundary
() Introduction 25 / 178
Class Boundary
These numbers are used to separate the classes so that there are
no gaps in the frequency distribution.
() Introduction 26 / 178
Graphical Display for Quantitative Data
Histogram
A histogram displays the classes of the quantitative data on a
horizontal axis and the frequencies (relative frequencies, percents)
of those classes on a vertical axis. The frequency (relative
frequency, percent) of each class is represented by a vertical bar
whose height is equal to the frequency (relative frequency,
percent) of that class. The bars should be positioned so that they
touch each other.
() Introduction 27 / 178
Graphical Display for Quantitative Data
Histogram
A histogram displays the classes of the quantitative data on a
horizontal axis and the frequencies (relative frequencies, percents)
of those classes on a vertical axis. The frequency (relative
frequency, percent) of each class is represented by a vertical bar
whose height is equal to the frequency (relative frequency,
percent) of that class. The bars should be positioned so that they
touch each other.
For single-value grouping, we use the distinct values of the
observations to label the bars, with each such value centered
under its bar
For limit grouping or cutpoint grouping, we use the lower
class limits (or, equivalently, lower class cutpoints) to label
the bars.
() Introduction 27 / 178
Graphical Display for Quantitative Data
Histogram
A histogram displays the classes of the quantitative data on a
horizontal axis and the frequencies (relative frequencies, percents)
of those classes on a vertical axis. The frequency (relative
frequency, percent) of each class is represented by a vertical bar
whose height is equal to the frequency (relative frequency,
percent) of that class. The bars should be positioned so that they
touch each other.
For single-value grouping, we use the distinct values of the
observations to label the bars, with each such value centered
under its bar
For limit grouping or cutpoint grouping, we use the lower
class limits (or, equivalently, lower class cutpoints) to label
the bars.
Note: Some statisticians and technologies use class marks or
class midpoints centered under the bars.
() Introduction 27 / 178
Graphical Display for Quantitative Data
() Introduction 28 / 178
Graphical Display for Quantitative Data
Examples
() Introduction 29 / 178
Graphical Display for Quantitative Data
Examples
() Introduction 30 / 178
Graphical Display for Quantitative Data
Examples
() Introduction 31 / 178
Graphical Display for Quantitative Data
Dotplots
Dotplots are particularly useful for showing the relative positions
of the data in a data set.
() Introduction 32 / 178
Graphical Display for Quantitative Data
Dotplots
Dotplots are particularly useful for showing the relative positions
of the data in a data set.
Prices, in dollar, of 16 DVD players
() Introduction 32 / 178
Graphical Display for Quantitative Data
Frequency Polygon
() Introduction 33 / 178
Graphical Display for Quantitative Data
Frequency Polygon
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
() Introduction 33 / 178
Graphical Display for Quantitative Data
Frequency Polygon
() Introduction 34 / 178
Graphical Display for Quantitative Data
Frequency Polygon
() Introduction 35 / 178
Graphical Display for Quantitative Data
Cumulative frequency
Less than 99.5 0
Less than 104.5 2
Less than 109.5 10
Less than 114.5 28
Less than 119.5 41
Less than 124.5 48
Less than 129.5 49
Less than 134.5 50
() Introduction 36 / 178
Graphical Display for Quantitative Data
() Introduction 37 / 178
Graphical Display for Quantitative Data
() Introduction 38 / 178
Graphical Display for Quantitative Data
() Introduction 39 / 178
Graphical Display for Quantitative Data
() Introduction 40 / 178
Graphical Display for Quantitative Data
Example
() Introduction 41 / 178
Graphical Display for Quantitative Data
() Introduction 42 / 178
Measurement
Levels of measurement
() Introduction 43 / 178
Measurement
Stem-and-Leaf Diagrams
() Introduction 44 / 178
Measurement
Stem-and-Leaf Diagrams
() Introduction 44 / 178
Measurement
Example1
() Introduction 45 / 178
Measurement
Example2
() Introduction 46 / 178
Measurement
Sometimes a data set may contain too many stems, with each
stem containing only a few leaves. In such cases, we may want to
condense the stem and leaf display by grouping the stems.
() Introduction 47 / 178
Measurement
Sometimes a data set may contain too many stems, with each
stem containing only a few leaves. In such cases, we may want to
condense the stem and leaf display by grouping the stems.
() Introduction 47 / 178
Measurement
Example3
The following stem and leaf display is prepared for the number of
hours that 25 students spent working on computers during the
past month.
0 | 6
1 | 179
2 | 26
3 | 2478
4 | 15699
5 | 368
6 | 24457
7 |
8 | 56
() Introduction 48 / 178
Measurement
() Introduction 49 / 178
Measurement
() Introduction 49 / 178
Measurement
() Introduction 49 / 178
Measurement
() Introduction 49 / 178
Measurement
Example4
3 | 1123334478999
4 | 0001111112222233667
() Introduction 50 / 178
Measures of Center
() Introduction 51 / 178
Measures of Center
The Mean
The mean, also known as the arithmetic average, is the sum of the
values, divided by the total number of values. The symbol X̄
represents the sample mean.
P
X1 + X2 + X3 + · · · + Xn X
X̄ = =
n n
where n represents the total number of values in the sample. For
a population, the Greek letter µ (mu) is used for the mean
P
X1 + X 2 + X3 + · · · + XN X
µ= =
N N
where N represents the total number of values in the population
() Introduction 52 / 178
Measures of Center
The data represent the number of days off per year for a sample of
individuals selected from nine different countries. Find the mean
Solution:
20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30 276
X̄ = = = 30.7
9 9
Hence, the mean of the number of days off is 30.7 days
() Introduction 53 / 178
Measures of Center
Carry one more decimal place than is present in the original set of
values.
() Introduction 54 / 178
Measures of Center
The procedure for finding the mean for grouped data assumes
that the mean of all the raw data values in each class is equal
to the midpoint of the class.
In reality, this is not true, since the average of the raw data
values in each class usually will not be exactly equal to the
midpoint.
However, using this procedure will give an acceptable
approximation of the mean, since some values fall above the
midpoint and other values fall below the midpoint for each
class, and the midpoint represents an estimate of all values in
the class.
() Introduction 55 / 178
Measures of Center
Grouped Data
Example: The data represent the number of miles run during
one week for a sample of 20 runners
() Introduction 57 / 178
Measures of Center
Example
() Introduction 58 / 178
Measures of Center
Solution:
Here n1 = 50, n2 = 60, n3 = 50
x1 = 75, x2 = 60, x1 = 50
() Introduction 59 / 178
Measures of Center
() Introduction 60 / 178
Measures of Center
Example
The wages of 5 workers are Rs. 1950, 2000, 2050, 2060 and 2080.
Calculate the arithmetic mean by using the idea of change of
origin (short method)
() Introduction 61 / 178
Measures of Center
Example
() Introduction 62 / 178
Measures of Center
Geometric Mean
() Introduction 63 / 178
Measures of Center
n
P
logxi
i=1
G = Antilog
n
The logarithm of the geometric mean is equal to the arithmetic
mean of the logarithms of observations.
() Introduction 64 / 178
Measures of Center
n
P
logxi
i=1
G = Antilog
n
The logarithm of the geometric mean is equal to the arithmetic
mean of the logarithms of observations.
() Introduction 64 / 178
Measures of Center
Example
Calculate the geometric mean from the following observations:
9.7, 0.0009, 178.7, 0.874, and 1238.
Solution:
xi logxi
9.7 0.9868
0.0009 -3.0458
178.7 2.2521
0.874 -0.0585
1238 3.0927
5
P
logxi
i=1 3.2273
G = Antilog = Antilog = 4.4208
n 5
() Introduction 65 / 178
Measures of Center
() Introduction 66 / 178
Measures of Center
() Introduction 66 / 178
Measures of Center
() Introduction 66 / 178
Measures of Center
Example
() Introduction 67 / 178
Measures of Center
() Introduction 68 / 178
Measures of Center
Harmonic Mean
Harmonic mean is quotient of ”number of the given values” and
”sum of the reciprocals of the given values”.
For Ungrouped Data
n
Harmonic mean = P
n
1
xi
i=1
() Introduction 69 / 178
Measures of Center
Example
Calculate the harmonic mean of the numbers: 13.2, 14.2, 14.8,
15.2 and 16.1
Solution:
The harmonic mean is calculated as below:
1
x x
13.2 0.0758
14.2 0.0704
14.8 0.0676
15.2 0.0658
16.1 0.0621
P 1
( x )=0.3417
5
Harmonic mean = = 14.63
0.3417
() Introduction 70 / 178
Measures of Center
Marks f
30-39 2
40-49 3
50-59 11
60-69 20
70-79 32
80-89 25
90-99 7
() Introduction 71 / 178
Measures of Center
Solution
f
Marks x f x
30-39 34.5 2 0.0580
40-49 44.5 3 0.0674
50-59 54.5 11 0.2018
60-69 64.5 20 0.3101
70-79 74.5 32 0.4295
80-89 84.5 25 0.2959
90-99 94.5 7 0.0741
P P f
f = 100 ( x ) = 1.4368
n
P
fi
i=1 100
Harmonic mean = n = = 69.60
P fi 1.4368
xi
i=1
() Introduction 72 / 178
Measures of Center
Median
The median of a data set is the measure of center that is the
middle value when the original data values are arranged in order
of increasing (or decreasing) magnitude.
() Introduction 73 / 178
Measures of Center
Median
The median of a data set is the measure of center that is the
middle value when the original data values are arranged in order
of increasing (or decreasing) magnitude.
If the number of values is odd, the median is the number
located in the exact middle of the list.
n + 1 th
ordered value
2
If the number of values is even, the median is found by
computing the mean of the two middle numbers.
n th n th
average of and + 1 ordered values
2 2
() Introduction 73 / 178
Measures of Center
Example
Solution:
() Introduction 74 / 178
Measures of Center
Example
Solution:
First sort the values by arranging them in order:
() Introduction 74 / 178
Measures of Center
Example
Solution:
First sort the values by arranging them in order:
0.73 + 1.10
M edian = = 0.915
2
() Introduction 74 / 178
Measures of Center
Example
Solution:
() Introduction 75 / 178
Measures of Center
Example
Solution:
First sort the values by arranging them in order:
() Introduction 75 / 178
Measures of Center
Example
Solution:
First sort the values by arranging them in order:
M edian = 0.73
() Introduction 75 / 178
Measures of Center
() Introduction 76 / 178
Measures of Center
Example
() Introduction 77 / 178
Measures of Center
Solution
() Introduction 78 / 178
Measures of Center
Solution
() Introduction 79 / 178
Measures of Center
() Introduction 80 / 178
Measures of Center
h n
M edian = l + ( − c)
f 2
where
l=Lower limit of the median class
h=Size of the class interval of median class
f =Frequency of the median class
n=Sum of the frequencies
c=cummulative frequency before the median class.
() Introduction 81 / 178
Measures of Center
Example
() Introduction 82 / 178
Measures of Center
Solution
() Introduction 83 / 178
Measures of Center
Solution
() Introduction 84 / 178
Measures of Center
h n
M edian = l + ( − c)
f 2
n
where l = 60, h = 2, f = 250, 2
= 262.5, and c = 65
Thus, Median=61.58.
() Introduction 85 / 178
Measures of Center
Example
Groups Frequency
10-14 5
15-19 12
20-24 30
25-29 25
30-34 6
() Introduction 86 / 178
Measures of Center
Solution
() Introduction 87 / 178
Measures of Center
Solution
() Introduction 88 / 178
Measures of Center
h n
M edian = l + ( − c)
f 2
n
where l = 19.5, h = 5, f = 30, 2
= 39, and c = 17
Thus, Median=23.17.
() Introduction 89 / 178
Measures of Center Quartiles, Deciles, Percentiles
Introduction
We have learned that the median divides a set of data into two
equal parts. In the same way, there are also certain other values
which divide a set of data into four, ten or hundred equal parts.
Such values are referred as quartiles, deciles and percentiles
respectively.
() Introduction 90 / 178
Measures of Center Quartiles, Deciles, Percentiles
Quartiles
() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles
Quartiles
() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles
Quartiles
() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles
Quartiles
() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles
Quartiles
() Introduction 91 / 178
Measures of Center Quartiles, Deciles, Percentiles
() Introduction 92 / 178
Measures of Center Quartiles, Deciles, Percentiles
Example
20 28 29 30 36 37 39 42 53 54
55 58 61 67 68 70 74 81 82 93
Solution:
Q1 = V alue of (n+1)
4
th item
(20+1)
= V alue of 4 th item
= 5.25th item
The value of 5th item is 36 and that of the 6th item is 37. Thus
Q1 = 36.25.
() Introduction 93 / 178
Measures of Center Quartiles, Deciles, Percentiles
Q2 = V alue of 2(n+1)
4
th item
2(20+1)
= V alue of 4
th item
= 10.5th item
The value of the 10th item is 54 and that of the 11th item is 55.
Thus Q2 = 54 + 0.5 ∗ (1) = 54.5
Q3 = V alue of 3(n+1)
4
th item
3(20+1)
= V alue of 4
th item
= 15.75th item
The value of the 15th item is 68 and that of the 16th item is 70.
Thus Q3 = 68 + 0.75(2) = 69.5
() Introduction 94 / 178
Measures of Center Quartiles, Deciles, Percentiles
() Introduction 95 / 178
Measures of Center Quartiles, Deciles, Percentiles
h n
Q1 = l + ( − c)
f 4
h n
Q2 = l + ( − c)
f 2
h 3n
Q3 = l + ( − c)
f 4
where
l=Lower limit of the class
h=Size of the class interval
f =Frequency of the class
n=Sum of the frequencies
c=cummulative frequency before the class.
() Introduction 96 / 178
Measures of Center Quartiles, Deciles, Percentiles
Example
We will calculate the quartiles from the frequency distribution for
the weight of 120 students
Weight (lb) Frequency (f) Class Boundaries C. F
110 - 119 1 109.5 - 119.5 1
120 - 129 4 119.5 - 129.5 5
130 - 139 17 129.5 - 139.5 22
140 - 149 28 139.5 - 149.5 50
150 - 159 25 149.5 - 159.5 75
160 - 169 18 159.5 - 169.5 93
170 - 179 13 169.5 - 179.5 106
180 - 189 6 179.5 - 189.5 112
190 - 199 5 189.5 - 199.5 117
200 - 209 2 195.5 - 209.5 119
210 - 219 1 209.5 - 219.5 120
P
f = n = 120
() Introduction 97 / 178
Measures of Center Quartiles, Deciles, Percentiles
Solution
() Introduction 98 / 178
Measures of Center Quartiles, Deciles, Percentiles
Cont...
() Introduction 99 / 178
Measures of Center Quartiles, Deciles, Percentiles
Deciles
The values which divide an array into ten equal parts are called
deciles. Denote f irst, second, · · · , ninth deciles by
1
D1 , D2 , · · · , D9 respectively. D1 is a point which has 10 part of
the observations below it.
The fifth decile D5 corresponds to median.
Example
20 28 29 30 36 37 39 42 53 54
55 58 61 67 68 70 74 81 82 93
Solution:
D2 = V alue of 2(n+1)
10
th item
2(20+1)
= 10
th item
= 4.2th item
The value of the 4th item is 30 and that of the 5th item is 36.
Thus D2 = 30 + 0.2 ∗ (6) = 31.2
D7 = V alue of 7(n+1)
10
th item
7(20+1)
= 10
th item
= 14.7th item
The value of the 14th item is 67 and that of the 15th item is 68.
Thus D7 = 67 + 0.7 ∗ (1) = 67.7
Example
We will calculate fourth and ninth deciles from the frequency
distribution of weights of 120 students
Weight (lb) Frequency (f) Class Boundaries C. F
110 - 119 1 109.5 - 119.5 1
120 - 129 4 119.5 - 129.5 5
130 - 139 17 129.5 - 139.5 22
140 - 149 28 139.5 - 149.5 50
150 - 159 25 149.5 - 159.5 75
160 - 169 18 159.5 - 169.5 93
170 - 179 13 169.5 - 179.5 106
180 - 189 6 179.5 - 189.5 112
190 - 199 5 189.5 - 199.5 117
200 - 209 2 195.5 - 209.5 119
210 - 219 1 209.5 - 219.5 120
P
f = n = 120
() Introduction 105 / 178
Measures of Center Quartiles, Deciles, Percentiles
Solution
Cont...
Percentiles
Percentiles
Percentiles
Percentiles
(n + 1)
P1 = V alue of th item
100
2(n + 1)
P2 = V alue of th item
100
..
.
99(n + 1)
P99 = V alue of th item
100
Example
20 28 29 30 36 37 39 42 53 54
55 58 61 67 68 70 74 81 82 93
Solution:
P15 = V alue of 15(n+1)
100
th item
15(20+1)
= 100
th item
= 3.15th item
= 29.15
h n
P1 = l + ( − c)
f 100
h 2n
P2 = l + ( − c)
f 100
..
.
h 99n
P99 =l+ ( − c)
f 100
Example
We will calculate thirty-seventh, forty-fifth and ninetieth
percentile from the frequency distribution of weights of 120
students
Weight (lb) Frequency (f) Class Boundaries C. F
110 - 119 1 109.5 - 119.5 1
120 - 129 4 119.5 - 129.5 5
130 - 139 17 129.5 - 139.5 22
140 - 149 28 139.5 - 149.5 50
150 - 159 25 149.5 - 159.5 75
160 - 169 18 159.5 - 169.5 93
170 - 179 13 169.5 - 179.5 106
180 - 189 6 179.5 - 189.5 112
190 - 199 5 189.5 - 199.5 117
200 - 209 2 195.5 - 209.5 119
210 - 219 1 209.5 - 219.5 120
P
()
f =n= 120
Introduction 113 / 178
Measures of Center Quartiles, Deciles, Percentiles
Solution
P45 = l + fh ( 45n
100
− c)
10
= 149.5 + 25 (54 − 50)
= 151.1 pounds
P90 = l + fh ( 90n
100
− c)
10
= 179.5 + 6 (108 − 106)
= 182.83 pounds
From P37 , P45 , and P90 , we have concluded or interpreted that
37% student weigh 147.5 pounds or less. Similarly, 45% students
weigh 151.1 pounds or less and 90% students weigh 182.83 pounds
or less.
Mode
The mode of a data set is the value that occurs most frequently.
When two values occur with the same greatest frequency,
each one is a mode and the data set is bimodal.
When more than two values occur with the same greatest
frequency, each is a mode and the data set is said to be
multimodal.
When no value is repeated, we say that there is no mode.
Example
Example
Grouped Data
Solution:
The modal class is 20.5 − 25.5, since it has the largest
frequency.
Sometimes the midpoint of the class is used rather than the
boundaries.
Hence, the mode could also be given as 23 miles per week.
Figure: Symmetric (Zero Skewness): The mean, median, and mode are
the same.
Arithmetic Mean
Arithmetic Mean
Geometric Mean
Harmonic Mean
Median
It is easy to calculate.
It is not affected by extreme values.
In a highly skewed distribution, median is an appropriate
average to use.
Disadvantages
It is not rigorously defined.
It necessitates the arrangement of data into an array which
can be tedious and time consuming for a large body of data.
Mode
RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.
RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.
RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.
Salary |41 38 39 45 47 41 44 41 37 42
Solution:
RANGE
The range of a data set is the difference between the maximum
and minimum data entries in the set. To find the range, the data
must be quantitative.
Salary |41 38 39 45 47 41 44 41 37 42
Solution:
Range = 47 − 37 = 10
Grouped Data
Grouped Data
Quartile Deviation
Quartile Deviation
Quartile Deviation
Quartile Deviation
Example
Solution
Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58
Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58
Upper Quartile
3n 3 × 60
Q3 = V alue of ( )th item = V alue of ( )th item = 45th item
4 4
Q3 lies in the class 11.25 − 11.75
Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58
Upper Quartile
3n 3 × 60
Q3 = V alue of ( )th item = V alue of ( )th item = 45th item
4 4
Q3 lies in the class 11.25 − 11.75
h 3n
Q3 = l + ( − c)
f 4
Lower Quartile
n 60
Q1 = V alue of ( )th item = V alue of ( )th item = 15th item
4 4
Q1 lies in the class 10.25 − 10.75
h n
Q1 = l + ( − c).
f 4
Where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7
Q1 = 10.58
Upper Quartile
3n 3 × 60
Q3 = V alue of ( )th item = V alue of ( )th item = 45th item
4 4
Q3 lies in the class 11.25 − 11.75
h 3n
Q3 = l + ( − c)
f 4
Where l = 11.25, h = 0.5, f = 14, 3n/4 = 45 and c = 36
Q3 = 11.57
() Introduction 137 / 178
Measures of Variation
11.57 − 10.58
Q.D = = 0.495
2
11.57 − 10.58
Q.D = = 0.495
2
11.57 − 10.58
Coef f icient of Quartile Deviation = = 0.045
11.57 + 10.58
11.57 − 10.58
Q.D = = 0.495
2
11.57 − 10.58
Coef f icient of Quartile Deviation = = 0.045
11.57 + 10.58
DEVIATION
Deviation of x = x − µ
DEVIATION
Deviation of x = x − µ
41 − 41.5 = −0.5
Salary x Deviation x − µ
41 -0.5
38 -3.5
39 -2.5
45 3.5
47 5.5
41 -0.5
44 2.5
41 -0.5
37 -4.5
42 0.5
P P
x = 415 (x − µ) = 0
Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.
Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.
n
P
|Xi − X|
i=1
M.D =
n
Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.
n
P
|Xi − X|
i=1
M.D =
n
For frequency distribution, the mean deviation is given by
n
P
fi |Xi − X|
i=1
M.D = n
P
fi
i=1
Mean Deviation
The mean deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may
be the arithmetic mean, the median or the mode.
n
P
|Xi − X|
i=1
M.D =
n
For frequency distribution, the mean deviation is given by
n
P
fi |Xi − X|
i=1
M.D = n
P
fi
i=1
n
P
fi |Xi − M edian|
i=1
M.D = n
P
fi
i=1
n
P
fi |Xi − M edian|
i=1
M.D = n
P
fi
i=1
n
P
fi |Xi − M ode|
i=1
M.D = n
P
fi
i=1
Example
Calculate the mean deviation from mean and its coefficients from
the following data.
Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25
Variance
Standard Deviation
1750
σ2 = = 291.7
6
√
σ = 291.7 = 17.1
Example
Find the variance and standard deviation for brand B paint data
Example
Find the variance and standard deviation for brand B paint data
Solution:
Example
Find the variance and standard deviation for brand B paint data
Solution:
σ 2 = 41.7
Example
Find the variance and standard deviation for brand B paint data
Solution:
σ 2 = 41.7
σ = 6.5
Example
Find the variance and standard deviation for brand B paint data
Solution:
σ 2 = 41.7
σ = 6.5
Since the standard deviation of brand A is 17.1 and the standard
deviation of brand B is 6.5, the data are more variable for brand
A.
Example
Find the variance and standard deviation for brand B paint data
Solution:
σ 2 = 41.7
σ = 6.5
Since the standard deviation of brand A is 17.1 and the standard
deviation of brand B is 6.5, the data are more variable for brand
A.
In summary, when the means are equal, the larger the variance or
standard deviation is, the more variable the data are.
When computing the variance for a sample, one might expect the
following expression to be used:
n
(Xi − X)2
P
i=1
,
n
where X is the sample mean and n is the sample size.
When computing the variance for a sample, one might expect the
following expression to be used:
n
(Xi − X)2
P
i=1
,
n
where X is the sample mean and n is the sample size.
This formula is not usually used, however, since in most cases the
purpose of calculating the statistic is to estimate the
corresponding parameter.
N
fi (Xi − µ)2
P
i=1
σ2 = N
,
P
fi
i=1
N
fi (Xi − µ)2
P
i=1
σ2 = N
,
P
fi
i=1
and n
fi (Xi − X)2
P
i=1
s2 = n
P ,
fi − 1
i=1
N
fi (Xi − µ)2
P
i=1
σ2 = N
,
P
fi
i=1
and n
fi (Xi − X)2
P
i=1
s2 = n
P ,
fi − 1
i=1
N N N
fi · Xi2 ) − ( fi · Xi )2
P P P
fi (
i=1 i=1 i=1
σ2 = N
P
( fi )2
i=1
N N N
fi · Xi2 ) − ( fi · Xi )2
P P P
fi (
i=1 i=1 i=1
σ2 = N
P
( fi )2
i=1
and n n n
fi · Xi2 ) − ( f i · Xi ) 2
P P P
fi (
i=1 i=1 i=1
s2 = n
P n
P
fi ( fi − 1)
i=1 i=1
Example
Find the variance and the standard deviation for the frequency
distribution of the data. The data represent the number of miles
that 20 runners ran during one week.
Example
Find the variance and the standard deviation for the frequency
distribution of the data. The data represent the number of miles
that 20 runners ran during one week.
Using the sample height and weight data for the 40 males, we find
the statistics given in the table below. Find the coefficient of
variation for heights, then find the coefficient of variation for
weights, then compare the two results.
Using the sample height and weight data for the 40 males, we find
the statistics given in the table below. Find the coefficient of
variation for heights, then find the coefficient of variation for
weights, then compare the two results.
Mean (x) Standard Deviation (s)
Height 68.34 in 3.02 in.
Weight 172.55 lb 26.33 lb
Solution
s 3.02in
Heights : CV = · 100% = · 100% = 4.42%
X 68.34in
Solution
s 3.02in
Heights : CV = · 100% = · 100% = 4.42%
X 68.34in
s 26.33lb
W eights : CV = · 100% = · 100% = 15.26%
X 172.55lb
Solution
s 3.02in
Heights : CV = · 100% = · 100% = 4.42%
X 68.34in
s 26.33lb
W eights : CV = · 100% = · 100% = 15.26%
X 172.55lb
EMPIRICAL RULE
EMPIRICAL RULE
EMPIRICAL RULE
EMPIRICAL RULE
Chebyshev’s Theorem
Chebyshev’s Theorem
Chebyshev’s Theorem
Chebyshev’s Theorem
Chebyshev’s Theorem
Chebyshev’s Theorem
Chebyshev’s Theorem
and
$50, 000 − 2($10, 000) = $30, 000
Hence, at least 75% of all homes sold in the area will have a price
range from $30, 000 to $70, 000.
Solution:
Subtract the mean from the larger value.
Z score
Z score
Z score
Z score
Z score
Example
Find the z score for each test, and state which is higher.
Test A Test B
X=38 X=94
X=40 X=100
s=5 s=10
Example
Find the z score for each test, and state which is higher.
Test A Test B
X=38 X=94
X=40 X=100
s=5 s=10
Solution:
For test A,
38 − 40
z= = −0.4
5
For test B,
94 − 100
z= = −0.6
10
The score for test A is relatively higher than the score for test B.
() Introduction 177 / 178
Measures of Variation