1 Review of Statistics
1 Review of Statistics
Statistics
• the science and art of dealing with figures and
facts
• well defined as a collection, presentation,
analysis and interpretation of numerical data
collected from different sources.
Role of Statistics in Engineering
• The field of statistics deals with the collection, The steps in the engineering method are as follows:
presentation, analysis, and use of data to Source: Applied Statistics and Probability for Engineers
make decisions, solve problems, and design by D. Montgomery & G. Runger
products and processes.
• Because many aspects of engineering practice
involve working with data, obviously some
knowledge of statistics is important to any
engineer.
• Specifically, statistical technique can be a
powerful aid in designing new products and
systems, improving existing designs, and
designing, developing, and improving
production processes.
Descriptive Statistics
Descriptive statistics is used to denote any of the many techniques used to summarize a set of data. In a sense,
we are using the data on members of a set to describe the set. The techniques are commonly classified as:
12. Cumulative Frequency – for the following frequency distribution, the cumulative frequencies
are calculated as additions of individual frequencies
Hence the cumulative frequency of a value is its frequency plus frequencies of all smaller values.
The above table is called a Cumulative Frequency table. The graph of cumulative frequency versus
the upper class boundary is called an ogive.
Statistical Terms
13. Relative Frequency Distribution
The relative frequency of a class 25-29 is the frequency of the class divided by the total frequency of all classes.
Example: The relative frequency of the class 25-29 = f /∑ f = 10/ 40 = 0.25
Note: The sum of relative frequencies is 1.
Shapes of Frequency Curves
Shapes of Frequency Curves
Data is a collection of facts, such as Representation of Data:
numbers, words, measurements,
1. Bar Graph – represents grouped data
observations etc.
with rectangular bars with lengths
proportional to the values that they
Types of Data: represent. The bars can be plotted
vertically or horizontally.
1. Qualitative data- descriptive information,
approximates and characterize
2. Quantitative data- numerical information
5. Dot Diagram –convenient way to see any unusual data features for small number of observations.
7. Grouped Frequency Distribution
8. Ogive – a graph of the upper class boundary versus the cumulative frequency
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑥ҧ =
𝑛
Population variance:
The sample variance is
2
n
xi
(x ) xi2 − i =1
n n
2
i −x
n
s2 = i =1
= i =1
n −1 n −1
The sample standard deviation, s, is the positive square The standard error of the mean is
root of the sample variance.
The sample range is 𝒓 = 𝐦𝐚𝐱 𝒙𝒊 − 𝐦𝐢𝐧 𝒙𝒊
Quantile Formulas
Ungrouped Data: Grouped Data:
𝑁
− 𝑓𝑐
𝑄1 = 𝑙 + 4 𝑤
𝑁+1 Lower Quartile: 𝑓𝑄
Lower Quartile Location: 𝑄1 𝐿𝑜𝑐 =
4
3𝑁
3 𝑁+1 − 𝑓𝑐
Upper Quartile Location: 𝑄3 𝐿𝑜𝑐 = 𝑄3 = 𝑙 + 4 𝑤
4 Upper Quartile: 𝑓𝑄
𝑁+1 𝑁
Median Location: 𝑀𝑒𝑑𝑖𝑎𝑛 𝐿𝑜𝑐 = − 𝑓𝑐
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2 𝑤
2 Median: 𝑓𝑄
𝑁𝑖
Percentile Location: 𝑃𝑖 𝐿𝑜𝑐 =
100 where:
𝑙 − lower class boundary of the quartile class
𝑁+1 𝑖
Decile Location: 𝐷𝑖 𝐿𝑜𝑐 =
10 𝑁 − total frequency of the distribution
Grouped Data:
𝑁𝑖
− 𝑓𝑐
𝐷𝑖 = 𝑙 + 10 𝑤
𝑓𝐷 where:
46 75 20 59 1 20 11 40 21 48 31 62 σ𝒙
21 41 48 ഥ=
62 𝒙 = 𝟒𝟗. 𝟖 𝐜𝐦
21 42 75 30 2 12 22 32 𝒏
89 75 48 40 3 25 13 42 23 49 32 67
42 47 48 31 4 30 14 42 24 50 34 69
35 85 32 25 5 31 15 42 25 50 35 73
36 40 52 43 32 43 52 75
6 16 26 36
67 73 61 52
7 32 17 44 27 52 37 75
53 48 49 62
42 32 50 50 8 35 18 46 28 53 38 75
41 69 62 44 9 36 19 47 29 59 39 85
10 40 20 48 30 61 40 89
𝑁 + 1 40 + 1
𝑀𝑒𝑑𝑖𝑎𝑛 𝐿𝑜𝑐 = = = 20.5 → the 20th and 21st term 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝟒𝟖 𝐜𝐦
2 2
𝑁 + 1 40 + 1
𝑄1 𝐿𝑜𝑐 = = = 10.25 → the 10th and 11th term 𝑸𝟏 = 𝟒𝟎 𝐜𝐦
4 4
3 𝑁+1 3 40 + 1 𝟔𝟏 + 𝟔𝟐
𝑄3 𝐿𝑜𝑐 = = = 30.75 → the 30th and 31st term 𝑸𝟑 = = 𝟔𝟏. 𝟓
4 4 𝟐
Example 2. Prepare a frequency distribution starting with the shortest plant where class width = 10. Calculate
the sample mean height, median, lower and upper quartiles of the distribution. 𝑁
− 𝑓𝑐
𝑀𝑒𝑑 = 𝑙 + 2 𝑤
Heights of plants in cm: Class Class Class Class Cumulative
𝑓𝑄
Interval Boundary mark frequency frequency
1 20 11 40 21 48 31 62 20 − 9
x f cf
𝑀𝑒𝑑 = 39.5 + 10
2 21 12 41 22 48 32 62 14
20-29 19.5-29.5 24.5 3 3
3 25 13 42 23 49 32 67 𝑴𝒆𝒅 = 𝟒𝟕. 𝟒 𝒄𝒎
4 30 14 42 24 50 34 69 30-39 29.5-39.5 34.5 6 9 𝑁
− 𝑓𝑐
𝑄1 = 𝑙 + 4 𝑤
5 31 15 42 25 50 35 73 𝑓𝑄
40-49 39.5-49.5 44.5 14 23
6 32 16 43 26 52 36 75
10 − 9
7 32 17 44 27 52 37 75 50-59 49.5-59.5 54.5 6 29 𝑄1 = 39.5 + 10
14
8 35 18 46 28 53 38 75
9 36 19 47 29 59 39 85 60-69 59.5-69.5 64.5 5 34 𝑸𝟏 = 𝟒𝟎 𝟐 𝒄𝒎
10 40 20 48 30 61 40 89 3𝑁
70-79 69.5-79.5 74.5 4 38 − 𝑓𝑐
𝑄3 = 𝑙 + 4 𝑤
𝑓𝑄
80-89 79.5-89.5 84.5 2 40
30 − 29
𝑄3 = 59.5 + 10
5
σ 𝒇𝒙
ഥ=
𝒙 = 𝟓𝟎. 𝟓 𝒄𝒎 𝑸𝟑 = 𝟔𝟏. 𝟓 𝒄𝒎
𝒏
Example 3. A student collects a series of twelve groundwater samples from a well. To start, she measures
the dissolved oxygen concentration in six of these. Her observations in mg/L are: 8.8, 3.1, 4.2, 6.2, 7.6, 3.6.
𝑚𝑔 𝑚𝑔 𝑠 𝑚𝑔
𝑥ҧ = 5.6 𝑠 = 2.3 𝑠𝑥ҧ = = 0.95
𝐿 𝐿 𝑛 𝐿
The additional observations in mg/L are: 5.2, 8.6, 6.3, 1.8, 6.8, 3.9.
𝑚𝑔 𝑚𝑔 𝑚𝑔
𝑥ҧ = 5.5 𝑠 = 2.2 𝑠𝑥ҧ = 0.65
𝐿 𝐿 𝐿
Example The weights in kg of milk deliveries to a processing plant are shown below:
a) Using class intervals of 5, tabulate this data in a frequency table with the minimum value as the
lower class limit of the first class interval.
b) Calculate the sample mean weight of the milk delivered based on the grouped data.
c) Find the median of the grouped data.
d) Find the modal class.
e) Find the sample standard deviation and standard error of the mean.
Solution:
Frequency / Tally table:
σ 𝑓𝑥 1,775 2
ഥ=
𝒙 = = 𝟒𝟒. 𝟑𝟕𝟓 𝒌𝒈 2
σ𝑓 𝑥 − 𝒙
ഥ
𝑛 40 𝑠 = = 24.599
𝑛−1
𝑁
− 𝑓𝑐
2
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 𝑤 𝒔 = 24.999 = 𝟒. 𝟗𝟔𝟎 𝒌𝒈
𝑓𝑄
20 − 12 𝑠 4.960
𝐌𝐞𝐝𝐢𝐚𝐧 = 42.5 + 5 = 𝟒𝟒. 𝟔𝟎𝟓 𝒌𝒈 𝒔𝒙ഥ = = = 𝟎. 𝟕𝟖 𝒌𝒈
19 𝑛 40
61 70 81 76 79 75 76 58 31
𝒔 = 𝟏𝟎. 𝟕℉
𝑟 = 84 − 40
𝒓 = 𝟒𝟒℉
# ℉ # ℉ 67 + 68 # ℉ # ℉ 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝟔𝟖
𝑴𝒆𝒅𝒊𝒂𝒏 =
1 31 19 68 2 1 40 19 69
2 40 20 69 2 45 20 70 𝑸𝟏 = 60
3 45 21 70 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝟔𝟕. 𝟓 3 49 21 70
4 49 22 70 4 52 22 70 𝑸𝟑 = 𝟕𝟓
5 52 23 70 58 + 60 5 53 23 70
𝑸𝟏 =
6 53 24 70 2 6 57 24 72
7 57 25 72 𝑸𝟏 = 𝟓𝟗 7 58 25 73 𝑰𝑸𝑹 = 𝟕𝟓 − 𝟔𝟎
8 58 26 73 𝑠 8 58 26 75
9 58 27 75 𝑸𝟑 = 𝟕𝟓 𝑠𝑥ҧ =
𝑛 9 60 27 75 𝑰𝑸𝑹 = 𝟏𝟓
10 60 28 75 10 61 28 76
11 61 29 76 𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏 11 61 29 76 𝒔 = 𝟏𝟎. 𝟕℉
12 61 30 76 12 63 30 78
13 63 31 78 13 66 31 79 𝒔 𝟏𝟎. 𝟕
𝑰𝑸𝑹 = 𝟕𝟓 − 𝟓𝟗 𝑠𝑥ҧ = =
14 66 32 79 14 67 32 80 𝒏 𝟑𝟔
15 67 33 80 𝑰𝑸𝑹 = 𝟏𝟔 15 67 33 81
16 67 34 81 16 67 34 83 𝑠𝑥ҧ = 𝟏. 𝟕𝟖℉
𝒔 𝟏𝟐. 𝟐
17 67 35 83 𝑠𝑥ҧ = = = 𝟐. 𝟎𝟑℉ 17 67 35 84
𝒏 𝟑𝟔
18 67 36 84 18 68
Is 𝟑𝟏℉ an outlier? YES! 𝑸𝟏 − 𝟏. 𝟓𝑰𝑸𝑹 = 𝟓𝟗 − 𝟏. 𝟓 𝟏𝟔 = 𝟑𝟓 Is 𝟖𝟒℉ an outlier? NO! 𝑸𝟑 + 𝟏. 𝟓𝑰𝑸𝑹 = 𝟕𝟓 + 𝟏. 𝟓 𝟏𝟔 = 𝟗𝟗
Box Plot – a graphical display that simultaneously describes several important features of the data set
such as center, spread, skewness, and outliers.
The following data are needed to construct a box plot:
1. Median or Second Quartile, 𝑄2
2. First Quartile, 𝑄1
3. Third Quartile, 𝑄3
4. Interquartile Range, IQR
Below is the box plot of Example 5.