0% found this document useful (0 votes)
72 views24 pages

1 Review of Statistics

1. Statistics is the science of dealing with numerical data through collection, presentation, analysis, and interpretation. It is important for engineering to analyze data and make decisions. 2. Descriptive statistics summarizes data through graphical and tabular methods like measures of central tendency, variability, and location. 3. Key statistical terms are defined including population/sample, frequency distribution, relative frequency, measures of central tendency/variability, and quantiles. Formulas are given for calculating these from grouped and ungrouped data.

Uploaded by

Jay Rael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views24 pages

1 Review of Statistics

1. Statistics is the science of dealing with numerical data through collection, presentation, analysis, and interpretation. It is important for engineering to analyze data and make decisions. 2. Descriptive statistics summarizes data through graphical and tabular methods like measures of central tendency, variability, and location. 3. Key statistical terms are defined including population/sample, frequency distribution, relative frequency, measures of central tendency/variability, and quantiles. Formulas are given for calculating these from grouped and ungrouped data.

Uploaded by

Jay Rael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lesson 1 – Review of Statistics

Statistics
• the science and art of dealing with figures and
facts
• well defined as a collection, presentation,
analysis and interpretation of numerical data
collected from different sources.
Role of Statistics in Engineering
• The field of statistics deals with the collection, The steps in the engineering method are as follows:
presentation, analysis, and use of data to Source: Applied Statistics and Probability for Engineers
make decisions, solve problems, and design by D. Montgomery & G. Runger
products and processes.
• Because many aspects of engineering practice
involve working with data, obviously some
knowledge of statistics is important to any
engineer.
• Specifically, statistical technique can be a
powerful aid in designing new products and
systems, improving existing designs, and
designing, developing, and improving
production processes.
Descriptive Statistics
Descriptive statistics is used to denote any of the many techniques used to summarize a set of data. In a sense,
we are using the data on members of a set to describe the set. The techniques are commonly classified as:

1. Graphical description in which we use graphs to summarize data.


2. Tabular description in which we use tables to summarize data.
3. Parametric description in which we estimate the values of certain parameters which
we assume to complete the description of the set of data.

A. Measures of Central Tendency: C. Measures of Variability:


1. Mean 1. Range
2. Median 2. Quartile Deviation (Semi-Interquartile Range)
3. Mode 3. Mean Deviation
4. Variance
B. Measures of Location (Quantiles): 5. Standard Deviation
1. Quartiles
2. Deciles
3. Percentiles
Statistical Terms
1. Raw data – data that has not been organized numerically.
2. Array – an arrangement of numerical data in ascending/descending order of magnitude.
3. Range – the difference between the largest and the smallest numbers in a data.
4. Class intervals – in a range of grouped data e.g 21-30, 31-40 etc, then 21-30 is called the class interval.
5. Class limits – in a class interval of 21-30, then 21 and 30 are called class limits.
6. Lower class limit (l.c.l) – in the class interval 21-30, the lower class limit is 21
7. Upper class limit (u.c.l) – in the class interval 21-30, the upper class limit is 30
8. Lower and upper class boundaries – in the class interval 21-30, the lower class boundary is 20.5 and
the upper class boundary is 30.5. These boundaries assume that, theoretically, measurements for a
class interval 21-30 includes all the numbers from 20.5 to 30.5. Class boundaries are also known as
exact limits.
9. Class width – in a class interval of 21-30, then the class width is the difference between the upper
class boundary and the lower class boundary i.e. 30.5-20.5 = 10. The class width is also known as class
size or interval width/size.
10. Class Mark or Midpoint – in a class interval of 21-30, the class mark is the average of the 2 class limits,
i.e. 25.5.
Statistical Terms
11. Frequency Distribution – large masses of raw data maybe arranged in classes in tabular form
with their corresponding frequencies

12. Cumulative Frequency – for the following frequency distribution, the cumulative frequencies
are calculated as additions of individual frequencies

Hence the cumulative frequency of a value is its frequency plus frequencies of all smaller values.
The above table is called a Cumulative Frequency table. The graph of cumulative frequency versus
the upper class boundary is called an ogive.
Statistical Terms
13. Relative Frequency Distribution

The relative frequency of a class 25-29 is the frequency of the class divided by the total frequency of all classes.
Example: The relative frequency of the class 25-29 = f /∑ f = 10/ 40 = 0.25
Note: The sum of relative frequencies is 1.
Shapes of Frequency Curves
Shapes of Frequency Curves
Data is a collection of facts, such as Representation of Data:
numbers, words, measurements,
1. Bar Graph – represents grouped data
observations etc.
with rectangular bars with lengths
proportional to the values that they
Types of Data: represent. The bars can be plotted
vertically or horizontally.
1. Qualitative data- descriptive information,
approximates and characterize
2. Quantitative data- numerical information

Types of Quantitative Data:

1. Discrete data- countable

2. Continuous data- measurable, variable


2. Histogram – a diagram consisting of 3. Line Graph – a graph that is represented by a series
rectangles whose area is proportional to the of data-points connected with a straight line.
frequency of a variable and whose width is
equal to the class interval.
4. Pie Chart – a graph in which a circle is divided 6. Frequency Distribution – a frequency table in
into sectors that each represent a proportion of the which the collected data are arranged in
whole. ascending/descending order in magnitude with
their corresponding frequency.

5. Dot Diagram –convenient way to see any unusual data features for small number of observations.
7. Grouped Frequency Distribution

8. Ogive – a graph of the upper class boundary versus the cumulative frequency

Note: From the cumulative frequency


data, the first plotting point is ( 24.5, 4). If
we started our graph at this point, it
would remain hanging on the y-axis. We
create another point (19.5, 0) as a starting
point. 19.5 is the projected upper class
boundary of the preceding class.
Formulas for ungrouped data
If the n observations in a sample mean are denoted Population mean for N number of observations:
by x1, x2, ..., xn, then the sample mean is

𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝑥ҧ =
𝑛
Population variance:
The sample variance is
2
 n 
  xi 
(x ) xi2 −  i =1 
n n


2
i −x
n
s2 = i =1
= i =1
n −1 n −1
The sample standard deviation, s, is the positive square The standard error of the mean is
root of the sample variance.
The sample range is 𝒓 = 𝐦𝐚𝐱 𝒙𝒊 − 𝐦𝐢𝐧 𝒙𝒊
Quantile Formulas
Ungrouped Data: Grouped Data:
𝑁
− 𝑓𝑐
𝑄1 = 𝑙 + 4 𝑤
𝑁+1 Lower Quartile: 𝑓𝑄
Lower Quartile Location: 𝑄1 𝐿𝑜𝑐 =
4
3𝑁
3 𝑁+1 − 𝑓𝑐
Upper Quartile Location: 𝑄3 𝐿𝑜𝑐 = 𝑄3 = 𝑙 + 4 𝑤
4 Upper Quartile: 𝑓𝑄

𝑁+1 𝑁
Median Location: 𝑀𝑒𝑑𝑖𝑎𝑛 𝐿𝑜𝑐 = − 𝑓𝑐
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 2 𝑤
2 Median: 𝑓𝑄
𝑁𝑖
Percentile Location: 𝑃𝑖 𝐿𝑜𝑐 =
100 where:
𝑙 − lower class boundary of the quartile class
𝑁+1 𝑖
Decile Location: 𝐷𝑖 𝐿𝑜𝑐 =
10 𝑁 − total frequency of the distribution

𝑓𝑐 − cumulative frequency before the quartile class


NOTE: The quantile is the value corresponding to the
quantile location of the data arranged in ascending order. 𝑓𝑄 − frequency of the quartile class
𝑤 − width of the class interval
Decile/Percentile Formulas

Grouped Data:

𝑁𝑖
− 𝑓𝑐
𝐷𝑖 = 𝑙 + 10 𝑤
𝑓𝐷 where:

𝑙 − lower class boundarty of the decile/percentile class


𝑁𝑖 𝑁 − total frequency of the distribution
− 𝑓𝑐
𝑃𝑖 = 𝑙 + 100 𝑤
𝑓𝑃
𝑓𝑐 − cumulative frequency before the decile/percentile class

𝑓𝐷 − frequency of the decile class

𝑓𝑃 − frequency of the percentile class


Example 1. Find the mean, median, lower and upper quartile of the heights of plants in the array of data below.
Heights of plants in cm: Heights of plants in cm:

46 75 20 59 1 20 11 40 21 48 31 62 σ𝒙
21 41 48 ഥ=
62 𝒙 = 𝟒𝟗. 𝟖 𝐜𝐦
21 42 75 30 2 12 22 32 𝒏
89 75 48 40 3 25 13 42 23 49 32 67
42 47 48 31 4 30 14 42 24 50 34 69
35 85 32 25 5 31 15 42 25 50 35 73
36 40 52 43 32 43 52 75
6 16 26 36
67 73 61 52
7 32 17 44 27 52 37 75
53 48 49 62
42 32 50 50 8 35 18 46 28 53 38 75
41 69 62 44 9 36 19 47 29 59 39 85
10 40 20 48 30 61 40 89
𝑁 + 1 40 + 1
𝑀𝑒𝑑𝑖𝑎𝑛 𝐿𝑜𝑐 = = = 20.5 → the 20th and 21st term 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝟒𝟖 𝐜𝐦
2 2
𝑁 + 1 40 + 1
𝑄1 𝐿𝑜𝑐 = = = 10.25 → the 10th and 11th term 𝑸𝟏 = 𝟒𝟎 𝐜𝐦
4 4
3 𝑁+1 3 40 + 1 𝟔𝟏 + 𝟔𝟐
𝑄3 𝐿𝑜𝑐 = = = 30.75 → the 30th and 31st term 𝑸𝟑 = = 𝟔𝟏. 𝟓
4 4 𝟐
Example 2. Prepare a frequency distribution starting with the shortest plant where class width = 10. Calculate
the sample mean height, median, lower and upper quartiles of the distribution. 𝑁
− 𝑓𝑐
𝑀𝑒𝑑 = 𝑙 + 2 𝑤
Heights of plants in cm: Class Class Class Class Cumulative
𝑓𝑄
Interval Boundary mark frequency frequency
1 20 11 40 21 48 31 62 20 − 9
x f cf
𝑀𝑒𝑑 = 39.5 + 10
2 21 12 41 22 48 32 62 14
20-29 19.5-29.5 24.5 3 3
3 25 13 42 23 49 32 67 𝑴𝒆𝒅 = 𝟒𝟕. 𝟒 𝒄𝒎
4 30 14 42 24 50 34 69 30-39 29.5-39.5 34.5 6 9 𝑁
− 𝑓𝑐
𝑄1 = 𝑙 + 4 𝑤
5 31 15 42 25 50 35 73 𝑓𝑄
40-49 39.5-49.5 44.5 14 23
6 32 16 43 26 52 36 75
10 − 9
7 32 17 44 27 52 37 75 50-59 49.5-59.5 54.5 6 29 𝑄1 = 39.5 + 10
14
8 35 18 46 28 53 38 75
9 36 19 47 29 59 39 85 60-69 59.5-69.5 64.5 5 34 𝑸𝟏 = 𝟒𝟎 𝟐 𝒄𝒎
10 40 20 48 30 61 40 89 3𝑁
70-79 69.5-79.5 74.5 4 38 − 𝑓𝑐
𝑄3 = 𝑙 + 4 𝑤
𝑓𝑄
80-89 79.5-89.5 84.5 2 40
30 − 29
𝑄3 = 59.5 + 10
5
σ 𝒇𝒙
ഥ=
𝒙 = 𝟓𝟎. 𝟓 𝒄𝒎 𝑸𝟑 = 𝟔𝟏. 𝟓 𝒄𝒎
𝒏
Example 3. A student collects a series of twelve groundwater samples from a well. To start, she measures
the dissolved oxygen concentration in six of these. Her observations in mg/L are: 8.8, 3.1, 4.2, 6.2, 7.6, 3.6.
𝑚𝑔 𝑚𝑔 𝑠 𝑚𝑔
𝑥ҧ = 5.6 𝑠 = 2.3 𝑠𝑥ҧ = = 0.95
𝐿 𝐿 𝑛 𝐿
The additional observations in mg/L are: 5.2, 8.6, 6.3, 1.8, 6.8, 3.9.
𝑚𝑔 𝑚𝑔 𝑚𝑔
𝑥ҧ = 5.5 𝑠 = 2.2 𝑠𝑥ҧ = 0.65
𝐿 𝐿 𝐿
Example The weights in kg of milk deliveries to a processing plant are shown below:

a) Using class intervals of 5, tabulate this data in a frequency table with the minimum value as the
lower class limit of the first class interval.
b) Calculate the sample mean weight of the milk delivered based on the grouped data.
c) Find the median of the grouped data.
d) Find the modal class.
e) Find the sample standard deviation and standard error of the mean.
Solution:
Frequency / Tally table:

σ 𝑓𝑥 1,775 2
ഥ=
𝒙 = = 𝟒𝟒. 𝟑𝟕𝟓 𝒌𝒈 2
σ𝑓 𝑥 − 𝒙

𝑛 40 𝑠 = = 24.599
𝑛−1
𝑁
− 𝑓𝑐
2
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + 𝑤 𝒔 = 24.999 = 𝟒. 𝟗𝟔𝟎 𝒌𝒈
𝑓𝑄

20 − 12 𝑠 4.960
𝐌𝐞𝐝𝐢𝐚𝐧 = 42.5 + 5 = 𝟒𝟒. 𝟔𝟎𝟓 𝒌𝒈 𝒔𝒙ഥ = = = 𝟎. 𝟕𝟖 𝒌𝒈
19 𝑛 40

𝐌𝐨𝐝𝐚𝐥 𝐜𝐥𝐚𝐬𝐬 𝐢𝐬 𝟒𝟐. 𝟓 − 𝟒𝟕. 𝟓


Example 4. The pH of a solution is measured eight times by one operator
using the same instrument. She obtains the following data:
pH 7.15 7.20 7.18 7.19 7.21 7.20 7.16 7.18

a) Calculate the sample mean.


b) Calculate the sample variance, sample standard deviation, and range.
Construct a dot diagram of the pH data.
c) What are the sources of variability in this experiment?
7.15 + 7.20 + 7.18 + 7.19 + 7.21 + 7.20 + 7.16 + 7.18
a) 𝑥ҧ = = 𝟕. 𝟏𝟖
8
b) 𝑠 2 = 𝟒. 𝟐𝟕 × 𝟏𝟎−𝟒
𝑠 = 𝟎. 𝟎𝟐
𝑟 = 7.21 − 7.15 = 𝟎. 𝟎𝟔
c) Some possible sources of variability: uncalibrated pH meter, inexperienced operator, solution may have
impurities that account to the imprecision of measurements, the operator’s line of sight may also contribute to the
error in the readings.
Example 5. The data below are the joint temperatures of the O-rings (degrees F) for each test firing or
actual launch of the space shuttle rocket motor (from Presidential Commission on the Space Shuttle Challenger
Accident, Vol. 1, pp. 129 – 131):
a) Calculate the sample mean.
ഥ = 𝟔𝟓. 𝟗℉
a) 𝒙
b) Calculate the sample variance, sample standard deviation, and sample range. b) 𝒔𝟐 = 𝟏𝟒𝟕. 𝟖
c) Construct a dot diagram of the temperature data.
𝒔 = 𝟏𝟐. 𝟐℉
d) Set aside the smallest observation (31°F) and recompute the quantities in
parts a and b. Comment on your findings. How “different” are the other 𝑟 = 84 − 31
temperatures from this last value?
𝒓 = 𝟓𝟑℉
84 49 61 40 83 67 45 66 70
ഥ = 𝟔𝟔. 𝟗℉
d) 𝒙
69 80 58 68 60 67 72 73 70
𝒔𝟐 = 𝟏𝟏𝟓. 𝟒𝟐
57 63 70 78 52 67 53 67 75

61 70 81 76 79 75 76 58 31
𝒔 = 𝟏𝟎. 𝟕℉
𝑟 = 84 − 40
𝒓 = 𝟒𝟒℉
# ℉ # ℉ 67 + 68 # ℉ # ℉ 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝟔𝟖
𝑴𝒆𝒅𝒊𝒂𝒏 =
1 31 19 68 2 1 40 19 69
2 40 20 69 2 45 20 70 𝑸𝟏 = 60
3 45 21 70 𝑴𝒆𝒅𝒊𝒂𝒏 = 𝟔𝟕. 𝟓 3 49 21 70
4 49 22 70 4 52 22 70 𝑸𝟑 = 𝟕𝟓
5 52 23 70 58 + 60 5 53 23 70
𝑸𝟏 =
6 53 24 70 2 6 57 24 72
7 57 25 72 𝑸𝟏 = 𝟓𝟗 7 58 25 73 𝑰𝑸𝑹 = 𝟕𝟓 − 𝟔𝟎
8 58 26 73 𝑠 8 58 26 75
9 58 27 75 𝑸𝟑 = 𝟕𝟓 𝑠𝑥ҧ =
𝑛 9 60 27 75 𝑰𝑸𝑹 = 𝟏𝟓
10 60 28 75 10 61 28 76
11 61 29 76 𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏 11 61 29 76 𝒔 = 𝟏𝟎. 𝟕℉
12 61 30 76 12 63 30 78
13 63 31 78 13 66 31 79 𝒔 𝟏𝟎. 𝟕
𝑰𝑸𝑹 = 𝟕𝟓 − 𝟓𝟗 𝑠𝑥ҧ = =
14 66 32 79 14 67 32 80 𝒏 𝟑𝟔
15 67 33 80 𝑰𝑸𝑹 = 𝟏𝟔 15 67 33 81
16 67 34 81 16 67 34 83 𝑠𝑥ҧ = 𝟏. 𝟕𝟖℉
𝒔 𝟏𝟐. 𝟐
17 67 35 83 𝑠𝑥ҧ = = = 𝟐. 𝟎𝟑℉ 17 67 35 84
𝒏 𝟑𝟔
18 67 36 84 18 68
Is 𝟑𝟏℉ an outlier? YES! 𝑸𝟏 − 𝟏. 𝟓𝑰𝑸𝑹 = 𝟓𝟗 − 𝟏. 𝟓 𝟏𝟔 = 𝟑𝟓 Is 𝟖𝟒℉ an outlier? NO! 𝑸𝟑 + 𝟏. 𝟓𝑰𝑸𝑹 = 𝟕𝟓 + 𝟏. 𝟓 𝟏𝟔 = 𝟗𝟗
Box Plot – a graphical display that simultaneously describes several important features of the data set
such as center, spread, skewness, and outliers.
The following data are needed to construct a box plot:
1. Median or Second Quartile, 𝑄2
2. First Quartile, 𝑄1
3. Third Quartile, 𝑄3
4. Interquartile Range, IQR
Below is the box plot of Example 5.

You might also like