Introduction To Statistics and Application in Engineering Analysis
Introduction To Statistics and Application in Engineering Analysis
Chapter 10
Introduction to Statistics
and Application in Engineering Analysis
Chapter Outline
What is Statistics?
Collecting:
Data relating to certain events or physical phenomena
Most datasets involve numbers
Organizing:
All collected data will be arranged in logical and chronicle order for viewing and analyses
Datasets are normally organized in either ascending order or descending order.
Summarizing:
Summarizing the data to offer an overview of the situation
Presenting:
Develop a comprehensive way to present the dataset
Analyzing:
To analyze the dataset for the intended applications
The Mode of Statistical Dataset:
The mode of the dataset is represented by the number that appear in the dataset
most frequently:
For instances:
The set:
1.75, 1.83, 1.85, 1.95, 1.97, 2.03, 2.03, 2.06, 2.13, 2.15, 2.15, 2.25, 2.35, 2.70, 2.70
has a triple mode of: 2.03, 2.15 and 2.70, as each of these numbers each appear
twice in the set
2) Determine appropriate range (the difference between largest and smallest numbers)
to present the data
3) Divide the range into a convenient number of intervals having the same size (value).
If this is not feasible, use the intervals of different sizes of open intervals.
2 2 2
1 1 1
Test scores 45- 50- 55- 60- 65- 70- 75- 80- 85- 90-
49 54 59 64 69 74 79 84 89 94
Frequency 1 2 2 5 11 9 10 5 4 2
The corresponding histogram, or the frequency distribution of the students marks are:
(11)
(9) (10)
Number of Students
(5) (5)
(4)
(2)(2) (2)
(1)
Marks
65-69
75-79
45-49
55-59
85-89
60-64
50-54
70-74
80-84
90-94
Terminologies in Statistics for Engineering Analysis
The Mean
The Mean of a a dataset is the arithmetic average of the data in the set
It is a good way to represent the Central tendency of the set
Mathematically, we may express the Mean of a dataset in the following way:
Disadvantage of using the Mean is that it loses its sense of representing the
central tendency when a few out-ranged data are present in the set.
For example, the Mean of a set of 2,3,5,7,9,11,13 is 7.14, which is a close number
representing the central value of the set.
This value becomes 15.71 if the last data of the set of 7 data become 73, i.e.: 2,3,5,7,9,11,73,
- Not a good representation of the central tendency of the data set.
The Median
In cases in which data in the set shows significant amount of Out-ranged data,
the Median meaning the central data is used to show the central tendency of the set.
For example, the same data set in the previous example with 7 data: 2,3,5,7,9,11, 73
We may take the data in the central of the set, i.e., 7 to be the Median representing the
central tendency of the dataset.
The central data is readily identified in a set with odd number of data.
For the set with even number of data, the Median is the average of the two central data.
For example, the Median of the dataset: 5, 9, 11, 14, 16, 19 is (11+14)/2 = 12.5
Like Mean, the median of a dataset exists at all times. It is often a better way to express
the central tendency of datasets with out-ranged data, such as the real estate price in
Santa Clara Valley in California, in which significant number of substantially out-ranged
house prices exist.
Deviation and Standard Deviation ()
Because the Mean of a dataset indicates its central tendency, it is often required to
measure how some data in the set deviates from its mean value x
[(x ] (x x )
n
x ) + ( x2 x ) + ........( xn x ) =
2 2 2 2
1 i
i =1
We will observe that no term in the above expression may result in ve value and thus avoid
a zero total deviation of the dataset.
( xi x )
2
(10.2)
= i =1
n 1
( xi x )
2
(10.3)
=
2 i =1
n 1
Example 10.4
To determine the standard deviation () and the variance (2) of the dataset:
{5 9 11 14 19} with n = 5
Number of Students
(5) (5)
(4)
(2)(2) (2)
(1)
Number of Students
Marks
65-69
75-79
45-49
55-59
85-89
(11)
60-64
50-54
70-74
80-84
90-94
(9) (10)
65-69
75-79
45-49
55-59
85-89
The histogram shown by a solid line curve with
60-64
50-54
70-74
80-84
90-94
its population located at the MEAN is called
NORMAL distribution of a statistical dataset
Mean
The Normal Distribution Function
x2
1
2
e 2S 2
x
-3 -2 -1 0 1 2 3
2
where = the standard deviation of the data set given in Equation (10.2), and x
is the mean of the set
Properties of Normal Distribution Curve
With the help of the mathematical expression of normal distribution function, we are able
to come up with the following important but interesting properties from mathematical
analysis:
Example 10.5:
From the properties of normal distribution curve, which we assume it fits the measured
tire lives, we will come to the following observations:
68.26% cars had tire life of: = 42000 3000 miles,
94.40% cars had tire life of: 2 = 42000 6000 miles, and
99.73% cars had tire life of : 3 = 42000 9000 miles.
Statistical Quality Control
Poor quality will not only result in failure of the products in the marketplace, but also
cause costly liability and recalls. For this reason, companies normally include the
following costs included in their setting the price of the products:
The question is How thorough should we conduct those inspections and testing
to ensure good quality in typical MASS PRODUCTION environment?
If we focus our attention on the inspection and testing of the FINISHED product,
A question is How many PIECES of the product should we pick up for
inspection, and also how many TESTING points we should select on each piece
in order to achieve credible quality control of the batch of the products from a
mass production process?
Control charts enable the use of objective criteria for distinguishing background variation
from events of significance based on statistical techniques
Variations in the process that may affect the quality of the end product or service
can be detected and corrected, thus reducing waste as well as the likelihood
that problems will be passed on to the customer. With its emphasis on early detection
and prevention of problems
SPC can lead to a reduction in the time required to produce the product or service from
end to end
Process cycle time reductions coupled with improvements in yield have made SPC
a valuable tool from both a cost reduction and a customer satisfaction standpoint
Statistical process control was pioneered by Walter A. Shewhart in the early 1920s.
W. Edwards Deming later applied SPC methods in the United States during World War II,
thereby successfully improving quality in the manufacture of munitions and other strategically
important products.
Source: Wikipedia
W. Edwards Deming
(Source: Wikipedia)
Genichi Taguchi
Major Contributions:
Alma mater
Kyushu University Known for Taguchi methods
Influences
Matosaburo Masuyama
The Control Charts
Control charts involves a range (BOUND) of acceptance of a parameter that relates
to the quality of a product
This range is defined by: Upper control limit (UCL) and Lower control limit (LCL)
FAIL - rejection
UCL
Measured Parameters
PASS
Reference Value of Parameter
e.g. the MEAN of the measured
PASS
LCL
FAIL - rejection
Once established, all further measured data fall in the bounded region
i.e., the green region are accepted
Whenever a further measured data falls outside the bounded region (the red),
the process is stopped for investigations on the causes for the failure
Construction of Control Charts
Control charts will be derived and constructed based on the MEASURED parameters
(dataset) of:
le c tion n=5 ns
se s n=4ts Stations n=6 nts
Sta
tio
le tion e
amp
Sta m
ents n=3 rem
en
Mea
s u re
m s ea s u re
m
n=2 as u
er M
e eter
do
m
er M m et Dia
Ra n
m et ia
Dia n=1D
K =1 K=2 K=k
Storage Bin
So, the size of the measured dataset = k x n (n=6 in the example)
One needs to compute: the MEAN (), and the STANDARD DEVIATION ()
The Three-Sigma Control Charts
This is the simplest control chart of all
It is constructed on the basis of the MEAN of the measured dataset
k-number samples are randomly picked up from a storage bin with produced
products
Take n-measurements on each sample with a mean value x on the sample
Compute the MEAN of the k x n measurement values = ,
and STANDARD DEVIATION =
The upper and lower control limits for quality control can be determined by:
3 3
LCLx = and UCLx = + (10.6)
n n
Graphical expression of 3- control charts:
The use of 3- control chart in quality control:
Once the chart is established with n measurements
Fail
UCLx form each of the k-number of samples, the quality
control engineer or technician will pick up samples
PASS from future productions and conduct same
n-measurements with a calculated sample mean
measured x. The sample is accepted if this value falls
within the bounds.
LCLx
If this value falls outside the bounds, the manufacturing
Fail process should be stopped, and the causes of failure
Need to be investigated with remedial actions.
Example 10.6
Quality control on the IC chip by measuring the output voltage using 3- control chart.
5 samples randomly picked up from a storage bin with IC chips mass produced from a process
3 measurement of voltage output (mV) from each sample, recorded as follows:
Sample 1: 2.25 3.16 1.80
Sample 2: 2.60 1.95 3.22 k=5
Sample 3: 1.75 3.06 2.45 n=3
Sample 4: 2.15 2.80 1.85
Sample 5: 3.15 2.76 2.20
The MEAN of the 15 measurements = 2.477 mV by Equation (10.1), and
the STANDARD DEVIATION = 0.5281 mV by Equation (10.3)
The upper and lower control limits of the dataset are computed from Equation (10.6):
3 3 x 0.5281 3 3 x 0.5281
LCL x
= = 2.477 = 1.5623 and UCL x
= + = 2.477 + = 3.3917
n 3 n 3
MEAN VALUES: x R = d2
or as computed
NOTE: The computed average sample range
R obtained by computed from sample
average = (R = d2 ) if the measured dataset fits perfectly with NORMAL distribution
Table 10.2 Factors for Estimating R and Lower and Upper Control Limits
(Ref: Rosenkrantz, W. A. Introduction to Probability and Statistics for Scientists and Engineers, McGraw-Hill, New York)
FAIL
UCL R
PASS
R
LCL R
FAIL
The application of the R-chart for quality control is similar to that with the
3- charts:
The range of the measured parameter of any sample after the R-chart is established
with values outside the bounds will result in the rejection of the sample. The manufacturing
process will be stopped for an investigation on the causes for the inferior quality.
Example 10.7
Use the R-chart for quality control in a process of IC-chip manufacturing described
in Example 10.6. The measurements of the IC chips output voltage at 3 leads on
each of the 5 samples are tabulated below:
Sample 1: 2.25 3.16 1.80
Sample 2: 2.60 1.95 3.22
k=5
Sample 3: 1.75 3.06 2.45
n=3
Sample 4: 2.15 2.80 1.85
Sample 5: 3.15 2.76 2.20
d2 = 1.693 D1 = 0 D2 = 4.36
PASS
R = 0.8941 mV
LCL R
=0
FAIL