0% found this document useful (0 votes)
70 views31 pages

Introduction To Statistics and Application in Engineering Analysis

This document discusses key concepts in statistics and their applications in engineering analysis. It covers topics such as histograms, measures of central tendency like mean and median, standard deviation, and control charts. Examples are provided to illustrate histograms and calculating mean, median and standard deviation. Statistics helps engineers make decisions and analyze data in the presence of uncertainty.

Uploaded by

wubied
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views31 pages

Introduction To Statistics and Application in Engineering Analysis

This document discusses key concepts in statistics and their applications in engineering analysis. It covers topics such as histograms, measures of central tendency like mean and median, standard deviation, and control charts. Examples are provided to illustrate histograms and calculating mean, median and standard deviation. Statistics helps engineers make decisions and analyze data in the presence of uncertainty.

Uploaded by

wubied
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

San Jose State University

Department of Mechanical and Aerospace Engineering

ME 130 Applied Engineering Analysis

Instructor: Tai-Ran Hsu, Ph.D.

Chapter 10

Introduction to Statistics
and Application in Engineering Analysis
Chapter Outline
What is Statistics?

The scope of statistics

Histogram and statistical data sets

Common terminologies in statistics

Normal distribution curve and normal distribution function

Statistical quality control

Statistical process control

The control charts


the 3- control charts
the sample range R-charts
Statistics What is it?

Statistics is the science of decision making


in a world full of uncertainties

Examples of uncertainty in the world:

A persons daily routine


The fate of human being
Persons career life
The rise and fall of stock market
The weather
Global economy
World politics

Uncertainties in engineering and technology:

Academic performance of this class


In design engineering: uncertainties in design methodologies,
material properties, fabrication techniques
Quality of the products
Market and sales of new and existing products
The Scope of Statistics

Statistics is concerned with using scientific methods in:

Collecting:
Data relating to certain events or physical phenomena
Most datasets involve numbers

Organizing:
All collected data will be arranged in logical and chronicle order for viewing and analyses
Datasets are normally organized in either ascending order or descending order.
Summarizing:
Summarizing the data to offer an overview of the situation
Presenting:
Develop a comprehensive way to present the dataset

Analyzing:
To analyze the dataset for the intended applications
The Mode of Statistical Dataset:

Statistical dataset can usually represented by the mode of the set

The mode of the dataset is represented by the number that appear in the dataset
most frequently:

For instances:

The set 2, 2, 5, 7, 9, 9, 9, 10, 10 11, 12, 18 has mode 9

The set:
1.75, 1.83, 1.85, 1.95, 1.97, 2.03, 2.03, 2.06, 2.13, 2.15, 2.15, 2.25, 2.35, 2.70, 2.70

has a triple mode of: 2.03, 2.15 and 2.70, as each of these numbers each appear
twice in the set

The set 3, 5, 8, 10, 12, 15 has no mode


The Histogram
Histogram is the most frequently used form to represent datasets

It is often referred to as Frequency distribution of a dataset

Procedure in establishing Histogram of a dataset:

1) Determine the largest and smallest numbers in the raw data.

2) Determine appropriate range (the difference between largest and smallest numbers)
to present the data

3) Divide the range into a convenient number of intervals having the same size (value).
If this is not feasible, use the intervals of different sizes of open intervals.

4) Determine the number of observations falling into each set interval,


i.e. find the data frequency.
Example on histogram:
Given a dataset on the length of lead wires in mm that a testing engineer measured from
3 different leads of a batch of 5 IC chips:
Sample 1: 2.15 2.35 1.95
Sample 2: 2.70 1.83 2.25
Sample 3: 1.97 2.03 2.13
Sample 4: 2.06 2.70 2.15
Sample 5: 2.03 1.75 1.85
Establish a histogram, or frequency of distribution of the measured lengths
We have observed from the dataset with minimum measured value at 1.75 mm and
maximum measured length 2.7 mm. So the range of the dataset is: (1.75 to 2.7 mm)
Next, we conveniently divide this range into 10 intervals, as follows:
Ranges 1.7-1.8 1.8-1.9 1.9-2.0 2.0-2.1 2.1-2.2 2.2-2.3 2.3-2.4 2.4-2.5 2.5-2.6 2.6-2.7
(mm)
Sample 1 2 2 3 3 1 1 0 0 2
Number
Number of Samples

The corresponding histogram is:


3 3

2 2 2

1 1 1

1.7 1.9 2.1 2.3 2.5 2.7 Measured Length, mm


1.8 2.0 2.2 2.4 2.6
Example on histogram:
The marks that students in the ME 130 class in Spring 2003 are tabulated in 10 intervals as:

Test scores 45- 50- 55- 60- 65- 70- 75- 80- 85- 90-
49 54 59 64 69 74 79 84 89 94

Frequency 1 2 2 5 11 9 10 5 4 2

The corresponding histogram, or the frequency distribution of the students marks are:

(11)
(9) (10)
Number of Students

(5) (5)
(4)

(2)(2) (2)
(1)

Marks
65-69

75-79
45-49

55-59

85-89
60-64
50-54

70-74

80-84

90-94
Terminologies in Statistics for Engineering Analysis
The Mean
The Mean of a a dataset is the arithmetic average of the data in the set
It is a good way to represent the Central tendency of the set
Mathematically, we may express the Mean of a dataset in the following way:

Given the dataset of: x1 , x2 , x3 ,...................., xn


with n = total number of data in the set n

The arithmetic mean of the set may be Summation of all data x i

computed by the following expression:


x = = i =1
Total number of data n
Advantage of using Mean in statistical analysis
It includes ALL data in the set
It always exists
It is usually reliable in representing the central tendency of the data set

Disadvantage of using the Mean is that it loses its sense of representing the
central tendency when a few out-ranged data are present in the set.
For example, the Mean of a set of 2,3,5,7,9,11,13 is 7.14, which is a close number
representing the central value of the set.

This value becomes 15.71 if the last data of the set of 7 data become 73, i.e.: 2,3,5,7,9,11,73,
- Not a good representation of the central tendency of the data set.
The Median

In cases in which data in the set shows significant amount of Out-ranged data,
the Median meaning the central data is used to show the central tendency of the set.

For example, the same data set in the previous example with 7 data: 2,3,5,7,9,11, 73

We may take the data in the central of the set, i.e., 7 to be the Median representing the
central tendency of the dataset.

The central data is readily identified in a set with odd number of data.

For the set with even number of data, the Median is the average of the two central data.

For example, the Median of the dataset: 5, 9, 11, 14, 16, 19 is (11+14)/2 = 12.5

Like Mean, the median of a dataset exists at all times. It is often a better way to express
the central tendency of datasets with out-ranged data, such as the real estate price in
Santa Clara Valley in California, in which significant number of substantially out-ranged
house prices exist.
Deviation and Standard Deviation ()

Because the Mean of a dataset indicates its central tendency, it is often required to
measure how some data in the set deviates from its mean value x

We may express such deviation of each data in the set as:


x1 x
The total variation of individual data from the mean is:
x2 x
. [(x 1 x ) + ( x2 x ) + ........ + ( xn x )] = 0
In reality, of course, the total deviation of individual data
. to its mean CANNOT be zero, as the above mathematical
expression shows.
xn x We thus need to derive another mathematical expression
that will not result zero in the summation

We realize the reason the total summation of individual deviation to be zero in


the expression:
[(x 1 x ) + ( x2 x ) + ........ + ( xn x )] = 0
is because the first half of the content in the summation carry ve signs, which cancel
that of the second half with +ve signs. The sum of these two groups of variations is
thus ZERO
In view the physical values for the deviation should not carry +ve or ve signs, we can
avoid recurrence of the above summation being zero by the following modification:

[(x ] (x x )
n
x ) + ( x2 x ) + ........( xn x ) =
2 2 2 2
1 i
i =1

We will observe that no term in the above expression may result in ve value and thus avoid
a zero total deviation of the dataset.

From which we define the Standard Deviation () of a dataset to be:


n

( xi x )
2

(10.2)
= i =1

n 1

The Variance which indicates the deviation of the dataset is:


n

( xi x )
2
(10.3)
=
2 i =1

n 1
Example 10.4

To determine the standard deviation () and the variance (2) of the dataset:

{5 9 11 14 19} with n = 5

We computer the mean to be: x = 11.6


We will then compute (xi x ) x1 x = 5 11.6 = 6.6
x2 x = 9 11.6 = 2.6
x3 x = 11 11.6 = 0.6
x4 x = 14 11.6 = 2.4
x5 x = 19 11.6 = 7.4

( 5 11.6 ) + ( 9 11.6 ) + (11 11.6 ) + (14 11.6 ) + (19 11.6 )


2 2 2 2 2
Leading to: = = 5.27
5 1

The variance of the dataset is: 2 = (5.27)2 = 27.8


The Normal Distribution Curve

We have shown the mark distribution of 52 students in a class of Engineering Analysis


In a histogram as:
(11)
(9) (10)

Number of Students
(5) (5)
(4)

(2)(2) (2)
(1)
Number of Students
Marks

65-69

75-79
45-49

55-59

85-89
(11)

60-64
50-54

70-74

80-84

90-94
(9) (10)

If we show the above histogram by shifting the


vertical axis from the left edge to the mean
value of the marks shown in the horizontal axis, (5) (5)
(4)
we will have the same histogram but with
a distribution shown as (2)(2) (2)
(1)
The solid curve links the peak values of mark intervals
in the histogram Marks

65-69

75-79
45-49

55-59

85-89
The histogram shown by a solid line curve with

60-64
50-54

70-74

80-84

90-94
its population located at the MEAN is called
NORMAL distribution of a statistical dataset
Mean
The Normal Distribution Function

Normal distribution function is a mathematical expression for the distribution of


statistical datasets that are commonly happen in real world a BELL shape
histogram with vertical axis normalized and located at the mean of the data set

So, it has a great value in Statistical ANALYSES of many real-world cases


y = f(x)

x2
1

2
e 2S 2

x
-3 -2 -1 0 1 2 3

The NORMAL DISTRIBUTION FUNCTION has a mathematical expression of:


2
1x x
1
f ( x) = e 2 (10.5)

2
where = the standard deviation of the data set given in Equation (10.2), and x
is the mean of the set
Properties of Normal Distribution Curve
With the help of the mathematical expression of normal distribution function, we are able
to come up with the following important but interesting properties from mathematical
analysis:

1)The data distribution is SYMMETRICAL about the mean

2)The percentage (%) of ALL data included are:

68.26% with the mean one standard derivation ()


94.4% with the mean 2
99.73% with the mean 3

Example 10.5:

A company produced 150,000 tires


The average life measured from 15,000 (i.e. 10%) tires is: 42,000 miles, with
a standard deviation = 3000 miles

From the properties of normal distribution curve, which we assume it fits the measured
tire lives, we will come to the following observations:
68.26% cars had tire life of: = 42000 3000 miles,
94.40% cars had tire life of: 2 = 42000 6000 miles, and
99.73% cars had tire life of : 3 = 42000 9000 miles.
Statistical Quality Control

Quality of products involves;

Appearance of the product customers acceptance


Fitting of components too tight or loose fitting of parts results in premature failure
Delivery of expected performance from poor quality
Delivery of expected performance for the expected life the reliability issue

A SIMPLE FACT: Cost and Reliability are two fundamental requirements


for market success of any product
- Quality assurance is the key to Reliability of any product

Principal causes of poor quality of products:

Poor design in setting dimensions, tolerances, surface finishing,


improper selection of materials, etc.
Manufacturing and fabrication processes relating to improper machining,
assembly, testing and inspection.
Improper conditions of machine tools and fabrication process control.
Poor workmanship in all above production processes.
Costs Associated with Quality of Products
Quality is the decisive factor for the success of any product in marketplace

Poor quality will not only result in failure of the products in the marketplace, but also
cause costly liability and recalls. For this reason, companies normally include the
following costs included in their setting the price of the products:

Warranty costs (a major cost factor)


Scrap and rework costs
Inspection
Prevention cost

An obvious way to ensure quality of any product is to conduct thorough INSPECTION of


the manufacturing, and TESTING on the finished products Both cost MONEY!!
Product Value
Cost of Quality Improvement

It makes sense to ensure that:


Cost of
Quality
Cost
Increase of Value > Increase of cost
A quote from an experienced salesman
Product from a local Sears store on dishwasher:
Value
Value The manufacturer of dishwashers DESIGN
their dishwashers for a life of TWO years = the
product life under the warranty by the company.

Desired Quality Improvement


Quality
Improvement
Statistical Process Control
We used the statement:
conduct thorough INSPECTION of the manufacturing, and TESTING on
the finished products for perfect quality assurance of products

The question is How thorough should we conduct those inspections and testing
to ensure good quality in typical MASS PRODUCTION environment?

If we focus our attention on the inspection and testing of the FINISHED product,
A question is How many PIECES of the product should we pick up for
inspection, and also how many TESTING points we should select on each piece
in order to achieve credible quality control of the batch of the products from a
mass production process?

Here comes the value of statistical method for determining:

the number of samples for quality assurance, and


the number of inspection on each sample

for quality assurance of products in MASS PRODUCTION environment


It is called Statistical Process Control
Statistical Process Control

In general, Statistical process control (SPC) is an effective method of monitoring


a process through the use of control charts.

Control charts enable the use of objective criteria for distinguishing background variation
from events of significance based on statistical techniques

Variations in the process that may affect the quality of the end product or service
can be detected and corrected, thus reducing waste as well as the likelihood
that problems will be passed on to the customer. With its emphasis on early detection
and prevention of problems

SPC can lead to a reduction in the time required to produce the product or service from
end to end

Process cycle time reductions coupled with improvements in yield have made SPC
a valuable tool from both a cost reduction and a customer satisfaction standpoint

Statistical process control was pioneered by Walter A. Shewhart in the early 1920s.
W. Edwards Deming later applied SPC methods in the United States during World War II,
thereby successfully improving quality in the manufacture of munitions and other strategically
important products.

Source: Wikipedia
W. Edwards Deming

Dr. Deming was an American statistician, professor, author,


lecturer, and consultant

He taught top management how to improve design


(and thus service), product quality, testing and sales
(the last through global markets) through various methods,
including the application of statistical methods.

Born He is regarded as having had more impact upon


October 14, 1900) Japanese manufacturing and business than any
Sioux City, Iowa, USA other individual not of Japanese heritage.

Died Despite being considered something of a hero in Japan,


December 20, 1993 (aged 93) he was only just beginning to win widespread recognition
Washington DC, USA in the U.S. at the time of his death.

Fields While working under Gen. Douglas MacArthur as a census


Statistician consultant to the Japanese government, he famously taught
statistical process control methods to Japanese business leaders,
Alma mater returning to Japan for many years to consult and to witness economic
University of Wyoming growth that he had predicted as a result of application of techniques
-BSEE (1921), learned from Walter Shewhart at Bell Laboratories. Later, he
University of Colorado became a professor at New York University while engaged as an
-MS Math (1925), independent consultant in Washington, D.C.
Yale University
-PhD Physics (1928)

(Source: Wikipedia)
Genichi Taguchi

Major Contributions:

The philosophy of off-line quality control, designing


Born products and processes so that they are insensitive
January 1, 1924 ("robust") to parameters outside the design engineer's
control; and
Birth place:
Tokamachi, Japan Innovations in the statistical design of experiments,
notably the use of an outer array for factors that are
CitizenshipJapan uncontrollable in real life, but are systematically
varied in the experiment
Fields
engineering, statistics Lean manufacturing or lean production in Toyota is
The best success of using the Taguchi method
Institutions
Aoyama Gakuin University

Alma mater
Kyushu University Known for Taguchi methods

Influences
Matosaburo Masuyama
The Control Charts
Control charts involves a range (BOUND) of acceptance of a parameter that relates
to the quality of a product

This range is defined by: Upper control limit (UCL) and Lower control limit (LCL)

FAIL - rejection
UCL
Measured Parameters

PASS
Reference Value of Parameter
e.g. the MEAN of the measured

PASS

LCL
FAIL - rejection

Once established, all further measured data fall in the bounded region
i.e., the green region are accepted
Whenever a further measured data falls outside the bounded region (the red),
the process is stopped for investigations on the causes for the failure
Construction of Control Charts

The fundamental assumption is:


All measured parameters (data) fall into a Normal (Bell shape) Curve
So, the normal distribution function in Equation (10.5) can be used as the basis
for mathematical derivations

Control charts will be derived and constructed based on the MEASURED parameters
(dataset) of:

Sample size = the number of selected sample from a batch = k


Number of measurements on EACH sample = n

le c tion n=5 ns
se s n=4ts Stations n=6 nts
Sta
tio
le tion e
amp
Sta m
ents n=3 rem
en
Mea
s u re

m s ea s u re
m
n=2 as u
er M
e eter
do
m
er M m et Dia
Ra n
m et ia
Dia n=1D

K =1 K=2 K=k

Storage Bin
So, the size of the measured dataset = k x n (n=6 in the example)

One needs to compute: the MEAN (), and the STANDARD DEVIATION ()
The Three-Sigma Control Charts
This is the simplest control chart of all
It is constructed on the basis of the MEAN of the measured dataset
k-number samples are randomly picked up from a storage bin with produced
products
Take n-measurements on each sample with a mean value x on the sample
Compute the MEAN of the k x n measurement values = ,
and STANDARD DEVIATION =
The upper and lower control limits for quality control can be determined by:
3 3
LCLx = and UCLx = + (10.6)
n n
Graphical expression of 3- control charts:
The use of 3- control chart in quality control:
Once the chart is established with n measurements
Fail
UCLx form each of the k-number of samples, the quality
control engineer or technician will pick up samples
PASS from future productions and conduct same
n-measurements with a calculated sample mean
measured x. The sample is accepted if this value falls
within the bounds.
LCLx
If this value falls outside the bounds, the manufacturing
Fail process should be stopped, and the causes of failure
Need to be investigated with remedial actions.
Example 10.6
Quality control on the IC chip by measuring the output voltage using 3- control chart.
5 samples randomly picked up from a storage bin with IC chips mass produced from a process
3 measurement of voltage output (mV) from each sample, recorded as follows:
Sample 1: 2.25 3.16 1.80
Sample 2: 2.60 1.95 3.22 k=5
Sample 3: 1.75 3.06 2.45 n=3
Sample 4: 2.15 2.80 1.85
Sample 5: 3.15 2.76 2.20
The MEAN of the 15 measurements = 2.477 mV by Equation (10.1), and
the STANDARD DEVIATION = 0.5281 mV by Equation (10.3)
The upper and lower control limits of the dataset are computed from Equation (10.6):
3 3 x 0.5281 3 3 x 0.5281
LCL x
= = 2.477 = 1.5623 and UCL x
= + = 2.477 + = 3.3917
n 3 n 3

Application of the control chart:


Fail
3.3917 mV
Average measured voltage output from any
PASS future randomly selected sample should fall
= 2.477 mV within the bounds In the chart.

Any sample fails to have its average measured


1.5623 mV output fall outside the bounds will be rejected,
Fail and the process will be halted for further
inspection.
Control Charts for Sample Range - The R-chart

Difference of 3- control charts and R- control charts:


3- charts: Based on the MEAN of the k x n measured dataset
The R-charts: Based on the range, i.e. the difference of the MAX and MIN
of the measured values
The R-chart is established on NORMAL distribution of measured parameters
Working Sheet for R-Cart
Sample Measured Parameters Sample Mean
x R
k=1 x1 x2 x3 x4 x5.xn x k =1 xmax - xmin
k=2 x
k =2



k=k

MEAN VALUES: x R = d2
or as computed
NOTE: The computed average sample range
R obtained by computed from sample
average = (R = d2 ) if the measured dataset fits perfectly with NORMAL distribution
Table 10.2 Factors for Estimating R and Lower and Upper Control Limits
(Ref: Rosenkrantz, W. A. Introduction to Probability and Statistics for Scientists and Engineers, McGraw-Hill, New York)

No. of Measurements Factor, d2 Coefficient, D1 Coefficient, D2


On Each Sample, n
2 1.128 0 3.69
3 1.693 0 4.36
4 2.059 0 4.70
5 2.326 0 4.92
6 2.534 0 5.08
7 2.704 0.20 5.20
8 2.847 0.39 5.31
9 2.970 0.55 5.39
10 3.075 0.69 5.47
11 3.173 0.81 5.53
12 3.258 0.92 5.59
13 3.336 1.03 5.65
14 3.407 1.12 5.69
15 3.472 1.21 5.74

The Lower control limit: LCL R


= D1 and the Upper control limit:UCL R = D2
where = standard deviation of the k x n dataset of the measured parameter
The R-Chart on Sample Range:

FAIL
UCL R

PASS
R
LCL R
FAIL

The application of the R-chart for quality control is similar to that with the
3- charts:

The range of the measured parameter of any sample after the R-chart is established
with values outside the bounds will result in the rejection of the sample. The manufacturing
process will be stopped for an investigation on the causes for the inferior quality.
Example 10.7

Use the R-chart for quality control in a process of IC-chip manufacturing described
in Example 10.6. The measurements of the IC chips output voltage at 3 leads on
each of the 5 samples are tabulated below:
Sample 1: 2.25 3.16 1.80
Sample 2: 2.60 1.95 3.22
k=5
Sample 3: 1.75 3.06 2.45
n=3
Sample 4: 2.15 2.80 1.85
Sample 5: 3.15 2.76 2.20

The working chart for the R-range is:


Sample Measured Voltages (mV) Mean value, x Sample Range
k=1 2.25 3.16 1.80 2.4033 1.36
2 2.60 1.95 3.22 2.5900 1.27
3 1.75 3.06 2.45 2.4200 1.31
4 2.15 2.80 1.85 2.2667 0.95
5 3.15 2.76 2.20 2.7033 0.95
Total k = 5 Total n = 3 Mean, = 2.477 R = 1.168
( from dataset)
The STANDARD DEVIATION for the dataset of 15 is calculated using Equation (10.2)
to be = 0.5281
With n = 3, we find the coefficients d2, D1 and D2 from Table 10.2 to be:

d2 = 1.693 D1 = 0 D2 = 4.36

From which, we will calculate:

The average of sample range


R = d 2 = 1.693 x 0.5281 = 0.8941
which is not the same value as computed from the dataset (= 1.168). This is because
The dataset is not a good fit to the NORMAL distribution curve. A larger sample size
may improve this situation
The Lower control limit: LCLR = D1 = 0 x 0.5281 = 0, and

The Upper control limit: UCLR = D2 = 4.36 x 0.5281 = 2.3025

The R-chart is:


FAIL
UCL R
= 2.3025 mV

PASS
R = 0.8941 mV

LCL R
=0
FAIL

You might also like