Lecture 2 Statistics For QA and QC
Lecture 2 Statistics For QA and QC
Reference:
Review of Basic Statistics (Quality Improvement, Chapter 5 & 8, Besterfield, 9th)
2
1. Introduction of Statistics
What is statistics?
The science that deals with the collection, tabulation, analysis, interpretation, and
presentation of quantitative data.
3
1.1 Data collection: Accuracy vs Precision
Accuracy
How well a measurement agrees with an accepted value. It is the
deviation of measured or observed average value from the true
mean value.
Precision
How well a series of measurements agree with each other. It is a
measure of consistency of the system.
4
1.2 Measures of Central Tendency
A numerical value which describes the central position of the data.
n
x i
• Average: x= i =1
n
• Mode: the value that occurs with the highest frequency in a set of
numbers
e.g. 2, 4, 4, 5, 6, 6, 8, 8, 8,11
i
( x − x ) 2
s= i =1
n −1
7
Content
1 Introduction of Statistics
2 Fundamentals of Probability
3 Probability Models for QA/QC
4 Central Limit Theorem
5 Hypothesis Testing
Reference:
Review of Basic Statistics (Chapter 4 &7, Besterfield, 7th)
8
2. Fundamentals of Probability
What is probability?
It describes the chance that a specific event will occur, synonyms as
likelihood, tendency, and trend.
NA
Probability is defined by this equation: P( A) =
N
where P(A) = probability of an event A occurring to 3 decimal places
NA = number of successful outcomes of event A
N = total number of possible outcomes
• Theorem 2
If P(A) is the probability that an event will occur, then the probability
that A will not occur is 1.000 - P(A).
• Theorem 5
The sum of the probability of the events of a situation is equal to 1.000.
10
2.1 Theorems of Probability (2)
P(A or B) = P(A) + P(B) P(A or B) = P(A) + P(B) – P(both) P(A and B) = P(A) * P(B) P(A and B) = P(A) * P(B|A)
Note:
o Mutually exclusive means that the occurrence of one event makes the other event impossible.
o P(B|A) is the (P) of B calculated knowing that A has occurred.
11
Example 4
Data: Food inspection results by supplier
Supplier Conforming Nonconforming Total
X 50 3 53
Y 125 6 131
Z 75 2 77
Total 250 11 261
If 261 food products are contained in a box and we need to pick 1 to 2 samples from it, what
is the probability of selecting
• An item produced by supplier X or Z?
• A conforming item produced by X or Z?
• An item produced by X or being nonconforming?
• 2 items with 1st from X and 2nd from Y (with replacement)?
• 2 items with 1st from X and 2nd from Y (without replacement)?
12
Example 4
Data: Inspection results by supplier
Supplier Conforming Nonconforming Total
X 50 3 53
Y 125 6 131
Z 75 2 77
Total 250 11 261
If 261 food products are contained in a box, what is the probability of selecting
• An item produced by supplier X or Z? P(X) + P(Z) = 53/261 + 77/261 = 0.498
• A conforming item produced by X or Z? P(co. X) + P(co. Z) = 50/261 + 75/261 = 0.479
13
Example 4
Data: Inspection results by supplier
Supplier Conforming Nonconforming Total
X 50 3 53
Y 125 6 131
Z 75 2 77
Total 250 11 261
If 261 food products are contained in a box, what is the probability of selecting
• An item produced by X or being nonconforming?
P(X) + P(nc.) – P(X and nc.) = 53/261 + 11/261 – 3/261 = 0.234
14
Example 4
Data: Inspection results by supplier
Supplier Conforming Nonconforming Total
X 50 3 53
Y 125 6 131
Z 75 2 77
Total 250 11 261
If 261 food products are contained in a box, what is the probability of selecting
• 2 items with 1st from X and 2nd from Y (with replacement)?
P (X and Y) = P(X) × P(Y) = (53/261) ×(131/261) = 0.102
• 2 items with 1st from X and 2nd from Y (without replacement)?
P (X and Y) = P(X) ×P(Y|X) = (53/261) × (131/260) = 0.102
15
2.2 Counting of Events
NA
The calculation of probability is based on the number of possible outcomes, P( A) =
which sometimes are difficult to find. N
1) Simple Multiplication
If an event A can happen in any of a ways, and after it has
occurred, another event B can happen in b ways, the number of
ways that both events can happen is ab.
A witness to a hit-and-run accident remembered the first 3 digits of the
license plate out of 5 and noted the fact that the last 2 were numerals.
How many owners of automobiles would the police have to investigate?
16
2.2 Counting of Events
2) Permutation
Ans: AB AC BC BA CB CA
17
2.2 Counting of Events
2) Permutation
It is the number of outcomes of arranging n objects when taken r
at a time, noted as
Total no. of objects
n! n(n − 1)(n − 2)(n − 3)...
P = n Pr =
n
=
(n − r )! (n − r )(n − r − 1)(n − r − 2)(n − r − 3)...
r
18
2.2 Counting of Events
2) Permutation
In the license plate example, suppose the witness further remembers that the
numerals were not the same, how many owners of automobiles would the
police have to investigate?
10! 10 ∙ 9 ∙ 8 ∙∙∙ 1
Ans: 𝑃210 = 10𝑃2 = = = 90
10 − 2 ! 8 ∙ 7 ∙∙∙ 1
19
2.2 Counting of Events
3) Combination
It is the number of outcomes of arranging n objects when
taken r at a time, without regarding the order of selection,
noted as
n!
C = n Cr =
n
r!(n − r )!
r
20
2.2 Counting of Events
3) Combination
An interior designer has five different colored chairs and will use
three in a living room arrangement. How many different combinations
are possible?
5! 5∙4∙3∙2∙1
Ans: 𝐶35 = 5𝐶3 = = = 10
3! 5 − 3 ! 3∙2∙1∙2∙1
21
Content
1 Introduction of Statistics
2 Fundamentals of Probability
3 Probability Models for QA/QC
4 Central Limit Theorem
5 Hypothesis Testing
Reference:
Review of Basic Statistics (Chapter 4 &7, Besterfield, 7th)
22
3. Probability Models for QA/QC
• Continuous distribution:
➢ Normal distribution (Gaussian distribution)
• Discrete distribution:
➢ Binomial distribution
➢ Poisson distribution
➢ Hypergeometric distribution
23
3.1 Normal Distribution
• Determined by µ & , independently • The larger the , the flatter the curve.
Normal curve with different means but Normal curve with different standard
identical standard deviation deviations but identical means
25
3.1 Normal Distribution
• All normal distributions of continuous e.g. Consider the value 35, Z = (35 – 29)/6 = +1
variables can be converted to the
standardized normal distribution by µ = 29
using the standardized normal value Z. σ=6
f (Z ) = e 2
2 2
Standardized normal distribution
26
3.1 Normal Distribution
99.73%
27
3.1 Normal Distribution
Calculation of probability
• The area under the normal curve is the • Essential first step: determine Z value using
probability of occurrence. xi −
Z=
• The area can be calculated by normal curve
formula using calculus, but not necessary.
• Probability of X xi can be made as
1 −Z
2
f (Z ) = e 2
xi − xi −
2 2 P( X xi ) = P( Z ) = ( )
• The % of data under the curve for various Z
values are given in the table of Normal
Distribution (Appendix Table A in Dale “Quality
Improvement”).
xi
z 0
28
3.1 Normal Distribution
Calculation of probability
• The table of “Areas Under the Normal
Curve” is a left-reading table, i.e., the given
area is from -∞ to a particular z value.
e.g., P (Z ≤ -0.72) = 0.2358, which means
23.58% of the data are less than -0.72.
Notes:
xi − xi −
P( X xi ) = P( Z ) = ( )
P(X xi) = 1 - P(X xi)
P(xi X xj) = P(X xj) - P(X xi)
29
xi xi xj
3.1 Normal Distribution
Calculation of probability
The mean value of the weight of a particular brand of cereal for the past year is 0.297 kg with a
standard deviation of 0.024 kg. Assuming normal distribution, find the % of the data that falls
1) below the lower specification limit of 0.274 kg
2) above the higher limit of 0.350 kg, or
3) between the lower and higher limit.
30
3.1 Normal Distribution
Calculation of probability
The mean value of the weight of a particular brand of cereal for the past year is 0.297 kg with a
standard deviation of 0.024 kg. Assuming normal distribution, find the % of the data that falls
1) below the lower specification limit of 0.274 kg
𝑋1 − 𝜇 0.274 − 0.297
Ans: 𝑍1 = = = −0.96
𝜎 0.024
P(X 0.274) = P(Z -0.96) = 0.1685 = 16.85%
31
3.1 Normal Distribution
Calculation of probability
The mean value of the weight of a particular brand of cereal for the past year is 0.297 kg with a standard
deviation of 0.024 kg. Assuming normal distribution, find the % of the data that falls
1) below the lower specification limit of 0.274 kg
2) above the higher limit of 0.350 kg, or
3) between the lower and higher limit.
𝑋2 − 𝜇 0.350 − 0.297
Ans: 2) 𝑍2 = 𝜎
=
0.024
= +2.21
• Discrete distributions:
➢ Binomial distribution
➢ Poisson distribution
➢ Hypergeometric distribution
33
3.2 Binomial Distribution
34
3.2 Binomial Distribution
• There are an infinite number of items or a steady stream of items coming from a work center.
• The process consists of a sequence of n independent trials, where the outcome of each trial is
either a success or a failure.
• If the probability of failure is denoted by p, and that of success by 1-p. The probabilities p and
1-p remain constant for each trial.
Then the probability function of x failure in n trials is referred to as the Binomial distribution:
P( x) = C xn p x (1 − p) n − x 1 2 3 4 5
x = 2,
x = 0, 1, 2,…, n n=5
p p 1-p 1-p 1-p
𝑝 𝑥 = 𝐶25 𝑝2 (1 − 𝑝)3 35
3.2 Binomial Distribution
Ans: p = 0.10, n = 5, x = 1
𝑃 𝑥 = 𝐶𝑥𝑛 𝑝 𝑥 1 − 𝑝 𝑛−𝑥
36
3.2 Binomial Distribution
Use the same example as the milk product manufacturing factory with a steady stream
of products and nonconforming probability of 0.10. What is the probability that no
more than 2 nonconforming milk products in a sample of 5?
37
3.3 Poisson Distribution
x e −
P( x) = x = 0, 1, 2, ...
x!
38
3.3 Poisson Distribution
as average count x e −
P( x) = x = 0, 1, 2, ...
x!
x is the count of any single event
Example:
is the average non-conforming beers in all boxes of beer product
x is 1 non-conforming beer in one sample box of beer
Note: The opportunities for the occurrence of an event should be equal and independent.
39
3.3 Poisson Distribution
The Poisson distribution can sometimes be used as approximation to the binomial distribution.
For example: In a sample of 100 bottles of beers (n =100), the nonconforming probability of beer is 0.02
(p = 0.02) based on previous record. The average number of nonconforming beers in a sample = np =
2. What is the probability of 1 nonconforming beer in a sample? (Hint: put = 2, x = 1 into the formula )
x e −
P( x) = x = 0, 1, 2, ...
x!
Note: The Poisson distribution can be used as approximation to the binomial distribution, when n is
very large and p approaches to 0, such that np = is a constant and (1 - p) approaches 1. In QA & QC,
p is usually very small.
Binomial and Poisson probability distributions are the basis for attribute control charts and
acceptance sampling.
40
3.3 Poisson Distribution
The probability for x a in Poisson distribution is:
a x e −
P( x a) =
x =0 x!
Ans:
2 3 x e −3
P( x 2) = = P(0) + P(1) + P(2) = 0.049 + 0.150 + 0.224 = 0.423
x =0 x!
41
Binomial & Poisson Distribution
The proportion of fish cans that are nonconforming is 0.04, calculate the probability of 3 or
fewer nonconforming fish cans in a sample of 100.
3
Ans: Binomial Distribution P( x 3) = C 100
x ( 0.04) x
(1 − 0.04)100 − x
x =0
Total combinations
N-D
The probability for x a in a hypergeometric distribution is:
D
a C DC N − D
P( x a) = x n − x
x =0 CnN
N = 9, D = 3
n = 4, x = 1
C13C49−−13
Ans: P (1) = 9
= 0.476
C4
45
Example
A company produces egg cartons with 24 eggs in each carton. One small bakery shop purchased
one carton and would like to test 4 eggs in it, and if all of them are fresh, the bakery shop will
purchase the egg brand in the future. If the carton purchased has 3 nonconforming eggs, what is
the probability of rejection by the bakery shop?
3 24 −3
Ans: N = 24, D = 3 Hypergeome C 0C 4 − 0
tric : P(0) =
Acceptance = 0.563
C24
4
n = 4, x = 0
Rejection:= 1 − P (0) = 1 − 0.563 = 0.437
rejection
1 1
Binomial:: P (0) = C xn p x (1 − p ) n − x = C04 ( )0 (1 − ) 4−0 = 0.586
Binomial
8 8
( 0.5) 0 e −0.5
Poisson:: x = 0, = 4 * 18 = 0.5, so P(0) =
Poisson = 0.607
0!
If the acceptance criterion allows no more than one defective egg in a carton, what is
the probability of acceptance? Ans: P(x ≤ 1) = P(0) + P(1) = 0.563 + 0.375 = 0.938
46
Distribution Inter-relationship
Hypergeometric
Binomial
p ≤ 0.1, n large
Poisson
47
Content
1 Introduction of Statistics
2 Fundamentals of Probability
3 Probability Models for QA/QC
4 Central Limit Theorem
5 Hypothesis Testing
Reference:
Review of Basic Statistics (Chapter 4 &7, Besterfield, 7th)
48
4. Central Limit Theorem
Data on the length of chocolate bars
Each individual measurements are coded from 6.00 mm,
so 6.35 is recorded as 35.
49
4. Central Limit Theorem
50
4. Central Limit Theorem
Consider the frequency distribution of non-normal population and average distribution of
sample sizes 2, 3, 4, 5, 6.
51
4. Central Limit Theorem
If the population from which samples are taken is not normal, the
distribution of sample average will tend towards normality
provided that the sample size n is at least 4 (n 4).
52
Content
1 Introduction of Statistics
2 Fundamentals of Probability
3 Probability Models for QA/QC
4 Central Limit Theorem
5 Hypothesis Testing
Reference:
Review of Basic Statistics (Chapter 4 &7, Besterfield, 7th)
53
5. Hypothesis Testing
To get results from statistical analysis, we first make a statement or claim, i.e.,
Hypothesis, about the variable of interest, which we believe to be true.
Example A: we believe that the diameters of sausages have a true mean of 6 mm.
The following hypothesis could be formed before the samples were taken:
H0: x = 6 null hypothesis Currently accepted value for a parameter
Note: These two states of nature (H0 and H1) cannot be true simultaneously.
54
5. Hypothesis Testing
Example: Please find the H0 and H1
55
Statistical Hypothesis - Errors
With a ‘YES/NO’ type of question, there are two ways of being wrong:
• Deciding ‘NO’ when ‘YES’ is correct → Type I error → We reject a null hypothesis when
the null hypothesis is in fact true.
e.g., The probability that a lot of conforming units is rejected or a process producing
acceptable units is stopped (). Producer's risk
• Deciding ‘YES’ when ‘NO’ is correct → Type II error → We accept a null hypothesis
when the null hypothesis is in fact false.
e.g., The probability that a lot of non-conforming units is accepted or a process producing
inferior units is allowed to continue (). Consumer's risk
56
Statistical Hypothesis - Errors
Ho true Ho false
Accept Ho Right decision Type II error ()
α type error and β type error are important terms in acceptance sampling.
57
Summary
• Distinguish the important terms such as variable & attribute, sample &
population, permutation & combination;
• Master theorems of probability;
• Understand normal distribution for continuous data, mean and standard
deviation, standardization method, probability calculation;
• Understand probability models for discrete data (Binomial, Poisson and
Hypergeometric distribution), the prerequisite for use, probability calculation;
• Have basic understanding of central limit theorem;
• Distinguish null & alternative hypothesis, type I (α type) & type II (β type)
errors in hypothesis testing.
58