Basic Statistics
Basic Statistics
Concepts in
Lean Six Sigma
ANTHONY JAMES H. VIZMANOS, PFT, RChT, SSGB
Derived from Six Sigma Green Belt course of the University System of Georgia – Kennesaw State University
Part 1
BASIC CONCEPTS OF
STATISTICS
Statistics
u Variation
u percentile
Collecting and Summarizing Data
Types of Data
u Nominal - has no numeric meaning or any numeric order
u Ordinal – there are no interval between the numbers but the numbers
represent a rank or an order of sort
u Interval – intended to be values where the scale is equal distance from
each other
u Ratio – there is a zero balance point in the data
u Locational – useful in a production line to capture the location where
defects occur
Other types of data
u Qualitative – non-numerical data
u Quantitative – numerical data, either continuous or discrete
u Continuous data – measured in a continuous scale
u Discrete data – limited and is typically used in counting
Data Collection Techniques
u Surveys
u Face-to-face interviews
u Focus group discussions (FGDs)
u Emails and websites
u Customer feedback
u Test marketing, mystery shopper
u Suggestion boxes
u Automatic data capture (barcodes, videos, etc)
Types of Statistical Errors
u Statistical error – errors occurred when collecting data
u Sampling error – errors related to the nature or size of the selected
sample
u Non-sampling error – errors related to the collection of data
u Measurement error – difference in the scale of measurement and difference
in the rounding off procedure
u Non-response error – errors derived when respondents do not respond to a
single or multiple questions in the survey
u Misinterpretation error – error derived when respondents misinterpret the
questions in the survey
u Sampling bias error – due to conflict of interest and some respondents may
not be included in the survey
u Instrumental error – error due to the limitations and/or sensitivity of the
instrument used in measuring an observation where data is derived
u Human error – due to human mistakes committed during data
gathering and interpretation
Types of Sampling
Complement rule
u The probability that Event A will not occur = N
u The probability that Event A will occur = Y
u N<1–Y
Mutually Exclusive and
Independent Events
Mutually Exclusive
u If occurrence of any one of these events excludes the occurrence
of others, these events are called mutually exclusive or disjoint
events.
u Events A and B are mutually exclusive or disjoint if they cannot occur
simultaneously.
Independent event
u If occurrence of one event does not change the probability of
another event occurring, the two events are said to be
independent events of each other.’
u Events A and B are independent if the probability of B is not affected by
Event A occurring.
Additional Rules for Probability
u Addition Rule
u Used to calculate the probability of A or B
u Written as P(A ∪ B) and can be generalized to more that two events such as
P(A ∪ B ∪ C ∪ D)
u Addition Rule
u General addition rule
u Works on all circumstances
u P(A ∪ B) = P(A) + P(B) – P(A&B)
u Example: The assembly of a product requires an electronic board which is
supplied by two suppliers. The probability of the board from supplier A
working is 0.8, the probability of the board from supplier B working is 0.7, and
the probability of both working is 0.6. what is the probability that either
supplier A or B’s board is working?
Answer: P(A ∪ B) = 0.8 + 0.7 – 0.6 = 0.9
A B
u
A&B
Additional Rules for Probability
u Conditional Probability
u Is the probability that an event happening given that another event has
happened
u Written as P(B|A)
u Probability of Event B happening given that Event A as happened
u P(B|A) = P(A&B) / P(A)
u NOTE: write down what Events A and B are before solving the problem
u Example: in your manufacturing plant, 70% of the employees are able to
work this Saturday for overtime, 40% are able to work on Sunday, and
20% could work either day. What is the probability that an employee
can work Saturday given that they can work on Sunday?
Additional Rules for Probability
u Conditional Probability
u Answer: this problem is looking for the probability of Saturday given that
Sunday is true.
u Let: Saturday = B, Sunday = A
u Need: Probability of Saturday and Sunday divided by that of Sunday
u Solution: P(B|A) = P(A&B) / P(A)
u P(SAT|SUN) = P(SAT & SUN) / P(SUN)
u P(B|A) = 20% / 40% = 0.2 / 0.4 = 0.5 or 50%
Additional Rules for Probability
u Multiplicative Rule
u Used to calculate the probability of both A and B
u Written as P(A ∩ B) and can be generalized to more than two events such as
P(A ∩ B ∩ C ∩ D)
u Multiplicative Rule
u General Multiplication Rule
u P(A ∩ B) = P(A) x P(B|A)
u Example: consider a bag of marbles containing 5 red, 2 blue, and 3 white
marbles. Supposed that two marbles are drawn without replacement (no
returning of marbles once drawn), what is the probability that both marbles
drawn are blue?
u Note: the two draws are NOT independent
u In this case, since the two draws are not independent, the probability of B
happening is dependent on the probability of A happening first.
# ! #
u Answer: P(blue on first) x P(blue on second ∩ blue on first) = x = or 0.022
!% & &%
u Note, the denominator of multiplier becomes 9 because the first draw already left
9 marbles in the bag, not 10. the numerator is 1 since the first blue marble has
already been drawn.
Combinations and Permutations
u Permutation
u All ordered arrangements of distinct objects
u nPr – n number of ways ordering the arrangement or r number objects
taken from a set of objects
*!
u 𝑛𝑃𝑟 = or as a formula in MS Excel as =permut()
*%, !
u Discrete
u Two states for the random variable x
u Set number of trials, determined in advance (n)
u Constant probability of success (P)
u Rule of thumb in sampling
u Population size: N > 50
u Number of trials is less than 10% of the population size (n > 0.10N)
u Binomial formula:
𝑛!
𝑃 𝑥 = 𝑝 ' 𝑞()'
𝑥! 𝑛 − 𝑥 !
(! ' 𝑞 ()'
u Solution: 𝑃 𝑥 = '! ()' !
𝑝
+! , (1 − 0.05)+), +! -
= ,! +), !
0.05 = -!
(0.05)(0.95)
= 4 0.05 0.8574 = 0.1715 𝑜𝑟 17.15%
Poisson Distribution
Note:
Correlation does not guarantee causation
Procedures for Calculating the
Correlation Coefficient
1. Calculate the mean of all x values (x̄) and the mean of all y values
(ȳ).
2. Calculate for the stdev of all x values (Sx) and the stdev or all y
values (Sy)
3. Calculate x - x̄ and y- ȳ for each pair (x,y) and then multiply all the
differences together
4. Get the sum by adding all of these products of difference together
5. Divide the sum by Sx times Sy
6. Divide the results of step 5 by n-1, where n=number of (x,y) pairs
Same answer as
the manually
calculated value
of r in previous
section
Testing for Significance (t-test)
*n=6
Sample Problem in t-test
u From r = 0.972
u H0: t ≤ tc or ”exercise does not help weight loss or may have a negative
effect” (tc is the test statistic value)
u H1: t > tc or “maybe exercise hours contribute to weight loss”
u Solution:
: (.;<%
𝑡= = = 8.273
2 + 6. 2 + (1.9:.).
7 +. < +.
u Used to describe a straight line that best fits a series of ordered pairs
(x,y).
u The equation for linear regression is ŷ = a + bx
where: ŷ = the predicted value of y given a value of x
x = the independent variable
a = the y intercept of the straight line
b = the slope of the straight line
Least Squares Method
u Is a mathematical procedure to identify the linear equation that best
fits a set of ordered pairs by finding values for a, the y-intercept, and b,
the slope.
u The goal of the least squares method is to minimize the total square
error between the values of y and ŷ
u Procedures for the least squares method:
1. Create a table with your x and y values in the columns
2. Calculate xy, x2, y2, x̄, and ȳ
3. Calculate the sums for x, y, xy, x2, y2, x̄, and ȳ
4. Find the linear equation that best fits the data by determining the value of
a, the y intercept, and b, the slope, using the following equations:
𝑛Σ𝑥𝑦 − (Σ𝑥)(Σ𝑦)
𝑏= ; a = ȳ − 𝑏𝑥̄
𝑛Σ𝑥 ( − (Σ𝑥)(
Sample Problem in Least Squares
Method
Complaints
Month (xi) xi2 xiyi yi2
(yi)
1 8 1 8 64
2 6 4 12 36
3 10 9 30 100
4 6 16 24 36
5 10 25 50 100 *n = 8
6 13 36 78 169
7 9 49 63 81
8 11 64 88 121
Σxi = 36 Σyi = 73 Σxi2 = 204 Σxiyi = 353 Σyi2 = 707
x̄ = 4.5 ȳ = 9.125
Sample Problem in Least Squares
Method
u Solution:
01#. 2(1#)(1.) =(>'>) 2(>?)(<>)
𝑏= = = 0.5833
01#. 2(1#). =(%(@)2(>?).
u Critical values using a left tailed test of ɑ = 0.10 to the left in the t-
table, go to df=n-1=19 and ɑ = 0.10, the value is 1.328, but since it is
left-tailed, we use -1.328
̄ B1
#2 @.+%>2@.+%'
u Solving the equation: 𝑡 = = = 1.11> = −1.1
7 .1