3 - Measure
3 - Measure
O B J E C T I V E : C O L L E C T R E L I A B L E D ATA T O B A S E L I N E A N A LY S I S
Topics covered:
1. What to collect?
Fishbone Diagram / Ishikawa Diagram / Cause & Effect Diagram
CEDAC Model
5 Why Analysis (Why-Why Analysis) – Potential Root Cause Identification
2. Introduction to Basic Statistics
Types of data
Measures for Data Analysis
3. How much to collect?
Sample vs Population
Confidence Interval (Margin of Error) & Confidence Level
Sample size calculations
4. How to collect?
Sampling Methodology
5. Can I trust the measurement system?
What is measurement system?
Equipment Variation – Repeatability
Precision & Accuracy
Operator Variation – Reproducibility
MSA – Gage R & R ANOVA & AAA
6. Data collection plan & data collection
1. What to collect?
FISHBONE DIAGRAM + 5 WHY
Tools used:
1. For Ishikawa Diagram – Brainstorming – Round Robin Technique
Use Mean as the measure of central tendency only when the data is
normally distributed.
Median: Represents position of central value in a data set. It is robust to outlier
Values in a data set and is not effected by them.
Procedure: Arrange the data into ascending or descending format. Use (n+1)/2 to
Find the position of median in the data set.
Excel: =median(data set)
This is generally not used into statistical purposes but for voting procedures.
Normality Testing
Student A Student B
72 67
73 72
76 76
76 84
78 76
Mean 75 75
Median 76 76
Mode 76 76
Measures of Variation / Spread
Range: The difference between maximum value – minimum value in the data set
Sum of Squared Distance from Mean 58 (58 / n-1 i.e. 58 / 4 is used to get average, Sample
Variance 14.5 count is used as n-1 in statistics)
Standard Deviation 3.807887 Square root of Variance is Standard Deviation
Quartiles: It is graphically represented by Box plot ( Box & Whiskers Plot). It divides
The data into 25% each. Each tail & box represent 25% data set in that area. The size
Of the tail and box explains the degree of variation that exists in that area.
60
55
Test Data
50 Quartile 1
= quartile(data range, 1)
45 Quartile 3
=quartile(data range, 3)
40
Student A Student B
72 67
73 72
76 76
76 84
78 76
Range 6 17
Variance 6 39
Standard Deviation 2.44949 6.244998
Quartile 1 73 72
Quartile 3 76 76
IQR 3 4
How much data to collect?
Confidence Level & Confidence Interval
Example: We want to estimate the mean systolic blood pressure of Malaysian females. The standard
deviation is around 20 mmHg and we wish to estimate the true mean to within 5mmHg with 95%
confidence. What is the required sample size?
Answer
Z is the value from the standard normal distribution reflecting the confidence
level that will be used (e.g., Z = 1.96 for 95%)
E is the desired margin of error.
p is the proportion of successes in the population
Example 1: An investigator wants to estimate the proportion of freshmen at his University who
currently smoke cigarettes (i.e., the prevalence of smoking). How many freshmen should be involved in
the study to ensure that a 95% confidence interval estimate of the proportion of freshmen who smoke
is within 5% of the true proportion?
Example 2: To estimate the proportion of Malaysian males who smoke, what sample size is required to
achieve a 95% confidence interval of width ±5% (that is to be within 5% of the true value)? A study
some years ago found that approximately 30% were smokers.
Answer
Random
Sampling
Population Based
Approach Stratified
Random
Sampling Sampling
Method
Systematic
Sampling
Process Based
Approach
Rational
Subgrouping
Sub Group = 6
Rational = Hour Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Variance Standard Deviation Range
Hour 1 12 11 11 12 13 12 0.567 0.753 2
Hour 2 10 10 11 10 11 12 0.667 0.816 2
Hour 3 11 11 13 13 12 12 0.800 0.894 2
Hour 4 11 10 10 14 14 12 3.367 1.835 4
Hour 5 10 13 13 14 15 16 4.300 2.074 6
Hour 6 11 15 17 16 11 15 6.567 2.563 6
Hour 7 13 19 18 13 14 18 7.767 2.787 6
Variation in every hour is referred as Within Subgroup Variation or Short Term Variation
Comparing Hour 1 with Hour 7 is referred as Between Subgroup Variation or Long Term Variation
Statistical Process Control Charts also work on the basis of Within Subgroup Variation and Between
Subgroup Variation. This helps us understand the behavior of our process other than just collecting data
Can I trust the Measurement System?
Total Variation
=
True Variation / Actual Process Variation /
Natural Variation / Part to Part Variation
+
Equipment Variation
+
Operator Variation
What is Measurement System?
Equipment & Operator together is referred as Measurement System.
500
503 Acceptable Bias declared by the producer of Milk packets after getting
503 approval from the standard regulatory body applicable is +/- 3 ml
497 Hence, it’s declared that even if the milk packet ranges from 497 – 503, it is
504 going to acceptable & treated as Accurate.
505 Accuracy = Average of Observed Measurements – Standard Value
504
499 Average 502.2
505 Standard Value 500
504
Bias 2.2
504
503
503 Accuracy = 2.2 Bias which is under acceptable limits. Hence the equipment is
499
Accurate.
500
Operator Variation
Operator Variation is checked by Reproducibility wherein same part is
Measured multiple times using the same equipment but by different operators.
The variation in the measurements is attributed to Operator Variation
90 90
80 80
Percent
Percent
70 70
60 60
50 50
1 2 3 1 2 3
Appraiser Appraiser
Rule:
If Agreement % (Kappa Value) for any of the above criteria is less than 90%
Then we must fix the disagreement by doing required training before the
Operators are allowed to collect data.
Data Collection Plan
DATA COLLECTION PLAN
Measure Name Waiting Time Chef Waiter Type of Food No. of Items in Order Time of Order Day Type Cooking Time Humidity
Measure Type ( X or Y) Y X X X X X X X X
Data Type (Continuous /
Discrete) Continuous Discrete - Nominal Discrete - Nominal Discrete - Nominal Discrete - Count Discrete - Binary Discrete - Binary Continuous Discrete - Percentage
If you don’t specify this then you will get in different formats and lot of time will be
consume cleaning the data