0% found this document useful (0 votes)
62 views39 pages

3 - Measure

The document discusses collecting reliable data to baseline an analysis. It covers topics such as tools for identifying potential causes like fishbone diagrams and 5 whys. It also covers basic statistics, measures of central tendency and variation, sample size calculations, and sampling methodology. The objective is to collect data that can be used to understand processes and identify improvement opportunities.

Uploaded by

dhruvil shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views39 pages

3 - Measure

The document discusses collecting reliable data to baseline an analysis. It covers topics such as tools for identifying potential causes like fishbone diagrams and 5 whys. It also covers basic statistics, measures of central tendency and variation, sample size calculations, and sampling methodology. The objective is to collect data that can be used to understand processes and identify improvement opportunities.

Uploaded by

dhruvil shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Measure

O B J E C T I V E : C O L L E C T R E L I A B L E D ATA T O B A S E L I N E A N A LY S I S
Topics covered:
1. What to collect?
 Fishbone Diagram / Ishikawa Diagram / Cause & Effect Diagram
 CEDAC Model
 5 Why Analysis (Why-Why Analysis) – Potential Root Cause Identification
2. Introduction to Basic Statistics
 Types of data
 Measures for Data Analysis
3. How much to collect?
 Sample vs Population
 Confidence Interval (Margin of Error) & Confidence Level
 Sample size calculations
4. How to collect?
 Sampling Methodology
5. Can I trust the measurement system?
 What is measurement system?
 Equipment Variation – Repeatability
 Precision & Accuracy
 Operator Variation – Reproducibility
 MSA – Gage R & R ANOVA & AAA
6. Data collection plan & data collection
1. What to collect?
FISHBONE DIAGRAM + 5 WHY
Tools used:
1. For Ishikawa Diagram – Brainstorming – Round Robin Technique

2. For 5 Why Analysis – Brainstorming & Gemba Walk


CEDAC MODEL
2. Introduction to Basic Statistics
Additional Discrete Data Types
Count: If you data about number or frequency of something then you are
dealing with a discrete data type.

For e.g. No. of students in a class, No. of cars manufactured, No. of


transactions process, No. of defects etc.

Percentage: If you data about proportion of something then you are


dealing with a discrete data type. This can be confused with continuous
data as it meets all the criteria for continuous data except having a
measurement system.

For e.g. Passing %, Quality %, Defect %, Humidity % etc


Measures for Data Analysis
Student A Student B
77 55
79 62
85 63
95 45
71 99
Measures of Central Tendency
Mean (Average): Sum of all the values / Count of the values
Excel: =average(data set)

For e.g. Data set - 2, 5 , 7, 9, 12


Average of the data set is 7
7 looks like the representative of the data set & as a perfect central value.

Use Mean as the measure of central tendency only when the data is
normally distributed.
Median: Represents position of central value in a data set. It is robust to outlier
Values in a data set and is not effected by them.

Procedure: Arrange the data into ascending or descending format. Use (n+1)/2 to
Find the position of median in the data set.
Excel: =median(data set)

For e.g. Data set - 2, 5 , 7, 9, 250


Average of the data set is 54.6
54.6 does not look like the representative of the data set & should not be used.

We use median because the data is non-normally distributed.


Odd data set: 2, 5, 7, 9, 250
Median: n = 5 hence the position of the median is (5+1) / 2 which is 3.
3rd value in the data set is the median i.e. 7 which is true representative of central
Value for the data set and is not impacted by outlier value of 250.

Odd data set: 2, 5, 7, 9, 12, 250


Median: n = 6 hence the position of the median is (6+1) / 2 which is 3.5
3.5 position will lie between 3rd & 4th value i.e 7 & 9. Hence the median will be (7+9)/2
i.e. 16/2 which is 8.
Mode: Represents highest frequency of a value in a data set. We can have no mode
At all if there is no repetition, we can many modes if there are equal number of
repeating values in the data set.

Procedure: Excel: =mode(data set)

For e.g. Data set - 2, 5 , 7, 9, 2, 5, 0, 2


The mode is 2.

Data set – 2, 5, 6, 5, 2, 5, 5, 9, 12, 2, 2


The mode is 2, 5.

This is generally not used into statistical purposes but for voting procedures.
Normality Testing

Stat- Basic Statistics - Normality Test – Anderson Darling Normality Test


Let’s explain the difference in the two data sets

Student A Student B
72 67
73 72
76 76
76 84
78 76

Mean 75 75
Median 76 76
Mode 76 76
Measures of Variation / Spread
Range: The difference between maximum value – minimum value in the data set

Variance: Squared average distance of each data point from mean.

Standard Deviation: Average Distance of each data point from mean.


Squared
Distance Distance
from from
Data Set Mean Mean Mean
Excel:
2 7 -5 25
Variance: =var (Data Set)
5 7 -2 4 Standard Deviation: =stdev ( Data Set)
7 7 0 0
9 7 2 4
12 7 5 25

Sum of Squared Distance from Mean 58 (58 / n-1 i.e. 58 / 4 is used to get average, Sample
Variance 14.5 count is used as n-1 in statistics)
Standard Deviation 3.807887 Square root of Variance is Standard Deviation
Quartiles: It is graphically represented by Box plot ( Box & Whiskers Plot). It divides
The data into 25% each. Each tail & box represent 25% data set in that area. The size
Of the tail and box explains the degree of variation that exists in that area.

Boxplot of Test Data

60

55
Test Data

50 Quartile 1
= quartile(data range, 1)

45 Quartile 3
=quartile(data range, 3)
40

Graph - Box Plot


Now let’s explain the difference in the two data sets

Student A Student B
72 67
73 72
76 76
76 84
78 76
Range 6 17
Variance 6 39
Standard Deviation 2.44949 6.244998
Quartile 1 73 72
Quartile 3 76 76
IQR 3 4
How much data to collect?
Confidence Level & Confidence Interval

200 Runs Confidence Level 1%

195-205 Runs Confidence Level 10%

190-210 Runs Confidence Level 25%

185-215 Runs Confidence Level 50%

180-220 Runs Confidence Level 75%

175-225 Runs Confidence Level 90%

170-230 Runs Confidence Level 95%

165-235 Runs Confidence Level 99%


Sample Size – Continuous Data

Margin of Error (Confidence Interval) — No sample


will be perfect, so you need to decide how much error
to allow. The confidence interval determines how much ● 90% – Z Score = 1.64
higher or lower than the population mean you are ● 95% – Z Score = 1.96
willing to let your sample mean fall. If you’ve ever ● 99% – Z Score = 2.57
seen a political poll on the news, you’ve seen a
confidence interval. It will look something like this:
“68% of voters said yes to Proposition Z, with a margin
of error of +/- 5%.”
Example: An investigator wants to estimate the mean systolic blood pressure in children with
congenital heart disease who are between the ages of 3 and 5. How many children should be
enrolled in the study? The investigator plans on using a 95% confidence interval (so Z=1.96) and
wants a margin of error of 5 units. The standard deviation of systolic blood pressure is unknown,
but the investigators conduct a literature search and find that the standard deviation of systolic
blood pressures in children with other cardiac defects is 20.

Example: We want to estimate the mean systolic blood pressure of Malaysian females. The standard
deviation is around 20 mmHg and we wish to estimate the true mean to within 5mmHg with 95%
confidence. What is the required sample size?

Answer

We are given σ = 20, Δ = 5 and z = 1.96.


n=(1.96×205)2=61.47 women
Sample Size – Discrete Data

Z is the value from the standard normal distribution reflecting the confidence
level that will be used (e.g., Z = 1.96 for 95%)
E is the desired margin of error.
p is the proportion of successes in the population
Example 1: An investigator wants to estimate the proportion of freshmen at his University who
currently smoke cigarettes (i.e., the prevalence of smoking). How many freshmen should be involved in
the study to ensure that a 95% confidence interval estimate of the proportion of freshmen who smoke
is within 5% of the true proportion?

Because we have no information on the proportion of freshmen who smoke, we


use 0.5 to estimate the sample size as follows:

Example 2: To estimate the proportion of Malaysian males who smoke, what sample size is required to
achieve a 95% confidence interval of width ±5% (that is to be within 5% of the true value)? A study
some years ago found that approximately 30% were smokers.

Answer

P = 0.30, Δ = 0.05 and z = 1.96


n=(1.960.05)20.3(1−0.3)=322.69
How to collect – Sampling Methodology

Random
Sampling
Population Based
Approach Stratified
Random
Sampling Sampling
Method
Systematic
Sampling
Process Based
Approach
Rational
Subgrouping
Sub Group = 6
Rational = Hour Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Variance Standard Deviation Range
Hour 1 12 11 11 12 13 12 0.567 0.753 2
Hour 2 10 10 11 10 11 12 0.667 0.816 2
Hour 3 11 11 13 13 12 12 0.800 0.894 2
Hour 4 11 10 10 14 14 12 3.367 1.835 4
Hour 5 10 13 13 14 15 16 4.300 2.074 6
Hour 6 11 15 17 16 11 15 6.567 2.563 6
Hour 7 13 19 18 13 14 18 7.767 2.787 6

Variation in every hour is referred as Within Subgroup Variation or Short Term Variation
Comparing Hour 1 with Hour 7 is referred as Between Subgroup Variation or Long Term Variation

Statistical Process Control Charts also work on the basis of Within Subgroup Variation and Between
Subgroup Variation. This helps us understand the behavior of our process other than just collecting data
Can I trust the Measurement System?
Total Variation
=
True Variation / Actual Process Variation /
Natural Variation / Part to Part Variation
+
Equipment Variation
+
Operator Variation
What is Measurement System?
Equipment & Operator together is referred as Measurement System.

Analyzing the variation arising out of Equipment & Operator


Is referred as Measurement System Analysis

Facts: Variation exists always. It is a universal phenomena


We wish to find out how much variation is out of
Measurement system and then take decision whether it is
Acceptable or not.
There is no equipment in the world which can give consistently
Accurate reading hence variation comes from Equipment.
How so ever experienced you may be but being human we shall
Make error hence Operator variation is inevitable.
Equipment Variation
Equipment Variation is measured by Repeatability which Is also known as
Test-Retest method. We measure same part, using same equipment by
The same operator multiple times, any variation In the reading is attributed to
Equipment Variation.

Using observed measurement, we check for


1. Precision
2. Accuracy

To check if the equipment used is trustworthy.


Precision
If the standard deviation of the observed measurements is less than 1/10
Of Total Equipment Tolerance, our equipment is Precise else not.
500 We bought a weighing scale & wish to check if it is trustworthy or not.
503 A milk packet for 500 ml was used to test and repeatability test was performed
503 After which we got these readings. Now we need to check if the weighing scale is
497 Precise and accurate or not.
504
505 Standard Deviation 2.51
504 Equipment Tolerance +/- 0.5
499 Total Equipment Tolerance 0.5 + 0.5 = 1
505 1/10 of Total Equipment Tolerance 0.1
504
504 **Equipment Tolerance is endorsed on the equipment as declared by the manufacturer
503
503
499 Since Standard Deviation 2.51 is not less than 0.1 hence the
500 equipment is not Precise
Accuracy
It the measure of how close to the target value are the observed values. It is measured in BIAS

500
503 Acceptable Bias declared by the producer of Milk packets after getting
503 approval from the standard regulatory body applicable is +/- 3 ml
497 Hence, it’s declared that even if the milk packet ranges from 497 – 503, it is
504 going to acceptable & treated as Accurate.
505 Accuracy = Average of Observed Measurements – Standard Value
504
499 Average 502.2
505 Standard Value 500
504
Bias 2.2
504
503
503 Accuracy = 2.2 Bias which is under acceptable limits. Hence the equipment is
499
Accurate.
500
Operator Variation
Operator Variation is checked by Reproducibility wherein same part is
Measured multiple times using the same equipment but by different operators.
The variation in the measurements is attributed to Operator Variation

Gage R & R ANOVA tests helps us understand Operator variation in greater


Details like variation by Operator and Operator by part variation.
MSA Basics
MSA is not mandatory
Only if the data measured by an Equipment or by an Operator, MSA
Must be done. However, if the data is coming from a system or
software like Cisco, Avaya etc then it’s NOT APPLICABLE

For Continuous Data – Gage R & R ANOVA


Gage Repeatability & Reproducibility Analysis of Variance
Minimum data required is 30
For Discrete Data – AAA
Attribute Agreement Analysis
Minimum data required is 60
Gage R & R ANOVA
Golden Rule:
If %Study Var is <10% - Measurement System is Excellent.
If % Study Var is between 10% - 30% - Management Decision – Accept with Caution
If % Study Var is >30% - Reject MSA. Fix Repeatability & Reproducibility Issues First.
Attribute Agreement Analysis
Assessment Agreement Date of study:
Reported by:
Name of product:
Misc:

Within Appraisers Appraiser vs Standard


100 95.0% CI 100 95.0% CI
Percent Percent

90 90

80 80
Percent

Percent
70 70

60 60

50 50
1 2 3 1 2 3
Appraiser Appraiser
Rule:
If Agreement % (Kappa Value) for any of the above criteria is less than 90%
Then we must fix the disagreement by doing required training before the
Operators are allowed to collect data.
Data Collection Plan
DATA COLLECTION PLAN
Measure Name Waiting Time Chef Waiter Type of Food No. of Items in Order Time of Order Day Type Cooking Time Humidity
Measure Type ( X or Y) Y X X X X X X X X
Data Type (Continuous /
Discrete) Continuous Discrete - Nominal Discrete - Nominal Discrete - Nominal Discrete - Count Discrete - Binary Discrete - Binary Continuous Discrete - Percentage

Was the order


Time between order Name of the placed during Actual cooking time for
placed till food served Name of Chef Waiter working Count of items in a Was there a weekdays or all the dishes in an The percentage of
Operational Definition on table working on the order on the order Type of Food Ordered particular order rush or no rush weekends order humidity in restaurant
No. of Decimal Places 0 NA NA NA NA NA NA 0 0
Atul, Neeraj,
Vikram, Satish,
Naman, Rosy, Starter, Maincourse, Peak Time, Non- Weekday,
Format Required (In Min) 12 Vikas, Amit, Vijay Geeta Sweetdish 4 Peak Time Weekend 8 56
Frequency of Sampling Breakfast, Lunch & Dinner
Subgroup size 5
No. of Samples 105
Collected By GB Restaurant Manager GB Restaurant Manager Restaurant Manager GB GB Restaurant Manager Team Member 1
Waiting Time (In Mintues) Waiter Waiter Gender Waiter Age Group Time of Order No. of Food Items Chef Support Available Actual Cooking Time

19 Peter M 20 - 30 Peak 3 Robin No 12


13 Peter M 20 - 30 Non Peak 4 Jack No 8
16 Stacy F 30 - 40 Peak 2 Jack No 7
13 Stacy F 30 - 40 Non Peak 5 Robin Yes 10

Why we need to create DCP?

For e.g. date can captured in DD/MM/YY format or MM/DD/YY format.

Time can be captured in Hours, Minutes, Seconds. It’s important to know


How do we need the data for analysis – like 10, 10 min, 10 minutes, 600 sec etc.

If you don’t specify this then you will get in different formats and lot of time will be
consume cleaning the data

You might also like