Chapter 2-Statistical Tools-1
Chapter 2-Statistical Tools-1
Daniel Ab.
Bahir Dar Institute of Technology (BiT)
Faculty of Mechanical and Industrial Engineering
2.1 Statistical Thinking
Statistical Thinking is a philosophy of learning and acting based on
the following fundamental principles:
• All work occurs in a system of interconnected processes,
• Variation exists in all processes, and
• Understanding and reducing variation are keys to success.
2
Key concepts of statistical thinking
1. Process and Systems thinking
2. Variation
3. Analysis to increase knowledge
4. Taking action
5. Improvement
Role of data in statistical thinking
• Quantify variation
• Measure effects
1. Categoricalvariables:
● Numbers and proportions in each category. e.g. 5 points Likert scale – Highly
satisfied, Satisfied, Neither satisfied nor dissatisfied, dissatisfied, highly
dissatisfied)
2. Continuous variables
1. Distributions
❑ Normal (Gaussian)
❑ Non-parametric
2. Centraltendency
❑ Mean
❑ Median
3. Scatter
❑ Standard deviation
❑ Range
❑ Inter-quartile range
❑ Standard error of the mean
Accuracy and Precision
Accuracy
● Accuracy is how close you are to the true value. The true value is sometimes
called thenominal
theoretical
valuevalue that would be obtained by a perfect measurement.
● The accuracy of a data set or a measuring instrument refers to the degree
of uniformity of the observations around a desired value such that, on
average, the target value is realized.
5.25mm
5.25mm
x x x
x x x
x x x x x x x x x
x x x x x x x x x
x x x x x 5.25mm x x x x x x
x x x x x x x x x x x x x
x x x x x x x x x x x x x x x
judge
2.4 Fundamental Statistical Measures
Measures of Central Tendency
Ameasure of central tendency gives a single value that acts as a representative or average
of the values of all the outcomes of your experiment.The main measure of central
tendency we will use is the arithmetic mean.While the mean is used the most, two
other measures of central tendency are also employed.These are the median and the
mode. Each has advantages and disadvantages, depending on the data and the intended
purpose.
Mean – if we are given a set of n numbers of values, the mean is the average of these
n values.
Median – is that value x for which P(X<x)<=1/2 and P(X>x)<=1/2. In other
words, the median is the value where half of the values of the total sample are
larger than the median, and half of them are smaller than the median.
Example: Consider the following set of integers:
S = {1, 6, 3, 8, 2, 4, 9}
If we want to find the median, we need to find the value, x, where half the values are
above xand half the values are below x. begin by ordering the list in ascending order.
S= {1, 2, 3, 4, 6, 8, 9}
4 is at the middle wayof the list and it is the median.
If the number of values are even number, we will find two values at the middle
way of the list and the median can be any number between these two numbers
but, commonly it is taken the average value of these numbers:
Example: S= {1, 2,3, 4, 6, 8, 9, 12}
4 and 6 are at the middle wayof the list and the median is 5 (the average)
Here, the mean is 5 for both S and R. but they are two vastly different data sets.
We need another descriptive statistic besides a measure of central tendency,
which we shall call a measure of variationor measureof dispersion.
We can measure the dispersion or scatter of the values of our data set about the
mean of the data set. If the values tend to be concentrated near the mean, then
this measure shall be small, while if the values of the data set tend to be
distributed far from the mean, then the measure will be large.
The three measures of variations that are usually used are called the range,
varianceand standard deviation.
Variance – is donated by σ2
For a set of n numbers x1, x2, …, xn and if μ is the mean of the values, the variance
is given by:
[(X1 - μ)2+(X2 - μ)2+…+(Xn - μ)2]
σ2 =
n
Therefore, the variance is a nonnegative number.
Problems/Causes of Days/Weeks/Months
problems 1 2 … m Total
Problem 1 / Cause 1 No. of Occurrences |||| … ||
Problem 2 / Cause 2 ||| |||||
… |||||| |
Problem n / Cause n || ||| Occurrence
• The data obtained from the check sheets can be put into a histogram.
A histogram is a snapshot of the variation of a product or the results
of a process. It often forms the bell-shaped curve which is
characteristic of a normal process.
• This tool is a bar graph which displays a frequency distribution of
the occurrence of the various measurements. It gives quick look at
the trends of a process.
• Histogram helps to depict and analyze the central tendency or mean
of the data, and its variation or spread. It also shows the range of
measurements, which defines the process capability, whether the
data is falling inside the bell-shaped curve and within specifications.
• The variable being measured is along the horizontal x-axis, and is
grouped into a range of measurements. The frequency of occurrence
of each measurement is charted along the vertical y-axis.
A histogram can show characteristics of the process being measured, such
as:
• Do the results show a normal distribution, a bell curve? If not, why
not?
• Does the range of the data indicate that the process is capable of
producing what is required by the customer or the specifications?
• How much improvement is necessary to meet specifications? Is this
level of improvement possible in the current process?
negative skewed
Flow Chart
• It is picturing the process.
• It uses to allow a team to identify the actual flow or sequence of events in a
process that any product or service follows.
Phases
Define– this phase is identifying which process to study and draw the process
flow of the current process.
Measure– identifying problems in the current flow chart
Improve– Improving the flow chart to solve the identified problem
Benefits:
Identify process improvements
Purge System
Mold Part
Remove Part
Light
Visual
dirt Bad
Buff part Inspection by Scrap
near dirt
operator
surface
Part OK
Trim Part
Assemble part
Functional
Part no
Inspection by Scrap
good
operator
Part OK
Package
Pareto Diagrams
Sample number
55
Statistically in control vs. Technically in control
A process that is operating with only chance causes of variation present is
said to be in statistical control. Where as, a process that is operating in the
presence of assignable causes is said to be out of control.
• Statistically controlled process:
inhibits only natural random fluctuations (common causes)
is stable
is predictable
may yield products out of specification
• Technically controlled process:
presently yields products within specification
need not be stable nor predictable
57