The Problem Solving, Problem Prevention, Decision-Making Guide
The Problem Solving, Problem Prevention, Decision-Making Guide
Problem-Prevention, and
Decision-Making Guide
Organized and Systematic
Roadmaps for Managers
http://taylorandfrancis.com
The Problem-Solving,
Problem-Prevention, and
Decision-Making Guide
Organized and Systematic
Roadmaps for Managers
Bob Sproull
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying, microfilming, and recording, or in any information storage or retrieval
system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright
.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the
CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
v
vi • Contents
xi
xii • Preface
only after weeks of effort. As a result, they end up making blind decisions
that change perfectly acceptable processes for all the wrong reasons. The
danger in this type of approach is that it typically adds numerous non-
value-added process steps that complicate, create waste, and destabilize
the processes and systems. It is this waste of time, waste of motion, and
waste of materials that drive many companies to financial ruin or, worse
yet, closure.
The real secret to solving problems does not depend upon the num-
ber of sophisticated statistical tools that you know how to use. As a mat-
ter of fact, the secret to solving most problems is to keep your approach
simple and uncomplicated. Where many people go wrong is that they fail
to do what Toyota does so effortlessly. They fail to “go and see.” Solving
problems starts by simply going to the problem, observing it, and fully
understanding it. As you will see in the chapters that follow, by following
a structured approach and using only simple tools, most problems will
be solved. And the same can be said for making effective decisions and
preventing problems.
The cornerstones of this book are three roadmaps for solving problems,
preventing problems, and making decisions. Each roadmap contains a step-
by-step explanation on how to solve existing problems, how to prevent
future problems, and how to make effective decisions. I also devote much
of this book to real case studies for each of the techniques presented.
This book also contains the four basic tools that I have successfully used
to solve most problems I have encountered during my career. It is my belief
that if you can master the use of these four basic tools, “go and see” the
problem, and use my roadmaps to guide you, that you will become much
better at solving problems, preventing problems, and making good deci-
sions. I have witnessed many very ordinary people deliver very extraor-
dinary results by simply following the structured roadmaps presented in
this book. Solving and preventing problems, and making effective deci-
sions does not have to be full of stress and anxiety if you will simply follow
my roadmaps.
Good luck! But remember, my definition of luck is laboring under cor-
rect knowledge. That is, you make your own luck!
1
The DNA of Problems
and Problem Solvers
Scott Peck
1
2 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
function as the individual or team searches for the answer to the problem-
solving conundrum.
1. Being objective
2. Being analytical
3. Being creative
4. Having dedication, commitment, and perseverance
5. Being curious
6. Having courage
7. Having a sense of adventure
8. Being enthusiastic
9. Being patient
10. Being vigilant
when the situation may appear hopeless to the team, but your positive out-
look and enthusiasm will guide you and your team through the process.
Finding root causes and developing solutions to problems are not always
clear-cut, straightforward, or uncomplicated, so a good problem solver
must demonstrate patience, persistence, and staying power. You will, at times,
be pressured to move faster than you would like to or need to, so you must
be compelled to stay the course. Part of learning to be a good problem
solver is learning how to become disciplined and regimented. If you take
your time and systematically work through problems, your success rate
will dramatically improve. Remember, patience truly is a virtue.
Finally, a good problem solver should be vigilant and always expect the
unexpected. Just when you think you may have exposed the root cause
of a problem, or have discovered the causal pathway of the problem, new
information or something unanticipated may come out of the blue and
catch you off guard if you aren’t alert to this possibility. So be cautious and
attentive that new information could come at any time that will change
your point of view.
These are the qualities and behaviors of a good problem solver, but not
all of them are necessarily essential in one person for successfully solving
a problem. As a matter of fact, it’s probably true that if the team possesses
these qualities or behaviors as a group, success will follow. If one person
is, for example, analytical, curious, patient, and dedicated, while someone
else is objective and enthusiastic, and still another has courage and a sense
of adventure but is vigilant, then the team holistically satisfies these req-
uisites. It sometimes takes a village to solve problems, so select your team
members with these qualities in mind.
Now let’s take a little closer look at all three problem types.
6 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
1.4 CHANGE-RELATED PROBLEMS
I mentioned earlier that problems tend to make us panic and make unwar-
ranted changes, so if these are not the behaviors we should be demonstrat
ing, then what are the right behaviors? Before we answer this, let’s review
the basic concept of what a problem is. Kepner and Tregoe, in their prob-
lem-solving classic The New Rational Manager, characterize problems sim
ply as deviations from expected performance [1], but let’s look at this more
closely. Kepner and Tregoe tell us that a performance standard is achieved
when all of the conditions required for acceptable performance are operat
ing as they should. This includes everything in our work environment, that
is, people, materials, systems, processes, departments, pieces of equip-
ment, and so on—basically everything. Kepner and Tregoe further tell us
that “if there is an alteration in one or more of these conditions, that is, if
some kind of change occurs, then it is possible that performance will alter
too.” If performance goes from good to bad or positive to negative, then
we feel the pressure. The more serious the effect in the decline in perfor-
mance, the more pressure there is to find a cause and correct it.
Changes happen every day in our lives so the question becomes, when is
the deviation that we observe considered to be a problem? It has been my
experience (and that of Kepner and Tregoe) that in order for a deviation to
be considered a problem, one or more of the following requirements must
be satisfied:
long. As pressure mounts to have the problem fixed, more often than
not the symptoms get treated and a quick fix is implemented. This,
in turn, usually prolongs the problem episode, or sets the stage for it
to return or actually deteriorate even further.
If the root cause and the solution are known and implementing it doesn’t
take too long and/or cost too much, then the deviation is not deemed to be
a problem because it simply gets fixed. In effect, it has no visibility within
the organization, at least not in the upper echelon. But when you add the
critical factors of cost, time, lost revenues, and so forth, deviations will
most likely be portrayed and characterized as problems.
Let’s look at an example of a change-related problem. Suppose you are
the plant manager of a company that manufactures electronic equipment
for the automobile industry. Your plant’s budgeted EBITDA% (earn-
ings before income, taxes, depreciation and amortization) is tied to sales
revenue and is variable, but has been averaging roughly 25% per month
throughout the year. This means that if your annual sales are in the $100
million range, then the annual earnings are expected to be in the neigh-
borhood of $25 million for the year. You’ve worked hard and you’ve pretty
much hit budget in each of the first five months of the year. The board is
pleased with the job you’re doing, and you are feeling good about how
well your plant is performing. In July, you get a frenzied call from your
accountant telling you that the numbers are in for June and they don’t look
good at all. You tell him to come to your office with the numbers and you
both assess the results. Figure 1.1 is the monthly EBITDA% graph and one
look at it tells you that your accountant is right. What had been a picture
of steady and stable, on-budget earnings from January through May, has
20
18
16
14
12
10
Jan Feb Mar Apr May Jun
Month
FIGURE 1.1
Monthly EBITDA% versus budget.
8 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
abruptly and unexpectedly gone awry by taking a nosedive. You ask your
accountant what happened, but he has no idea. A sense of fear, appre-
hension, and, yes, panic comes over you because you know that this very
afternoon the board of directors is coming for its monthly review of your
plant’s performance. You tell your accountant that you need answers and
that you need them right now! The accounting manager leaves and you sit
there and rack your brain trying to understand what has happened. You
know instinctively that something has changed, but what was it? You think
to yourself, “EBITDA% was right on target through May, but in June, it
dropped from 25% to 18%.” Because the drop in performance was sudden
and unexpected, you know you have to find out what changed. That’s the
problem with change-related problems; they are many times unpredict-
able, sudden, and unexpected. And when the direction of change is toward
the negative side of the ledger, many times they cause people to panic.
But sometimes conditions improve and positive changes occur and
things go better than expected. But when performance rises or improves
unexpectedly, it clearly does not trigger the same urgent response as the
negative shift does. What if the reverse had been true for our plant man-
ager? What if his EBITDA% had suddenly improved, as in Figure 1.2?
Would the sense of urgency or panic have gripped him as when the
EBITDA% had declined? Obviously, it wouldn’t have, would it? Why do
you suppose this is true? It is because, reason number one, the negative
recognition and perception requirement of a problem trumps both of the
other reasons.
EBITDA% Budget %
26
24
22
20
EBITDA%
18
16
14
12
10
Jan Feb Mar Apr May Jun
Month
FIGURE 1.2
Monthly EBITDA% versus budget.
The DNA of Problems and Problem Solvers • 9
Since the plant manager would have known that the board would have
been happy with the new EBITDA%, there would have been no negative
feelings expressed by the board, and, therefore, little if any pressure to
even find out why the positive deviation had occurred.
But beware, do not ignore or disregard positive deviations! Why?
Because in my opinion, unexplained positive changes in performance
have potentially the same consequences as negative changes. That is, if we
don’t understand what prompted the positive change in performance, then
we certainly won’t be able to understand or explain why the performance
suddenly changed back to its original, “normal” level of performance, and
it will happen at some point in the future. The “positive” performance level
that had resulted from the original shift in performance will now become
the new expected level of performance. So, when the performance metric
shifts again to the performance from January through May, it then satis-
fies the first two requirements of a problem and, therefore becomes clas-
sified as a problem. For this reason, it is essential to investigate positive
deviations and uncover the root cause or causes.
Because these type problems are always the result of a change, I refer
to them as change-related problems. Performance is at a certain level and
then a change occurs somewhere in the process resulting in a new perfor-
mance level. When trying to recognize, understand, and solve problems
of this nature, the focus must always be on determining what changed and
when it changed. And when you do find the change, the solution usually is
simply to reverse the change.
Before leaving our discussion of change-related problems, I need to say
something about the change process itself. In many companies I have con-
sulted for, there was no mechanism or system in place to routinely capture
process or system changes. Changes happen every day in most businesses
and, for the most part, the changes are a good thing, provided they are
made under control. (By under control I mean that the change was well
conceived, analyzed for potential problems, and, equally important, fully
documented.) Documentation of changes becomes critical when prob-
lems of this nature are encountered. The documentation should include
the specifics of the change and the timing of the change. At the very least,
the date should be documented, but if you can record more specific time
information like serial numbers or bar code numbers, then problems can
be correlated directly to the change.
10 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
1.5 CHRONIC PROBLEMS
There is another type of problem that is not necessarily the result of a
change, but rather a problem that has been around seemingly forever.
Many times, when you ask someone how long this problem has existed,
you get a response like “We’ve always had this defect!” or “This machine
has never produced what the others have.” I have named this kind of prob-
lem a chronic problem, and for those of you that have ever been involved
with the Fords or GMs or Chryslers of the world, you will recognize it
immediately. Kepner and Tregoe refer to this kind of problem as day-one
problems [1].
As the name implies, it’s the kind of problem that has been around since
day one. Maybe it’s the launch of a new machine that is supposed to be
identical to one or more already in place. But, since the start-up, it has
never performed quite like the others. Or maybe the supplier of a raw
material has two factories, and product received from one factory has out-
performed the other factory from the first delivery of the product. Figure
1.3 is a common example for this type of problem.
In this type of problem there is still the expected level of performance
(machine target) of the new machine compared with the actual perfor-
mance of the other machines making the same or similar product. The
deviation is the output between the lower performing machine and the
other two, supposedly identical machines. The same rules for deciding
whether a problem is a problem apply here, as well as the problem-solving
tools and techniques.
900
880
860
840
820
800
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
FIGURE 1.3
Average monthly machine throughput versus target.
The DNA of Problems and Problem Solvers • 11
1.6 HYBRID PROBLEMS
Now that we understand the differences between a change-related prob-
lem and the chronic problem, you might wonder if it’s possible to have
both types of problems acting together simultaneously. The answer is an
emphatic and categorical yes! When you have an expected level of perfor-
mance that has never been achieved and it suddenly worsens, you are in
the midst of a hybrid problem.
Consider the situation in Figure 1.4. Here we see actual EBITDA% by
month, compared to budgeted EBITDA%. The actual EBITDA% has been
below budget by approximately 2.5% for the first seven months of the year.
In August, the situation worsens, and the gap between expected perfor-
mance (i.e., EBITDA%) and actual performance grows to about 8%. A
situation that I’m sure was filled with pressure and negative energy just
became worse.
If you were the owner of these dreadful and deplorable financials, imag-
ine how you would feel and what your actions might be. You have two
competing priorities here. On the one hand, you must determine what
changed to make the already dismal situation deteriorate, while on the
other you must close the gap to the budget. You are in the midst of a hybrid
problem, with each part of it competing against the other. The logical
12 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Budget % EBITDA%
20
18
16
EBITDA%
14
12
10
8
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
FIGURE 1.4
Monthly EBITDA% versus budget.
When the only tool you own is a hammer, every problem begins to resem-
ble a nail.
Abraham Maslow
In my travels, one thing that has become very apparent and evident to me
is that there are so many people who have no grasp of basic analysis tools
and techniques. One of the prerequisites for solving problems is having at
least a basic understanding of which tools to use and when to use them.
It is remarkable to me that even after all of the many initiatives and pro-
grams like Total Quality Management (TQM) and Six Sigma, so many
people and companies haven’t embraced or begun to understand the basic
tools and concepts. In this chapter, we will consider four basic tools that
a problem-solving team must make use of if the team is to successfully
determine the root cause of the problem it is addressing. You may be won-
dering if there are other tools available besides these four, and the answer
is yes. But having said this, it is my belief that if you can master and make
use of these four simple and uncomplicated tools, then you will be able to
solve the majority of problems facing you.
The four tools are the run chart, the Pareto chart, the cause-and-effect
diagram, and the causal chain. The run chart will answer the questions of
when the problem started and when it has occurred since it started, and
will then help identify whether it is a change- or launch-related problem.
The Pareto chart will help the team determine things like where the prob-
lem is, which machine is creating the problem, and who has the problem.
The cause-and-effect diagram will be facilitated with the creation of a list of
potential causes, whereas the causal chain will help with the team formulate
the chain of events that led to the problem (i.e., the hypothesis). Although
15
16 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
there are other tools that can be used by the team, I firmly believe that
teams will be much more successful by using just these four simple tools.
Now let’s look at each tool and some examples.
2.1 RUN CHART
One of the keys to solving problems is knowing when the problem began and
when it has occurred since its inception. In addition, the team needs to be able
to measure the impact of any changes made to the process. The run chart will
provide the answer to all of these questions. The run chart or trend chart, as
it is also termed, is a graphical representation of the problem being tracked as
a function of time, with time being any unit (e.g., hours, days). Time is placed
along the horizontal axis (x-axis), and whatever you are measuring is placed
along the vertical axis (y-axis). Let’s look at an example. Suppose we suspect
that temperature is a key factor in the creation of a defect, and we are inter-
ested in knowing what happens to the temperature throughout the day. We
measure the temperature each hour and record it as follows:
Time Temperature
6:00 am 60
7:00 am 62
8:00 am 63
9:00 am 65
10:00 am 70
11:00 am 75
12:00 pm 80
1:00 pm 81
2:00 pm 82
3:00 pm 80
4:00 pm 79
5:00 pm 75
6:00 pm 72
85
80
75
Temperature
70
65
60
55
50
6:00 7:00 8:00 9:00 10:00 11:00 12:00 1:00 2:00 3:00 4:00 5:00 6:00
Time
FIGURE 2.1
Temperature versus time of day.
Figure 2.1 is the run chart made from the data collected on temperature
every hour. As you can see, the time the temperature was taken is repre-
sented along the x-axis, and the actual temperature reading is along the
y-axis. From the run chart, we can see that the highest rate of temperature
change happens between 9:00 am and 12:00 noon, and that the maximum
temperature occurs at 2:00 pm before it begins dropping. The run chart
allows us to see exactly what is happening to temperature as a function of
time of day. From a problem-solving perspective, if you now record any
changes you make directly onto the run chart, the effect is seen immedi-
ately. Let’s look at a real example.
I have used run charts to solve a variety of problems over the years, but
one problem in general stands out as being one of the most distinctive.
(Note: This case study will be covered in detail later in Chapter 20.) This
problem concerned an engineering group for a company that produced
truck bodies for vehicles like moving vans, refrigerated trucks, and land-
scaping trucks. As the company received orders from customers, it routed
the orders through its engineering group, where a cost quote and a build
package were prepared. Historically, the normal backlog of orders in engi-
neering was in the 200- to 400-hour range. The actual time required to pre-
pare the quote was about two days, but because there was a backlog of orders
waiting to be quoted, the actual time to quote the order and communicate it
back to the prospective customer was approximately one week. This amount
of time was acceptable to most customers, so there was no problem.
In early 1999 the backlog had grown from the normal 300 hours to over
1500 hours and, as a result, potential customers were unhappy, because the
new lead time had grown from seven days to almost forty days. The impact
of this increased lead time was that new sales were dropping rapidly and
18 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
margins were being negatively affected. Customers just didn’t want to wait
forty days to receive a quote, and this was now recognized as being a sig-
nificant problem.
In a move designed to reduce the backlog, the vice president of engi-
neering ordered all of his engineers to work overtime until this backlog
was reduced. The overtime worked and the backlog was reduced, but sev-
eral months later, due to the cost of this overtime, the VP was told to stop
the overtime. Once again, the backlog grew to over 1200 hours and the VP
was relieved of his position and I was called in to fix this problem. This
was obviously an example of someone treating the symptoms rather than
finding and eliminating the root cause of the problem.
Since the VP left rather abruptly, I had no communication overlap with
him and had to rely on any existing data and discussions with the engi-
neers to attempt to understand what was happening. The first thing I did
was create a multiyear run chart to get a mental and visual image of what
was going on (see Figure 2.2). Once I saw the data plotted in a run chart, I
knew two important things. One, the problem we were experiencing was
a relatively new problem and, two, whatever changed to cause the increase
in backlog hours had occurred somewhere around February 1999. I knew
that if I could determine what had changed in the February 1999 time
frame, then I had a good chance at solving this problem. Herein lies the
true problem-solving value of run charts. By observing the level of the
response variable (e.g., backlog hours) as a function of time (months),
I could see immediately that the problem was the result of some kind
of change and approximately when the change had been made. This is
extremely important, because if we can find what changed and reverse
it, the response variable should return to its previous acceptable level of
800
600
400
200
0
May-96
May-97
May-98
Oct-99
Feb-00
Sep-96
Sep-97
Sep-98
Feb-99
Jun-99
Jan-96
Jan-97
Jan-98
Date
FIGURE 2.2
Engineering backlog, January 1996 to February 2000.
Four Basic Tools for Problem Solving • 19
1600 Overtime
start/stop Returned
1400 Changed to old system
1200 eng. system
1000
Hours
800
600
400
200
0
May-96
May-97
May-98
Oct-99
Sep-96
Sep-97
Sep-98
Feb-99
Feb-00
Jun-99
Jun-00
Jan-96
Jan-97
Jan-98
Date
FIGURE 2.3
Engineering backlog, January 1996 to June 2000.
20 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
45
40
35
30
# of faults
25
20
15
10
5
0
Mon Tue Wed Thu Fri
Weekday
FIGURE 2.4
Number of faults by week.
data as a Pareto chart by day of the week, as in Figure 2.4, we see that the
chart of this data validates what we believed was true. The Pareto chart
gives a picture of the days of the week, and clearly shows us that we have
a severe problem with faults on Monday, and then the faults gradually
decrease as the week progresses, until Friday when they cease to exist. By
knowing that Monday is the worst day for faults, we can focus our efforts
on comparing what is unique or different on Monday compared to the best
day of the week, Friday.
Likewise, if we are working on a problem with quality defects, for exam-
ple, and we suspect operator differences, we can collect data on each oper-
ator and construct a Pareto chart to visually demonstrate the differences.
Figure 2.5 is a Pareto chart displaying each operator’s defect level. Clearly
Jim has significantly more defects than the other three, and in particular
Lisa. This suggests that there could be a work method difference between
what Jim is doing compared to what Andy, Tom, and Lisa are doing. If we
can study Jim’s and Lisa’s methods, find the differences or distinctions,
and then modify Jim’s method, then doesn’t it make sense that Jim’s defect
level would be reduced to near the level of the other operators?
45
40 40
35
# of defects
30
25
20
15
10
5 5 3 2
0
Jim Andy Tom Lisa
Operator
FIGURE 2.5
Number of defects by operator.
22 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Pareto charts are really quite simple to create. Along the horizontal or
x-axis, we simply place what we are comparing (e.g., operators, machines),
and then place whatever we are measuring along the vertical or y-axis.
Now what could be more straightforward than that? In Figure 2.5 we see
that Jim has had forty defects, Andy five, Tom three, and Lisa two. The
important thing to remember about Pareto charts, from a problem-solving
perspective, is that if significant differences in the level of the response vari-
able exists, then we must search for differences in methods (for operators)
or functions (for machines) that created the difference in performance.
Although Pareto charts are easy to develop, most people don’t use them
effectively. In my first book, Process Problem Solving: A Guide for Maintenance
and Operations Teams, I used an example of fiberglass panels that had a
variety of defects on them [2]. We constructed a Pareto chart and found
that blisters were the number one defect problem. From this Pareto, we
spawned a second, lower level Pareto of part types and concluded that
one of the six different part types examined (part J40) had 74% of the blis-
ters. This allowed us to focus on this part type rather than working on all
six parts. We then divided this part into six different physical zones and
determined that Zone A contained 72% of the blisters. By using a series of
Pareto charts, this team had a clear sense of direction and a focal point: it
would focus its efforts on blisters on part J40 in Zone A. If the team was
able to determine the root cause of these blisters on this part and in this
zone, then it might be able to translate this to the other parts and other
zones.
It’s never easy to eat an elephant or an apple in one single bite, but if
we take one bite at a time, we have a much better chance of succeeding.
Multilevel Pareto charts help the team focus and prioritize its efforts.
2.3 CAUSE-AND-EFFECT DIAGRAM
Our third tool, the cause-and-effect diagram, or fishbone diagram (because
its structure resembles the bones of a fish), is one of the most popular
tools ever developed. It was created and developed by Dr. Kaoru Ishikawa,
a noted Japanese consultant, and is also referred to as the Ishikawa dia-
gram in his honor. A cause-and-effect diagram is a tool that helps identify,
organize, and display possible causes of a specific problem. It graphically
illustrates the relationship between a given outcome (the effect) and all
Four Basic Tools for Problem Solving • 23
Problem description
Material Machine
FIGURE 2.6
Cause-and-effect diagram.
the factors that might influence the outcome (the causes). The structure of
the diagram helps the team think in a very systematic way, as it looks for
potential causes of the problem it is trying to solve. Figure 2.6 is a typical
layout of a cause-and-effect diagram.
The construction of a cause-and-effect diagram starts by identifying and
defining the outcome or effect being studied (i.e., problem description),
and placing it to the far right side of a straight line. We then establish main
causal categories, such as man, method, machine, and materials, and place
them at the end of diagonal lines drawn from the central spine of the fish,
as is illustrated in Figure 2.6. For each of the main categories, we then
identify other, more specific factors that could be the causes of the effect,
and place them on offshoot bones from the diagonal lines. We continue to
identify more detailed and more explicit causes, and then organize them
on bones that come off the offshoot bones.
Figure 2.7 is a hypothetical cause-and-effect diagram for a person with
diabetes whose blood sugar is out of control. Four major categories were
selected (Food/Nutrition, Medicine, Exercise, and Person) and then more
specific, potential causes for the out-of-control diabetes were added to
each major category. These more specific secondary causes are seen as the
smaller bones on the fish emanating from the major categories at the end
of the diagonal lines as we attempt to zero in on our list of potential causes
of the problem. Finally, even more specific causes are added. For example,
under the category Exercise we see that the “level of exercise” is listed with
“too low” and “none” completing this series of bones.
Meals/nutrition Medicine
Amount taken
Sugar content of food Fo
rg
N
Time taken
o
To et
o
tk
st
N o
n
hi ot
gh ta
ow
n
on ke
sc Correct
Meal schedule h ed medication
N N u M
ot o tf Carbohydrate content of food l e i sd
on ol iag
sc lo no
he wi sis
du ng by
le sc Dr
he
du .
le Diabetes
out of control
To
no
o
on
re
e
sd
N low
iet
Exercise Person
FIGURE 2.7
Cause-and-effect diagram for out-of-control diabetes.
24 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Four Basic Tools for Problem Solving • 25
2.4 CAUSAL CHAIN
The final tool we will be discussing in this chapter is the causal chain. When
problems occur, we know that a chain of events has taken place to alter the
performance of the process. We aren’t always certain as to what happened,
so we need some kind of tool or technique that will help us develop a
theory as to what did happen. One of the most effective tools available for
accomplishing this is the causal chain. Causal chains are stepwise evolu-
tions of problem causes (Figure 2.8). Each step in the causal chain repre-
sents an object in either a normal or abnormal state. The object is placed
Gasket
fails Oil level Air
low compressor
Air
fails pressure
Pressure
too low switch
opens Current
stops Motor
stops Press
stops
FIGURE 2.8
Single stairstep causal chain.
Filler arm Select different location for filler arm
catches inside Install a larger ball on end to limit catching on cores
Filler arm cores
sticks
Core machine
overfills
Opening Create larger opening for cores to decend from
Hopper cores stuck hopper to bowl
jams
Core machine
doesn’t fill
FIGURE 2.9
26 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
on the line to the far right of the chain, with the state it is in listed directly
beneath it. So, in Figure 2.8, the object in distress is the press and its state
is that it has stopped. Although we might use a cause-and-effect diagram
to list the variety of reasons why the press stopped, it does not explain the
causal mechanism that actually caused it.
In Figure 2.8 we see that a press has stopped and we ask the question why.
The press stopped because the motor stopped. Why did the motor stop?
Because the current has stopped flowing. We continue asking why until
we reach the end of our chain, and find that the press stopped, because the
motor stopped, because the current stopped, because the pressure switch
opened, because the air pressure was too low, because the air compressor
failed, because the oil level was too low, because of a gasket that failed. We
have now developed a potential theory as to why the press stopped, and
along the way we have identified objects or items (e.g., current, oil level)
that we can test to prove or disprove our theory. Each step is the cause of
the next step and the effect of the preceding step. That is, the information
on the step to the left is always the cause of the information on the step to
the right.
What if we have more than one potential cause? How do we handle
that situation? The answer is simple. We just create additional, individual
chains like the one in Figure 2.8 and place them along the y-axis as in
Figure 2.9. This is an actual example from a team that was working on
a core machine that was malfunctioning. Here the team brainstormed
and came up with four different chains. Each individual chain is, in real-
ity, a brief theory to prove or disprove, as to how the core machine was
malfunctioning.
Ultimately, the team either eliminated the chain as a potential cause
through testing or developed action items for each of the potential root
causes. At the end of the day, the team performed all of the actions in
the gray shaded boxes, and the problem was not only solved, but it was
improved from its previous state.
Remember, the primary purpose of a causal chain is to develop a step-
wise chain of events that explains why a particular performance shortfall
exists. Once this is complete, hypotheses or theories can be formulated as
to why a problem exists. Causal chains are, in my opinion, one of the sim-
plest and effective, yet most underutilized tools available for a team to use.
http://taylorandfrancis.com
3
A Structured Approach
to Problem Solving
When a problem comes along, study it until you are completely knowl-
edgeable. Then find that weak spot, break the problem apart, and the rest
will be easy.
29
30 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
The split must be repaired before proceeding to the next step, because
of the potential leak path it might create. To make matters worse, the split
was not always detected during the inspection, prior to the tank being
cured, creating a much more difficult postcure repair. It was a potentially
serious problem, because of the aforementioned leak potential.
This particular problem had apparently been around for as long as any-
one could remember, and everyone on the team had doubts that this prob-
lem could actually be solved. By asking one simple question during the
first team meeting convinced me that it could in fact be solved. During
the first meeting, I asked this question to the team that was assembled:
“Have you ever produced a tank that did not have this problem?” The team
members assured me that they routinely make tanks without this problem
and that in fact only 40% of their tanks actually had splits in them. I con-
fidently told them that if they can make one tank without the split, then
we will solve the problem. From the looks of doubt that I received, it was
apparent that they weren’t buying what I was saying, so I had a room full
of disbelievers.
In the next six chapters, we will take a look at how we did solve this
problem by using the following Problem Solving Roadmap. The roadmap
(Figure 3.1) is an amalgam of tools, techniques, and experience that I have
used over the years to not only successfully solve problems but to also
teach others. It has worked on a variety of problems, from a diversity of
industries producing a multiplicity of products. The roadmap will work
equally well for service companies and can also be used as a guide to pro-
cess improvement. I think you will find that there is no equal to this road-
map when it comes to a detailed, step-by-step methodology for solving
problems in a systematic way.
I cannot overstate the importance of following a logical, systematic, and
structured approach to solving problems for a number of reasons. First, a
systematic approach keeps the team focused and helps discourage a team
from wandering aimlessly. Many times, I have witnessed teams struggling
with a sense of direction or what to do next, but once they were presented
with a roadmap to follow, they stayed on track and made significant and
rapid progress. Second, using a structured approach helps the team under-
stand what information is needed, and then facilitates the organization of
data, thoughts, and information. It separates what’s important from what
isn’t. Third, by using and following a systematic approach, with its logical
progression of tasks and activities, the probability of finding the true root
cause increases significantly. Finally, by using a structured approach in a
A Structured Approach to Problem Solving • 31
FIGURE 3.1
Problem Solving Roadmap.
TABLE 3.1
Six Sigma and Toyota’s Practical Problem Solving versus Problem Solving Roadmap
Problem Solving Toyota’s Practical Problem
Six Sigma Element Roadmap Step Solving
Define I. Define, describe, and 1. Initial problem perception
appraise the problem 2. Clarify the problem (the “real”
problem)
3. Locate area/point of cause
(where the problem physically
occurs)
Measure II. Collect data
Analyze III. Investigate, organize, 4. Investigation of the root cause
and analyze the data by using 5 Whys
IV. Formulate and test a
causal theory
V. Select the most
probable cause
Improve VI. Develop, test, and 5. Develop countermeasures
implement 6. Evaluate
countermeasures
Control VII. Implement, 7. Standardize
document, and celebrate
The Toyota Way [3]. All of the elements of the Problem Solving Roadmap
coincide with the actions needed to improve processes. The implication
here is that it isn’t necessary to wait for a problem to be declared before
using the roadmap. Table 3.1 summarizes these alignments.
Back to our problem solving, let’s take a look at this process and see how
this team solved the problem with split inner liners. As you go through
this process, remember that this team had never received any substantial
problem-solving training, but by diligently following the Problem Solving
Roadmap, they were able to solve a problem that had plagued their com-
pany for years. The team was comprised of just ordinary people achieving
extraordinary results by simply following the roadmap. I’m certain that
you too can achieve extraordinary things.
4
Define, Describe, and
Appraise the Problem
The most serious mistakes are not being made as a result of wrong answers.
The truly dangerous thing is asking the wrong questions.
Peter Drucker
33
34 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
problem exists. That is, how many units are affected, and what percentage
of any one unit is affected? For example, suppose you have five machines
producing the same or similar products, and only one machine is affected
by the performance problem. Here you make note that only machine A
is impacted. The most effective tool to answer this question is the Pareto
chart. The Pareto chart provides a graphical representation of the problem
for everyone to see.
In our case study, approximately 20% to 30% of the tanks produced had
the problem (Note: A run chart will be presented later to show the per-
centage of total tanks with the problem as a function of time), and the split
could be the length of the joint or just a portion of it (the team saw both).
This split was also observed at locations directly behind the joint, but this
was seen on approximately only 3% of the tanks affected.
Who has the problem? If there are humans involved (i.e., operators), then
we need to define exactly who the operators are with the problem. Suppose
there are three operators, and only one is exhibiting the performance
shortcoming. By defining this point, we can focus our attention on what
the differences or distinctions are that will lead us to the root cause of the
problem. Here we can use a Pareto chart or a simple matrix of operators.
In our case study, there are approximately ten inner liner operators and
three cement operators who have produced tanks that had this defect.
(Note: A Pareto chart will be presented later.)
Is the problem change-related, or is it a launch-related problem, or is
it a hybrid of a change- and launch-related problem? This is important
to determine, because our approach to solving the problem is uniquely
different. In a change-related problem, we focus most of our efforts and
attention on determining exactly what changed and when the change
occurred. In a launch-related problem, the answer to the problem lies in
accurately determining the differences or distinctions between where or
when the problem occurs compared to when or where it doesn’t. Or, in
the event that the distinction is found to be a person, who has the problem
compared to who doesn’t have the problem.
In a change-related problem, our primary tool is a trend or run chart,
while in a launch-related problem we utilize Pareto charts or matrices as
a means of comparison. Of course, in a hybrid type problem, we use both
the run chart and the Pareto chart.
In our case study, we were not certain as to whether this was a change-
related problem, a launch-related problem, or a hybrid, because data
had not been collected on inner liner splits in the past. Because of this
36 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
120.0
100.0
% of production split
80.0
60.0
40.0
20.0
0.0
9/6/2006
9/8/2006
9/10/2006
9/12/2006
9/14/2006
9/16/2006
9/18/2006
9/20/2006
9/22/2006
9/24/2006
9/26/2006
9/28/2006
9/30/2006
10/2/2006
10/4/2006
10/6/2006
10/8/2006
10/10/2006
10/12/2006
10/14/2006
Date
FIGURE 4.1
Run chart of percent of production with split liners.
The success metric can be as simple as the number of objects with the
problem as a function of time, or it can be the percent of production with
the problem, or any other metric that the team agrees will let them know if
improvement has been achieved. In any event, you want to be able to relate
the response of any changes you might make to the level of the problem.
The trend chart or run chart is the best tool to accomplish this. You will
recall from the last chapter that the trend chart displays time (hours, days,
weeks, etc.) along the x-axis and the level of what you are measuring along
the y-axis. These charts are simple to create, and they permit the team
to record the actual changes made directly on the chart, making it com-
pletely visible to everyone.
In our case study, I prepared a run chart (using Excel) that was used by
the team to track its progress (Figure 4.1). As you can see, the x-axis rep-
resents time and the y-axis is the percent of production with inner liner
splits. Since the team had no historical data to understand when the prob-
lem had begun, we first had to establish a data collection system to capture
data on the defect, which we did. Once the success metric was established,
it was time to move to step 3, the definition of symptoms.
http://taylorandfrancis.com
5
Investigate, Organize,
and Analyze the Data
One who returns to a place, sees it with new eyes. Although the place
may not have changed, the viewer inevitably has. For the first time, things
invisible before, become suddenly visible.
39
40 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
(and sometimes taste), then record what our senses tell us. Our next step is
to ask some basic questions relative to our senses:
• What do I see when the problem is present or just prior to the onset
of the problem?
• What do I feel when the problem is present or just prior to the onset
of the problem?
• What do I hear when the problem is present or just prior to the onset
of the problem?
• What do I smell when the problem is present or just prior to the
onset of the problem?
• What do I taste (sometimes) when the problem is present or just
prior to the onset of the problem?
3. The surface of the inner liner appeared tackier to the touch on tanks
with the defect.
4. The team did not hear or smell anything abnormal and we certainly
didn’t taste anything.
of changes that occurred months prior to the onset of the problem, and
there may or may not be complete documentation available.
One example of this might be a problem that is tied to humidity or tem-
perature. Suppose, for example, that a process change was made in July
or August in Alabama, and the problem is caused by low temperatures.
Its effects might not be observed until December, January, or February,
since those are the only months that get cold in Alabama. My advice is
that any change made to the process, within at least the last six months,
should be recorded and tracked. Six months is simply a guideline, based
on my experiences, but it certainly isn’t a hard and fast rule. I have even
seen an example of a change made almost a year prior to the appearance
of the problem found to be the root cause of the problem. When in doubt,
record the change.
Where should the problem-solving team record the changes? My advice
is very simple. Record all changes directly onto the run chart. The reason
I recommend this is simple. By recording all known changes on the same
document as the defect rate (i.e., the run chart), it is much easier to directly
visualize the impact of the change if it is tied to increases or decreases in the
level of the problem. In Chapter 3, in the case of the engineering backlog,
we saw the full impact of recording the change directly onto the run chart.
In our case study, there were no known, or at least no documented,
changes that had been made prior to the onset of the problem, whenever
the onset was years ago.
the answer to the question, Where or when would you expect to see the
problem, but you aren’t? If DFCs exist, they can be very valuable to the
team, because they can shorten and decomplicate the actions of the team.
DFCs can be people, machines, materials, processes, systems, and so forth,
and the fact that they exist is proof positive that a problem can be either
completely solved or its effects can be minimized.
In our case study, the team had already developed a cause-and-effect
diagram by the time I had arrived and had begun the process of eliminat-
ing potential causes. As I sat in my first meeting with this team, I noticed
that one of the causes the team had identified were the operators that pro-
duce the first part of the tank. It seemed strange to me that the team had
already eliminated them as a potential cause. I sat silently, but wondered
to myself how or by what criteria they had concluded that operators were
not the cause of the effect they were observing (i.e., splits in the inner liner
material).
After the meeting, I stopped the team leader and asked him how they
had concluded that operators (or their methods) could not have caused the
problem. He simply told me that based upon their experience, operators
were not an issue. Apparently, this decision had been hotly debated, but in
the end, he himself decided to remove it as a cause.
I decided to follow up on this point and look at any data that might be
available that might either confirm that the operators were not a factor in
this defect or confirm that there was an operator correlation to the defect.
I started with the first step in the process where the inner liner is applied
to a plaster form. I found that there were ten to twelve operators producing
tanks at various work stations.
I created the Pareto Chart shown in Figure 5.1, and as you can see, there
was not one operator that stood out as a DFC, so maybe the leader was
right after all about operators not being a cause of the problem.
I then went to the next step in the process where operators apply a
cement to the surface of the inner liner material to prepare it for later
steps in the process. Again, I created a Pareto chart to determine if there
was a cement operator relationship to the problem of split inner liners (see
Figure 5.2). Much to my surprise, one operator had significantly more split
inner liners than the other two.
But what if the operator with the most defects also produces the most
tanks? Could that be the reason he has more split inner liners? Again, I
went to the data and prepared a Pareto chart (Figure 5.3). Yes, he had made
more tanks, but his rate of failure was the key point here. The percentage
44 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
8
7
6
# of Splits 5
4
3
2
1
0
A B C D E F G H I
Operator
FIGURE 5.1
Pareto chart for number of inner liner splits by inner liner operator.
30
25
20
# of tanks
15
10
5
0
A B C
Cement operator
FIGURE 5.2
Pareto chart for number of tanks with split inner liners by cement operator.
#Produced #Split
80 74
# produced and split
70
61
60 54
50
40
30 28
20
10 5 2
0
A B C
Nylon operator
FIGURE 5.3
Total produced and number of total inner liner splits by cement operator.
of his production with split inner liners was significantly higher than the
other two operators. That is, operator A had almost 80% of the defects but
had produced only 39% of the tanks. Operator C had only 6% of the defects
and had produced 32% of the tanks. Because of this finding, I could not
rule out operators as a potential cause of the problem and decided to meet
Investigate, Organize, and Analyze the Data • 45
25
20
# cracked
15
10
0
F
J
C
L
B
N
G
K
H
O
A
Tank type
FIGURE 5.4
Pareto chart of number of tanks with inner liner splits by tank type.
46 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
uniquely different about where or when we see the problem. Suppose that
you create a run chart and you notice that the defect is only observed on
Mondays and Tuesdays or between 8:00 am and 5:00 pm, what would
that tell us? Defect-free configurations can also be related to the time we
observe the problem. Suppose, for example, that we researched a problem
and discovered that a particular problem only occurred on the first shift
of a three-shift operation; wouldn’t that be important to know? Of course,
it would be! It implies that there is something different going on in the
process during the first shift, doesn’t it? It could be a temperature differ-
ence, an operator difference, or something else we hadn’t considered. So
please, when you are looking for a DFC, consider the subject of time. Now
let’s move on to the next step in the process: our search for distinctions.
step, we are only looking for differences and not why the distinction may
or may not be important. The team recorded this finding as a distinction.
In the case of the three cement operators, the team elected to video-
tape the operators’ motions, techniques, and methods, and then review
the videotapes as a team to look for differences. This was done and the
team reviewed the videos and compared the operator who was having 80%
of the split inner liners to the operator that only had 6%. The team dis-
covered major differences in work methods, especially as it related to the
amount of cement applied to the inner liner. The operator who had 80%
of the defects applied significantly more cement than the operator who
had only 6% of the defects. In addition, the team observed differences in
the method of application of the cement and recorded each difference for
future use. Again, we are only looking for differences and not the rea-
son why the difference might be important. This is an extremely impor-
tant thing to remember, because differences should only be discounted or
accepted after discussions later in the process.
Br
Tr
Y Pu
Ta
ire
Different operators (N) ll c
ea
ap
Pa
ck
k
Ca
Excess humidity pe to tio
o
so
n
po
ra
lib
m
lv
rf
di
in
Y
ra
t
gu
t
or
en
or
uc
t
us
m
r
h
ed
ch
flo
oo
ar pr is
d
ap
t fl
tin es
2n
g su
1s
pl
re ie
d
to
Y fo
Air movement (N) Y Y Y rm
Y Y Y Y
Y
Split tanks
Temperature nylon (N)
n r r Composition poly
o oo oo Y
ati t fl d fl
plic 1s 2n Autoclave effects (N)
ap Viscosity nylon
At Y (“Hot house”) oven effects
Y Y Chunky (Y) Ti Va Y
Y Y m ryi
M
Gum (Y) e ng
ea
58
Ag
tem
e
su
12
re
pi
of
%
d
no
ch
oll
so
%
Y ve
nr ss ion
un
lid
nb
so
o
s
ne sit
ky
yp
lid
ick po
s
ion Th m os
sit Co
itio
n
Po in
ov
Machine-tools en
Y Y Y
Y Y Y
Material Y
Indicates investigation complete, operating
within spec
Indicates additional investigation required (Y) Indicates team believes this item could
(more data) be a contributing factor
Investigation showed failure, possible (N) Indicates less probability of contribution
operation outside of process specification
FIGURE 5.5
Cause-and-effect diagram for split tanks.
48 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Investigate, Organize, and Analyze the Data • 49
Pu
Y
Di
Br
Ta
l
N
Tr
t
rec
Pa
ck
t
Ca
i
ak
ap
Excess humidity
oo
pe
li
r
s
on
po
ra
m
Location of inner liner Jt.
br
N
fo
gu
olv
uc
int
r
diu
ate
m
rc
s
or
en
d
is
oo
ha
rm
re
ap
flo
f fo p
rti
d fl
ao
ssu
lie
ng
1st
are
2n
re
dt
ed
of
Y urvY
or
N Y Y Y
Air movement (N) N Y Y
N H2O application to form coating
N Excessive inner liner Jt. tension
Ba
Y Y Y
dc
Y
ng
ov
hi
era
e tc
ge
r
St
Y
H2
O
Y
Split tanks
Temperature nylon (N)
r
Composition poly
or
Autoclave effects (N) N
oo
n N N
flo
io
t fl
d
at
1s
ic
2n
pl Viscosity nylon Condition of form
ap Y
m
N (“Hot house”) oven effects
r
At
fo
N
or
in
Ti
Ve
ff
m
o
N
ns
r if
e
Chunky (Y)
io
yin
i ng
N N
tat
N go
M
ak
Ag
Gum (Y)
58
ve
e
ea
1
Cr
n
de
o
su
T
In
fc
2%
re
ll s by
hu
so
ro es i on o
n
n
d%
it N ve
lid
on Y n
ky
s
ick os Y
so
n Th p po
lid
tio m
s
si Co
sit
io
Po n
N Machine-tools
N N
N N
N N
Material
FIGURE 5.6
50 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
It is a capital mistake to theorize before one has data. Insensibly one begins
to twist facts to suit theories, instead of theories to suit facts.
6.1 DEVELOP A HYPOTHESIS
To develop a hypothesis is to develop a theory as to the sequence of events
that led to the reason or reasons why a problem appeared. Of the tools
available to the team, the causal chain is, for me, the most effective. It is
one thing to create a long list of potential causes, like you would with a
cause-and-effect diagram, but it is quite another to create the actual chain
of events that produced the effect.
Causal chains are stepwise evolutions of problem causes. Again, the
thought process behind theory development starts with the problem at hand
and then, working backward, asking the question why until we arrive at a
potential root cause. Each step represents an object in a normal or abnormal
state, and is the cause of the next step and the effect of the preceding step.
That is, the information on the step to the left is always the cause of the infor-
mation on the step to the right. In creating the causal chain, we are attempt-
ing to not only find the cause or causes of the effect we have observed, but
also the series, or string of events, that happened to create the problem.
When creating causal chains, or developing hypotheses, we always
start our journey with the symptoms of the problem, but we also have
to consider the impact of changes, defect-free configurations (DFCs), and
53
54 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
FIGURE 6.1
Inner liner splitting causal chain.
Formulate and Test a Causal Theory • 55
56 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
much thought to your theory, which was based on concrete data, observa-
tions, and intuition, and you are now ready to put your theory to the test.
You know that proving your theory must withstand all attempts to dis-
prove it, so you take your time thinking about how best to test it. Testing
involves fully understanding the causal mechanism that led you to your
theory, and then either setting up a series of tests or making the defect
occur under controlled conditions.
In the case of the inner liners that split, the team had developed two pri-
mary theories that had to be tested. The first theory involved the joint being
placed under tension as it was stretched around the corners of the plaster
form. The team concluded there were two choices on the tests it could run.
Either the joint could be moved into an area of less tension (i.e., a flat area
on the form), or a tank type that already had the joint in a flat area could
have the joint moved into the curved area of the form. The team elected to
use the former rather than the latter simply because it wanted to improve
a bad condition rather than worsen a good condition. I agreed with them.
The second theory revolved around the volume of cement applied to the
inner liner material. Again, the team had two choices, and again it elected
to attempt to improve a bad condition by instructing the operator with
the highest proportion of splits to apply the same amount of cement as the
best operator. Again, I agreed with the team, because to my way of think-
ing it is always a better choice when you are moving toward improvement.
Having said that, there is an equal and opposite argument that would tell
you it is always better to create the defect rather than do things to improve
it during the testing phase. In some cases, I agree with this viewpoint, but
in this case I did not.
The team chose to test the first theory, the one that involves tension on
the inner liner joint as it passes around the curved area of the plaster form.
The team could have selected the tank type with the highest proportion of
split inner liners to study, but because the production requirements were
low on this tank, the team instead elected to study the tank type that had
both the highest number of defects and the highest production require-
ments. The rationale was that the higher build rate would provide the
opportunity for faster results, and I agreed.
The team received approval to move the joint from the engineer-
ing group, then moved the joint away from the curved area of the form
into a flat area. The results came almost immediately, in that three of the
tanks displayed splits within the first day. But now, instead of the splits
being at the joint as before, they were now located in the liner material
Formulate and Test a Causal Theory • 57
directly above the curved part of the form but not at the joint. This sort
of confirmed the tension theory, but the team was not satisfied. The team
observed the first step in the build process again, where the inner liner is
placed on the form. The team noted that the first-stage operators fold the
liner material back onto itself, so that they are able to refresh the form to
provide a tacky surface upon which to stick the inner liner to the form.
The team concluded that this folding action probably caused a permanent
deformation in the liner material and weakened it, and since this is now
the area under tension, it is now susceptible to splitting. The team was
actually happy to see the splits in the liner material, because it now had a
much better understanding of the factors that create the splitting.
The team now turned its attention to the theory that excess cement (actu-
ally, the organic solvent in the cement) was attacking and deteriorating the
liner material. The team wanted to make this a learning experience for
the operators involved, so all three cement operators were invited to view
the video of themselves. The team explained to the operators that they
were looking for differences in work methods and specifically the volume
of cement being applied. The operators watched intently, and shortly into
the video, the operator who had the highest rate of liner splitting said that
he noticed that he was applying much more of the cement than the other
operators. The other two operators agreed and one of them asked which
method was better. The team showed the operators the data for the first
time on splitting, and all three operators immediately understood how
much cement needed to be applied and how to apply it. The results were
immediate, as the number of split inner liners was significantly reduced
to near zero.
http://taylorandfrancis.com
7
Choose the Most Probable Cause
When solving problems, dig at the roots, instead of just hacking at the
leaves.
Anthony J. D’Angelo
59
60 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
material applied to the surface of the liner material caused the splitting.
But what if the other two operators in question sometimes apply excessive
cement? Could that have caused the other 20% of the splits? The answer
to this question will only come by auditing all of the operator work meth-
ods to make certain that all operators use the proper techniques and then
apply the proper amount of cement.
In this case, the team discovered that the other two other operators
did, in fact, occasionally apply more than the recommended amount of
cement, especially when they were in a rush to complete the tank. When
the team discussed this observation with all three of the operators, the
operators responded accordingly by using care when applying the cement,
and the level of split inner liners went to zero and the problem was solved.
The final conclusion of the team was that the single root cause was exces-
sive application of cement to the inner liner.
The team felt good that it had successfully solved this problem and there
was a feeling of euphoria at the next team meeting. But just like the Grinch
who stole Christmas, I told the team members that they were not yet fin-
ished, because I didn’t agree with their conclusion. I knew that excess
cement was the apparent cause, but they still had to answer the question
of why the operators were applying too much of the cement. The team sat
motionless, looking at each other and me, in disbelief until I asked one
final question, “Why did the operators apply too much cement?” Their dis-
belief immediately turned to understanding. The team realized immedi-
ately that if the work method had specified the correct amount of cement,
and if there had been regular audits to ensure that the correct work method
was actually being accomplished, then the problem probably would never
have occurred. The team agreed and began discussing the solution to this
problem, which leads us to the next step in our problem-solving process.
http://taylorandfrancis.com
8
Develop, Test, and Implement Solutions
It’s so much easier to suggest solutions, when you don’t know too much
about the problem.
Malcolm Forbes
8.2 FACTORS TO CONSIDER
There are several important factors that must be considered when decid-
ing upon the right solution to implement. The old expression “there’s more
than one way to skin a cat” is really true when it concerns developing
solutions to problems. For example, we want a solution that will be easy
and practical to implement within the current framework of the process
under investigation. It wouldn’t be a good solution if, for example, people
had difficulty understanding how to implement it, so keep your solution as
simple as possible. If the solution concerns a work method change, it must
65
66 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
monitor the results just to be sure that the change actually was the root
cause.
In the case of multiple, potential solutions, the team must select the
“best” solution with respect to complexity, cost, and other factors required
to make the change. In the case of the split inner liners, the team had to
develop a solution that was robust enough to prevent a recurrence of the
problem, and so too must you do the same. Good, effective solutions must
be robust enough to weather potential storms that might arise later.
The split inner liner team came up with a solution that considered how
to ensure that if the operators followed the new work method, which
included thickness measurements of the cement and photos of a tank,
with the correct amount of cement on it, then the probability of having
split inner liners would be minimized. Now let’s look at how best to imple-
ment a solution to a problem.
30.0
% of production with splits
25.0
20.0
15.0
10.0
5.0
0.0
9/24/2005
10/1/2005
10/8/2005
11/5/2005
12/3/2005
9/17/2005
9/10/2005
10/15/2005
10/22/2005
10/29/2005
11/12/2005
11/19/2005
11/26/2005
12/10/2005
Week ending date
FIGURE 8.1
Run chart of percent of production with inner liner splits by week.
success metric, and as you can see, the problem that had been around for
years was no longer a problem. The impact of this achievement was signifi-
cant, because it not only reduced the daily rework for this defect by eight
hours, but it also improved the cycle time of the tanks. This repair took
eight hours to fix, but it also had to sit overnight before it could move to
the next production step.
Because the team had followed a structured and systematic approach to
this problem, success was achieved. But the team knew that its work wasn’t
quite done yet. Whenever a solution is implemented, it is absolutely essen-
tial that a control of some kind is developed and implemented along with
the solution. As a matter of fact, no solution is really complete without a
control to monitor the performance of the process. We’ll discuss this in
the next chapter.
http://taylorandfrancis.com
9
Implement, Document, and Celebrate
Jack Welch
71
72 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
prohibitive, not all processes are capable of utilizing a fail-safe device. But
there are other alternatives.
If it is not possible to implement a fail-safe device, then a manual mea-
surement and a control chart is the second choice. Because the control
chart is based upon the normal statistical variability of the process, data
is collected and control limits are calculated for the process parameter
or product characteristic. Control chart theory states that as long as the
measured or calculated data point falls within these limits and the process
has been deemed capable, then we are reasonably sure that defects will be
caught. The only problem with these type controls is that they are based
upon samples, rather than 100 percent inspection.
A control chart is simply a run chart with control limit lines equally
spaced from the process average. There are many different varieties of con-
trol charts, but in my opinion, the most effective is the x-bar and R chart.
This chart simultaneously monitors the location of the process parameter,
or product characteristic being measured, relative to the historical aver-
age of the process and the amount of variation present. If the calculated
average and range remain within predetermined limits, then the process
is said to be “in control” and is permitted to continue running. If the pro-
cess goes outside these control limits, then the process is said to be “out of
control” and is stopped while corrective action is taken.
There are many excellent reference books on control charts that summa-
rize other rules that dictate whether a process is out of control, so I suggest
you do a Google search to learn more about them.
Figure 9.1 is an example of a control chart that one of my teams devel-
oped to control product coming off of a shear machine. The product being
controlled with this control chart was a long rectangular piece of metal
that not only had to be a certain width but also had to be parallel along the
entire length of it. The x-bar (i.e., average) portion of the chart was used to
control the width of the rectangle, and the R (i.e., range) portion was used
to control its side-to-side parallelism. This chart proved to be a very effec-
tive method of control for a problem that had been “fixed” many times in
the past. As you can see, the procedure for collecting and recording data
is located in the box at the far right side of the chart. Directly beneath the
procedure are “rules of action” that define out-of-control conditions and
actions the operator should take if or when the process actually goes out
of control.
The third, and least attractive, alternative is an audit of the process.
Audits are intended to be a review of a procedure or process or system
(Part # (spec center) and tolerance range)
(069458 (6.643) (6.6274–6.6586) (065971 (6.643) (6.6274–6.6586) (065844 (6.8125) (6.7969–6.8281) (069459 (6.703) (6.6874–6.7186)
Ring shear machine control chart
Xbar chart Procedure
.0180
1. Record Date in date box
UCL 0.0150 2. Record operator initials in
operator box
.0120 3. Record the spec center in the spec
.0090 box
4. Close calipers and push zero button
.0060 5. Open caliper jaws to larger than
.0030 width of piece and then close,
resting caliper on piece being
Xbar 0 measured
–.00300 6. Record measurement in box 1
7. Close calipers and push zero button
–.00600 8. Move to other end of piece and
–.00900 open caliper jaws to larger than
width of piece, close the calipers,
–.01200 resting caliper on piece being
LCL –0.0150 measured
9. Record measurement in box 2
–0.0180 10. Add box 1 + box 2, divide by 2
and record in the Avg box
Date 5/5/05 5/05/05 11. Subtract the spec number from the
average and record in Xbar box
Oper JR JR 12. Subtract box 2 from box 1 and
record in Range box
1 6.8290 6.8315 13. Plot data point from Avg box on
2 6.8295 6.8400 X-bar chart
14. Plot data point from Range box
Avg 6.8295 6.8375 on range chart
15. Take action per rules of
Spec 6.8125 6.8125 action as necessary
Xbar .01675 .02375
Rules of action for
Rbar .0005 .0085
out-of-control conditions
Range chart 1. If one or more points fall outside
0.022 UCL or LCL (Red hash line)
UCL 0.020 measure next 2 spears per steps
1–14. If 2nd point is outside
0.018 control limits call supervisor.
0.016 2. If 4 of 5 points are on same side
of Xbar (inside green for yellow
0.014 line) line call supervisor.
0.012 3. If any single point outside spec
tolerance limits shut down and
0.010
call supervisor.
0.008
Rbar 0.006
0.004
0.002
0
FIGURE 9.1
Implement, Document, and Celebrate • 73
The actual report does not have to be lengthy. In fact, each of the first
sixteen steps could serve as the format for the report. I always recommend
to companies that they should start a file of “Best Practices,” which is sim-
ply a collection of solved problems, and your report belongs in this file for
future generations of problem solvers.
The range of what we think and do is limited by what we fail to notice. And
because we fail to notice that we fail to notice, there is little we can do to
change, until we notice how failing to notice shapes our thoughts and deeds.
Ronald Laing
Although the two reasons for failure just listed are the most common,
there are other reasons why people fail to solve problems:
77
78 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
there when the effects of the medication (i.e., short-term fix) subside.
Based on my experience, treating symptoms rather than finding and
eliminating the root cause is a very common occurrence when peo-
ple are trying to solve problems.
2. People lack the basic skills necessary for problem solving—Because many
people don’t understand the nature of problems, and because they
have never really had any formal training in problem analysis, they
tend to apply the “change something and see what happens” approach.
All this does is complicate the situation, because you lose track of
changes and probably worsen the problem. This lack of basic skills
is one of the most significant reasons or causes of failed problem-
solving initiatives. It’s not your fault, since you have probably never
been provided any form of effective problem-solving training. Many
of you might have received training on specific tools or techniques,
like cause-and-effect diagrams or Pareto charts, or maybe even trend
charts. But how many of you have learned how and when to use
these tools to define a problem, or search for a change, or identify a
defect-free configuration, or clarify a distinction? Solving problems
requires a systematic approach, and it’s my bet many of you have
never had this kind of training.
3. Failure to look at problems holistically—Many times people only focus
on the symptoms of the problem and fail to look at the causal mecha-
nism that created the conditions for the problem to occur. Remember,
we are looking for cause-and-effect relationships here. I absolutely
agree with Kepner and Tregoe in this aspect, but again, it all comes
down to the level of problem-solving training people have received.
4. Failure to involve the right people—Problem analysis enables people
to work together as a team so as to pool their information. Because
many problems are complex, it is difficult, if not impossible, for one
person to have all of the knowledge and information needed to be
able to solve the problem. Operating in a vacuum not only slows the
problem-solving process, it limits the available scope of knowledge,
setting up the potential to overlook key bits of information.
5. Failure to use a structured and systematic approach—The number
one cause for failure to solve problems is the failure to follow a logi-
cal and systematic process. When problem-solving teams don’t fol-
low a systematic approach, they have a tendency to wander aimlessly,
and usually problems do not get solved. It is only through being dis-
ciplined enough to use a structured approach that most problems
Failing at Problem Solving • 79
actually get solved. Although this, again, is a training issue, the real-
ity is that the fault lies with the leader of the organization. The leader
must set expectations that structured approaches to problem solving
will be used.
6. Failure to define or understand the real problem—Defining the
problem is absolutely a critical first step in resolving the problem.
Without the focus provided by the problem definition, everyone is
not properly focused and aligned. Having the entire team aligned
is critical to the success of the team. Otherwise people will go off in
different directions and the team will most likely fail.
7. No support or expectations from leadership—Support and expecta-
tions from the leadership in any organization are not only impor-
tant, but they are critical pieces of the problem-solving pie. If the
problem solvers do not get the support they need, and if the expecta-
tion of leadership isn’t to use a structured approach to solving prob-
lems, then, quite simply, problems won’t get solved effectively.
http://taylorandfrancis.com
11
A Message for Leadership
Arnold Glasgow
81
82 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
in his classic business article “Stop Fighting Fires” [4]: “In business orga-
nizations, there are invariably more problems than people have the time
to deal with. At best, this leads to situations where minor problems are
ignored. At worst, chronic fire fighting consumes an operation’s resources.
Managers and engineers rush from task to task, not completing one before
another interrupts them. Serious problem-solving efforts degenerate into
quick-and-dirty patching. Productivity suffers, and managing becomes
a constant juggling act of deciding where to allocate overworked people,
and which incipient crisis to ignore for the moment.” In effect, the culture
exhibits classic fire-fighting behaviors.
In my travels throughout the world, I have been fortunate (or maybe
I should say unfortunate) enough to see many examples of fire fight-
ing firsthand, and it’s not a pretty sight at all. Bohn and his colleague
Ramchandran Jaikumar observed fire-fighting behaviors in many manu-
facturing and new product development settings, and have actually devel-
oped a list of “fire-fighting symptoms” to be used as a guide to determine
if an organization is a victim of fire fighting [4]. Their claim is that if your
organization exhibits three or more of the following symptoms, then you
are the victim of fire fighting:
1. There isn’t enough time to solve all the problems. The number of prob-
lems outnumbers the number of problem solvers. (I have seen this
one a lot.)
2. Solutions are incomplete. Many problems are simply patched and
never really solved. That is, the superficial effects (symptoms) are
treated, but the root causes are never eliminated. (This is the most
common behavior from my travels.)
3. Problems recur and cascade. Incomplete or haphazard solutions usu-
ally cause old problems to recur and sometimes create new prob-
lems. (Tied to symptom 2.)
4. Urgency supersedes importance. The “squeaky wheel gets oiled first”
syndrome is usually at work. Problem-solving efforts never really
continue to fruition, because of constant interruptions, due to fires
that must be extinguished. (Usually happens because priorities aren’t
realized.)
5. Performance drops. Because of the ineffective problem solving that
always occurs, overall performance usually always drops dramati-
cally. (A natural cause-and-effect relationship.)
A Message for Leadership • 85
6. Many problems become crises. Problems smolder until they flare up,
often just before a deadline, and then they require heroic efforts to
solve. (Thompson’s law tells us that Murphy is an optimist.)
problem, and then and only then, do they implement solutions that work.
They don’t permit their problem solvers to ride in on white horses and save
the day. As a matter of fact, the successful companies either shoot all their
white horses, or they put them to pasture.
11.4 TRAFFIC INTENSITY
Bohn introduces what he refers to as traffic intensity, or the number of
problems relative to the resources devoted to problem solving. Bohn pres-
ents the following equation:
only solved inefficiently but badly as well. “Gut feel” solutions to problems
become the norm until finally the system simply shuts down.
Engineer 1
Managers
set
New problems Growing priorities Good
and backlog and Engineer 2 Solutions permanent
opportunities of problems pressure solutions
for quick
results Engineer 3
FIGURE 11.1
Flowchart of the effects of fire-fighting syndrome.
88 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Engineer 1
The world:
customers, bosses, other Queue of Manager or
departments, new ideas, Engineer 2
problems committee
etc.
Engineer 3
Neglected
problems
FIGURE 11.2
Flowchart of problems.
and so forth, so matching the right person to the right problem isn’t neces-
sarily straightforward.
So, if your organization is mired in fire fighting, what can you do to
resolve this condition? Bohn tells us that fire fighting can be eliminated,
but it requires a level of commitment that did not exist before the situa-
tion escalated to where it is. Bohn suggests that there are primarily three
approaches for eliminating fire fighting:
1. Tactical methods
2. Strategic methods
3. Cultural methods
Albert Einstein
91
92 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
or tails, so the probability of guessing right is one out of two, or fifty per-
cent. If you were to flip the coin one hundred times, you would most likely
have approximately fifty heads and fifty tails.
In cards, if we want to know the probability of randomly selecting a four
of any suit, for example, we know that there are fifty-two cards in a deck
and four of each type card, so the odds of drawing a four randomly from a
deck of cards is one in thirteen. A bit more complicated than a simple coin
flip but still easy to understand. If we were to calculate the odds of drawing
a four of hearts, then the odds would be one in fifty-two. And if you play
the lottery, you already know by the number of nonpaying tickets that the
odds are astronomical against you selecting the right numbers. In each of
these examples, you are able to calculate the odds of winning.
Calculating probabilities is another way of measuring risk, and we all
know that every decision involves a certain amount of risk. When we
operate in a preventive environment, we are attempting to minimize this
risk. The difference between solving a problem and preventing a problem
is simply one of timing. In problem solving we are in a reactive mode,
after the fact, trying to determine why something did happen. In problem
prevention, we are in a proactive mode, trying to determine what could
happen. Unlike games of chance, like the coin flip or the card draw, trying
to predict or estimate the odds of something specific happening requires
that we use primarily our past experience or maybe some data we have
collected in a similar experience.
FIGURE 12.1
Problem Prevention Roadmap.
involves a future activity. Now let’s look at each of the major segments and
action steps more closely.
These seventeen individual action steps will facilitate your efforts to
attempt to control your own future and destiny. Each of these seventeen
actions requires that we reach deep into our experiences, bag of tools,
techniques, and personality traits in a proactive manner.
Before we look at each of these six major segments and seventeen action
steps more closely, you need to understand one simple fact. In general,
people have a hang-up with the future. That is to say, typically, we are so
caught up in the chaos of problems in the present and leftover problems
from the past that we rarely take the time to look into the future. People
want and need to be future oriented, to be proactive, but the pressures of
problems already upon us typically preclude this from happening.
http://taylorandfrancis.com
13
Defining High-Risk Areas
Living at risk is jumping off the cliff and building your wings on the way
down.
Ray Bradbury
97
98 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
TABLE 13.1
Chances of Occurring versus Consequences Table
Consequences
Table 13.1 is a simple tool that any organization can use to select the right
area of vulnerability. We are considering the potential negative conse-
quences facing the organization and the chances of the negative event actu-
ally happening. Each potential problem or negative event is rated as low,
medium, or high, for both occurrence and consequences, and placed directly
into the appropriate box within the table. As you might have speculated,
any potential negative event or proposed project that falls within a red box
should be selected. Conversely, any event or area that falls within a green
block should be crossed off the list. The yellow blocks are borderline and
represent those events that may or may not be included on your final list.
Once your most important areas of vulnerability have been defined
and consensus has been achieved, it is now time to define the potential
problems, failure modes, and potential negative effects that could occur
in these areas.
http://taylorandfrancis.com
14
Defining Problems, Failure
Modes, and Effects
Again and again, the impossible decision is solved when we see that the
problem is only a tough decision waiting to be made.
101
102 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Progress always involves risk. You can’t steal second base and keep your
foot on first.
Frederick Wilcox
103
104 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
are 100% confident that the problem will occur under certain conditions.
A rating of 1, on the other hand, might denote or imply that there is no
possibility the problem will occur. The numbers between 1 and 10 will be
used to estimate how certain or uncertain you are, but remember, be con-
servative. Logically, if you have no idea whether the problem will occur,
then you might rate the problem as a 5.
Table 15.1, of unknown origin, is frequently used in the auto industry
to estimate the probability of occurrence. The table contains three col-
umns, and each column provides a different piece of information that
will help you with your estimate. The first column is the actual numeri-
cal ranking from 10 to 1. The second column is an actual probability
value, and the third column is a description that corresponds to the level
of probability.
A ranking of 10 suggests that the failure mode is almost certain to
occur, whereas 1 specifies that failure is almost an impossibility. The other
numbers between 10 and 1 have discrete probabilities depending upon
your beliefs, experiences from the past, and level of confidence. The true
value of a table like this is that if you aren’t particularly comfortable or
confident that you can accurately estimate the numerical probability of
the failure, then you can still use the descriptive column to estimate and
assign a ranking. In any event, it’s important that you contemplate your
own personal experiences and those of your team, and then discuss them
thoroughly before assigning a ranking number.
TABLE 15.1
Probability of Occurrence versus Probability of Failure
Ranking Probability of Occurrence (O) Probability of Failure
10 ≥1 in 2 Almost certain to occur
9 1 in 3 Very high chance of occurring
8 1 in 8 High chance of occurring
7 1 in 20 Moderately high chance of occurring
6 1 in 80 Medium chance of occurring
5 1 in 400 Low chance of occurring
4 1 in 2000 Slight chance of occurring
3 1 in 15,000 Very slight chance of occurring
2 1 in 150,000 Remote chance of occurring
1 1 in 1,500,000 Almost impossible to occur
Identifying the Highest Total Risk Problem • 105
TABLE 15.2
Severity and Severity Criteria
Ranking Severity (S) Severity Criteria
10 Hazardous Hazardous effect without warning. Safety related. Regulatory
noncompliant.
9 Serious Potential hazardous effect. Able to stop without mishap.
Regulatory compliance in jeopardy.
8 Extreme Item inoperable but safe. Customer very dissatisfied.
7 Major Performance severely affected but functional and safe.
Customer dissatisfied.
6 Significant Performance degraded but operable and safe. Nonvital part
inoperable. Customer experiences discomfort.
5 Moderate Performance moderately affected. Fault on nonvital part
requires repair. Customer experiences some dissatisfaction.
4 Minor Minor effect on performance. Fault does not require repair.
Nonvital fault always noticed. Customer experiences minor
nuisance.
3 Slight Slight effect on performance. Nonvital fault notice most of the
time. Customer is slightly annoyed.
2 Very slight Very slight effect on performance. Nonvital fault may be
noticed. Customer is not annoyed.
1 None No effect
106 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
TABLE 15.3
Probability of Detection
Ranking Detection (D) Likelihood of Detection by Design Control
10 Absolute No controls in place to detect the presence of the effects of
uncertainty the potential failure mode.
9 Very remote Very remote chance that the controls in place will detect
the presence of the effects of the potential failure mode.
8 Remote Remote chance that the controls in place will detect the
presence of the effects of the potential failure mode.
7 Very low Very low chance that the controls in place will detect the
presence of the effects of the potential failure mode.
6 Low Low chance that the controls in place will detect the
presence of the effects of the potential failure mode.
5 Moderate Moderate chance that the controls in place will detect the
presence of the effects of the potential failure mode.
4 Moderately Moderately high chance that the controls in place will
high detect the presence of the effects of the potential failure
mode.
3 High High chance that the controls in place will detect the
presence of the effects of the potential failure mode.
2 Very high Very high chance that the controls in place will detect the
presence of the effects of the potential failure mode.
1 Almost certain Almost certain that the controls in place will detect the
presence of the effects of the potential failure mode.
TRF = O × S × D
For example, suppose our ranking for occurrence was 9, our severity rank-
ing was 5, and our detection ranking was 10. The total risk factor would be
9 times 5 times 10 or 450.
108 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
TABLE 15.4
Impact versus Action
TRF Value Impact Action
1–10 Minor impact/risk Minor design changes, process improvements,
or increased controls are needed.
11–125 Moderate impact/risk Moderate design changes, process
improvements, or increased controls are
needed.
125–700 Major impact/risk Major design changes, process improvements,
and 100% inspection are needed. Production
may have to be stopped.
>700 Catastrophic impact/risk If in production, stop and redesign product or
process, 100% improved inspection, etc.,
until problem is resolved.
Identifying the Highest Total Risk Problem • 109
The measure of success is not whether you have a tough problem to deal
with, but whether it is the same problem you had last year.
111
112 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Sören Kierkegaard
115
116 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
I have a word of caution as you reevaluate each risk factor. Make certain
that you and your team are unbiased and objective as you consider the
three areas of risk. If you feel or sense that your team isn’t being impartial
and unbiased, it is always a good idea to get a sanity check from someone
outside the group that is not vested in the outcome of the evaluation.
18
Implement Preventive Measures Plan
There comes a moment when you have to stop revving up the car and
shove it into gear.
David Mahoney
119
120 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
occasions good plans gone amiss simply because of how the plan was com-
municated, so please take the time to methodically and meticulously com-
municate your plan and then be there during the implementation.
If your control is a measurement or audit, again, be certain that all who
perform the control testing first know how to accurately make the mea-
surement, and then measure or audit often in the early stages of imple-
mentation. As mentioned earlier, an ounce of prevention is worth a pound
of cure!
The surest sign of a crisis is that when you have a major problem, no one
tries to tell you how to do your job.
Anonymous
121
122 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
19.2 BACKGROUND INFORMATION
This case study involves a company that has been producing truck bodies
for the transportation industry since 1958. The truck bodies are designed
according to customer-supplied specifications (usually), and then fabri-
cated and installed on bare chassis purchased from various chassis manu-
facturers. The engineering group is centrally located at the corporate office,
and the completed designs are forwarded to the plants electronically. The
company produced three basic body types to support the various needs
within this industry. The three body types are the stake bodies, which
are normally used in applications like landscaping businesses; dry freight
bodies, used to haul such things as furniture; and refrigerated bodies used
by such businesses as meat companies and fruit haulers.
The company employed seventeen full time engineers (degreed and
nondegreed), and the performance metric used to track engineering
performance was the amount of backlog hours in an engineering queue.
Because the engineering group was in an apparent state of chaos, which
I will describe shortly, I was asked to lead this group, but more specifically
to reduce the backlog of hours that had grown to an unmanageable level.
At the time this problem surfaced, I was the vice president of quality and
continuous improvement.
The order process is such that orders are received into the customer ser-
vice department, evaluated for completeness, and then a decision is made
as to whether they are standard truck bodies. If they are standard designs,
then no engineering work is required and the order is simply entered into
the order entry system, and the truck body is constructed on a truck chas-
sis at one of the seven plants around the country, depending upon which
is closer to the point of delivery. Prior to the problem emerging, the nor-
mal cycle time to receive the order, build the trucks, and ship them to the
customer was approximately three weeks, depending upon the size of the
order. If the order required engineering work, then it was forwarded to
engineering and placed into a queue, and then remained there until an
engineer was available to work on the order. Rather than using a first in,
first out (FIFO) process, it was not uncommon for an engineer to go into
the order backlog and personally select an order to complete. As you will
see, this selection process exacerbated the problem that had developed,
because many times the order selected was done so because it was easier
The Case of the Engineering Backlog • 123
for the engineer to complete. As a result, the more difficult orders sat in
the queue longer than the simple ones until a customer started screaming
for it.
The problem that had developed was that the actual engineering
“backlog,” as of May 2000, stood at approximately 1200 hours and was
growing. To put this number into perspective, since the typical order
requiring engineering work averaged approximately three hours to com-
plete, then you will understand the magnitude of this problem. At this
rate, there were somewhere around 400 separate orders in the engineer-
ing queue waiting to be processed through the engineering department,
and most orders were for multiple truck bodies. This backlog translated
into lead times to just complete the engineering design work was in excess
of forty days or eight weeks. Add another three to four weeks for con-
struction and shipment, and you’re looking at a total average cycle time
of eleven to twelve weeks to receive, design, and build the truck bodies.
This was assuming there were no mistakes in the engineering work, which
apparently happened quite frequently. If you were a salesman trying to
sell truck bodies, this amount of time would be an inconceivable barrier.
Because of this extended lead time, delivery dates were being missed, and
the company was rapidly losing market share to its competition. Revenues
were obviously declining rapidly as well.
One other concern that I inherited was the declining morale within
the engineering group due primarily to excessive overtime the engineers
were putting in, which was the root cause of many of the mistakes that
were being made on the orders. Of course, when a mistake was made, the
order had to go back into this same queue, thus further lengthening the
lead time. Many of the engineers had apparently lost their motivation and
sense of self-worth. Needless to say, it was a complete mess in engineering.
As I mentioned, during the last week of May 2000, I was asked to assume
responsibility for this engineering group. A decision had been made to
replace the incumbent vice president of engineering, because of his inabil-
ity to lead his troops out of this crisis. In other words, the VP had not
reduced the backlog that he had inherited from the previous VP, and
clearly his group’s performance was negatively impacting the financial
well-being of the company. I was given the singular mission of reduc-
ing the backlog to a manageable level, with manageable being defined as
somewhere between 200 and 300 hours. My mandate was clear, but it was
also clear that we didn’t have much time to accomplish it.
124 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
800
600
400
200
0
May-99
Sep-99
Nov-99
Dec-99
Jan-00
Feb-00
Mar-00
Feb-99
Mar-99
Apr-99
Jun-99
Jul-99
Aug-99
Oct-99
Apr-00
Date
FIGURE 19.1
Engineering backlog February 1999 to April 2000.
The Case of the Engineering Backlog • 125
As can be seen in Figure 19.1, the backlog hours had decreased steadily
during the time period from June 1999 through October 1999, but then
had progressively increased from that point on, with no signs of slowing
down. So, the questions you might ask are, why did the backlog hours
decrease?, and why did the backlog hours increase again? Again, the best
source for answers to these type questions is always documented evidence,
like reports or other written documentation, but in the absence of docu-
mentation, we again turn to interviews with experienced employees (i.e.,
engineers in this case).
In this case, no documentation was available, so we were forced to inter-
view all of the experienced engineers from this time period, and fortunately
it did prove to be productive. According to the engineers, the reason for the
backlog decrease beginning in June 1999 was an excessive amount of over-
time mandated to all engineers (i.e., twenty hours per week per engineer).
Although they were able to steer the backlog down, when the decision was
made to terminate all overtime (engineers were paid for overtime at this com-
pany, so it was costly), not only did the backlog increase to an unacceptable
level again, but the morale in engineering became a problem, because they
were once again facing pressures and uncertainty. At this point, I decided to
look at more history to determine if the problem had always been a problem.
I added three additional years worth of data to the run chart, so it now
contained data from January 1996 through May 2000. Based upon what
you see in this expanded run chart (Figure 19.2), what conclusions are
800
600
400
200
0
May-96
May-97
Sep-97
Jan-98
Jan-96
Sep-96
Jan-97
Sep-98
Feb-99
Jun-99
Oct-99
Feb-00
May-98
Date
FIGURE 19.2
Engineering backlog January 1996 to February 2000.
126 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
you able to draw? Between June 1996 and December 1998, does the engi-
neering backlog appear to be under control and “manageable”? And what
about the question about whether the problem has always been a problem?
I’m sure you’ll agree with me that the problem has not always been a prob-
lem. As you will see, this realization was an extraordinarily important
discovery.
If one were to ask, when did the problem start?, then we see that the
problem began in earnest in January 1999. So, by simply plotting the data
on a run chart, we are able to pinpoint the time frame of when the engi-
neering hours backlog actually became a problem (remember that prob-
lems of this nature begin with a change).
Ask yourself this question: If I want to solve this problem, then what
must my next question be? If you said, what changed on or around January
1999? then you answered the question correctly. So, how do we or can we
find out what changed in January 1999? Finding engineering documenta-
tion would be the best way, but unfortunately many companies don’t keep
good records, and this company was no exception. So sometimes, once
again, we must rely on interviews with employees who were present when
the change occurred, and that is exactly what we did.
The time period between May 1996 and December 1998, when we did
not have the problem, is considered a defect-free configuration (DFC). You
will recall that a DFC is, in this case, found by asking the question, when
would you expect to see the problem, but you don’t? The presence of a DFC
begs the question, what is unique or distinct when comparing when the
problem exists to when it doesn’t exist? Based upon the presence of this
apparent DFC, our next course of action was to determine what were the
distinctions or differences that existed between the two periods of time.
There was little, if any, documentation available that could lead us to
understand what had changed or what the distinctions were, so we, again,
conducted one-on-one interviews with engineers and other support
groups that were present when this change apparently had transpired. The
interviews revealed the following:
1. Prior to January 1999, all incoming orders were received and pro-
cessed by a single group within engineering. The new engineering
VP, wanting to make some changes (and make a name for himself),
“restructured” engineering by creating three individual groups to
handle incoming orders:
Group 1: Stake body orders only
The Case of the Engineering Backlog • 127
Because there were now three separate groups that had become
“specialists,” three distinct silos had been created within engineering,
with no provision or arrangement for intragroup communication. The
result of this “silo” creation was the development of a significant backlog
for dry freight bodies. This comprised the bulk of the backlog hours.
Figure 19.3 is a Pareto chart of the distribution of orders for each group.
It really doesn’t take a rocket scientist to see that if the groups are staffed
equally and the numbers of incoming orders is over 60% straight (or dry
freight) bodies, then a backlog or bottleneck will occur within these type
bodies. It’s exactly the same result one would expect from a constraint
operation in a manufacturing process.
So, faced with this problem and not much time available to fix it, what
would your solution be? Would you try and relive the past? Sort of, “If it
ain’t broke, don’t fix it!” We did just that! We simply went back in time
and recreated the system that worked so well before. We did several things
immediately:
% hours in queue
70
60
% of total hours
50
40
30
20
10
0
Straight Refrig Stake body
Body type
FIGURE 19.3
Pareto chart of percent hours in queue by body type.
128 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
So, let’s summarize what has happened thus far in our search for the root
cause of this problem.
1. Our first step was to define the problem to be the engineering hour’s
backlog was excessive. As a matter of fact, we defined the current
state to be in excess of 1200 hours, and that in order for this prob-
lem to be considered resolved, it had to be at a manageable level of
between 200 and 300 hours. We also had identified a secondary
problem of low engineering morale.
2. The second thing we did was to begin collecting facts from existing
data and documentation in the form of a run chart so we could see
some history. The run chart clearly demonstrated when the problem
began. This was important to us, because once we defined when the
problem began, we could begin to look for changes that had occurred
that could have produced the effect (i.e., excessive engineering queue
hours). We said that if we could determine what had changed, then
we could probably reverse the changes and correct the problem.
3. We then determined the changes that had taken place by conducting
interviews of engineers and support groups that were in place when the
engineering hours began to become excessive. We would have liked to
have confirmed this with documentation, but in its absence, conver-
sations and recollections proved to be helpful. We determined that a
significant change in the organizational structure of the engineering
group had occurred just prior to the onset of the problem. We asked
ourselves the question, “Could the change that occurred have created
the effect we are seeing?” and the answer was yes it could have. The next
step was to change the organization back to its original structure and
to begin measuring the response to the change (i.e., one single group
within engineering to handle all incoming orders and a run chart with
a daily target line, so that engineers could see their progress).
The next step was to measure the progress or response to this change.
Figure 19.4 is a run chart that includes a total hour’s target line, total hours
The Case of the Engineering Backlog • 129
1000
800
Hours
600
400
200
0
6/1/2000
6/5/2000
6/7/2000
6/9/2000
7/3/2000
7/5/2000
7/7/2000
8/2/2000
8/4/2000
8/8/2000
6/13/2000
6/15/2000
6/19/2000
6/21/2000
6/23/2000
6/27/2000
6/29/2000
7/11/2000
7/13/2000
7/17/2000
7/19/2000
7/21/2000
7/25/2000
7/27/2000
7/31/2000
8/10/2000
8/14/2000
Date
FIGURE 19.4
Engineering backlog by day.
backlog, daily past due, and a daily hour’s backlog. As you can see, the
results were pretty astounding. The results came immediately, as the back-
log decreased from 1200 hours to less than 200 hours in eleven weeks and
remained within the acceptable limits. Figure 19.5 is an extended plot of
hours through the end of September 2000, and it is clear that stability was
regained and engineering hours were, once again, manageable. The most
significant result, however, was in the reduction in lead times for order
processing that was reduced from forty days to an astonishing forty-eight
hours. What was once a problem for the company was now a differentiator
in the marketplace that stimulated sales rather than being a barrier. All of
this from a run chart, a Pareto chart, and a few simple questions.
One final point regarding the success of this engineering problem-
solving experience. Figure 19.6 is a plot of hours received, hours com-
pleted, and backlog hours. One thing that stands out as being totally
significant is this. Remember, I had mentioned that this success had actu-
ally stimulated sales? If you look at Figure 19.6 closely, you will notice
that at no time prior to the resolution of this problem were the number of
hours received and completed anywhere near the volumes observed after
the problem was corrected. As a matter of fact, the average hours received
130 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
1000
800
600
Hours
400
200
0
6/1/2000
6/6/2000
6/9/2000
7/5/2000
8/3/2000
8/8/2000
9/1/2000
9/6/2000
6/14/2000
6/19/2000
6/22/2000
6/27/2000
6/30/2000
7/10/2000
7/13/2000
7/18/2000
7/21/2000
7/26/2000
7/31/2000
8/11/2000
8/16/2000
8/21/2000
8/24/2000
8/29/2000
9/11/2000
9/14/2000
9/19/2000
9/22/2000
Date
FIGURE 19.5
Engineering backlog by day.
800
600
400
200
0
May-96
May-97
May-98
Oct-99
Sep-96
Sep-97
Sep-98
Feb-00
Feb-99
Jun-99
Jun-00
Jan-96
Jan-97
Jan-98
Date
FIGURE 19.6
Engineering backlog by month.
and processed out of engineering averaged just over 300 hours when the
engineering process was stable (i.e., between June 1996 and December
1998), but when the problem was corrected and sales were stimulated,
at one point this number reached approximately 800 hours. These addi-
tional hours were not only managed, but the group was able to absorb and
complete them, with no overtime and no engineers added. Not only was
the process “fixed,” it was improved. The improvement methodology that
was utilized is another subject completely, as tools like value analysis and
The Case of the Engineering Backlog • 131
1. Problem definition
2. Problem description
3. Determination of changes relative to when the problem started (run
chart)
4. Discovery of defect-free configurations and the distinctions between
when we had the problem compared to when we didn’t (run chart
and Pareto chart)
5. Collection of key information to generate possible causes (inter-
views, causal chain)
6. Testing for most probable cause
7. Verification of the true root cause
8. Implementing the solution (reversing the change that created the
problem)
9. Implementing a control (run chart with target line and daily feedback)
John Guinther
20.1 CASE BACKGROUND
Improvement, in any endeavor, requires a detailed knowledge of the cur-
rent situation and a team that understands the intricacies of the process.
That is to say, if improvement is to occur, then adequate data on problems
that exist must be collected so that problems can be defined and the cor-
rect priority established. This data must include things like information
on where and what the problems are, the severity and frequency of the
problems, and the impact of the problems if resolution is achieved. This
case involved a subsupplier of pinions to a major European auto manu-
facturer. Pinions are used in things like turn signals and headlight levers.
The pinions, in question, had five individual diameters along the surface
of the shaft that were repeatedly wandering outside the customer specifi-
cation limits and had to be either reworked or scrapped. The company’s
major customer was not pleased with the quality of the pinions and on-
time delivery rate, and was threatening to source another supplier. My job
was to help the auto company understand the nature of this problem, and
develop and implement an immediate and lasting solution. As usual, with
problems of this nature there was an intense sense of urgency.
When I arrived at the facility in France, the only data collection in place
was limited to general information regarding scrap and downtime. The
scrap and downtime being collected included the total amount by shift
and date, but not any specific information as to why it had occurred.
133
134 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
The only exception to this was the existence of finished product inspection
data, which included both the amount of scrap and the general reason it
was scrapped. For example, the scrap data might state that it was scrapped
for diameter, but not which diameter. There was nothing in place to dem-
onstrate the specific causes for scrap or downtime at each of the individual
process steps. Without this type of specific information, it would be nearly
impossible to define, prioritize, and ultimately resolve the problems with
scrap and downtime.
1000 parts per day were being sent outside the facility to an independent
subcontractor for grinding. This turned out to be a very expensive under-
taking, so timely resolution of this problem was critical.
TABLE 20.1
Diameter Tolerances
Diameter Lower Limit of Tolerance Upper Limit of Tolerance
1 –0.005 +0.000
2 –0.000 +0.010
3 –0.005 +0.000
4 +0.002 +0.011
5 –0.008 +0.000
136 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
I surmised that the two diameters that probably had accounted for the
majority of scrap would most likely be the two locations (diameters 1 and 3)
with the diameter tolerance band width of only 0.005 mm (5 microns).
The data collection revealed this assumption to in fact be true.
Figure 20.1 is a summary of scrap information by cause on grinder 2.
(Note: I have left the causes of scrap in French to add an element of real-
ism.) As you can see in Figure 20.1, the number one cause of scrap on
grinder 2 between October 7 and 11, 2003, was ø Non-Retouche, which
in French translated to diameters that were not repairable, because the
diameters were too small. The second leading cause of scrap, ø Retouch
means that the pinion was repairable, but the repair had failed. Again,
this scrap was a diameter problem, as was the third cause, Reglage (adjust-
ment). Of the 252 scraps that occurred during this time period, 242 (96%)
were caused by the grinder’s failure to meet diameters 1 and 3, just as I had
predicted.
With this information in hand, it was clear that we needed to look at
the capability of this machine to hold the 5-micron tolerance. During the
same time frame, grinder 1 was not producing parts because of another
technical problem, so we focused on grinder 2, assumed that the same
problems existed on grinder 1, and concluded that we could translate our
findings from grinder 2 to grinder 1.
The most recent process capability that had been run on the grinders
had been performed three months prior to my arrival and the results indi-
cated a total lack of process capability. The purpose of the capability study
is simply to ascertain how well a process will produce parts that conform
to specification. Two of the calculations frequently used are the Pp and
Ppk. In simplistic terms, the Pp tells us how well the process could pro-
duce parts if it were perfectly centered, while the Ppk is a measure of how
60.0
40.0
20.0
7.5 2.0 2.0 2.0
0.0
Non Retouch Reglage RZ NC Divers
retouche scrap cause
FIGURE 20.1
Pareto chart of percent of scrap by cause for grinder 2.
The Case of the Defective Pinions • 137
well the process is actually performing. A value of 1.0 for either of the two
indices suggests that the ends of the normal distribution of data from the
process will coincide exactly with the limits of the tolerance band. So, any
value less than 1.0 indicates that the process will produce some parts that
are unacceptable, and a value greater than 1.0 signifies that all measure-
ments on the parts will fall within the upper and lower limits of the speci-
fication. If the values for Pp and Ppk are significantly below 1.0, then it is
safe to say that many parts will be defective.
The minimum target for both of these capability indices is 1.33, so that if
the process shifts naturally or if adjustments are purposely made to center
the process, then the process will continue to produce acceptable parts.
It is also assumed that the process is free of special cause variation and,
therefore, contains only natural variation.
Table 20.2 summarizes the capability indices for all five of the diameters
measured on the two grinders. Because the capability indices on grinder 2
are significantly better than grinder 1, we would expect to experience more
diameter scraps on grinder 1 compared to grinder 2, and that was precisely
the case. Remember that the scrap levels on grinder 1 were approximately
five percent compared to three percent on grinder 2. However, based upon
the Pp and Ppk values on grinder 1, one might also expect to see diameter
scraps in locations other than the ø 21.02. Since this was not occurring, I
questioned the validity of the capability studies or that improvements had
been made since the study was run. It was my opinion that the capability
TABLE 20.2
Diameter Tolerances versus Pp and Ppk
Diameter Tolerance + Limits Pp Ppk
Grinder 2
1 ø22.00 + 0.002 + 0.011 1.19 1.11
2 ø12.00 + 0.000 + 0.008 1.08 1.05
3 ø21.02 + 0.000 – 0.005 0.71 0.49
4 ø22.315 + 0.005 – 0.005 1.65 0.70
5 ø21.02 + 0.000 – 0.005 0.70 0.56
Grinder 1
1 ø22.00 + 0.002 + 0.011 0.71 0.72
2 ø12.00 + 0.000 + 0.008 0.67 0.63
3 ø21.02 + 0.000 – 0.005 0.53 0.48
4 ø22.315 + 0.005 – 0.005 0.88 0.39
5 ø21.02 + 0.000 – 0.005 0.57 0.42
138 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
values were closer to 1.0, in the case of diameters 3 and 5, and probably
greater than 1.0 on the other diameters.
When capability indices are close to 1.0, it is not uncommon to see more
scraps caused by overadjusting the process, especially if the decision to
adjust the process is based upon single data points. I surmised that in the
normal operation of the grinders, if the grinder operator had a single scrap
for the diameter being too large, then an adjustment was probably made
to reduce the diameter. When the adjustment was made, the population of
the data shifted to a smaller diameter, which caused scrap diameters in the
opposite direction (i.e., too small). I observed several operators on grinder
2 and found this to be the case. Each time a scrap would occur in either
direction, an adjustment was made in the opposite direction to the grinder
for the diameter 21.02 and predictably more scraps were created.
One way to improve this situation was to base all adjustment decisions
on sample averages, rather than single point measurements since the sam-
ple average is an estimate of the population average. Between October 11
and October 16, a study was run instructing the operators to not make
adjustments when they had a single scrap but rather to let the grinders
run normally, collect data on the next four pinions, and then calculate the
sample average. If the average of the five measurements was outside the
specification limits, then the operators were to make their normal adjust-
ment. (Note: Since each individual pinion was automatically measured, I
knew that the grinder would alert the operator to each out-of-tolerance
pinion diameter, so there was no worry about defective material reaching
the customer.) Prior to the study, we reset the grinder to produce diam-
eters as close to the center of the specification as possible. The rationale
behind this move was that if we wanted to simulate the best possible run-
ning condition, then centering the process would produce the greatest
number of good parts.
Figure 20.2 is the run chart of this data that depicts the percent scrap
before, during, and after this study. Prior to running this study, for the
previous 8243 pinions there were a total of 242 scraps for diameter or
2.94% of production. During the study, for the 10,524 pinions produced,
there were a total of 86 scraps or 0.82% of production. Immediately after
the study, on the 8683 pinions produced there were a total of 373 scraps for
diameter or 4.3% of production.
Based upon the results of this study, it was apparent that the company,
without spending any money, could improve its scrap levels and outgoing
quality simply by applying some very basic laws of process control and,
The Case of the Defective Pinions • 139
Grinder #1 study
10.00
without adjustment
8.00 after scraps
% scrap
6.00
4.00
2.00
0.00
03
/8 03
10 9/20 3
/1 03
10 1/2 3
10 2/2 3
10 3/2 3
10 4/2 3
10 5/2 3
10 6/2 3
10 7/2 3
10 8/2 3
10 9/2 3
10 0/2 3
10 1/2 3
10 2/2 3
3/ 3
/ 0
/1 00
/1 00
/1 00
/1 00
/1 00
/1 00
/1 00
/1 00
/1 00
/2 00
/2 00
/2 00
/2 00
20
10 /20
10 /20
10 0/2
/7
10
Date
FIGURE 20.2
Percent of production scrapped for diameter on grinder 1.
more specifically, changing the rules of action for when and when not to
make machine adjustments. This did not preclude the need to improve the
capability of the grinders, but for now an immediate benefit could be real-
ized by simply changing the rules of action.
Clearly this study had demonstrated how overadjustment of a process
will actually cause the process to deteriorate and produce defective parts.
If the process Cpk (or Ppk) were equal to 1.0, and the process mean was
identical to the center of the specification. For example, the laws of the
normal distribution tell us that it would be normal to produce 3 pinions
out of 1000 (0.003%) that would fall outside the tolerance limits. Since this
process is producing parts outside the tolerance at a rate of approximately
0.8%, we would estimate the Cpk to be somewhere between 0.8 and 0.9
with the process relatively centered. For example, if the process average
was 21.0174 and the standard deviation was 0.0009, then the short-term
Cpk would be 0.85 and the percentage of parts being produced outside the
limits for this diameter would be approximately 0.79%.
The real problem on these two grinders was the excessive amount of
variation within the process with respect to the tolerance width, and in
this case, most of the excessive variation was being caused by an operator
or supervisor overadjusting the process. One of the basic laws of prob-
ability and/or SPC is to never adjust the process on the basis of single
point measurements but rather on an average of several parts. If the cal-
culated sample average is unacceptable, then the process should clearly be
adjusted. Conversely, if the sample average is acceptable, then the process
should be permitted to run without adjustment.
On the basis of this analysis, it was apparent that a problem-solving
team needed to be formed to attack this problem head on. A team was
formed that consisted of the grinder 2 operator, the supervisor from the
140 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
area, the quality manager, the plant manager, an engineer, and me. The
results of this study and current situation were presented to the team, and
then a root cause analysis was completed. The problem analysis flowchart
(PAF) (Figure 20.3) was the tool used by the problem-solving team that I
had developed and translated into French. (Note: For details on how to use
a PAF chart, reference Chapter 6 in my first book, Process Problem Solving:
A Guide for Maintenance and Operations Teams [2].)
The first step in any meaningful problem-solving event is the develop-
ment of the problem statement. In this case, after hearing a description of
the problem, the team concluded that the problem was excessive variation
in the diameter of the pinions run on grinders 1 and 2. To this end, the
team filled out the remaining steps of the PAF chart (see Figure 20.3).
Causes Processus:
Diagramme d’analyse du probleme potentielles Date:
Chaines causale
FIGURE 20.3
Problem analysis flowchart.
The Case of the Defective Pinions • 141
As with the Pareto chart presented earlier, for realism the PAF chart is
presented in French, just as the team developed it. The specifics of what
actually occurred in this case study aren’t nearly as important as the pro-
cess of following a structured approach to problem solving. The custom-
ary questions of what, where, and when, and scope of the problem were
answered, and a problem statement was developed. The team investigated
and listed symptoms of the problem and any relevant data it believed
was important. The team also listed any known changes that could have
impacted the grinders or problem with diameters. Next, the team iden-
tified two defect-free configurations (two other type pinions with more
forgiving diameter tolerances) and the distinctions between where the
problem existed and where it didn’t.
The team was now ready to create a causal chain. The causal chain por-
trays the potential failure modes and concerns of the team members, as
explained in Chapter 4, and then answers the question why until arriving
at a potential root cause. For example, the operator and supervisor, after
seeing the scrap data and basic SPC theory I had presented, believed that
the operators were making diameter adjustments too frequently. When
the question why was asked, it was clear that operator instructions on
when to make the adjustments were not defined. That was an action that
could be acted upon and is a potential root cause of the problem.
Continuing down the causal chain, other potential root causes were
defined and corrective actions were developed. For example, since the
pinions were made of steel, there was a concern that if the machine is per-
mitted to be down for extended periods of time, then the diameters on the
parts could be affected by temperature changes. One of the reasons for the
grinders being down was that the operators must shut down the grinder
three times every two hours and measure the surface state of the pinions
in the lab. In order to do this, the operator had to leave his or her machine,
take a sample to the test lab, and then measure the surface profile them-
selves. This takes approximately three minutes for the grinder operator
to accomplish, but could easily have been completed by quality control
workers. This procedure was changed, thus eliminating the need for the
machine to be shut down. In addition to the potential quality improve-
ment, there was also a gain in production of nine pieces every two hours,
since the machine can now continue to run instead of being shut down
during the measurement. Over the course of a full production day, the
increase in pinion production would be over 100 pinions on each grinder,
or over 200 additional pinions per day without adding additional labor.
142 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Another potential cause for excessive variation was the device used to
automatically measure diameters during the grinding cycle (i.e., a poka
yoke–type device). The device had been mounted on the ø 12 section of the
pinion instead of on the diameters producing the most scrap. The toler-
ance for this diameter had a range of 0.008 mm compared to 0.005 on the
problem diameters. Because of this, it was suggested that the team meet
with the supplier of the measurement device to determine if the measure-
ment could be made on one of the 21.02 mm diameters.
Although the grinder problem-solving team was still in its early stages of
implementing the recommended actions, significant improvements had
already been observed. Figure 20.4 is a run chart of percent scrap due to diame-
ters from October 26 until November 1, 2003. The new scrap reaction technique
employed on both grinders is depicted on the chart, and since implementing
this new technique, the daily average percent scrap for diameter problems had
decreased from 3.1% to 0.73% and appears to be stable at this level.
Grinder 1 displayed an even greater percentage improvement as is
depicted in Figure 20.5. The reduction on grinder 1 was much more
12.00
10.00
8.00
% scrap
6.00
4.00
2.00
0.00
03 03 03 03 03 03 03 03 03 03 03
/ 20 / 20 / 20 / 20 / 20 / 20 / 20 / 20 / 20 / 20 /20
23 23 24 25 26 27 28 29 /30 31 /1
10
/
10
/
10
/
10
/
10
/
10
/
10
/
10
/
10 10
/ 11
Date
FIGURE 20.4
Grinder 2 diameter scrap by day.
New scrap
reaction
12.00 technique
10.00
8.00
% scrap
6.00
4.00
2.00
0.00
3 3 3 3 3 3 3 3 3 3 3 3
200 200 200 2 00
00 200 2 00 2 00 200 200 2 00 2 00 2
1/ 2/ 3/ 4/ 1 / 5/ 6/ 7/ 8/ 9/ 0/ 1/
/2 0/2 0/2 0/2 0/2 0/2 0/2 0/2 0/2 0/3 0/3 11/
10 1 1 1 1 1 1 1 1 1 1
Date
FIGURE 20.5
Grinder 1 percent of scrap for diameter.
The Case of the Defective Pinions • 143
dramatic than grinder 2, because the initial scrap levels were much higher.
The average scrap levels on this grinder had been reduced from nearly 6%
to less than 1% and appear to have stabilized as well. The level of scrap on
grinder 1 since implementing the new scrap reaction technique is even
better than Figure 20.5 represents. On October 31, the day shift shop
manager, in an effort to improve throughput, changed the cycle time, and
immediately twenty scraps resulted. Without this change, the percent of
scrap on grinders 1 and 2 would have been nearly identical.
As of October 31, 2003, the level of scrap for diameters had been reduced
on grinder 1 from 3.1 percent to 0.73 percent and on grinder 2 from 5.1
percent to 0.82 percent with zero euros spent. In terms of numbers of
pinions per day saved with these actions, the combined total number of
acceptable pieces had increased, on average, by over 500 pinions per day!
At the same time the scrap and rework problem existed, another prob-
lem on these same two grinders was unplanned downtime that had been
out-of-control for some time. In addition to the scrap information data,
collection for downtime by cause was also implemented on these two
machines. We were able to demonstrate that over seventy percent of the
downtime had been caused by the same diameter problems. Over a period
of three weeks, grinder 2 had experienced downtime for diameter prob-
lems, totaling approximately 29 hours while on grinder 1 the total was
approximately 22 hours. Since the average cycle times on grinders 2 and 1
were 33 and 30 seconds respectively, unplanned downtime for this cause
alone had resulted in the loss of approximately 3163 pinions on grinder 2
in 20 days (158/day) and 2640 pinions on grinder 1 in 26 days (103/day).
If the downtime for diameter adjustments on these two grinders had
only been reduced by seventy percent, then the potential increase in daily
output is approximately 183 pinions per day. Therefore, by improving the
two grinders’ ability to produce pinions within the specifications, the total
potential throughput increase from scrap and downtime reduction was
approximately 260 pinions per day, or nearly 1600 pinions per week, and
this team had other improvement actions yet to be implemented. If any-
one ever has reservations about whether a correlation between quality and
production throughput exists, this should dispel the doubt.
I followed up on this supplier several months later and learned that the
team had continued their improvement efforts and the results continued
to demonstrate improved quality and reduced unplanned downtime. They
had met with the supplier of the measurement device and were able to
move it to the smaller diameter on the pinion and achieved even more
144 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
The most complicated problems will arise at the most remote locations.
Joe Cooch
145
146 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
FIGURE 21.1
Problem solving roadmap.
The Case of the Cracking Rails • 147
10
9
8
# of units cracked
7
6
5
4
Last Crack
3 Mfg’d Nov
2 2001
1
0
A99
A00
A01
A02
A03
A04
A90
A91
A92
A93
A94
A95
A96
A97
A98
O99
O00
O01
O02
O03
O04
O90
O91
O92
O93
O94
O95
O96
O97
O98
Month/year
FIGURE 21.2
Plant B angle rail cracking by month/year.
148 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
8
7
# of units cracked
6
5
4
Last Crack
3 Mfd’d 2/03
2
1
0
J01
S03
J04
J96
J97
J98
J99
J02
J03
J05
J00
S97
S02
S96
S98
S99
S00
S01
S04
M99
M00
M03
M96
M97
M98
M01
M02
M04
Month/year
FIGURE 21.3
Plant A mounting angle rail cracks by month/year.
From these two charts, it appeared as though the cracking problem had
stopped in plant B, but not yet in plant A. The team continued its investi-
gation and determined that all of the most recent plant A failures had only
been detected on mild steel rails and none on stainless steel rails. This
was an enormously important discovery, because this information would
allow the team to focus its problem-solving efforts on mild steel rails, only
in this plant. That is, if the team was going to unravel this problem on the
mild steel tankers in plant A, then it needed to uncover what the changes
were in plant B to correct the same problem on mild steel tanker wagons.
The team would also need to search for and locate the design and process
distinctions between plant A and plant B that were related to mild steel
rails. As a final point, it would be of the essence that both plants work to
collectively determine what change or changes had been made to correct
the cracking problem on stainless steel tankers at both locations.
As a point of distinction, remember in Chapter 2 when I described a
change-related problem, and I told you that if you are able to determine what
had changed that you could certainly determine the root cause? Having said
this, remember that the inverse is true as well. That is, if there was a change
made that corrected the problem, then you can use the same tools and tech-
niques to determine when and what change had been made. This is in reality
just as significant as finding the root cause of a problem. If this team was not
able to segregate and pinpoint the change that had “fixed” the problem, then
the probability that the problem would return would be almost a certainty.
So, remember this as we go forward with this case study.
The next action for the team was to observe all of the available photos,
warranty claims, and other pertinent information so that a profile of the
The Case of the Cracking Rails • 149
FIGURE 21.4
Cracking example.
problem could be developed (i.e., what the problem was, where it was hap-
pening, and when it was happening). In doing so, the team determined
that all of the cracks (i.e., the what of the problem) had occurred on the
front of the mounting rail along the weld seam (i.e., the where of the prob-
lem). Figure 21.4 is a representative photo of the cracking that the team
had observed. The most common crack had occurred at the mounting rail
weld and continued to propagate along various lengths of the weld. The
team now had the information it needed to develop its problem statement
as follows:
The existence of DFCs told the team that there were significant differ-
ences or distinctions between where or when you have the problem com-
pared to where or when you don’t. By finding the distinctions the team
could determine the root cause, so it was imperative that the team deter-
mine all of the distinctions that existed.
The Case of the Cracking Rails • 151
Although these differences existed between the two plants, the team was
now challenged to determine if, in fact, these differences could explain
the performance differences between the two facilities on mild steel rails.
Undercuts
Action items present Welding point design
Weld rollover Conducive to high
present point streese
FIGURE 21.5
Causal chain for mounting rail cracking.
The team was challenged to attempt to understand why what we know and
believe to be true are in fact true, and then to develop theories as to why the
cracking is occurring. The team brainstormed and concluded the following:
21.10 HYPOTHESIZE/TEST FOR
POTENTIAL ROOT CAUSES
The team brainstormed and developed theories for the cracking problem
and, where appropriate, developed additional tests to either validate or
invalidate the theories. For each of the things the team knew to be true
The Case of the Cracking Rails • 155
All three of these findings point to the original design not being robust
enough. In addition to the preceding findings, the team also determined
that weld undercuts, rollover welds, and off-centering of the undercarriage
create areas of high stress concentration, which when combined with the
original design probably played a role in the cracking problem.
Bernard Baruch
22.1 BACKGROUND INFORMATION
In this chapter we will discuss a case study that demonstrates how the use
of a structured problem-solving process not only solved two chronic qual-
ity problems but also significantly impacted throughput within the same
department. It is important to understand that oftentimes as we solve a
problem, there are positive side benefits that we may or may not have been
anticipating.
The company we will be studying is a manufacturer of stainless steel
pressurized vessels used to hold a variety of liquids and gases. Sheets of
stainless steel are rolled into tubes that are then welded together before
structural rings are mounted and welded to the exterior of these large
stainless steel tubes to provide the needed strength to counteract the
applied pressures of the liquids and gases that are held and transported
inside the tanks.
Two common problems encountered when welding products of this
nature are pinholes and weld spatter. Weld spatter occurs when the weld-
ing arc virtually explodes and coats the surrounding area with bits of
welding wire that must be ground off for aesthetic purposes. Pinholes are
157
158 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
FIGURE 22.1
Problem Solving Roadmap.
problem exactly the same. In order to do this, the following questions had
to be answered:
1. What? What is the specific object with the performance problem, and
what specifically on the object is considered to be the fault, defect, or
performance problem? In this case study, the specific object was the
stainless steel tanks, and the faults were pinholes and weld spatter,
especially in the area of the external structural rings.
2. Where? Where is the object with the problem and where physically
on the object is the defect or fault located? In this case study, the
physical location was the company’s tank finishing area, and the
location on the tank was in close proximity to the external structural
support rings.
3. When? When is the performance problem observed? That is, when,
from a time perspective, and when in the life cycle of the object is
the defect or fault seen? In our case study, the faults or defects were
observed in the finishing area after the welding of the external sup-
port rings during inspection. It is important to note that since no his-
torical data was available, the team could not be certain how long the
problems with weld spatter and pinholes had actually existed. This
fact is important, because the team could not associate this problem
with a change and would, therefore, be considered as a launch-type
problem, or a problem that had always existed to some degree.
4. Scope? What is the scope of the performance problem? That is, how
many objects have the defect, and how much of the object is con-
sumed with the defect? In our case study, 100% of the tanks pro-
duced had either weld spatter or pinholes or both primarily close to
the structural rings.
5. Trend? What is the current rate of the performance problem and is
the problem spreading to the remaining parts of the object? Is the
performance problem increasing, decreasing, or remaining con-
stant? The team evaluated the data it had collected and concluded
that the trend was somewhat constant.
format using the current time cards was initiated to capture rework infor-
mation by defect type and by shift, but it did not provide specific informa-
tion on things like which operator might have caused the defect or where
the defect was located, so the team elected to simply review each tank and
manually assemble the information on location and welding operator.
Figure 22.2 is a Pareto chart of the data that this team assembled, and as
you can easily discern, grind and chip spatter, and weld pinholes were the
top two problems in the tank finishing (also referred to as the barrel area).
After reviewing Figure 22.2, the team had no trouble identifying and
deciding upon which problems to attempt to solve. The data clearly told
the team to attack both pinholes and weld spatter.
The team then collected weld spatter and pinhole rework hours for fif-
teen consecutive days, then plotted both on a run chart (see Figure 22.3).
For the fifteen-day period, the combined rework hours for pinholes and
weld spatter averaged approximately eleven hours per day for this time
period. This run chart would serve as a baseline for improvement for this
team and would be used as the team’s success metric to measure progress
against these two defects.
It is important to mention here that any time data collection is involved,
we must be certain that the inspection system used to gather the data is
calibrated appropriately, so that all of the inspectors amassing the data
25
20.25
20
Rework hours
14.5
15
10.05
10 8.5 7.8 7.05 6.5 6 5.5 5
5 4 3.5 2.8 2.3 2
0
ape
m
in k l e s
s
ide
p
ld p ork
s
s
e
t
le
es
ter
tes
on
eam
ing
m
rne ewor
sum
uck
inu
pip
o
pat
nh
fra
nt
i
inh
r
ect
die
Re
gs
lum
n
rk
ma
,
do
s in
ps
ner
rk
el s
wo
nd
on
and QC
wo
chi
of A
rk
ole
siff
s, l
sa
arr
Re
les
We
wo
Re
and
bu
inh
ole
rel
nd
fb
ho
ing
Re
bar
ld p
a
o
inh
pin
ind
pp
gs
end
ld p
We
rin
ds ,
Pre
hes
Gr
ind
ind
hea
We
ind
atc
Gr
Gr
scr
Gr
rk
wo
ind
Re
Gr
Causes
FIGURE 22.2
Pareto chart of rework by cause in barrel area.
162 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Pinholes Spatter
12
10
Hours of rework
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Day #
FIGURE 22.3
Daily weld spatter and pinholes rework hours.
are doing so in the same manner. This is especially important when the
inspection entails a visual judgment against a predetermined standard.
The team discussed this need to provide assurance that all inspectors
would detect and classify the two defects in the same manner. Since no
standard existed, the team had to develop one. To this end, a series of pho-
tos of the two defects that would serve as standards was developed. The
photos were reviewed and classified independently by each inspector, and
then the results were evaluated to determine the accuracy and precision of
the inspection. Along with the photos, the team developed a visual rating
system based on a scale of 1 to 10 to be used to determine if improvements
had been made. The team implemented the new inspection criteria, which
actually worked quite well.
Based upon the information collected and team discussions, the team
agreed on the following problem statement:
Pinholes and weld spatter are occurring on 100% of the stainless steel
tanks, at a rate of approximately eleven hours per day on stainless steel
tanks creating significant amounts of rework. Based upon a recent study,
as much as 100 hours per week of rework time has been observed for these
two problems, and the trend is constant.
Now that all members of the team understand the problem exactly the
same, they can begin the process that will ultimately lead them to the root
cause and corrective action. At the risk of being redundant, this step must
be completed before the team can move forward in the problem-solving
process.
The Case of the Weld Spatter • 163
22.4 INVESTIGATE, ORGANIZE,
AND ANALYZE THE DATA
The next step for the team to accomplish was to investigate the problem, in
much the same manner as a police force would investigate a crime. There
are clues and bits of evidence at the scene of the crime that, when assem-
bled in a logical and structured fashion, will lead the team of investigators
to the root cause of these problems. So, the team investigated, organized,
and then analyzed all of the pertinent and relevant information.
1. There was a noticeable and inconsistent gap between the tank and
the structural rings prior to welding the rings in place. Where the
gap existed, there was an excessive amount of weld spatter present.
2. Not all welders were operating at the same speed setting.
3. The tanks were located on variable speed, floor-mounted rollers that
turned the tanks as the welds were made. The turning speeds were
not the same for all tanks.
4. As the tanks turned on the rollers, oftentimes there would be a jerk-
ing or slipping action that tended to speed up and slow down the
tanks during the welding of the rings.
5. The antispatter material used was not applied consistently by all the
weld operators.
6. Not all welders were using the same welding wire type.
7. The welding angle used by the welders was inconsistent from opera-
tor to operator.
8. The legs on the rings did not appear to be symmetrical from side to
side.
164 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
The team had done a wonderful job of reviewing the process and gener-
ating a list of symptoms, and I was certain that the list of symptoms would
be valuable information as the team attempted to solve this problem.
team weighed heavily on the list of symptoms and concluded that the gap
between the tank and the structural rings was a clear distinction. The team
noted that when there was a gap present, the weld operator was required
to fill in the gap with weld wire. No other noticeable distinctions were
found.
FIGURE 22.4
Causal chain for weld spatter.
eliminated was the gas mix being wrong at run-out. The others were all
seen as process deficiencies that required actions to correct them. For
example, it was clear that the company needed a preventive maintenance
program, so the team created a simple, operator-based preventive mainte-
nance program to avoid some of these problems in the future. In addition,
it was clear that welding training was a problem, so the team contacted
the manufacturer of the welding machines, who graciously agreed to hold
training classes free of charge.
The Case of the Weld Spatter • 167
FIGURE 22.5
The Case of the Weld Spatter • 169
Control chart.
170 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
FIGURE 22.6
Weld spatter and pinholes average daily rework by month.
was simply overwhelming to the leadership of this plant. Once again, this
team of very ordinary employees had achieved very extraordinary things
by simply following a structured and systematic approach as outlined in
the Problem Solving Roadmap. The problem-solving team and the man-
agement team celebrated!
23
A Case Study in Problem Prevention
Brian Adams
23.1 CASE BACKGROUND
One of the responsibilities and obligations of leadership in any organiza-
tion is to look into the future, see it, understand it, and then make plans to
protect it. The leadership of an organization has the authority and influ-
ence to walk into the future, find potential areas of risk and vulnerability,
and then do something about them now in the present to either prevent
the problems from occurring, or mitigate and neutralize their effects
if they were to occur. In this case study, we will look at a company that
hadn’t spent much time examining and analyzing the future, so this was
new for it. But even though the company had not peered into the future
before, it was able to learn simple techniques to do so, and what it found
and ultimately accomplished was nothing short of remarkable.
Our case study involves a company that makes plastic bottles used to
hold a variety of liquids, including spring water and soft drinks. Its cur-
rent process produced bottles at rates approaching 200 bottles per minute.
This company’s bottle sales were sluggish and somewhat stagnant, and it
was searching for new markets to penetrate. Sales hadn’t met targets in the
past two years, so after much internal debate, the company decided to use
the services of someone outside the company who was unbiased.
171
172 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
second, except for the maintenance plan, their plans lacked continu-
ity, with respect to what they believed was the most urgent and pressing
problem facing their company, sales. At least the maintenance executive,
had anticipated a sales increase, and developed a plan to produce more
products.
In light of my observations in that first meeting, the next day I reassem-
bled the staff and presented the Problem Prevention Roadmap (Figure 23.1)
as a way of looking into the future, anticipating any areas that might be
at risk, prioritizing them, and then developing a preventive measures plan.
The series of meetings that followed with the CEO and his staff proved to
be a very valuable exercise for this team.
1.0 Define and record high risk areas 11.0 Select most probable causes
A. Which part of our business or which Select the most probable cause 12.0 Identify preventive actions
processes have the potential to cause of the potential problem through
problems and why? Brainstorm to identify specific
testing, logic, data, and experience. actions that could prevent the
B. What areas of our business or plant could
appearance of the failure mode
cause us problems and why?
and its effects.
C. Prioritize the high risk areas and select
the area of highest risk. 10.0 Define most obvious root
causes 13.0 Identify detective controls
II. Define problems, failure modes, Develop causal chains for the Analyze and determine the
and effects remaining potential root causes. effectiveness of existing controls.
If additional detective controls
2.0 Define potential problems, failure are needed, list additional controls.
modes, and effects
A. What is the potential problem? (Describe 9.0 Eliminate obvious noncauses 14.0 Identify actions to reduce
specifics)
B. Who might have the potential problems? (All
Using experience, logic, data, and severity
operators?) common sense, eliminate potential Brainstorm to identify specific actions
C. What are the negative effects of the potential causes that are obvious from the that could reduce the impact or
problem? cause-and-effect diagram. severity of the failure mode and its
D. Where could the problem occur? (Specific effects if it occurs.
location?)
E. When could the problem occur? (Any time?) 8.0 Brainstorm possible causes
F. What would the scope of the problem be if it 15.0 Estimate (O), (S), and (D)
occurs? For the highest TRF value, and recalculate TRF
G. Develop a potential problem statement that brainstorm, and list possible Based upon the actions identified in 12.0,
incorporates who, what, where, when, and causes using a cause-and-effect 13.0, and 14.0 estimate the new values for
scope for each potential problem.
diagram. (O), (S), and (D) and calculate the new
III. Identify highest total risk TRF. If TRF is acceptable continue to 15.0.
If TRF is not acceptable, return to 11.0.
problem IV. Determine most probable
3.0 Estimate probability of cause
occurence (O) VI. Implement preventive
For each potential failure mode, 7.0 Prioritize and select the measures plan
estimate the probability of highest TRF
16.0 Finalize and implement
occurrence. Prioritize the total risk factors
prevention plan
by sorting them from highest to
4.0 Estimate the severity (S) lowest values and select the As soon as the TRF value is within
problem with the highest TRF the acceptable range, finalize and
If the problem were to occur, implement the preventive measures
value.
estimate the impact or plan.
severity of the problem on
the organization. 17.0 Audit effectiveness of prevention
plan
6.0 Calculate the total risk factor After the Preventive Measures Plan is
5.0 Estimate the probability of (TRF) implemented, audit the effectiveness
detection (D) Multiply the probability of occurrence
If the potential problem were to of the plan. If plan is effective, then return
(O) times the estimate of severity (S) to 7.0 and select the next highest TRF. If
occur, estimate the probability times the probability of detection (D)
that you could detect it. plan is not effective, return to 12.0
for each potential problem.
FIGURE 23.1
Problem Prevention Roadmap.
174 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Consequences
FIGURE 23.2
Chances of occurring versus consequences matrix.
176 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
The team then evaluated each of these final three concerns and decided
by simply using logic and their experience that the concern over resin
supply would be the issue of greatest impact on the organization. The
team reasoned that if resin could not be supplied in adequate volumes,
then the concern over the installation of the new high-speed bottling
machine would become a moot point. That is, there would be no need to
even install the new machine. Likewise, a shortage of resin or a spike in
the price of oil would not only affect their own company’s ability to pro-
duce and ship bottles, but their competitors would be equally affected. As
it turned out, this was an excellent decision, because eight months later,
the price of crude oil did spike dramatically, and shortages of resin were
a common occurrence.
The team was now ready to formulate its potential problem statement
using all of the answers to the questions it had just answered and it did so
as follows: “At any time in the future (but especially if OPEC announces a
cutback of oil supply), there could be a shortage of oil-based resin to pro-
duce enough bottles, which could negatively affect 80% of our customers.”
In this case study, the identification of potential failure modes was not
a difficult assignment. The team brainstormed again and created a list of
potential failure modes that could lead to the potential problem of reduced
oil-based resin. The list was as follows:
Now that the team had identified the failure modes that could lead to
a shortage of oil-based resin, it was time to evaluate each for probability
of occurrence, severity, and the company’s ability to detect the problem
either prior to it happening or immediately after.
A Case Study in Problem Prevention • 179
they were rated low because their impact would not be felt immediately.
Table 23.3 includes the final ranking for each of the eight failure modes.
TRF = O × S × D
The team did so and entered the results into the appropriate box in
Table 23.5.
TABLE 23.3
Matrix of Occurrence, Severity, and Detection with Occurrence and Severity
Total Risk
Occurrence Severity Detection Factor Action New New New New
Failure Mode (O) (S) (D) (TRF) Item (O) (S) (D) TRF
OPEC announces cutback 7 7
Catastrophic accident 5 9
Federal regulation change 6 8
Terrorist strike 5 7
Labor strike at supplier 4 9
China’s and India’s economies 9 5
Middle East war 7 5
Catastrophic weather event 5 9
184 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
TABLE 23.4
Matrix of Occurrence, Severity, and Detection with Occurrence, Severity, and Detection
Total Risk
Occurrence Severity Detection Factor Action New New New New
Failure Mode (O) (S) (D) (TRF) Item (O) (S) (D) TRF
OPEC announces cutback 7 7 5
Catastrophic accident 5 9 9
Federal regulation change 6 8 5
Terrorist strike 5 7 9
Labor strike at supplier 4 9 7
China’s and India’s economies 9 5 3
Middle East war 7 5 4
Catastrophic weather event 5 9 9
A Case Study in Problem Prevention • 185
TABLE 23.5
Matrix of Occurrence, Severity, and Detection with TRF Calculated
Total Risk
Occurrence Severity Detection Factor Action New New New New
Failure Mode (O) (S) (D) (TRF) Item (O) (S) (D) TRF
OPEC announces cutback 7 7 5 245
Catastrophic accident 8 9 9 648
Federal regulation change 6 8 5 240
Terrorist strike 5 7 9 315
Labor strike at supplier 4 9 7 252
China’s and India’s economies 9 5 3 135
Middle East war 7 5 4 140
Catastrophic weather event 5 9 9 405
186 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
A Case Study in Problem Prevention • 187
Why?
Accident rate at resin
supplier
Why? very high
Safety initiative
Why?
not active
Why? Safety director
not present
Safety director Why?
Why?
resigned
Safety at supplier
Catastrophic accident
not taken seriously
occurs
Why?
Why? Equipment at supplier
deteriorating
Preventive
Why? maintenance
not performed
Preventive
maintenance plan
Why?
not in place
Reactive
Why? maintenance
is the method used
Shut down for
Why? PM’s
not done
Production
mentality
present
FIGURE 23.3
Causal chain of catastrophic accident occurring
interim, resin shortages would occur. With the most probable causes in
place, the team was now ready to develop actions that could either prevent
or mitigate the negative effects of the problem. After doing so, the team
will reestimate O, S, and D, and then recalculate the total risk factor to
estimate the effect of its actions on the potential problem.
In any moment of decision, the best thing you can do is the right thing, the
next best thing is the wrong thing, and the worst thing you can do is nothing.
Theodore Roosevelt
24.1 MAKING CHOICES
Every day in life, one thing that is almost certain is that we will be called
on to make a choice between different alternatives. We start each weekday
deciding whether we will get up and go to work, what to wear, what to eat
for breakfast, and when to leave for work. Fortunately for us, these deci-
sions are simple and instinctive decisions and, as such, are undemanding
and uncomplicated. As we move through the day at our jobs, the deci-
sions and choices we make become more complex and difficult. Each of
our decisions involve deciding what it is that we need and want, imagining
alternatives that will supply our needs and wants, evaluating any risks and
consequences that might arise, and then making a choice. Soon after we
choose, we begin wondering if our choice of alternatives was correct. We
worry until the results of our decision come to light, and when they do,
we usually find out right away if we have made the right decision or the
wrong decision.
For many, decisions are stressful, but do they really have to be? The
answer is no they do not. Making difficult decisions can actually be
accomplished with little or no worry if we simply follow a structured and
systematic process. I’m sure by now you’ve noticed that in both problem
prevention and problem solving, there were decisions that needed to be
193
194 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
V. Calculate the Decision Factors 10.0 Calculate the Total Option Score
II. Define the Criteria for the Decision
A. Calculate the Total Option Score by
2.0 Define the “Mandatory and Optional” multiplying the Satisfaction Index (SI)
6.0 List the Potential Negative Consequences
times the Option Risk Index (ORI).
Criteria
A. For each option, list the potential negative B. Rank order the Total Option Scores from
A. List the “Mandatory criteria” for your decision. consequences. highest to lowest.
B. List the “Optional criteria” for your decision. B. On a scale of 10 to 1, with 10 being the least
likely to occur, estimate the probability of
occurrence for each negative consequence.
3.0 Rate the Optional Criteria C. On a scale of 10 to 1, with 10 being the least
A. On a scale of 1 to 10, score the Optional criteria severe and 10 being the most severe, estimate VI. Make and Implement Your Decision
based upon their relative order of importance with the potential severity of each consequence if
10 being the most important and 1 being the least it were to occur.
important. 11.0 Select the Best Option Score
B. Rank order the Optional criteria from highest to A. From the prioritized ranking of Total
IV. Assess the Risks of Each Option
lowest. Option Scores listed in 10.0 B, select
the highest value option.
III. Develop List of Potential Options 5.0 Evaluate Each Option
A. Compare each option to each Optional
4.0 Create a List of Options criteria. 12.0 Implement the Best Option
B. On a scale of 1 to 10, rate each option as
A. Based upon the Mandatory and Optional A. Develop a Best Option Implementation Plan
to the degree it satisfies each of the wants,
criteria, develop and list potential options. with 10 being the option that best that ensures complete and successful
satisfies the Optional criteria and 1 being implementation of the best option.
B. If an option does not satisfy the Mandatory
the option that least satisfies the Optional B. Implement the Best Option Implementation
criteria, then either eliminate it or make it a criteria. Plan.
Optional criteria. C. Rank order each option from highest to
lowest value.
FIGURE 24.1
Decision Making Roadmap.
Decisions, Decisions, Decisions • 195
196 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
guarantee that our final decision will be successful, and then there is
everything else that falls into the optional criteria category. When we
consider options later in this process that are intended to deliver our
statement of purpose, an option either satisfies the mandatory criteria
or it does not. If the option doesn’t meet all of the mandatory criteria,
then we must either eliminate it or transfer it to our list of optional cri-
teria. All mandatory criteria must be measurable and clearly defined,
with no ambiguity.
Optional criteria, on the other hand, are the things that are not
mandatory, but they would be nice to have if we could get them. For
example, if you were buying a car, one of your mandatory criteria might
be that the price must be under $30,000, and any car that is over this
price would not be considered and automatically be rejected. On the
other hand, if you wanted a red car but would buy a car that was a
different color, then the color red would be an optional criterion. The
options that we are considering will be judged on their relative perfor-
mance against the optional criteria. Remember, we are simply compar-
ing options against each other, and optional criteria help us draw this
comparison.
SI = 9 × 6 = 54
Pretty straightforward, isn’t it? Once you have completed the math for all
options and all optional criteria, arrange them in numerical order from
highest to lowest. The results you get may surprise you.
200 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
or
ORI = O × S
Total Option Score (TOS) = Satisfaction Index (SI) × Option Risk Index (ORI)
or
TOS = SI × ORI
Decisions, Decisions, Decisions • 201
Suppose we had calculated the SI to be 54 and the ORI to be 72, then the
TOS would be
TOS = 54 × 72 = 3888
Calculate the Total Option Score for each of the options, and then rank
order them from highest to lowest.
A real decision is measured by the fact that you’ve taken a new action. If
there’s no action, you haven’t truly decided.
Anthony Robbins
25.1 CASE BACKGROUND
You will recall the high-speed bottling company in Chapter 23 that con-
ducted a problem prevention exercise involving a potential shortage of oil-
based resin. The company had identified several potential problems with
its resin supplier and had put plans in place to reduce the likelihood of
a shortage. The plans were primarily concerned with the resin supplier
implementing effective safety and preventive maintenance programs. The
bottling company had informed the supplier that if it didn’t comply with
both of these initiatives that it would source a new supplier. Things worked
well for a period of time, but it wasn’t long before shortages started occur-
ring. The leadership team had decided to go ahead with its plan to source
and certify a new resin supplier.
The bottling company called me in to help with the selection, but I
explained that the only thing I could do for them was help make their
decision as to which supplier would be the best option. The CEO agreed
and called his team together for a meeting. I asked them what they had
done so far in their search for this new resin supplier, and, not surpris-
ingly, all they had done was bicker over who the supplier should be. There
had been no structured attempt to develop a statement of purpose or even
define any criteria for the selection of the new resin supplier.
203
204 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
V. Calculate the Decision Factors 10.0 Calculate the Total Option Score
II. Define the Criteria for the Decision
A. Calculate the Total Option Score by
2.0 Define the “Mandatory and Optional” multiplying the Satisfaction Index (SI)
6.0 List the Potential Negative Consequences times the Option Risk Index (ORI).
Criteria
A. For each option, list the potential negative B. Rank order the Total Option Scores from
A. List the “Mandatory criteria” for your decision. consequences. highest to lowest.
B. List the “Optional criteria” for your decision. B. On a scale of 10 to 1, with 10 being the least
likely to occur, estimate the probability of
occurrence for each negative consequence
3.0 Rate the Optional Criteria C. On a scale of 10 to 1, with 10 being the least
A. On a scale of 1 to 10, score the Optional criteria severe and 10 being the most severe, estimate VI. Make and Implement Your Decision
based upon their relative order of importance with the potential severity of each consequence if
it were to occur.
10 being the most important and 1 being the least
important. 11.0 Select the Best Option Score
B. Rank order the Optional criteria from highest to A. From the prioritized ranking of Total
lowest.
IV. Assess the Risks of Each Option
Option Scores listed in 10.0 B, select
the highest value option.
III. Develop List of Potential Options 5.0 Evaluate Each Option
A. Compare each option to each Optional
4.0 Create a List of Options criteria. 12.0 Implement the Best Option
B. On a scale of 1 to 10, rate each option as
A. Based upon the Mandatory and Optional A. Develop a Best Option Implementation Plan
to the degree it satisfies each of the wants, that ensures complete and successful
criteria, develop and list potential options. with 10 being the option that best satisfies implementation of the best option.
B. If an option does not satisfy the Mandatory the Optional criteria and 1 being the option B. Implement the Best Option Implementation
criteria, then either eliminate it or make it a that least satisfies the Optional criteria. Plan.
Optional criteria.
C. Rank order each option from highest to
lowest value.
FIGURE 25.1
Decision Making Roadmap.
A Case Study in Decision Making • 205
206 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
Once again, the team had compiled a very good list of optional criteria and
was now ready for the next step in the process.
TABLE 25.1
Optional Criteria
Optional Criteria Rating
1. Within 500 miles of our production facility 10
2. Active R&D program with a history of new product introductions 8
3. Able to deliver resin by rail 7
4. Demonstrated Lean manufacturing principles 5
5. ISO 9000 certified 2
6. Six Sigma program in place 1
TABLE 25.2
Optional Criteria Matrix With Scores
Option
Company Company Company Company
Optional Criteria A Score B Score C Score D Score
Within 500 miles of our 9 6 5 7
production facility
Active R&D program with a 3 6 9 9
history of new product
introductions
Able to deliver resin by rail 1 8 7 9
Demonstrated Lean 9 8 8 9
manufacturing principles
ISO 9000 certified 10 10 10 10
Six Sigma program in place 1 10 5 10
TABLE 25.3
Matrix of Potential Negative Consequences
Company A Company B Company C Company D
Potential Negative
Consequence O S O S O S O S
Rumor of a potential strike 5 3
by the union could cause a
long-term interruption of
resin.
Historical PPM quality levels 6 3
were greater than X that
could return.
Rumor that the company will 4 4
be sold to competitor, which
could negate any long-term
contracts.
Safety record in past was not 6 6
acceptable, which could
interrupt our supply of
resin.
CEO announced his 1 2
retirement within a year and
new CEO could change the
direction of the company.
Quality manager resigned 1 3
last month, which could
negatively impact quality of
incoming resin.
Three years ago, company 4 7
had an explosion that shut
down its resin reactor. If it
were to happen again, resin
supply could be in jeopardy.
TABLE 25.6
Total Option Score
Satisfaction Option Risk Total Option
Option Index Index Score
Company A 188 17 3196
Company B 244 46 11224
Company C 248 19 4712
Company D 290 36 10440
The team was also more impressed with how quickly company D
responded to all of its requests. Intuition, although not data-based, is cer-
tainly an important part of any decision and, as a team, the team felt good
about its final decision. Either option would have been acceptable, so I
supported the team’s decision.
Jon Madonna
215
216 • The Problem-Solving, Problem-Prevention, and Decision-Making Guide
As you continue using the roadmaps, you will soon learn that you will
no longer need to follow each of the individual steps in a logical and regi-
mented way. They will become instinctive and automatic to you as you
work through existing problems, prevent future problems, or make deci-
sions and choices about different options you have.
No
Is there a
problem to be
solved?
Yes
Yes
Does the
Use the problem problem
solving roadmap already exist?
No
Yes Is there a
Use the problem problem to be
prevention roadmap prevented?
No
No
Is there a
decision
involved?
Yes
Yes
Has the
decision been
made?
No
FIGURE 26.1
Needs Assessment Roadmap.
26.3 CONCLUSION
We are all confronted with decisions and problems every day, but most
certainly in our work life, knowing what to do can sometimes be filled
with pressure and uncertainty. It is my hope that the roadmaps that I have
developed will help guide you through the sometimes overpowering and
overwhelming inventory of problems and decisions that we all face on a
routine basis. The common thread that binds all of these roadmaps is the
essential yet fundamental need to follow a structured and systematic pro-
cess. I hope I have helped to provide a coherent and effective pathway. I
wish you good luck, but as I told you in the Preface, my definition of luck
is laboring under correct knowledge. You make your own luck!
Appendix: Problem
Analysis Flowchart
219
Potential Process:
Changes Causal chain Causes Date:
1.
Problem statement
2. What?
220 • Appendix
3. Where?
4. When?
4. Scope?
3. Symptoms
1.
4. 2.
3.
5. 2.
Distinctions Tests/corrections When made Results Conclusions Relevant data
1. 1. 1.
2.
2. 3. 2.
4.
3. 5. 3.
6.
6. 7. 3.
Problem statement Changes Corrections/controls
Most probable cause(s) and comments
What? 1. Short term
1.
Where? 2. Corrections:
2.
When? 3.
Controls:
3.
Scope? 4.
4.
Trend? 5. Long term
5.
Statement: 6. Corrections:
1. 4.
Symptoms Defect free config’s Controls
1. 1.
2. 2.
3. 3.
4. 4.
2. 5. 9.
Relevant data Distinctions Tests/corrections When made Results Conclusions
1. 1. 1.
2. 2.
2. 3. 3.
4. 4.
3. 5.
6.
3. 6. 10.
Appendix • 221
http://taylorandfrancis.com
References
1. Charles Kepner and Benjamin Tregoe, The Rational Manager: A Systematic Approach to
Problem Solving and Decision-Making, McGraw-Hill, 2006.
2. Bob Sproull, Process Problem Solving: A Guide for Maintenance and Operations Teams,
Productivity Press, 2001.
3. Jeffrey Liker, The Toyota Way: 14 Management Principles from the World’s Greatest
Manufacturer, McGraw-Hill, 2004.
4. Roger Bohn, “Stop Fighting Fires,” Harvard Business Review, July–August 2000.
223
http://taylorandfrancis.com
Index
225
226 • Index
Crisis, 12, 13, 83, 84, 117; see also Problem satisfaction index, 210
solvers; specific problem solving total option score, 213
and failure, 85–86 Dedication, perseverance, and
4 C’s of problem solving, 12–13 commitment, problem solvers, 3
Curious, problem solvers, 3 Defect-free configurations (DFCs), 42–45,
Customer-supplied specifications, 122 66–67, 126
definition, 42–43
operator correlation, 43
D
Pareto Chart, 43, 44f, 45f
Data collection system, 134–135 potential causes elimination, 43
Data investigation, organization, and record, 150, 164
analysis, 147–149, 163 time and, 46
Decision criteria, 196–197 type, 54
mandatory, 204–206 Defective input material, 60
optional, 204–207 Defective measurement tools, 60
potential options that satisfy, 198 Defective pinions case (problem-solving
Decision making process)
decision criteria, 196–197 background information, 133–134
potential options that satisfy, 198 data collection system, 134–135
make your decision, 201 Pareto chart, 136f
making choices, 193–195 problem analysis flowchart (PAF), 140,
option 140f
assess the risks of each, 198–199 scrap summary and analysis, 135–144
evaluate and rate, 198 Defective replacement parts, 61
implement, 201 Definition of problem, 147
optional criteria, 197, 208f Desire to believe, 61
option risk index (ORI), 200 Detective controls, identify, 190–191
roadmap, 194, 195f, 205f Deviation or performance shift, 6
satisfaction index, 199, 200 DFC, see Defect-free configurations
statement of purpose (DFCs)
create, 196 Disregard positive deviations, 9
write, 196 Distinctions, 46–47, 148, 151, 164
total option score, 200–201 Documentation, 122
Decision making (bottling of changes, 9
company case) Document your success, 74–75
background information, 203–204
decision, criteria for, 204–206
E
optional, 206
factors, 209 Efforts, chronic problems, 11
and implementing, 213–214 Engineering backlog case (problem-
negative consequences, 209 solving process)
options background information, 122–123
evaluation, 207–208 history and analysis, 124–132, 125f,
implement, 214 129f, 130f
potential, 207 Pareto chart, 127f
risk assessment, 208–209 questions, 121
risk index, 210–211 Enthusiasm, problem solvers, 3–4
Index • 227