Why Ai Data Science Projects Fail Synthesis Lectures On Computation and Analytics 1636390382 9781636390383 - Compress
Why Ai Data Science Projects Fail Synthesis Lectures On Computation and Analytics 1636390382 9781636390383 - Compress
Why Ai Data Science Projects Fail Synthesis Lectures On Computation and Analytics 1636390382 9781636390383 - Compress
Projects Fail
How to Avoid Project Pitfalls
iii
Synthesis Lectures on
Computation and Analytics
This series focuses on advancing education and research at the interface of qualita-
tive analysis and quantitative sciences. Current challenges and new opportunities are
explored with an emphasis on the integration and application of mathematics and
engineering to create computational models for understanding and solving real-world
complex problems. Applied mathematical, statistical, and computational techniques
are utilized to understand the actions and interactions of computational and analyt-
ical sciences. Various perspectives on research problems in data science, engineering,
information science, operations research, and computational science, engineering, and
mathematics are presented. The techniques and perspectives are designed for all those
who need to improve or expand their use of analytics across a variety of disciplines
and applications.
Why AI/Data Science Projects Fail: How to Avoid Project Pitfalls
Joyce Weiner
iv
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any
other except for brief quotations in printed reviews, without the prior permission of the publisher.
DOI 10.2200/S01070ED1V01Y202012CAN001
M
&C MORGAN & CLAYPOOL PUBLISHERS
vi
ABSTRACT
Recent data shows that 87% of Artificial Intelligence/Big Data projects don’t make
it into production (VB Staff, 2019), meaning that most projects are never deployed.
This book addresses five common pitfalls that prevent projects from reaching deploy-
ment and provides tools and methods to avoid those pitfalls. Along the way, stories
from actual experience in building and deploying data science projects are shared to
illustrate the methods and tools. While the book is primarily for data science practi-
tioners, information for managers of data science practitioners is included in the Tips
for Managers sections.
KEYWORDS
data science, project management, AI projects, data science projects, project planning,
agile applied to data science, Lean Six Sigma
vii
Contents
Preface���������������������������������������������������������������������������������������������������� ix
4 Define Phase������������������������������������������������������������������������������������������ 19
4.1 Project Charter ��������������������������������������������������������������������������� 19
4.2 Supplier-Input-Process-Output-Customer (SIPOC) Analysis ��� 23
4.3 Tips for Managers ����������������������������������������������������������������������� 28
7 Model-Building Phase��������������������������������������������������������������������������� 41
7.1 Keep it Simple ��������������������������������������������������������������������������� 41
7.2 Repeatability ������������������������������������������������������������������������������ 42
7.3 Leverage Explainability �������������������������������������������������������������� 42
7.4 Tips for Managers ����������������������������������������������������������������������� 43
9 Deployment Phase��������������������������������������������������������������������������������� 53
9.1 Plan for Deployment from the Start ������������������������������������������ 53
9.2 Documentation �������������������������������������������������������������������������� 54
9.3 Maintenance ����������������������������������������������������������������������������� 55
9.4 Tips for Managers ����������������������������������������������������������������������� 56
References���������������������������������������������������������������������������������������������� 63
Author Biography���������������������������������������������������������������������������������� 65
ix
Figures
Figure 4.1: Example project charter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 4.2: Supplier-input-process-output-customer (SIPOC) analysis table.
The SIPOC is completed in three parts following the numbered
steps .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 4.3: Example SIPOC for engineering dispositioning material . . . . . . . 24
Figure 4.4: Example SIPOC with both Part 1, Process, and Part 2, Output
and Customer completed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 4.5: Example SIPOC with Part 3 Suppliers, Inputs started.. . . . . . . . . . 26
Figure 4.6: Example SIPOC with all parts completed.. . . . . . . . . . . . . . . . . . . . 27
Figure 8.1: Example presentation slide. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Tables
Table 1.1: Five project pitfalls�������������������������������������������������������������������������� 2
Table 1.2: Alignment between data science project phases and Lean six
sigma DMAIC framework ���������������������������������������������������������� 3
Table 3.1: Connection between the methods to avoid pitfalls and the five
project pitfalls������������������������������������������������������������������������������� 13
Table 3.2: Questions to ask at retrospectives��������������������������������������������������� 17
Table 4.1: Key components of a project charter ��������������������������������������������� 20
Table 5.1: Deliverables and metrics for various types of data science
projects����������������������������������������������������������������������������������������� 29
Table 5.2: Example calculation for time saved ����������������������������������������������� 31
Table 5.3: Types of waste with manufacturing and office examples ��������������� 32
Table 5.4: Common metrics and dollar conversion����������������������������������������� 33
Table 8.1: Data science project types and typical final deliverables����������������� 45
Table 8.2: Data visualization reading list ������������������������������������������������������� 48
xi
Preface
Who is this book for? To answer that question, I need to give a little background. My
degrees are in physics. I have a physics undergraduate degree and a Master’s degree
in optical science. In physics there are two disciplines: theoretical physics and experi-
mental physics. Similarly, I have observed that in data science, there are theorists who
focus on developing algorithms, and practitioners who use algorithms and apply data
science. This book is primarily for data science practitioners. I’ve also included infor-
mation for managers of data science practitioners in the Tips for Managers sections.
This book is organized into chapters. The first two chapters introduce common
project pitfalls and the methods to avoid them. The next chapters are based on project
phases for data science projects. In each chapter of the project phase chapters, I’ll
include tools you can use to help avoid the project pitfalls, highlighting which of the
five methods the tool supports. To get you started, the five methods are: (1) ask ques-
tions; (2) get alignment; (3) keep it simple; (4) leverage explainability; and (5) have
the conversation.
Throughout this book I use the term “data science projects” as an all-encom-
passing term that includes Artificial Intelligence (AI) and Big Data projects. AI itself
is an inclusive term. Machine learning and deep learning are both types of AI. AI is
defined by the Oxford English Dictionary as the theory and development of computer
systems able to perform tasks that normally require human intelligence, such as visual
perception, speech recognition, decision-making, and translation between languages
(Oxford Languages, 2020). That means it encompasses any method of computation
that mimics human intelligence, not only what we often think of—machine learning
and neural nets—but also expert systems and optimization algorithms. I’m using
“customer” to mean the person getting value from the project and the end user of the
project. I’m using “management” to mean your leadership and decision-making chain
within your organization.
1
CHAPTER 1
Over my career, I’ve worked on all sizes of data science projects. I’ve built small
reports, dashboards, and large predictive models. For any project there are risks or
pitfalls that can cause a project to fail that are in the control of the people working on
that project. The five pitfalls are:
1. the scope of the project is too big;
3. the model couldn’t be explained, hence there was lack of trust in the
solution,
4. the model was too complex and therefore difficult to maintain; and
An area of interest for me throughout my career has been to use data to drive
efficiency improvements. I began my career working as a process engineer in manu-
facturing. While working in manufacturing, I became interested in process improve-
ment and earned my Lean Six Sigma black belt.1 Lean Six Sigma uses the DMAIC
framework of Define, Measure, Analyze, Improve, and Control as a strategy for
improving processes. During a talk I attended on data science project phases, I real-
ized that the phases for data science projects (1. Define project objectives; 2. Acquire
and explore data; 3. Model data; 4. Interpret and communicate; and 5. Implement,
document, and maintain) lined up with the DMAIC framework (Table 1.2). This
made me think that some of the tools from Lean Six Sigma would be very helpful in
overcoming the five pitfalls.
1
If you are interested in learning more about Lean Six Sigma, an excellent overview is What is
Lean Six Sigma? (George, Rowlands, and Kastle, 2003)
1. INTRODUCTION AND BACKGROUND 3
Table 1.2: Alignment between data science project phases and Lean Six Sigma
DMAIC framework
Data Science Project Phases Lean Six Sigma DMAIC Framework
Define project objectives Define
Acquire and explore data Measure
Model data Analyze
Interpret and communicate Improve
Implement, document, and maintain Control
Getting back to the Transform 2019 panel session, the reasons they gave for
projects to fail were: need for leadership support, access to data, collaboration across
teams, long-term ownership of solutions, and keeping it simple. Of these reasons,
leadership support, access to data, and collaboration across teams are related to com-
munications, and having management alignment. Keeping it simple is just that—sim-
ple. Sometimes not easy, but simple. Long-term ownership of a solution is a matter
of planning—planning for deployment and building for maintainability. There is one
other potential problem that the panelists didn’t mention. That is lack of understand-
ing and therefore discomfort with a model. If you can’t explain why a model is predict-
ing a particular outcome, and having that explanation is important to management or
the customer for the model, then your project will fail.
Fundamentally, treat a data science project the way you would any other project.
A data science project is not a quick and dirty one-off skunk works kind of thing—if
you want it to be deployed. To get a project to production you need to plan it and do
up-front work. If you go slowly at the start of a project, and do the work on defining
the problem, getting alignment with the end customer for the project, then you will
only solve the problem once. By going slower up front you can end up going faster in
the end.
So, what exactly can you do to avoid project pitfalls? There are five methods
altogether.
1. Ask questions.
2. Get alignment.
3. Keep it simple.
4. Leverage explainability.
Asking questions enables communications and helps start the process of get-
ting management alignment. Asking questions up front ensures you are positioned
to start a project that will deliver results that the customer wants. Asking questions
fosters a collaborative atmosphere which will help if you need to get assistance from
other teams.
Explaining your intentions and documenting the project helps with alignment.
Having metrics supports accountability. You can use them to request help and re-
sources and get support from management. Aligning with management and getting
support from the start of the project will ensure that management is aware of your
project and can help if needed later.
Keeping it simple prevents a project from becoming so big that it can never be
finished. It also prevents problems with maintaining a project. If it is simple to explain,
and simple to execute, it can be simple to transfer to a different owner who can sustain
it long term. Starting with a simple problem allows you to build a solution and then
decide if adding on is needed or desired. This is the crawl, walk, run methodology.
Leveraging explainability is tied to both asking questions and getting align-
ment. If your management isn’t that comfortable with AI, they may prefer models
with very clear connections between the inputs and outputs. Knowing this in advance
will allow you to select the type of model that works for the project and meets their
criteria. In some cases, they may not care at all. This is important to know in advance
and asking questions and getting alignment helps to ensure you will build something
that your management and customer is happy with.
Lastly, all of these are about having the conversation. As a project goes along
there are many decision points. If you know what the items of key importance are for
your customer in advance, and have alignment with your management, you can make
these decisions quickly and move the project forward. Even with up-front engagement
and alignment you will need to continue to have conversations with management and
your customer as the project progresses. The best way to handle this is to understand
that this need will occur and plan for it by having regular check-ins with both your
customer and management.
5
CHAPTER 2
3. the model couldn’t be explained, hence there was lack of trust in the
solution;
ambitious. “Sure,” we say, “we can have it do this and that, and that other feature, and
automatically control everything.” The problem then becomes if you go off to build
that all as one piece. If you do that, it puts you at risk of not finishing within the cus-
tomer’s desired timeframe because it has become a really huge project. It also puts you
at risk of another pitfall, solving the wrong problem.
The solution to a big project is breaking it into smaller deliverable pieces that
can be put into production as you go—incremental development. This gives you a
chance to check in with the customer. A benefit of an incremental development
framework like in Agile development2 is that this check is built into the process. Avoid
waiting until the very end to show your project to the customer and get feedback. If
you build incrementally and share as you go, you can learn if they will be satisfied by a
smaller scope. I’ve only had two cases where I’ve ever scoped a project too small, and
that was easily corrected by increasing the scope based on what we had learned so far.
So, having too big a scope can be addressed by asking questions at the start of
the project, and by delivering incrementally and asking questions at each delivery. As
in the case of the do-it-yourself couple, ask the equivalent questions to “What’s in this
wall?”: Where are the boundaries of the project? Does it feel big? Can it be broken into
pieces that can be delivered separately?
To give you an example of breaking a project into pieces, let’s look at the case of
taking a manual process and adding AI. To do this, you need data and a data pipeline.
If those are not yet available, that needs to be the first part of the project. You need
to automate the process and establish data collection. Then, later, you can add AI.
Trying to do it all at once just sets you up for failure. It’s also clear from this example
why having alignment is necessary. Without that conversation there would likely be a
misalignment in expectations for the project deliverables. While you would know that
you need to first set up automation so data for a future AI model can be collected,
management might only see that this project is taking forever and may pull the plug.
By having the conversation with management and getting alignment on the project,
you not only prevent the project from being canceled, but also get recognition for the
value delivered from automating a manual process as part of the project deliverables.
The second reason for a project to fail is scope creep. In this case, you start the
project off fine and then decide to add some more features. Additional features lead to
complexity. Complexity can lead to bugs in code. Complexity can also cause problems
with deployment, which can cause a project to fail.
Scope creep can cause you to not finish a project because there is no defined
“end.” If you decide in advance what the first-pass accuracy criteria will be for the
2
http://www.agilemanifesto.org/
2. PROJECT PHASES AND COMMON PITFALLS 7
model, you won’t fall into the trap of polishing it endlessly. The same applies to features
of the overall project. The thing to keep in mind to fight scope creep is that you can
always go back and make improvements later. Having a working model in deployment
is better than not having anything. As my Lean sensei has quoted, “don’t let perfect
be the enemy of good.” In other words, don’t hold off putting a working model into
deployment because it isn’t perfect.
To fight scope creep, it’s important to define the project’s objectives and scope
up front. You can always go back and revise these as you learn, but you then have
guidelines to keep you to a smaller project size that you can deliver to production.
Say, for example, I am going to be building a report. To help get alignment with
the customer, I’ll have a meeting to understand the decision they want to make based
on the report and what their needs are. Then, before I start pulling data or writing
code, I’ll draw out a mockup of the report and share it to get their feedback. I do this
drawing on paper or on a whiteboard so we can iterate and make changes real time in
the meeting. Once what the report should look like has been established, I have a good
feel for the scope of the project. If the customer calls me and asks for a new feature
before I’ve deployed the first version, I’ll suggest we wait for the first version to go out,
and then see if the feature is needed. Depending on what they are asking, we can have
a conversation about what currently planned feature to swap out for that new feature.
One way to establish a boundary on scope is to have a fixed timeline for de-
livering the project. In Agile development we learn about the iron triangle of project
management: resources, time, and features. Two out of the three are fixed, the last one
is flexible. Traditionally, resources and features are considered “fixed” and then you have
problems with slipping timelines and projects that never end. In Agile, resources and
time are fixed, the features that go into a project are flexible. If you set a fixed timeline
for the project, i.e., a two-week sprint with a product release at the end of each sprint,
and adjust the features you deliver to fit inside that timeline, you can prevent scope
creep. You also have to keep to the rule that you can’t add any features unless you have
completed all the features you initially agreed to, and you still have time left, or, that
you learned something about what the customer wants and will swap one feature for
another.
The third reason for a project not to get to deployment is that the model couldn’t
be explained and there is management (typically) concern about how it works and lack
of comfort in the solution. To avoid this pitfall, keep it simple.
The simpler a model is, the easier it is to explain. While it might be exciting for
you, as the data scientist, to use the latest model that you were recently reading about,
it may not be the best solution for the project. It can also backfire if that latest model
8 2. PROJECT PHASES AND COMMON PROJECT PITFALLS
doesn’t have the explainability that your customer desires. To avoid this pitfall, have the
conversation in advance with your customer. Do they need to know the reasoning be-
hind and input values that influenced a given prediction, or are they ok with predictive
values? Will they want to know causality or just correlation? In my work, I prioritize
using machine learning over neural nets, and physical models over machine learning,
because physical models are the easiest to explain, followed by machine learning al-
gorithms, while neural nets can be a black box. If there is a known equation for the
process you are working on, there is no need to get fancy—just use that physical model.
When I was working in semiconductor manufacturing, we used physical models
to predict critical dimensions. We created a system that compared the calculated mea-
surement based on the input parameters to actual measurements and made a feedback
loop to automatically adjust the lithography machine. This is an example of AI using
a simple physical model.
Keeping with models, the fourth risk for your project not to get to deployment
is that the model is too complex. Again, the solution to this risk is to keep it simple.
The simpler the model, the faster the calculation will be for inference. The simpler the
model, the faster it will be to train, or if it is really simple, no training will be needed
at all. Additionally, simpler models are easier to maintain.
As you are building a model, think about the number of parameters needed for
inference. Will that data be available every time your customer will want a prediction?
How hard or easy is it to gather the data needed? The simpler the model, the fewer
parameters it has. The fewer parameters needed, the easier it is to ensure all the re-
quired data is available.
Think about how long it takes to train your model (assuming you aren’t using a
physical model). How often will the model need to be retrained and how long will the
training take? If a model takes many hours to build—say a deep learning model—but
you only need to retrain infrequently, that might work. If it takes many hours to train
the model, and you’ll need to retrain monthly, that may not work. These are some of
the questions you’ll need to ask to gather this information. How often do the input
parameters change? How often does the process that the model supports change?
Understanding these parameters and having the conversation with the customer of the
model about training frequency is important to ensure your project gets to deployment.
The fifth risk for your project is that you solve the wrong problem. Of the five
reasons, solving the wrong problem is the most devastating for a project team. After
working long and hard on a project, getting to the end, only to learn that you haven’t
delivered a helpful solution is difficult. It is also not that uncommon.
2. PROJECT PHASES AND COMMON PITFALLS 9
deliberately proceed through the project phases: Define project objectives; Acquire
and explore data; Model data; Interpret and communicate; and Implement, document,
and maintain. Check in with your customer throughout the project to make sure you
have alignment, and keep management updated on your progress and what has been
delivered so far. Keep things simple and iterate. This ensures you have an achievable
scope to your project. Iterating on the project means you can deliver value as you go
without having the full scope of the project finished.
• Are we maximizing value delivered for the time we are spending on the
project?
more effort than is actually required. Lean Six Sigma suggests asking why five times.
In practice, I have found it best to ask why until you get to a fundamental constraint
like, “that’s how physics works.”
Another two of the pitfalls relate to models. Make sure your team discusses
with you the type of model they intend to use. Help them evaluate the customer’s and
the business’ need for explainability and select an appropriate model that meets those
requirements. Ask questions to gain understanding about the model to be used, so
that you are comfortable with the model and can help explain to others the method
your team is using. Help your team avoid building models that are not tied to reality
and avoid over processing in model building. It can be tempting for data scientists to
work on improving model accuracy indefinitely. Help your team ensure that they are
not chasing diminishing returns.
The last pitfall relates to solving the wrong problem. Two things will assist in
avoiding this pitfall. (1) Work with your team to ask questions at the start of a project.
(2) Support your team in getting alignment with customers and stakeholders.
Guide your team to go slow to go fast. Make planning both what will be done
and how it will be done a normal part of working on a project. Ask questions about
how the team is planning the project and the work. Ask from the very start how they
plan to deploy the project. Make sure they are thinking about the end state as they
build things. Allow them to iterate rather than wait for perfection to deploy a solution.
13
CHAPTER 3
Table 3.1: Connection between the methods to avoid pitfalls and the five project
pitfalls
Method Avoids These Pitfalls
Scope is too big
Scope creep
Ask Questions
Model couldn’t be explained
Solved the wrong problem
Scope is too big
Get Alignment Scope creep
Solved the wrong problem
Model couldn’t be explained
Keep it Simple
Model was too complex
Model couldn’t be explained
Leverage Explainability
Solved the wrong problem
Scope is too big
Scope creep
Have the Conversation Model couldn’t be explained
Model was too complex
Solved the wrong problem
14 3. FIVE METHODS TO AVOID COMMON PITFALLS
If you can get a simple version of the project deployed, you can always go back
and refine the model or add features. You get a benefit and deliver business value from
having the project deployed. Added complexity delays that benefit. In a model, com-
plexity can cause difficulty in the ability to maintain the model. Increased complexity
often comes with increased time required to train a model. I’ll discuss this in more
detail in Chapter 7.
Keep deployment of the project simple. If there is an existing system, use it to
deploy your project. If there are existing business processes, make sure your project
works within them, unless the intent of the project is to change them. I’ll cover this
topic in more detail in Chapter 9.
Development, have the team’s Scrum Master help with protecting the team from dis-
tractions, and require that your team only accept user stories from outside the team if
they come via the Product Owner.
Check in with your team—Are they keeping it simple? Ask them about what
the cheapest and quickest solution would be. Coach them to choose simple methods.
Make it OK to put a solution into production and then go back and improve it in
the future.
Train your team to learn from past projects by performing retrospectives. A
retrospective is time where the team reflects on a project that has been deployed and
compares the project charter to actual results. Ask what was supposed to happen
and what did happen during the project. Ask what they have learned, and what they
would like to change or continue in the future. Keep it positive. This is about learning,
not recriminations.
The first time a team does a retrospective, they will be nervous that you will
be looking to find fault. Finding fault is counterproductive and will not help the
team improve and grow. Make it safe to discuss failure and examine what happened
without placing blame on individuals. Keep in mind that your team is doing their
best. No one comes to work wanting to mess up. To quote W. Edwards Deming, “A
bad system will beat a good person every time” (Deming, 1993). Support your team
to think about where your organization’s systems and business processes are holding
them back and take steps to make improvements. Start with small things your team
controls. If you only change one thing after that first retrospective as a result of their
feedback, they will feel empowered and engaged in the process, and future retrospec-
tives will be very fruitful.
19
CHAPTER 4
Define Phase
The first phase of a project is where you set the scope and determine the deliver-
ables—what the outcomes of the project will be. As discussed previously, it is im-
portant to do this up-front work because it prevents future problems that can cause
your project to fail. Work done in this step ensures you have alignment with both
management and the end customer, and that the scope of the project is defined. This
protects against scope creep and having too big a scope. The questions you ask and the
conversations you have during the define phase of the project protect against solving
the wrong problem.
There are two tools from Lean Six Sigma you can use to help ensure you ask the
questions, get alignment, and have the needed conversations at the beginning of the
project. These tools are: (1) a project charter and (2) a Supplier-Input-Process-Out-
put-Customer (SIPOC) analysis.
decision maker. This keeps the alignment between the project as delivered and the
customer’s expectations, which ensures your project will make it to production.
The format of the charter document isn’t as important as its content. The key
components of the charter are problem statement, scope, how to measure success of
the project, stakeholders, and decision maker (Table 4.1).
The project charter starts with the problem statement. Why are you undertaking
this project in the first place? What is the problem that needs to be solved? The prob-
lem statement should contain the basic facts of the problem. Include why the issue
matters. This ensures there is alignment for why we are doing the project.
In writing the problem statement ensure you are keeping to just the facts of the
problem and not sneaking in possible solutions. The problem is not that we don’t have
a report for monthly sales figures. The problem is that we want to know if our new
marketing campaign is working. To do that, maybe looking at monthly sales figures is
the correct approach. Maybe there is another metric that would be as good or better.
If you assume the answer in setting up the problem, you limit your thinking.
To make sure you aren’t unintentionally limiting your thinking by including
solutions in your problem statement, you can use a standard format for problem state-
ments. A good problem statement answers: who has the problem, what the problem
is, where it occurs, when it occurs, and what the impact of the problem is to the busi-
ness. You can use a fill-in-the-blanks style format like this: During (period of time for
baseline), the (primary business measure) for (business process) was (baseline). This is
a gap of (objective target vs. baseline) from (business objective) and represents (cost
impact of the gap) of cost impact.
4.1 PROJECT CHARTER 21
have addressed the problem sufficiently? Are there customer requirements for model
accuracy? Are there customer requirements for timeliness? Have those requirements
been addressed satisfactorily by the project? These measurements help you know
when you are done and are a layer of protection against scope creep. The expected
return on investment from the project should be documented as well. That helps
with prioritizing work, and with ensuring management understands the benefit of
resourcing the project.
Lastly, the charter should include who will provide input to the project—the
stakeholders, and who will be the final decision maker. For each decision, there
should be only one person who decides. To go fast, it is helpful to have established
in advance who that will be.
Project Charter
Project
New Marketing Campaign Dashboard
Name
Project
J. Weiner
Owner
Problem “During the new marketing campaign the sales of widgets was $50,000, this was
Statement $5,000 more than the period of the same duration prior to the campaign.”
Design and deploy a report that shows the impact of marketing campaigns on
Scope
widget sales
Current
Metrics Item Goal Note
Value
Baseline sales of widgets $45,000.00 $75,000.00 Q3 2018 data
Widget sales during marketing campaign $50,000.00 $75,000.00 Q3 2019 data
Stake- Widget marketing team, J. Smith (Widget marketing team manager), Data
holders science team, D. Jones (Data science team manager), Widget sales department
Decision
J. Smith
Maker
Having the stakeholders and decision maker written down in the charter docu-
ment does two things. One, at the beginning of the project it makes you think through
who the project stakeholders are so you can get their input and feedback at the start
of the project. They may be customers of the end result of the project, or they may
be suppliers of data or other input. Writing down who they are means you have a list
and can check in with them as the project progresses. Writing down the stakeholders
is also a check on project scope. If you start to have a large number of stakeholders,
then it might be good to scope the project down or break it into pieces that can be
delivered separately to different sub-sets of stakeholders. The other thing that writing
down the stakeholders and decision maker does is that it helps with alignment. Stake-
holders can see who else will be giving input into the project, and they can see who
the final decision maker will be. This prevents all your stakeholders from assuming,
naturally, that they are the key decision maker for the project. It avoids the problem
of too many bosses which will make finishing a project difficult due to expansion of
scope and misalignment.
4.2 SUPPLIER-INPUT-PROCESS-OUTPUT-CUSTOMER
(SIPOC) ANALYSIS
A SIPOC (pronounced “sigh-pock”) summarizes the inputs and outputs of a process
in a table format; see Figure 4.2. More importantly, it includes the requirements for
those inputs and outputs. For a data science project, this helps ensure that the model
selected provides the expected accuracy and establishes what data are needed to gen-
erate that model. For a project delivering a report or dashboard, it helps determine the
criteria for frequency of use, who the audience will be, and what data are needed to
create the report or dashboard.
A SIPOC is created in three parts. The first part establishes the start and end of
the process that is being worked on. This could be to fix a problem or to enhance a pro-
cess with automation or AI. For example, if you are intending to build a report to help
make a decision, the process that is used to make the decision would be in the center
of the SIPOC. The second part focuses on the output of the process and the customer,
including the customer requirements. The third part looks at the inputs needed for the
process to deliver the outputs, and what the requirements are for those inputs.
24 4. DEFINE PHASE
Let’s use the following as an example and build the SIPOC. Say I am planning
to build a report to help engineers determine if material meets certain criteria for qual-
ity and can continue to be processed, or if it should be scrapped. That decision-making
process would go in the center of the SIPOC table. I would write down the start and
end points: start—material is on hold for engineering decision; end—material is dis-
positioned, as shown in Figure 4.3.
The next phase is to examine the outputs. For this example, I write down the
outputs or deliverables from the process: dispositioned material. But let’s think that
4.2 SUPPLIER-INPUT-PROCESS-OUTPUT-CUSTOMER (SIPOC) ANALYSIS 25
through. Really, the output of the process is quality parts, so let’s make that change.
The other output of the process is timely decision making. Not only do we want quality
parts, we want to have our material flow through the factory and not be waiting for a
long time.
Next, I identify the customers who receive deliverables from the process: man-
ufacturing, engineering, and engineering management. Manufacturing is a customer
because they want to keep material flowing through the factory, and also want to make
sure the manufacturing process is making quality parts. Engineering is a customer be-
cause the reason that material needs to be dispositioned is useful for troubleshooting
the manufacturing process and making adjustments to ensure the process produces
quality parts. Engineering management is a customer because they are responsible for
their team to disposition material quickly and correctly and fully address problems, so
they don’t re-occur.
In thinking about the customers and why they are customers, we start to see
what their requirements might be. The best way to know for certain is to go ask the
customer. You can start this conversation by sharing what you think their requirements
might be, and then listen to their feedback and additions. Finally, I write down the
requirements for each output from each customer; see Figure 4.4.
Figure 4.4: Example SIPOC with both Part 1, Process, and Part 2, Output and Cus-
tomer completed.
correct choice, so material is not wasted or the final customer is not upset by receiving
material of poor quality. Engineering wants the report to have all the necessary infor-
mation in one place so they can quickly make a decision, they also want the report to
be accurate so they can disposition the material correctly. Knowing the outputs then
means I have a sense of what inputs are required.
The third part of completing a SIPOC analysis is to look at the inputs to the
process. This starts with writing down what inputs are required to enable the process
to occur. Then you look at who or what supplies each of those inputs. Finally, you
document the requirements for each input from each supplier.
In our example, the engineer needs to know what material is waiting for them to
disposition, why the material has been flagged for them to look at, and what happened
on the equipment when the material was being processed. The list of material is neces-
sary for timely decision making. Why the material was flagged and what happened on
the equipment provide the engineer with information so they can make corrections to
the process and prevent similar errors in the future. There is a clear connection between
the outputs that meet the customer’s needs and the inputs required to build the report.
Once we know the needed inputs, we can identify who (or what system) sup-
plies that information. Then we can determine the requirements for each of the inputs
so that our report can meet the needs of our customers.
In our example, as we complete the suppliers and requirements, we notice we
need one other input. We need to know how long the material has been waiting to
be able to meet the manufacturing requirement of not waiting longer than 12 hours.
The data need to be accurate and include detail on what was measured compared to
4.2 SUPPLIER-INPUT-PROCESS-OUTPUT-CUSTOMER (SIPOC) ANALYSIS 27
the goal for that parameter. An example of this is statistical process control limits.
This information is needed by the engineer to be able to answer why the material
was flagged. Suppliers can be teams or systems. In this example, manufacturing is a
supplier—if they have entered comments into the shop floor control system. Other
suppliers are the databases for the statistical process control and shop floor control
systems; see Figure 4.6.
After completing the SIPOC, you might need to go back to your project charter
and update the list of stakeholders based on the findings from listing out the custom-
ers and suppliers. You may also need to adjust the scope of the project based on what
you’ve learned.
The SIPOC is helpful in the phase of your data science project where you are
acquiring and exploring data. From the SIPOC analysis you know the requirements
for the inputs to your process, report, or model which will help you select data sources.
From our example, we know we need to extract data for our report from the statistical
process control database and from the shop floor control database because we have
completed the SIPOC and understand the requirements to enable us to create the
output the customer desires. As you do the data acquisition and exploration, show the
results to your customer and get their feedback.
The project charter and SIPOC help you ask the questions at the start of a
project that set you up to succeed in the end and deliver a project to production. The
project charter establishes the problem you will be working on, so you don’t solve
the wrong problem, and sets the scope and definitions of success to prevent too big
a scope and scope creep. The SIPOC allows you to think through stakeholders and
28 4. DEFINE PHASE
scope the project requirements as well as get alignment on the expected deliverables
and requirements for those deliverables. This is useful for defining what “done” looks
like for your project.
The other factor in determining if you are done with a project is to compare
the expected business value to the business value delivered. Achieving the expected
business value early is a reason to re-assess the project scope and maybe stop working
on it, after having a conversation with your stakeholders and final decision maker. Not
meeting the expected business value after all the planned work is complete can be due
to factors outside your control and is worth assessing if further work should be done, or
if things are good enough as is. Again, this is a joint decision between the team work-
ing on the project and the final decision maker. In the next chapter, we’ll investigate
how to calculate business value for data science projects.
CHAPTER 5
Table 5.1: Deliverables and metrics for various types of data science projects
Project Type Deliverables Metrics
Productivity
Root cause determination
Time to decision
Data analysis Problem solving support
Decision quality
Problem identification
Risk reduction
Time to decision
Time savings
Automation Waste eliminated
Decision support
Decision quality
Standardization Excursion prevention
Standards and business
Business process improve- Quality improvement
processes
ment Risk reduction
Improved model accuracy
New insights
Data mining Decision quality
Learned something new
Risk reduction
Productivity
Increased capability
Improved data science Decision quality
Advanced algorithms
Risk reduction
30 5. MAKING THE BUSINESS CASE: ASSIGNING VALUE TO YOUR PROJECT
To build out these metrics, let’s look at the types of data science projects I’ve
been involved in over the course of my career. I can group the projects I’ve done into
five broad categories. I’ve done data analysis projects. I’ve built reports and automated
processes. I’ve done projects to devise standards and improve business processes. I’ve
delivered insights from data mining. I’ve done projects which improved my organi-
zation’s ability to do data science. Each type of project has different deliverables, and
different metrics to measure those deliverables (Table 5.1).
Let me take a little time here to talk about decision quality. The quality of a
decision is not defined by whether the outcome of the decision is good or bad. We
could make a good decision and still have a bad outcome. Say we are deciding to plant
a crop. We could make a quality decision and still have a bad outcome if the weather
changed from its forecast.
A good quality decision has the following characteristics: it is framed appro-
priately; there are alternatives; data and information are used to decide; the value
and trade-offs are clear; logical reasoning is used; and there is commitment to follow
through and to take action based on the decision. We can use these metrics to gauge
the decision quality. If our analysis is providing the data and information needed to
decide, how can we assess how much we have improved the quality versus not having
that data?
3
Increased market segment share can be difficult to quantify
4
See Table 5.3.
35
CHAPTER 6
on the test system’s hard drive, then transferred to the engineer’s computer for analysis.
By switching the storage location for the collected data to a network drive, we can
more easily explore data across multiple test systems. It is a simple change, much more
straightforward than setting up a relational database to store the data. Long term,
we may want to move in the direction of storing the data in a database. Our simple
change has opened up the potential for analysis of data from multiple systems that we
can start using right away.
Another example of starting small and simple is when a team I worked with
used a SharePoint5 list to collect data. We needed to forecast hardware use and would
do the forecasts twice a year. These forecasts were kept in various office document
formats like presentations or spreadsheets. Our problem was that we weren’t able to
use the historic data from past forecasts because they were not saved anywhere sys-
tematically. By developing a standard location for the forecasts on our SharePoint and
designing the SharePoint list to match formats people had been previously using, we
made it easy for the team to enter the data and build a history.
When you need to collect data from people, make it easy for them to enter the
data. If you have a form or data collection tool that has a lot of fields, people will tend
not to fill things in, or not fill fields in completely, if they feel it takes too long or the
form is too big. When you make the fields required fields, typically people will do the
minimum amount required, even if they may know more about the situation and could
add information by filling in other fields. When you are collecting data from people,
think about the experience from their perspective. People are busy. People typically
will do the minimum unless they are passionate about something—for example, I’ve
had situations where a manufacturing technician would be very frustrated by an on-
going problem and add a ton of helpful content into a comment field, because they
were angry that the problem kept happening. In that case, our normal data collection
systems didn’t transfer the complete information well from the techs to the engineers.
Sometimes the people who enter the data don’t understand the value that can
be gained from using the data they provide, so they do the minimum, or do it quickly
and maybe with less attention to detail than you would like. Help them help you by
minimizing the burden to enter data. Help them understand the value they deliver in
collecting the data and entering it. Circle back to the people who provide the data with
results from your analyses that have been made possible by their data and share what
can be learned. Because of these difficulties with manual data collection, I advocate for
automating data collection wherever possible.
5
Other marks are the properties of their respective owners
6.4 WHAT DOES THE CUSTOMER WANT TO KNOW? 37
Starting small and starting simple is helpful also because you have then tested a
system, which is valuable information when you need to expand or grow it. For exam-
ple, if you have started some manual collection, and determined that there are valuable
insights that can be delivered from that data, it is easier to build a case to add sensors
or other measurement instruments and collect the data automatically. In cases like
this, I’ve collaborated with my IT department to develop systems that will automate
the data acquisition. By starting small and already having data collected, I then have
a good idea of the amount of effort that will be needed to automate the system, and
a sense of the benefit that will be delivered. You can then calculate a return on invest-
ment for the effort which, in addition to information from your project charter and
SIPOC, is useful for convincing the IT department to work on your project.
CHAPTER 7
Model-Building Phase
Two of the project pitfalls relate directly to the model building phase of a data science
project: couldn’t explain the model, and the model was too complex. The tools to ad-
dress these pitfalls are to keep things simple and leverage explainability.
between the model and what is actually happening in the real world. To be useful, a
model needs to be tied to reality and based on real measurements.
7.2 REPEATABILITY
Models need to be repeatable. If I provide the same inputs, I should get the same
results. This is easy to ensure if my model is simple. If different people run my model,
they should get the same results given the same inputs. If the model is run on a differ-
ent machine, I should get the same results. To be useful, models should be transport-
able—meaning I can share a model with another team. This is much easier to ensure
if they are simple.
Good coding practices help with repeatability and the ability to share models
and code between teams. This is something to think about during the modeling phase
of your project. Can you modularize your code so that other teams can use pieces of
your project? How will you test your model? How will you verify that the predictions
from your model are accurate?
husky and wolf. In this case, the training set was deliberately selected to be biased in
this way for the purposes of the paper. The risk in using a neural net comes when this
type of problem occurs unintentionally, and you are not aware of the problem in the
training set.
Between concerns around explainability and the desire to keep things simple, I
typically prioritize using models in this order from simplest to most complex:
1. physical models;
4. neural nets.
Of course, model selection is highly dependent on the type of data that will be used.
Neural nets are particularly useful for visual analytics and natural language processing.
the training data set becomes very high. The problem is then that the model becomes
overly specific, and the accuracy on other data will be lower than optimal. This is a
place where keeping it simple helps. Testing on data not used to train the model will
help your team detect overfitting.
Support your team in applying good coding practices. Provide a source code
control system and ensure your team uses it. Have them create standards and doc-
ument how they write code so there is consistency across the team. Ensure that all
dependencies for your team’s code are documented and included so other teams can
reuse code your team has created, and that your team can repurpose code from others.
Support your team in creating tests for code and models. Consider requiring contin-
uous integration which uses automation tools which build and test code after each
change (Manturewicz, 2019).
Ask about model maintenance and how the model will be supported for the
long term. What is the plan to maintain the model? What will trigger the need to
retrain the model? Who will own the model long term? What are the systems that are
in place to support the model? What does your team need to build and what business
processes need to be developed?
45
CHAPTER 8
Table 8.1: Data science project types and typical final deliverables
Type of Project Typical Final Deliverables
Data analysis Presentation
Automated report/dashboard
Automation Automated report/dashboard
Deployed model
Improved business process Automated report/dashboard
Deployed model
Data mining Presentation
Improved data science Presentation
For data analysis projects, the deliverables are usually determining root cause for
a problem, supporting the problem-solving process, or identifying problems that need
to be fixed. This means that data analysis projects most often end by you presenting
your findings in a meeting. Sometimes you then create an automated report or dash-
board to enable your customer to continue to monitor various metrics resulting from
your analysis.
In projects where the goal is to automate a process, the typical deliverable is a
report or dashboard. Sometimes I am asked to automate an analysis, often one which
requires input from multiple sources. Sometimes I am automating the process of gath-
ering data by generating a single report which includes all the information needed to
46 8. INTERPRET AND COMMUNICATE PHASE
make a given decision. The example I used in describing how the SIPOC works in
Chapter 4 is this type of project.
Automation projects can also result in deployed models. An example of this is
the process control project I mentioned in Chapter 7. The process we were automating
was a manual adjustment of equipment parameters based on statistical process control
values. We developed and deployed a physical model that would automatically make
the adjustments.
Improving a business process typically includes developing standards for how
work is done. The data science deliverable for this type of project is a report or dash-
board that either is the standard, for example a report can provide one standard way
to extract and view certain data in order to make decisions, or measures the business
process and helps maintain the new systems. There can also be deployed models to sup-
port this type of change, depending on the level of automation in the business process.
Data mining projects result in new insights. If those insights are not communi-
cated, no business value is generated. This communication is usually done in meetings
through presentations. It can also be accomplished via emailed reports or through
writing a paper.
Lastly, projects which improve data science should result in a presentation or
paper to share that increased capability or new algorithm. You may wind up with im-
proved data science capability as a side benefit to a project. No matter if it is the main
intent or an additional outcome, sharing what you have learned with your organization
increases the ability of the organization as a whole. It is worth spending the time to
write up the learning as a paper or presentation.
fluence? What are the primary questions your audience will want to answer with the
information you are providing?
When I talk of reports, I mean automated reports or dashboards. When I talk
of presentations, I mean you sharing information typically in the form of a slide deck
to a group. When I talk of models, I mean AI models that make predictions. I will
tackle each one separately.
8.2 REPORTS
For reports there are three rules.
1. Keep it simple.
2. Keep it clear.
6
If your report is for an Israeli audience, flip it and put the most important information on the
top right. This may also apply for Chinese or Japanese audiences. Ask about where your user
expects the most important information to be on a page.
48 8. INTERPRET AND COMMUNICATE PHASE
8.3 PRESENTATIONS
For presentations, there are four rules. The first three are in common with reports: (1)
keep it simple; (2) keep it clear; and (3) use good visuals. There is one additional rule
for presentations and that is, (4) make your presentation tell a story. The goal of your
presentation is to guide your audience, interpret the results of your findings to meet
their requirements, and highlight points of interest. Spending time to understand who
it is you will be presenting to gets you set up to accomplish those tasks.
The first rule is to keep it simple. For presentations, put your interpretation of
your findings at the very start of your slide deck. Knowing your audience helps here.
Do you need to provide background information for them to understand the context?
Is the forum you will be presenting to one of those that doesn’t let speakers get past
the first slide without a ton of questions? If it is that type of forum, build your presen-
tation to accommodate their style, by having one slide with the key point you want to
communicate, and then links to backup information to answer anticipated questions.
No matter what type of audience they are, keep to one idea per slide.
Keep your presentation clear. Don’t confuse your audience with all the analysis
you did to get to your final conclusion. Save that information in the backup of your
presentation in case of questions. Retain only the key graphs you made which got you
to the conclusion in your presentation, not every graph you generated in exploring
the data.
Part of keeping things clear is to not clutter your slides with extra information.
If you have a dense slide, consider separating the information into multiple slides, or
use builds to walk the audience through the information. Match what you say to what
is on the screen at that moment. A very good piece of advice is to plan out what you
will say for each slide and write it up either in the speaker notes section or in a separate
word document. Depending on how formal the presentation will be, consider prac-
ticing your presentation. If your presentation will be timed, practicing your delivery is
50 8. INTERPRET AND COMMUNICATE PHASE
particularly important. Practicing will allow you to gauge if you have too much or too
little content and will help get you used to working within the time limits.
Think about what the key point is that you want to communicate. Since each
slide only has one idea, make that idea the title of the slide. For maximum impact, I
express my titles as headlines. For example, “Increased marketing budget correlates to
increased sales” is the title of a slide with a graph showing marketing budget versus
sales (Figure 8.1). Notice also how I have included a text box with my analysis and
suggested course of action. Each slide should have a key takeaway message. To make
that message clear to the audience, I include it in a text box at the bottom of the slide
and use a build to allow them time to read the graph, before sharing my conclusion.
$60,000.00
Sales Revenue (dollars)
$50,000.00
$40,000.00
$30,000.00
The more spent on marketing so far, the higher the sales revenue has
$20,000.00 been. It may be starting to top out at $30K. We should test that over
$10,000.00 the next two months by spending $32K each month.
$0.00
$0.00 $5,000.00 $10,000.00 $15,000.00 $20,000.00 $25,000.00 $30,000.00 $35,000.00
Mocked-up data from marketing and sales database, extracted August 28, 2010 by Joyce Weiner
Marketing Budget (dollars)
The third rule of presentations is to use good visuals. Use graphics and visual-
izations to underscore your point. As you present, verbally walk the audience through
your visualization, if they may not be familiar with reading that type of graph or table.
Include the data source and when the data was extracted as footnotes. Good visuals
often “grow legs” meaning a graph that really explains something well is often copied
and used in other presentations. This means you did an excellent job in capturing an
insight in a visual. Make sure your name is on the visuals you create, so that when they
are shared, you get credit.
For presentations, make your presentation tell a story. Like a story, your presen-
tation should have a plot and a beginning, middle, and end. Telling a story makes your
presentation easier to follow and makes it memorable. In the beginning of the story,
8.4 MODELS 51
let your audience know what to expect in your presentation. In the middle, deliver
the content and the value of the presentation, and in the end, summarize what you
covered. This is the classic, “Tell them what you’re going to tell them, tell them, and
tell them what you told them.” There is a reason it’s a classic—because it is effective.
Stories need plots. Some possible plots for presentations are: “My problem
and how I solved it,” “The current problem, options for solving it and the one I like,”
and “The current problem and the help I need from you.” In “my problem and how I
solved it” you are sharing data analysis used to identify solutions to a problem and ver-
ify that the problem has been solved. Or, you might be presenting on improvements
you made to model accuracy or a new algorithm. Another variation on this plot is
“my problem and how I found it” where you report on data mining analysis used to
uncover a problem.
In “the current problem, options for solving it and the one I like” you are pro-
viding analysis to support decision making. You are also providing analysis of possible
solutions and guiding the audience with your assessment of which is the best option
and why. In “The current problem and the help I need from you” you are presenting on
analysis of a problem and what resources are needed to move forward with a solution.
Of course, for all these stories, you are providing supporting evidence in the form of
charts, graphs, and tables.
8.4 MODELS
For models, interpretation of the information is built into the system that has been
put in place around the model and that uses the model’s output. For example, we have
a system that predicts the need for preventative maintenance. Based on the results of
the model’s prediction, the system will flag a user that maintenance is needed, or even
schedule maintenance through existing systems. If the model predicts a tool needs
adjustment, that might trigger a report with adjustment suggestions to engineering
or might trigger an automated adjustment. The method that is used depends on trust
level with that model, and how much experience the users have had with the model.
So, depending on the user’s level of comfort with the model, projects involving
a model might need reports, or they might need systems which interface with the
model. These systems need to provide the inputs required by the model to generate a
prediction and have rules or other methods to interpret that prediction.
52 8. INTERPRET AND COMMUNICATE PHASE
CHAPTER 9
Deployment Phase
Begin planning for deployment from the beginning of your project. Excitement is a
common trap and when you start a new project, you may just want to get some data
and explore it and do some model building. The trouble is that then, you are started
down a path without fully thinking it through. Taking the time at the beginning of the
project to think about deployment gives your project an edge and can help you beat the
odds and be one of the 13% of projects that gets to the deployment phase.
Although tempting, don’t fall into the trap of minimally cleaning the data and
rushing to build a model. This method of execution of a data science project makes
deployment difficult because you haven’t thought about or planned for maintaining
the model in deployment. Once you have something created it is really disappointing
when you realize it is all wrong and needs to be scrapped. It is a much easier decision
to make at the beginning before you have spent any time building.
be enhanced, adjusted to meet those needs, or possibly that your report is no longer
needed and you can stop running it.
Using systems that are already in place means less work is needed to get your
report or model into production. Inserting a model into an existing system, or even
building the model in a spreadsheet is easier than creating an all new application or
building a website for your model. When I worked in manufacturing, all our factory
equipment had associated computers to control the equipment. It was very easy to
deploy models to that controlling computer and harness the existing systems.
9.2 DOCUMENTATION
A favorite quote of mine is, “documentation is a gift you give your future self.” Having
started with a project charter and a SIPOC, you have begun to document the project
at the very start. It is best to continue with this practice of documenting as you go,
rather than waiting until the very end to write the project documentation. When you
wait, the burden of remembering what you did and why can become a mountain of a
task and make documentation hard to complete.
Frequently, projects fail because of lack of good documentation. A project may
be implemented once but can’t be easily maintained because there was no transfer of
knowledge. A good model can’t be reused because it doesn’t run on someone else’s
computer due to a lack of undocumented dependencies. What often happens in these
cases is that teams will redo a project, reinventing what had already existed because
they are unable to use it.
When thinking about deployment, think about the documentation for the proj-
ect. Where will you store the code and the documentation? Make sure all information
about the project is in one place. This can be a wiki, a shared drive, or a code repository.
While you are documenting your project, take the time to write up all the deci-
sions you made and why you made them. Did you investigate multiple models before
selecting the one that worked the best? That is great information to have for the future,
and for sharing with other teams in your organization. Did you select a particular lan-
guage to use for scripting because it was easy to interface into existing systems? Again,
great information to capture and keep for the long term.
At the end of the project, when it has been deployed, take time to reflect and
document the learnings you had over the course of the project. This should include
things like new insights that were gained, new algorithms that were developed, and
general learning that occurred. Also, take the time to go back and revisit the project
9.3 MAINTENANCE 55
charter. Did you accomplish what you planned to do? Why or why not? Write up your
reflections and include them with the final business value delivered by your project.
9.3 MAINTENANCE
A big consideration in the deployment phase is who will maintain the report or model.
The SIPOC is helpful in defining this as it gives information about who will be using
the model and what they expect. If your user is expecting 24×7 support, you need to
plan for that before putting your solution into production.
I ran into the problem of not planning for maintenance early in my career. I
developed a report to make it easier for manufacturing to make some production deci-
sions, and my report stopped working at 2 am. Of course, I was called in the middle of
the night to fix the report and get production running again. While that was all right
as a one-time solution, it would not work in the long term. I needed to convert my
report to work with existing systems that were supported on a round-the-clock basis.
If I had thought this through at the beginning, it would have prevented a scramble
and a re-write of the report.
Before your project goes into deployment, think about how you will know if the
report or model has stopped working. For a model, this is about error checking and
verification. Will you have a defined testing cycle? Can you detect errors automatically?
For a report this can be as simple as having a programmatically generated time
stamp at the top. The user can then check to see that the report has updated before
using the information. There is nothing worse that learning that decisions have been
made based on a report or dashboard that hasn’t been updated in two weeks. I include
a timestamp and a support phone number or email in my reports, so if a user identifies
that the report has stopped, they can contact the report owner or support team and
notify them of the problem.
Think about when you will update the model, and what will trigger an update.
The same goes for reports. Will you establish a time-based update cycle? Is there a
specific event that would trigger an update? For a model, you can measure accuracy,
and if it drops below a certain threshold, trigger retraining of the model. Sometimes
there are external factors that influence when you should retrain, such as changes to
the process that the model is built for, or changes to automation.
Finally, think about the expected lifetime for your project. What will trigger
obsolescence? It is unlikely that you will be running the same report or model forever.
How will you know if it is no longer being used? For reports, I like to have some way
of telling if users are interacting or view them. When that usage count falls off, it’s
56 9. DEPLOYMENT PHASE
time to have a conversation with the main decision maker about the usefulness of the
report. At this point you have two choices, you can update the report to meet the new
needs, or you can cancel the report because it is no longer useful.
When you no longer need a report or model, having documentation about all
the pieces and dependencies is incredibly helpful in completing end of life tasks. You
don’t want to remove or delete something that other models, reports, or teams rely on.
Cleaning up after a report or model frees up shared resources (compute, storage, etc.).
reports and models. Either build the capability for long-term support of models and
reports within your team or establish systems and methods for transferring projects to
other teams for long-term support.
59
CHAPTER 10
3. the model couldn’t be explained, hence there was lack of trust in the
solution;
4. the model was too complex and therefore was difficult to maintain; and
2. get alignment;
3. keep it simple,;
start gathering data or building a model. Use the project charter and SIPOC analysis
tool to guide you in asking these questions.
Asking questions is not a one and done type of thing. Continue to check in with
the project stakeholders and customers as you go to ensure you are solving the right
problem and meeting their requirements. Update the charter and SIPOC as you gain
clarity and learn more about your customer’s needs. Show your customer the results of
your initial data exploration and ask for feedback. Ask about explainability, and how
the model will be used. Ask about long-term considerations like who will own main-
taining the model. Ask for feedback at the end of the project.
Consider simpler models like physical models or simple tree models to support
explainability. Take advantage of new techniques for explainability that are currently
being researched.
References
Beck, K. and Beedle, M. (2001). Principles behind the Agile Manifesto. Retrieved
from agilemanifesto.org: https://agilemanifesto.org/principles.html. 10
Berinato, S. (2016). Visualizations that really work. Retrieved from Harvard Business
Review: https://hbr.org/2016/06/visualizations-that-really-work. 49
Deming, W. E. (1993). A bad system will beat a good person every time. Retrieved
from The W. Edwards Deming Institute: https://deming.org/a-bad-system-
will-beat-a-good-person-every-time/. 17
Few, S. (2012). Show Me the Numbers: Designing Tables and Graphs to Enlighten. El
Dorado Hills, CA: Analytics Press. 48
Few, S. (2013). Information Dashboard Design. Analytic Press. 49
George, M. L., Rowlands, D., and Kastle, B. (2003). What is Lean Six Sigma? Mc-
Graw-Hill Education. 2
Johnson, G. (2014). Designing with the Mind in Mind: Simple Guide to Understanding
User Interface Design Guidelines. 2nd Edition. Morgan Kaufmann. 48
Knafic, C. N. (2015). Storytelling With Data: A Data Visualization Guide for
Business Professionals. Hoboken, NJ: John Wiley and Sons, Inc. DOI:
10.1002/9781119055259. 48
Manturewicz, M. (2019). What is CI/CD—all you need to know. Retrieved from
https://codilime.com/; https://codilime.com/what-is-ci-cd-all-you-need-
to-know/. 44
Oxford Languages. (2020). Artificial intelligence definition. Retrieved from google.
com: https://tinyurl.com/y6zwlnkw. xi
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). “Why should I trust you?”: Explain-
ing the predictions of any classifier. arXiv 1602.04938 [cs.LG]. Retrieved
from https://arxiv.org/abs/1602.04938. DOI: 10.18653/v1/N16-3020. 42
Royal Society. (2019). Explainable AI: the basics Policy Briefing. Retrieved from Royal
Society: https://www.exploreaiethics.com/reports/explainable-ai-the-ba-
sics/. 15
64 REFERENCES
Segel, E. and Heer, J. (2010). Narrative visualization: Telling stories with data. IEEE
Transactions on Visualization and Computer Graphics (Proc. InfoVis). DOI:
10.1109/TVCG.2010.179. 49
Stone, M. (2006). Choosing colors for data visualization. Retrieved from Perceptual
Edge: https://www.perceptualedge.com/articles/b-eye/choosing_colors.pdf.
48
Tufte, E. (2001). The Visual Display of Quantitative Information. Cheshire, CT: Graph-
ics Press. 48
Tufte, E. (1990). Envisioning Information. Cheshire, CT: Graphics Press. 48
Tufte, E. (2006). Beautiful Evidence. Cheshire, CT: Graphics Press. 48
VB Staff. (2019). Why do 87% of data science projects never make it into production?
Retrieved from Venturebeat.com: https://venturebeat.com/2019/07/19/
why-do-87-of-data-science-projects-never-make-it-into-production/. vi, 1
65
Author Biography