Data Science For Supply Chain Forecasting 2nd Edition Extract
Data Science For Supply Chain Forecasting 2nd Edition Extract
net/publication/350440225
CITATIONS READS
16 10,343
1 author:
Nicolas Vandeput
SupChains
4 PUBLICATIONS 55 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Nicolas Vandeput on 27 March 2021.
Nicolas Vandeput
Second Edition
We can see in Raphael’s fresco The School of Athens, philosophers—practitioners and
theorists alike—debating and discussing science. I like to use the same approach when
working on projects: discussing ideas, insights, and models with other supply chain data
scientists worldwide. It is always a pleasure for me.
For the second edition of Data Science for Supply Chain Forecasting, I surrounded myself
with a varied team of supply chain practitioners who helped me to review chapter after
chapter, model after model. I would like to thank each of them for their time, dedication,
and support.
I would like to thank my dear friend Gwendoline Dandoy for her work on this book. She
helped me to make every single chapter as clear and as simple as possible. She worked
tirelessly on this book, as she did on my previous book Inventory Optimization: Models and
Simulations. Along with her help to review this book, I could always count on her support,
iced tea, and kindness. Thank you, Gwendoline.
Thanks to Mike Arbuzov for his help with the advanced machine learning models. It is
always a pleasure to discuss with such a data science expert.
I had the chance this year to be surrounded by two inspiring, brilliant entrepreneurs: João
Paulo Oliveira, co-founder of BiLD analytic (a data science consultancy company) and
Edouard Thieuleux, founder of AbcSupplyChain (do not hesitate to visit his website abc-
supplychain.com for supply chain training material and coaching). I would like to thank
them both for their help reviewing the book. It is always a pleasure to discuss models and
entrepreneurship with them.
I would like to thanks Michael Gilliland (author of The Business Forecasting Deal—the
book that popularized the forecast value added framework) for his help and numerous
points of advice on Part III of the book.
I was also helped by a group of talented supply chain practitioners, both consultants and
in-house experts. I would like to thank Léo Ducrot for his help on the third part of the
iv Acknowledgments
Nicolas Vandeput
September 2020
[email protected]
v
First Edition
Discussing problems, models, and potential solutions has always been one of my favorite
ways to find new ideas—and test them. As with any other big project, when I started to
write Data Science for Supply Chain Forecasting, I knew discussions with various people
would be needed to receive feedback. Thankfully, I have always been able to count on
many friends, mentors, and experts to share and exchange these thoughts.
First and foremost, I want to express my thanks to Professor Alassane Ndiaye, who has
been a true source of inspiration for me ever since we met in 2011. Not only does Alassane
have the ability to maintain the big picture and stay on course in any situation—especially
when it comes to supply chain—but he also has a sense of leadership that encourages
each and everyone to shine and come face to face with their true potential. Thank you for
your trust, your advice, and for inspiring me, Alassane.
Furthermore, I would like to thank Henri-Xavier Benoist and Jon San Andres from Bridge-
stone for their support, confidence, and the many opportunities they have given me. To-
gether, we have achieved many fruitful endeavors, knowing that many more are to come
in the future.
Of course, I also need to mention Lokad’s team for their support, vision, and their incredible
ability to create edge models. Special thanks to Johannes Vermorel (CEO and founder)
for his support and inspiration—he is a real visionary for quantitative supply chain models.
I would also like to thank the all-star team, Simon Schalit, Alexandre Magny, and Rafael
de Rezende for the incredible inventory model we have created for Bridgestone.
There are few passionate professionals in the field of supply chains who can deal with the
business reality and the advanced quantitative models. Professor Bram De Smet is one of
those. He has inspired me, as well as many other supply chain professionals around the
globe. In February 2018, when we finally got the chance to meet in person, I shared my
idea of writing a book about supply chain and data science. He simply said, “Just go for it
and enjoy it to the fullest.” Thank you, Bram, for believing in me and pushing me to take
that first step.
Just like forests are stronger than a single tree by itself, I like to surround myself with
supportive and bright friends. I especially would like to thank each and every one of the
following amazing people for their feedback and support: Gil Vander Marcken, Charles Hof-
fremont, Bruno Deremince, and Emmeline Everaert, Romain Faurès, Alexis Nsamzinshuti,
François Grisay, Fabio Periera, Nicolas Pary, Flore Dargent, and Gilles Belleflamme. And
of course, a special thanks goes to Camille Pichot. They have all helped me to make this
book more comprehensive and more complete. I have always appreciated feedback from
others to improve my work, and I would never have been able to write this book alone
without the help of this fine team of supportive friends.
On another note, I would also like to mention Daniel Stanton for the time he took to share
his experience with business book publishing with me.
vi Acknowledgments
Last but not least, I would like to thank Jonathan Vardakis truly. Without his dedicated
reviews and corrections, this book would simply not have come to its full completion.
Throughout this collaboration, I have realized that we are a perfect fit together to write
a book. Many thanks to you, Jon.
Nicolas Vandeput
November 2017
[email protected]
About the Author
Nicolas Vandeput is a supply chain data scientist specialized in demand forecasting and
inventory optimization. He founded his consultancy company SupChains in 2016 and co-
founded SKU Science—a smart online platform for supply chain management—in 2018.
He enjoys discussing new quantitative models and how to apply them to business reality.
Passionate about education, Nicolas is both an avid learner and enjoys teaching at uni-
versities; he has taught forecasting and inventory optimization to master’s students since
2014 in Brussels, Belgium. He published Data Science for Supply Chain Forecasting in
2018 and Inventory Optimization: Models and Simulations in 2020.
Contents
Introduction xvii
I Statistical Forecasting 1
1 Moving Average 3
1.1 Moving Average Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Do It Yourself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Forecast KPI 11
2.1 Forecast Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 MAPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Which Forecast KPI to Choose? . . . . . . . . . . . . . . . . . . . . . . 22
3 Exponential Smoothing 29
3.1 The Idea Behind Exponential Smoothing . . . . . . . . . . . . . . . . . . 29
3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Do It Yourself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Underfitting 41
4.1 Causes of Underfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 Do It Yourself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Model Optimization 57
6.1 Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8 Overfitting 71
8.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8.2 Causes and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
10 Outliers 93
10.1 Idea #1 – Winsorization . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.2 Idea #2 – Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . 97
10.3 Idea #3 – Error Standard Deviation . . . . . . . . . . . . . . . . . . . . 100
10.4 Go the Extra Mile! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
13 Tree 133
13.1 How Does It Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
13.2 Do It Yourself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
15 Forest 151
15.1 The Wisdom of the Crowd and Ensemble Models . . . . . . . . . . . . . 151
15.2 Bagging Trees in a Forest . . . . . . . . . . . . . . . . . . . . . . . . . . 152
15.3 Do It Yourself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
15.4 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
23 Clustering 229
23.1 K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
23.2 Looking for Meaningful Centers . . . . . . . . . . . . . . . . . . . . . . 232
23.3 Do It Yourself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Bibliography 295
Glossary 301
Index 305
Foreword – Second Edition
In a recent interview, I was asked what the two most promising areas in the field of
forecasting were. My answer was “Data Science” and “Supply Chain” that combined, were
going to fundamentally shape the theory and practice of forecasting in the future, providing
unique benefits to business firms able to exploit their value. The second edition of Data
Science for Supply Chain Forecasting is essential reading for practitioners in search of
information on the newest developments in these two fields and ways of harnessing their
advantages in a pragmatic and useful way.
Nicolas Vandeput, a supply chain data scientist, is an academic practitioner perfectly knowl-
edgeable in the theoretical aspects of the two fields, having authored two successful books,
but who is also a consultant, having founded a successful company, and well connected
with some of the best known practitioners in the supply chain field as referenced in his
acknowledgments. The third part of this book has benefited from advice from another
prominent practitioner, Michael Gilliland, author of the Business Forecasting Deal (2nd
ed.) while Evangelos Spiliotis, my major collaborator in the M4 and M5 competitions, has
reviewed in detail the statistical models included in the book.
Nicolas’ deep academic knowledge combined with his consulting experience and frequent
interactions with experienced experts in their respected fields are the unique ingredients of
this well-balanced book covering equally well both the theory and practice of forecasting.
This second edition expands on his successful book, published in 2018, with more than
50% new content and a large, second part of over 150 pages, describing machine learning
(ML) methods, including gradient boosting ones similar to the lightGBM that won the M5
accuracy competition and was used by the great majority of the 50 top contestants. In
addition, there are two new chapters in the third part of the book, covering the critical
areas of judgmental forecasting and forecast value added aimed at guiding the effective
supply chain implementation process within organizations.
The objective of Data Science for Supply Chain Forecasting is to show practitioners how
to apply the statistical and ML models described in the book in simple and actionable “do-
it-yourself” ways by showing, first, how powerful the ML methods are, and second, how
xiv Foreword – Second Edition
to implement them with minimal outside help, beyond the “do-it-yourself” descriptions
provided in the book.
Tomorrow’s supply chain is expected to provide many improved benefits for all stakeholders,
and across much more complex and interconnected networks than the current supply chain.
Today, the practice of supply chain science is striving for excellence: innovative and inte-
grated solutions are based on new ideas, new perspectives and new collaborations, thus
enhancing the power offered by data science.
This opens up tremendous opportunities to design new strategies, tactics and operations
to achieve greater anticipation, a better final customer experience and an overall enhanced
supply chain.
As supply chains generally account for between 60% and 90% of all company costs (exclud-
ing financial services), any drive toward excellence will undoubtedly be equally impactful
on a company’s performance as well as on its final consumer satisfaction.
This book, written by Nicolas Vandeput, is a carefully developed work emphasizing how
and where data science can effectively lift the supply chain process higher up the excellence
ladder.
This is a gap-bridging book from both the research and the practitioner’s perspective, it
is a great source of information and value.
Firmly grounded in scientific research principles, this book deploys a comprehensive set
of approaches particularly useful in tackling the critical challenges that practitioners and
researchers face in today and tomorrow’s (supply chain) business environment.
In the same way electricity revolutionized the second half of the 19th century, allowing
industries to produce more with less, artificial intelligence (AI) will drastically impact the
decades to come. While some companies already use this new electricity to cast new light
upon their business, others are still using old oil lamps or even candles, using manpower to
manually change these candles every hour of the day to keep the business running.
As you will discover in this book, AI and machine learning (ML) are not just a question of
coding skills. Using data science to solve a problem will require more a scientific mindset
than coding skills. We will discuss many different models and algorithms in the later chap-
ters. But as you will see, you do not need to be an IT wizard to apply these models. There
is another more important story behind these: a story of experimentation, observation, and
questioning everything—a truly scientific method applied to supply chain. In the field of
data science as well as supply chain, simple questions do not come with simple answers.
To answer these questions, you need to think like a scientist and use the right tools. In
this book, we will discuss how to do both.
Supply Chain Forecasting Within all supply chains lies the question of planning. The
better we evaluate the future, the better we can prepare ourselves. The question of future
uncertainty, how to reduce it, and how to protect yourself against this unknown has always
been crucial for every supply chain. From negotiating contract volumes with suppliers to
setting safety stock targets, everything relates to the ultimate question:
Old-school Statistics and Machine Learning One could think that these statistical
models are already outdated and useless as machine learning models will take over. But
this is wrong. These old-school models will allow us to understand and see the demand
2 The sales and operations planning (S&OP) process focuses on aligning mid- and long-term demand
and supply.
3 See Section 3.3 for more information about Holt-Winters models.
xix
patterns in our supply chain. Machine learning models, unfortunately, won’t provide us any
explanation nor understanding of the different patterns. Machine learning is only focused
on one thing: getting the right answer. The how does not matter. This is why both the
statistical models and the machine learning models will be helpful for you.
Concepts and Models The first two parts of this book are divided into many chapters:
each of them is either a new model or a new concept. We will start by discussing statistical
models in Part I, then machine learning models in Part II. Both parts will start with simple
models and end with more powerful (and complex) ones. This will allow you to build your
understanding of the field of data science and forecasting step by step. Each new model
or concept will allow us to overcome a limitation or to go one step further in terms of
forecast accuracy.
On the other hand, not every single existing forecast model is explained here. We will only
focus on the models that have proven their value in the world of supply chain forecasting.
Do It Yourself We also make the decision not to use any black-box forecasting function
from Python or Excel. The objective of this book is not to teach you how to use software.
It is twofold. Its first purpose is to teach you how to experiment with different models on
your own datasets. This means that you will have to tweak the models and experiment
with different variations. You will only be able to do this if you take the time to implement
these models yourself. Its second purpose is to allow you to acquire in-depth knowledge
on how the different models work as well as their strengths and limitations. Implementing
the different models yourself will allow you to learn by doing as you test them along the
way.
At the end of each chapter, you will find a Do It Yourself (DIY) section that will show
you a step-by-step implementation of the different models. I can only advice you to start
testing these models on your own datasets ASAP.
4 Even though we will focus on supply chain demand forecasting, the principles and models explained
for machine learning, where there is definitely no one-size-fits-all model or silver bullet:
machine learning models need to be tailor-fit to the demand patterns at hand.
You do not need technical IT skills to start using the models in this book today. You
do not need a dedicated server or expensive software licenses—only your own computer.
You do not need a PhD in Mathematics: we will only use mathematics when it is directly
useful to tweak and understand the models. Often—especially for machine learning—a
deep understanding of the mathematical inner workings of a model will not be necessary
to optimize it and understand its limitations.
Unfortunately, many people forget that experimenting means trial and error. Which
means that you will face the risk of failing. Experimenting with new ideas and models
is not a linear task: days or weeks can be invested in dead-ends. On the other hand,
a single stroke of genius can drastically improve a model. What is important is to fail
fast and start a new cycle rapidly. Don’t get discouraged by a couple of trials without
improvement.
Automation is the key to fast experimentation. As you will see, we will need to run
tens, hundreds, or even thousands of experiments on some datasets to find the best
model. In order to do so, only automation can help us out. It is tremendously important
to keep our data workflow fully automated to be able to run these experiments without
any hurdle. Only automation will allow you to scale your work.
Automation is the key to reliability. As your model grows in complexity and your
datasets grow in size, you will need to be able to reliably populate results (and act
upon them). Only an automated data workflow coupled to an automated model will
give you reliable results over and over again. Manual work will slow you down and create
random mistakes, which will result in frustration.
Don’t get misled by overfitting and luck. As we will see in Chapter 8, overfitting
(i.e., your model will work extremely well on your current dataset, but fail to perform
well on new data) is the number one curse for data scientists. Do not get fooled by
luck or overfitting. You should always treat astonishing results with suspicion and ask
yourself the question: Can I replicate these results on new (unseen) data?
Sharing is caring. Science needs openness. You will be able to create better models if
you take the time to share their inner workings with your team. Openly sharing results
(good and bad) will also create trust among the team. Many people are afraid to share
bad results, but it is worth doing so. Sharing bad results will allow you to trigger a
debate among your team to build a new and better model. Maybe someone external
will bring a brand-new idea that will allow you to improve your model.
Simplicity over complexity. As a model grows bigger, there is always a temptation
to add more and more specific rules and exceptions. Do not go down this road. As
more special rules add up in a model, the model will lose its ability to perform reli-
ably well on new data. And soon you will lose the understanding of all the different
interdependencies. You should always prefer a structural fix to a specific new rule (also
known as quick fix). As the pile of quick fixes grows bigger, the potential amount of
interdependences will exponentially increase and you will not be able to identify why
your model works in a specific way.
Communicate your results. Communication skills are important for data scientists.
Communicating results in the best way is also part of the data science process: clar-
ity comes from simplicity. When communicating your results, always ask yourself the
following questions:
- Who am I communicating to?
xxii Introduction
Perfection is achieved not when there is nothing more to add, but when there
is nothing left to take away.
Antoine de Saint-Exupéry
Excel
Excel is the data analyst’s Swiss knife. It will allow you to easily perform simple calculations
and to plot data. The big advantage of Excel compared to any programming language is
that we can see the data. It is much easier to debug a model or to test a new one if you
see how the data is transformed at each step of the process. Therefore, Excel can be the
first go-to in order to experiment with new models or data.
Excel also has many limitations. It won’t perform well on big datasets and will hardly allow
you to automate difficult tasks.
Python
Python is a programming language initially published in 1991 by Guido van Rossum, a
Dutch computer scientist. If Excel is a Swiss knife, Python is a full army of construction
machines awaiting instructions from any data scientist. Python will allow you to perform
computations on huge datasets in a fast, automated way. Python also comes with many
libraries dedicated to data analysis (pandas), scientific computations (NumPy and SciPy),
or machine learning (scikit-learn).5 These will soon be your best friends.
5 See Pedregosa et al. (2011); Virtanen et al. (2020); Oliphant (2006); Hunter (2007); McKinney
(2010).
xxiii
Why Python? We chose to use Python over other programming languages as it is both
user-friendly (it is easy to read and understand) and one of the most used programming
languages in the world.
experienced Python user could produce, but the implementations are easy to understand—
which is the primary goal here.
Python Libraries We will use throughout the book some of Python’s very well-known
libraries. As you can see here, we will use the usual import conventions. For the sake of
clarity, we won’t show the import lines over and over again in each code extract.
1 import numpy as np
2 import pandas as pd
3 import scipy.stats as stats
4 from scipy.stats import norm
5 import matplotlib.pyplot as plt
Other Resources
You can download the Python code shown in this book as well as the Excel templates on
supchains.com/resources-2nd (password: XXXXXXXXXX). There is also a Glossary (and an
Index) at the end of the book, where you can find a short description of all the specific
terms we will use. Do not hesitate to consult it if you are unsure about a term or an
acronym.
Part I
Statistical Forecasting
Chapter 1
Moving Average
The first forecast model that we will develop is the simplest. As supply chain data scientists,
we love to start experimenting quickly. First, with a simple model, then with more complex
ones. Henceforth, this chapter is more of a pretext to set up our first forecast function in
Python and our Excel template—we will use both in all of the following chapters.
Where,
ft is the forecast for period t
n is the number of periods we take the average of
dt is the demand during period t
Initialization As you will see for further models, we always need to discuss how to ini-
tialize the forecast for the first periods. For the moving average method, we won’t have a
forecast until we have enough historical demand observations. So the first forecast will be
done for t = n + 1.
4 1 Moving Average
Future Forecast Once we are out of the historical period, we simply define any future
forecast as the last forecast that was computed based on historical demand. This means
that, with this model, the future forecast is flat. This will be one of the major restrictions
of this model: its inability to extrapolate any trend.
Notation
In the scientific literature, you will often see the output you want to predict noted as
y . This is due to the mathematical convention where we want to estimate y based on
x. A prediction (a forecast in our case) would then be noted ŷ . This hat represents the
idea that we do an estimation of y . To make our models and equations as simple to read
and understand as possible, we will avoid this usual convention and use something more
practical:
Demand will be noted as d
Forecast will be noted as f
When we want to point to a specific occurrence of the forecast (or the demand) at time
t, we will note it ft (or dt ). Typically:
d0 is the demand at period 0 (e.g., first month, first day, etc.)
f0 is the forecast for the demand of period 0
We will call the demand of each period a demand observation. For example, if we measure
our demand on monthly buckets, it means that we will have 12 demand observations per
year.
1.2 Insights
In Figure 1.1, we have plotted two different moving average forecasts.
As you can see, the moving average forecast where n = 1 is a rather specific case:
the forecast is the demand with a one-period lag. This is what we call a naïve forecast:
“Tomorrow will be just as today.”
Naïve Forecast
A naïve forecast is the simplest forecast model: it always predicts the last available
observation.
A naïve forecast is interesting as it will instantly react to any variation in the demand, but
on the other hand, it will also be sensitive to noise and outliers.
1.2 Insights 5
Volume
Demand
Moving average (8)
Moving average (1)
Period
Figure 1.1: Moving averages.
Noise
In statistics, the noise is an unexplained variation in the data. It is often due to the
randomness of the different processes at hand.
To decrease this sensitivity, we can go for a moving average based on more previous
demand observations (n > 1). Unfortunately, the model will also take more time to react
to a change in the demand level. In Figure 1.1, you can observe that the moving average
with n = 8 takes more time to react to the changing demand level during the first phase.
But during the second phase, the forecast is more stable than the naïve one. We have
to make a trade-off between reactivity and smoothness. As you will see, we will have
to make this trade-off repeatedly in all the exponential smoothing models that we will see
later.
Limitations
There are three main limitations at the core of a moving average.
1. No Trend The model does not see any trend (and therefore won’t project any).
We will learn how to include those in Chapters 5 and 7.
2. No Seasonality The model will not properly react to seasonality. We will include
seasonality in Chapters 9 and 11.
3. Flat Historical Weighting A moving average will allocate an equal weight to all
the historical periods that are taken into account. For example, if you use a moving
average with n = 4, you allocate a weight (importance) of 25% to the last four
periods. But, the latest observation should somehow be more important than the one
four periods ago. For example, if you want to forecast June based on the demand
6 1 Moving Average
of the latest months, you should give more importance to the demand observed in
May compared to any other month.
Although, you might also want to give a small importance to the demand observed
five periods ago: May might be the most interesting month, but there may be some
interest to look at last December as well.
We will solve this in Chapter 3 by using an exponential—rather than a flat —
weighting of the historical periods.
1.3 Do It Yourself
Excel
Let’s build an example with a moving average model based on the last three demand
occurrences (n = 3), as shown in Figure 1.2, by following these steps:
1. We start our data table by creating three columns:
Date in column A
Demand in column B
Forecast in column C
You can define the first line as Date = 1 (cell A2=1) and increase the date by one
on each line.
2. For the sake of the example, we will always use the same dummy demand in Excel.
You can type these numbers (starting on date 1 until date 10): 37, 60, 85, 112,
132, 145, 179, 198, 150, 132.
3. We can now define the first forecast on date 4. You can simply use the formula
C5=AVERAGE(B2:B4) and copy and paste it until the end of the table. You can con-
tinue until row 12 (date 11), which will be the last forecast based on historical
demand.
4. The future forecasts will all be equivalent to this last point. You can use C13=C12
and copy and paste this formula until as far as you want to have a forecast. In the
example, we go until date 13.
5. You should now have a table that looks like Figure 1.2.
Python
If you are new to Python and you want to get a short introduction, I introduce the most
useful concepts in Appendix A. This will be enough to understand the first code extracts
and learn along the way.
1.3 Do It Yourself 7
Throughout the book, we will implement multiple models in separate functions. These
functions will be convenient as they will all use the same kind of inputs and return similar
outputs. These functions will be the backbone of your statistical forecast toolbox, as by
keeping them consistent you will be able to use them in an optimization engine, as shown
in Chapter 6.
We will define a function moving_average(d, extra_periods=1, n=3) that takes three
inputs:
d A time series that contains the historical demand (can be a list or a NumPy array)
extra_periods The number of periods we want to forecast in the future
n The number of periods we will average
15 f[t+1:] = np.mean(d[t-n+1:t+1])
16
20 return df
If you are new to Python, you will notice that we have introduced two special
elements in our code:
np.nan is a way to represent something that is not a number (nan stands for
not a number).
In our function, we used np.nan to store dummy values in our array f, until they
get replaced by actual values. If we had initialized f with actual digits (e.g., ones
or zeros), this could have been misleading as we wouldn’t know if these ones or
zeros were actual forecast values or just the dummy initial ones.
np.full(shape,value) this function will return an array of a certain shape,
filled in with the given value.
In our function, we used np.full() to create our forecast array f.
You can easily plot any DataFrame simply by calling the method .plot() on it. Typically,
if you want to plot the demand and the forecast that we just populated, you can simply
type:
1 df[['Demand','Forecast']].plot()
By default, .plot() will use the DataFrame index as the x axis. Therefore, if you want to
display a legend on the x axis, you simply can name the DataFrame index.
1 df.index.name = 'Period'
30
Moving average
Demand
Forecast
20
10
0
0 5 10 15 20 25
Period
Figure 1.3: .plot() output.
As you can see in Figure 1.3, the future forecast (as of period 20) is flat. As discussed,
this is due to the fact that this moving average model does not see a trend and, therefore,
can’t project any.
Chapter 2
Forecast KPI
Accuracy
The accuracy of your forecast measures how much spread you had between your
forecasts and the actual values. The accuracy gives an idea of the magnitude of
the errors, but not their overall direction.
12 2 Forecast KPI
Bias
The bias represents the overall direction of the historical average error. It measures
if your forecasts were on average too high (i.e., you overshot the demand) or too
low (i.e., you undershot the demand).
Of course, as you can see in Figure 2.1, what we want to have is a forecast that is both
accurate and unbiased.
et = ft − dt
Note that with this definition, if the forecast overshoots the demand, the error will be
positive; if the forecast undershoots the demand, the error will be negative.
2.2 Bias 13
DIY
Excel You can easily compute the error as the forecast minus the demand.
Starting from our example from Section 1.3, you can do this by inputting =C5-B5 in cell
D5. This formula can then be dragged onto the range C5:D11.
Python You can access the error directly via df['Error'] as it is included in the
DataFrame returned by our function moving_average(d).
2.2 Bias
The (average) bias of a forecast is defined as its average error.
1X
bias = et
n n
Where n is the number of historical periods where you have both a forecast and a demand
(i.e., periods where an error can be computed).
The bias alone won’t be enough to evaluate your forecast accuracy. Because a positive
error in one period can offset a negative error in another period, a forecast model can
achieve very low bias and not be accurate at the same time. Nevertheless, a highly biased
forecast is already an indication that something is wrong in the model.
Scaling the Bias The bias computed as in the formula above will give you an absolute
value like 43 or -1400. As a demand planner investigating your product forecasts, you
should ask yourself the following question: Is 43 a good bias? Without information about
the product’s average demand, you cannot answer this question. Therefore, a more relevant
KPI would be the scaled bias (or normalized bias). We can compute it by dividing the total
error by the total demand (which is the same as dividing the average error by the average
demand).
1
P P
et et
bias% = 1 P =
n P
n dt dt
Attention Point
Pro-Tip
It usually brings no insights to compute the bias of one item during one period.
You should either compute it for many products at once (during one period) or
compute it for a single item over many periods (best to perform its computation
over a full season cycle).
When computing the scaled bias, it is important to divide the bias by the average demand
during the periods where we could measure forecast error. We define the relevant demand
average in line 2, where we select only the rows where an error is defined.
Finally, you should obtain these results:
1 d = [37, 60, 85, 112, 132, 145, 179, 198, 150, 132]
2 df = moving_average(d, extra_periods=4, n=3)
3 kpi(df)
4 >> Bias: 22.95, 15.33%
2.3 MAPE
The Mean Absolute Percentage Error (or MAPE) is one of the most commonly used
KPIs to measure forecast accuracy. MAPE is computed as the average of the individual
absolute errors divided by the demand (each period is divided separately). To put it simply,
it is the average of the percentage absolute errors.
1 X |et |
MAPE =
n n dt
MAPE is a strange forecast KPI. It is quite well-known among business managers, despite
being a really poor accuracy indicator. As you can see in the formula, MAPE divides each
error individually by the demand, so it is skewed: high errors during low-demand periods
will have a major impact on MAPE. You can see this from another point of view: if you
choose MAPE as an error KPI, an extremely low forecast (such as 0) can only result in a
maximum error of 100%, whereas any too-high forecast will not be capped to a specific
percentage error. Due to this, optimizing MAPE will result in a strange forecast that will
most likely undershoot the demand. Just avoid it. If MAPE is mentioned in this book, it
is not to promote its use, but as a plea not to use it.
16 2 Forecast KPI
Going Further
You can use an array formula to compute the MAPE without the two extra error
columns. You can define cell J3 as:
J3 = AVERAGE(ABS(D5:D11)/B5:B11)
If you are not familiar with Excel array formulas, these are formulas that can perform
operations over multiple cells (hence the term array ). In order to use one, simply
type your formula (for example, the one above), then validate the cell by pressing
CTRL+SHIFT+ENTER (and not simply ENTER). If the formula is properly validated, you
should see it being surrounded by { }. Array formulas are powerful tools in Excel,
but can be confusing for the users. Use them with care.
2.4 MAE 17
Note that, unlike for the bias, here we don’t need to worry about selecting the proper
demand range to compute the MAPE. As we divide df['Error'] directly by df['Demand']
before computing the mean, pandas by default is removing the rows where the demand is
not defined.
You should obtain the following results:
1 kpi(df)
2 >> Bias: 22.95, 15.33%
3 >> MAPE: 29.31%
2.4 MAE
The Mean Absolute Error (MAE) is a very good KPI to measure forecast accuracy. As
the name implies, it is the mean of the absolute error.
1X
MAE = |et |
n n
As for the bias, the MAE is an absolute number. If you are told that MAE is 10 for a
particular item, you cannot know if this is good or bad. If your average demand is 1,000,
it is, of course, astonishing, but if the average demand is 1, an MAE of 10 is a very poor
accuracy. To solve this, it is common to divide MAE by the average demand to get a
scaled percentage:
1
P P
|et | |et |
MAE% = 1 P n
= P
n dt dt
18 2 Forecast KPI
Attention Point
Many practitioners use the MAE formula and call it MAPE. This can cause a lot of
confusion. When discussing forecast error with someone, I advise you to explicitly
specify how you compute the forecast error to be sure to compare apples with
apples.
2.5 RMSE
The Root Mean Square Error (RMSE) is a difficult KPI to interpret, as it is defined as
the square root of the average squared forecast error. Nevertheless, it can be very helpful,
as we will see later. s
1X 2
RMSE = e
n n t
Just as for MAE, RMSE is not scaled to the demand, so it needs to be put in percentages
to be understandable. We can then define RMSE% as:
q P
1
n et2
RMSE% = 1 P
n dt
Actually, many algorithms—especially for machine learning—are based on the Mean Square
Error (MSE), which is directly related to RMSE.
1X 2
MSE = e
n n t
Many algorithms use MSE instead of RMSE since MSE is faster to compute and easier to
manipulate. But it is not scaled to the original error (as the error is squared), resulting in
a KPI that we cannot relate to the original demand scale. Therefore, we won’t use it to
evaluate our statistical forecast models.
J5 = SQRT(AVERAGE(G5:G11))
20 2 Forecast KPI
The RMSE% can be computed in cell K5 by dividing the RMSE by the average demand:
K5 = J5/AVERAGE(B5:B11)
Period 1 2 3 4 5 6 7 8 9 10 11 12
Demand 10 12 14 8 9 5 8 10 12 11 10 15
Period 1 2 3 4 5 6 7 8 9 10 11 12
Demand 10 12 14 8 9 5 8 10 12 11 10 15
Forecast #1 12 14 15 10 7 4 5 8 12 14 13 8
Error #1 2 2 1 2 -2 -1 -3 -2 0 3 3 -7
Forecast #2 12 14 15 10 7 4 5 8 12 14 13 9
Error #2 2 2 1 2 -2 -1 -3 -2 0 3 3 -6
The only difference in the two datasets is the forecast on the latest demand observation:
forecast #1 undershot it by 7 units and forecast #2 undershot it by only 6 units. Note
that for both forecasts, period 12 is the worst period in terms of accuracy. If we look at
the KPI of these two forecasts, this is what we obtain:
What is interesting here is that by just changing the error of this last period (the one with
the worst accuracy) by a single unit, we decrease the total RMSE by 6.9% (2.86 to 2.66),
but MAE is only reduced by 3.6% (2.33 to 2.25), so the impact on MAE is nearly twice as
low. Clearly, RMSE puts much more importance on the largest errors, whereas MAE gives
the same importance to each error. You can try this for yourself and reduce the error of
one of the most accurate periods to observe the impact on MAE and RMSE.
Spoiler: There is nearly no impact on RMSE.1
As we will see later, RMSE has some other very interesting properties.
1 Remember, RMSE is not so much impacted by low forecast error. So reducing the error of the period
that has already the lowest forecast error won’t significantly impact RMSE.
22 2 Forecast KPI
W1 W2 W3 W4 W5
Mon 3 3 4 1 5
Tue 1 4 1 2 2
Wed 5 5 1 1 12
Thu 20 4 3 2 1
Fri 13 16 14 5 20
Now let’s imagine we propose three different forecasts for this product. The first one
predicts 2 pieces/day, the second one 4 and the last one 6. Let’s plot the actual demand
and the forecasts in Figure 2.6.
Demand
Forecast #1
Forecast #2
Forecast #3
Volume
Period
Figure 2.6: Demand and forecasts.
You can see in Table 2.1 how each of these forecasts performed in terms of bias, MAPE,
MAE, and RMSE on the historical period. Forecast #1 was the best during the historical
periods in terms of MAPE, forecast #2 was the best in terms of MAE, and forecast #3
was the best in terms of RMSE and bias (but the worst on MAE and MAPE).
2.6 Which Forecast KPI to Choose? 23
RMSE
2 The median is the value for which half the dataset is higher and half of the dataset is lower.
24 2 Forecast KPI
If you set MSE as a target for your forecast model, it will minimize it. You can minimize
a mathematical function by setting its derivative to zero. Let’s try this.
∂ MSE ∂ n1 (ft − dt )2
P
=
∂f ∂f
2 X
(ft − dt ) = 0
n
X X
ft = dt
Conclusion To optimize a forecast’s (R)MSE, the model will have to aim for the total
forecast to be equal to the total demand. That is to say that optimizing (R)MSE aims to
produce a prediction that is correct on average and, therefore, unbiased.
MAE
Conclusion To optimize MAE (i.e., set its derivative to 0), the forecast needs to be as
many times higher than the demand as it is lower than the demand. In other words, we are
looking for a value that splits our dataset into two equal parts. This is the exact definition
of the median.
MAPE
Unfortunately, the derivative of MAPE won’t show some elegant and straightforward prop-
erty. We can simply say that MAPE is promoting a very low forecast as it allocates a high
weight to forecast errors when the demand is low.
2.6 Which Forecast KPI to Choose? 25
Conclusion
As we saw in the previous section, we have to understand that a big difference lies in the
mathematical roots of RMSE, MAE, and MAPE. The optimization of RMSE will seek to
be correct on average. The optimization of MAE will try to overshoot the demand as often
as undershoot it, which means targeting the demand median. Finally, the optimization of
MAPE will result in a biased forecast that will undershoot the demand. In short, MAE is
aiming at demand median, and RMSE is aiming at demand average.
Bias
For many products, you will observe that the median demand is not the same as the average
demand. The demand will most likely have some peaks here and there that will result in
a skewed distribution. These skewed demand distributions are widespread in supply chain,
as the peaks can be due to periodic promotions or clients ordering in bulk. This will cause
the demand median to be below the average demand, as shown in Figure 2.7.
Probability
Average
Median
Probability
Demand
Figure 2.7: Median vs. Average.
26 2 Forecast KPI
This means that a forecast that is minimizing MAE will result in a bias, most often
resulting in an undershoot of the demand. A forecast that is minimizing RMSE will not
result in bias (as it aims for the average). This is definitely MAE’s main weakness.
Sensitivity to Outliers
As we discussed, RMSE gives a bigger importance to the highest errors. This comes at
a cost: a sensitivity to outliers. Let’s imagine an item with a smooth demand pattern as
shown in Table 2.2.
Table 2.2: Demand without outliers (median: 8.5, average: 9.5).
Period 1 2 3 4 5 6 7 8 9 10
Demand 16 8 12 9 6 12 5 7 6 14
The median is 8.5 and the average is 9.5. We already observed that if we make a forecast
that minimizes MAE, we will forecast the median (8.5) and we would be on average
undershooting the demand by 1 unit (bias = -1). You might then prefer to minimize
RMSE and to forecast the average (9.5) to avoid this situation. Nevertheless, let’s now
imagine that we have one new demand observation of 100, as shown in Table 2.3.
The median is still 8.5—it hasn’t changed!—but the average is now 18.1. In this case,
you might not want to forecast the average and might revert back to a forecast of the
median.
Generally speaking, the median is more robust to outliers than the average. In a supply
chain environment, this is important because we can face many outliers due to demand
peaks (marketing, promotions, spot deals) or encoding mistakes. We will discuss outliers
further in Chapter 10.
Intermittent Demand
recognizable pattern. The client always orders the product in batches of 100. We then
have an average weekly demand of 33 pieces and a demand median of... 0.
We have to populate a weekly forecast for this product. Let’s imagine we do a first forecast
that aims for the average demand (33 pieces). Over the long-term, we will obtain a total
squared error of 6 667 (RMSE of 81.6), and a total absolute error of 133.
Now, if we forecast the demand median (0), we obtain a total absolute error of 100 (MAE
of 33) and a total squared error of 10.000 (RMSE of 58).
As we can see, MAE is a bad KPI to use for intermittent demand. As soon as you have
more than half of the periods without demand, the optimal forecast is... 0!
Going Further
Conclusion
MAE provides protection against outliers, whereas RMSE provides the assurance to get an
unbiased forecast. Which indicator should you use? There is, unfortunately, no definitive
answer. As a supply chain data scientist, you should experiment: if using MAE as a KPI
results in a high bias, you might want to use RMSE. If the dataset contains many outliers,
resulting in a skewed forecast, you might want to use MAE.
28 2 Forecast KPI
Note as well that you can choose to report forecast accuracy to management
using one or more KPIs (typically MAE and bias), but use another one (RMSE?)
to optimize your models. We will further discuss how to manage the forecasting
process in Part III.
Part II
Machine Learning
Chapter 12
Machine Learning
Tell us what the future holds, so we may know that you are gods.
Isaiah 41:23
For a machine learning algorithm to learn how to make predictions, we will have to feed it
with both the inputs and the desired respective outputs. It will then automatically under-
stand the relationships between these inputs and outputs.
Another important difference between using machine learning and exponential smoothing
models to forecast our demand is that machine learning algorithms will learn patterns
from our entire dataset. Exponential smoothing models will treat each item individually
and independently from the others. Because it uses the entire dataset, a machine learning
algorithm will apply what works best to each product. One could improve the accuracy
of an exponential smoothing model by increasing the length of each time series (i.e.,
providing more historical periods for each product). Using machine learning, we will be able
to increase our model’s accuracy by providing more of the products’ data to be ingested
by the model.
Welcome to the world of machine learning.
For our forecasting problem, we will basically show our machine learning algorithm different
extracts of our historical demand dataset as inputs and, as a desired output, what the
very next demand observation was. In our example in Table 12.1, the algorithm will learn
the relationship between the last four quarters of demand, and the demand of the next
quarter. The algorithm will learn that if we have 5, 15, 10, and 7 as the last four demand
observations, the next demand observation will be 6, so that its prediction should be 6.
12.2 Data Preparation 121
Next to the data and relationships from product #1, the algorithm will also learn from
products #2, #3, and #4. In doing so, the idea is for the model to use all the data
provided to give us better forecasts.
Most people will react to this idea with two very different thoughts. Either people will think
that “it is simply impossible for a computer to look at the demand and make a prediction”
or that ‘‘as of now, the humans have nothing left to do.” Both are wrong.
As we will see later, machine learning can generate very accurate predictions. And as the
human controlling the machine, we still have to ask ourselves many questions, such as:
- Which data should we feed to the algorithm for it to understand the proper relation-
ships? We will discuss how to include other data features in Chapters 20, 22, and
23; and how to select the relevant ones in Chapters 18 and 24.
- Which machine learning algorithm should be used? There are many different ones:
we will discuss new models in Chapters 13, 15, 17, 19, 21, and 25.
- Which parameters should be used in our model? As you will see, each machine
learning algorithm has some parameters that we can tweak to improve its accuracy
(see Chapter 14).
As always, there is no definitive, one-size-fits-all answer. Experimentation will help you find
what is best for your dataset.
Naming Convention During our data cleaning process, we will use the standard data
science notation and call the inputs X and the outputs Y . Specifically, the datasets X_train
and Y_train will contain all the historical demand that we will use to train our algorithm
(X_train being the inputs and Y_train the outputs). The datasets X_test and Y_test
will be used to test our model.
You can see in Table 12.2 an example of a typical historical demand dataset you should
have at the beginning of a forecasting project.
We now have to format this dataset to something similar to Table 12.1. Let’s say for now
that we want to predict the demand of a product during one quarter based on the de-
mand observations of this product during the previous four quarters.1 We will populate the
datasets X_train and Y_train by going through the different products we have, and, each
time, create a data sample with four consecutive quarters as X_train and the following
quarter as Y_train. This way, the machine learning algorithm will learn the relationship(s)
between one quarter of demand and the previous four.
You can see in Table 12.3 an illustration for the first iterations. Loop #1 uses Y1Q1 to
Y1Q4 to predict Y2Q1, Loop #2 is shifted by one quarter: we use Y1Q2 to Y2Q1 to
forecast Y2Q2, etc.
Our X_train and Y_train datasets will look like Table 12.4.
Remember that our algorithm will learn relationships in X_train to predict Y_train. So
we could write that as X_train → Y_train.
In order to validate our model, we need to keep a test set aside from the training set.
Remember, the data in the test set won’t be shown to the model during its training phase.
The test set will be kept aside and used after the training to evaluate the model accuracy
against unseen data (as a final test).2
2 See Chapters 4 and 8 for a discussion about training and test sets, as well as underfitting and
overfitting, respectively.
12.2 Data Preparation 123
In our example, if we keep the last loop as a test set (i.e., using demand from Y2Q4 to
Y3Q3 to predict Y3Q4) we would have a test set as shown in Table 12.5.
That means that our algorithm won’t see these relationships during its training phase.
It will be tested on the accuracy it achieved on these specific prediction exercises. We
will measure its accuracy on this test set and assume its accuracy will be similar when
predicting future demand.
Dataset Length
It is important for any machine learning exercise to pay attention to how much data is fed
to the algorithm. The more, the better. On the other hand, the more periods we use to
make a prediction (we will call this x_len), the fewer we will be able to loop through the
dataset. Also, if we want to predict more periods at once (y_len), it will cost us a part of
the dataset, as we need more data (Y_train is longer) to perform one loop in our dataset.
Typically, if we have a dataset with n periods, we will be able to make 1+n−x_len−y_len
loops through it.
loops = 1 + n − x_len − y_len
Also keep in mind that you will have to keep some of those loops aside as a test set.
Optimally, you should have enough loops of test set to cover a full season (so that you
124 12 Machine Learning
are sure that your algorithm captures all demand patterns properly).
For the training set, a best practice is to keep—at the very least—enough runs to loop
through two full years. Overall, for a monthly dataset, you should then have at least
35 + x_len + y_len periods (as shown below) in order for the algorithm to have two full
seasonal cycles to learn any possible relationships and a full season to be tested on.
Data Collection
The dataset creation and cleaning is a crucial part of any data science project. To il-
lustrate all the models we will create in the next chapters, we will use the historical
sales of cars in Norway from January 2007 to January 2017 as an example dataset. You
can download this dataset on supchains.com/download.3 You will get a csv file called
norway_new_car_sales_by_make.csv. This dataset, graphed in Figure 12.1, contains the
sales of 65 carmakers across 121 months. On average, a bit more than 140,000 new cars
are sold in Norway per year. If we assume that the price of a new car is around $30,000
on average in Norway, the market can be roughly estimated to be worth $4B.
Attention Point
This dataset is modest in terms of size, allowing for fast training and manipulation.
This is perfect for a learning exercise and trying out new models and ideas. But this
dataset is not big enough to showcase all the power of advanced machine learning
models—they should show better results on bigger datasets.
3 The data is compiled by the Opplysningsrådet for Veitrafikken (OFV), a Norwegian organization in
the automotive industry. It was initially retrieved by Dmytro Perepølkin and published on Kaggle.
12.3 Do It Yourself – Datasets Creation 125
150,000
125,000
100,000
Cars sold
75,000
50,000
25,000
0
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
Year
Figure 12.1: Cars sold per year in Norway.
In the next chapters, we will discuss various different models and apply them to
this example dataset. But what we are actually interested in is your own dataset.
Do not waste any time, and start gathering some historical demand data so that
you can test the following models on your own dataset as we progress through the
different topics. It is recommended that you start with a dataset of at least 5 years
of data and more than a hundred different products. The bigger, the better.
We will discuss in the following chapters many models and ideas and apply them to
this dataset. The thought processes we will discuss can be applied to any demand
dataset; but the results obtained, as well as the parameters and models selected,
won’t be the same on another dataset.
We will make a function import_data() that will extract the data from this csv and format
it with the dates as columns and the products (here car brands) as lines.
1 def import_data():
2 data = pd.read_csv('norway_new_car_sales_by_make.csv')
3 data['Period'] = data['Year'].astype(str) + '-' + data['Month'].astype(str) c
,→ .str.zfill(2)
4 df = pd.pivot_table(data=data,values='Quantity',index='Make',columns='Period', c
,→ aggfunc='sum',fill_value=0)
5 return df
126 12 Machine Learning
use later (from the scikit-learn library) will need 1D arrays if we forecast only one
period.
We can now easily call our new function datasets(df) as well as import_data().
1 df = import_data()
2 X_train, Y_train, X_test, Y_test = datasets(df, x_len=12, y_len=1, test_loops=12)
We obtain the datasets we need to feed our machine learning algorithm (X_train and
Y_train) and the datasets we need to test it (X_test and Y_test). Note that we set
test_loops as 12 periods. That means that we will test our algorithm over 12 different
loops (i.e., we will predict 12 times the following period [y_len=1] based on the last 12
periods [x_len=12]).
Forecasting Multiple Periods at Once You can change y_len if you want to forecast
multiple periods at once. In the following chapters, we will keep y_len = 1 for the sake of
simplicity.
Future Forecast If test_loops==0, the function will return the latest demand observa-
tions in X_test (and Y_test set as a dummy value), which will allow you to populate the
future forecast, as we will see in Section 12.5.
What About Excel? So far, Excel could provide us an easy way to see the data and
our statistical model relationships. But it won’t get us any further. Unfortunately, Excel
does not provide the power to easily format such datasets into the different parts we need
(X_train, X_test, Y_train, and Y_test). Moreover, with most of our machine learning
models, the dataset size will become too large for Excel to handle correctly. Actually,
another major blocking point is that Excel does not provide any machine learning algorithm.
As you can see, we created a model object called reg that we fit to our training data
(X_train, Y_train), thanks to the method .fit() (remember, we should fit our model to
the training set and not to the test set). We then populated a prediction based on X_train
and X_test via the method .predict().
KPI Function
We can now create a KPI function kpi_ML() that will display the accuracy of our model.
This function will be similar to the one defined in Chapter 2. Here, we use a DataFrame
in order to print the various KPI in a structured way.
1 def kpi_ML(Y_train, Y_train_pred, Y_test, Y_test_pred, name=''):
2 df = pd.DataFrame(columns = ['MAE','RMSE','Bias'],index=['Train','Test'])
3 df.index.name = name
4 df.loc['Train','MAE'] = 100*np.mean(abs(Y_train - Y_train_pred))/np.mean(Y_train)
5 df.loc['Train','RMSE'] = 100*np.sqrt(np.mean((Y_train - Y_train_pred)**2))/np c
,→ .mean(Y_train)
6 df.loc['Train','Bias'] = 100*np.mean((Y_train - Y_train_pred))/np.mean(Y_train)
7 df.loc['Test','MAE'] = 100*np.mean(abs(Y_test - Y_test_pred))/np.mean(Y_test)
8 df.loc['Test','RMSE'] = 100*np.sqrt(np.mean((Y_test - Y_test_pred)**2))/np c
,→ .mean(Y_test)
9 df.loc['Test','Bias'] = 100*np.mean((Y_test - Y_test_pred))/np.mean(Y_test)
10 df = df.astype(float).round(1) #Round number for display
11 print(df)
You can see that the RMSE is much worse than the MAE. This is usually the case, as
demand datasets contain a few exceptional values that drive the RMSE up.5
Car sales per month, per brand, and at national level are actually easy to forecast, as those
demand time series are stable without much seasonality. This is why the linear benchmark
provides such good results here. On top of that, we only predict one month at a time
and linear approximation works well in the short term. On a different dataset, with a
longer forecast horizon (and more seasonality), linear regressions might not be up to the
challenge. Therefore, don’t be surprised to face much worse results on your own datasets.
We will discuss in the following chapters much more advanced models that will be able
to keep up with longer-term forecasting horizons, seasonality, and various other external
drivers.
Attention Point
Do not worry if the MAE and RMSE of your dataset is much above the example
benchmark presented here. In some projects, I have seen MAE as high as 80 to
100%, and RMSE well above 500%. Again, we use a linear regression benchmark
precisely to get an order of magnitude of the complexity of a dataset.
The DataFrame forecast now contains the forecast for the future periods.
5 See Chapter 2 for a thorough discussion about forecast KPI and the impact of outliers and extreme
values.
12.5 Do It Yourself – Future Forecast 131
1 print(forecast.head())
2 >> 0
3 >> Make
4 >> Alfa Romeo 6.187217
5 >> Aston Martin 1.032483
6 >> Audi 646.568622
7 >> BMW 1265.032834
8 >> Bentley 1.218092
Now that we have a proper dataset, a benchmark to beat, and we know how to generate
a future forecast, let’s see how far machine learning can get us.
BIBLIOGRAPHY 295
Bibliography
Baraniuk, C. (2015). The cyborg chess players that can’t be beaten. BBC.
https://www.bbc.com/future/article/20151201-the-cyborg-chess-players-
that-cant-be-beaten. Online; accessed 21 July 2020.
Breiman, L. et al. (1998). Arcing classifier (with discussion and a rejoinder by the author).
The annals of statistics, 26(3):801–849. Online; accessed 22 May 2020.
Cauchy, A. (1847). Méthode générale pour la résolution des systemes d’équations simul-
tanées. Comp. Rend. Sci. Paris, 25(1847):536–538.
Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Pro-
ceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.
Cunff, A.-L. L. (2019). Confirmation bias: believing what you see, seeing what you believe.
https://nesslabs.com/confirmation-bias. Online; accessed 23 July 2020.
Fildes, R. and Goodwin, P. (2007). Good and bad judgement in forecasting: Lessons from
four companies. Foresight, 8:5–10.
296 BIBLIOGRAPHY
Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely randomized trees. Machine
Learning, 63(1):3–42.
Gilliland, M. (2010). The Business Forecasting Deal: Exposing Myths, Eliminating Bad
Practices, Providing Practical Solutions. John Wiley & Sons, Hoboken, N.J.
Hewamalage, H., Bergmeir, C., and Bandara, K. (2020). Recurrent neural networks for
time series forecasting: Current status and future directions. International Journal of
Forecasting.
Insights, M. T. R. (2019). The AI effect: How artificial intelligence is making health care
more human. MIT Technology Review Insights. https://www.technologyreview.com/
hub/ai-effect/. Online; accessed 21 July 2020.
Kingma, D. P. and Ba, J. (2015). Adam: a method for stochastic optimization. Interna-
tional Conference on Learning Representations, pages 1–13.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in Neural Information Processing Systems,
pages 1097–1105.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and
Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition.
Neural Computation, 1(4):541–551.
Levenson, R. M., Krupinski, E. A., Navarro, V. M., and Wasserman, E. A. (2015). Pigeons
(columba livia) as trainable observers of pathology and radiology breast cancer images.
PLOS ONE, 10(11):1–21.
Morgan, J. N. and Sonquist, J. A. (1963). Problems in the analysis of survey data, and a
proposal. Journal of the American Statistical Association, 58(302):415–434.
Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann
machines. In ICML, pages 807–814.
Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press. http://
neuralnetworksanddeeplearning.com/chap5.html. Online; accessed 16 August 2020.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,
M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau,
D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research, 12:2825–2830.
Silver, N. (2015). The Signal and the Noise: Why So Many Predictions Fail—but Some
Don’t. Penguin Books.
Stetka, B. (2015). Using pigeons to diagnose cancer. Scientific Ameri-
can. https://www.scientificamerican.com/article/using-pigeons-to-diagnose-
cancer/. Online; accessed 26 May 2020.
Tetlock, P. E. and Gardner, D. (2016). Superforecasting: The Art and Science of Predic-
tion. Broadway Books.
Times, T. N. Y. (1958). New navy device learns by doing. The New York
Times. https://www.nytimes.com/1958/07/08/archives/new-navy-device-learns-
by-doing-psychologist-shows-embryo-of.html. Online; accessed 16 August 2020.
Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
Science, 185(4157):1124–1131.
Vandeput, N. (2020). Inventory Optimization: Models and Simulation. De Gruyter.
VanderPlas, J. (2016). In-depth: Kernel density estimation. https://jakevdp.github.io/
PythonDataScienceHandbook/05.11-k-means.html. Online; accessed 09 June 2020.
Vincent, J. (2019). Microsoft invests $1 billion in OpenAI to pursue holy grail of
artificial intelligence. The Verge. https://www.theverge.com/2019/7/22/20703578/
microsoft-openai-investment-partnership-1-billion-azure-artificial-
general-intelligence-agi. Online; accessed 18 August 2020.
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D.,
Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M.,
Wilson, J., Jarrod Millman, K., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R.,
Larson, E., Carey, C., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D.,
Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald,
A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and Contributors (2020). SciPy
1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods,
17:261–272.
Wallis, K. F. (2014). Revisiting francis galton’s forecasting competition. Statistical Sci-
ence, pages 420–424.
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. Quarterly
Journal of Experimental Psychology, 12(3):129–140.
Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the be-
havioral sciences. PhD dissertation, Harvard University.
Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages.
Management Science, 6(3):324–342.
Glossary 301
Glossary
accuracy The accuracy of your forecast measures how much spread you had between
your forecasts and the actual values. The accuracy of a forecast gives an idea of the
magnitude of the errors but not their overall direction. See page 11
alpha A smoothing factor applied to the demand level in the various exponential
smoothing models. In theory: 0 < α ≤ 1; in practice: 0 < α ≤ 0.6. See page 30
array A data structure defined in NumPy. It is a list or a matrix of numeric values. See
page 288
bagging Bagging (short word for Bootstrap Aggregation) is a method for aggregating
multiple sub-models into an ensemble model by averaging their predictions with equal
weighting. See page 153
beta A smoothing factor applied to the trend in the various exponential smoothing
models. In theory: 0 < β ≤ 1; in practice: 0 < β ≤ 0.6. See page 45
bias The bias represents the overall direction of the historical average error. It measures
if your forecasts were on average too high (i.e., you overshot the demand) or too low
(i.e., you undershot the demand). See page 12
Boolean A Boolean is a value that is either True or False: 1 or 0. See page 222
boosting Boosting is a class of ensemble algorithms in which models are added se-
quentially, so that later models in the sequence will correct the predictions made by
earlier models in the sequence. See page 184
bullwhip effect The bullwhip effect is observed in supply chains when small variations
in the downstream demand result in massive fluctuations in the upstream supply chain.
See page 50
classification Classification problems require you to classify data samples in different
categories. See page 133
data leakage In the case of forecast models, a data leakage describes a situation where
a model is given pieces of information about future demand. See page 32
DataFrame A DataFrame is a table of data as defined by the pandas library. It is
similar to a table in Excel or an SQL database. See page 290
302 Glossary
demand observation This is the demand for a product during one period. For example,
a demand observation could be the demand for a product in January last year. See page
4
ensemble An ensemble model is a (meta-)model constituted of many sub-models. See
page 152
epoch One epoch consists, for the neural network learning algorithm, to run through
all the training samples. The number of epochs is the number of times the learning
algorithm will run through the entire training dataset. See page 262
Euclidean distance The Euclidean distance between two points is the length of a
straight line between these two points. See page 231
evaluation set An evaluation set is a set of data that is left aside from the training set
to be used as a monitoring dataset during the training. A validation set or a holdout
set can be used as an evaluation set. See page 210
feature A feature is a type of information that a model has at its disposal to make a
prediction. See page 134
gamma A smoothing factor applied to the seasonality (either additive or multiplicative)
in the triple exponential smoothing models. In theory: 0 < γ ≤ 1; in practice: 0.05 <
γ ≤ 0.3. See page 75
holdout set Subset of the training set that is kept aside during the training to validate
a model against unseen data. The holdout set is made of the last periods of the training
set to replicate a test set. See page 178
inertia In a K-means model, the inertia is the sum of the distances between each data
sample and its associated cluster center. See page 231
instance An (object) instance is a technical term for an occurrence of a class. You
can see a class as a blueprint and an instance of a specific realization of this blueprint.
The class (blueprint) will define what each instance will look like (which variables will
constitute it) and what it can do (what methods or functions it will be able to perform).
See page 137
level The level is the average value around which the demand varies over time. See
page 29
Mean Absolute Error MAE = n1 |et | See page 17
P
P |et |
Mean Absolute Percentage Error MAP E = n1 dt See page 15
1
et See page 19
P 2
Mean Square Error MSE = n
naïve forecast The simplest forecast model: it always predicts the last available ob-
servation. See page 4
noise In statistics, the noise is an unexplained variation in the data. It is often due to
the randomness of the different processes at hand. See page 5
NumPy One of the most famous Python libraries. It is focused on numeric computa-
tion. The basic data structure in NumPy is an array. See page 288
Glossary 303
Index
AdaBoost, 184, 186–188, 190, 192, 193 ETR, 165, 166, 168, 169, 173, 180,
compared to XGBoost, 207, 213, 193, 213, 214
214 computation time, 169, 192, 217
computation time, 188, 192, 193,
217 feature importance, 161–163, 171, 180,
learning rate, see learning rate, 242
AdaBoost plot, 163
multiple periods, 193 selection, 243
XGBoost, 208
bias, 11–15, 22, 24, 25, 27, 102, 138, forecast value added, 272, 278–281, 283
140, 275, 276, 278, 279, 281, forest, 140, 153–159, 161–163,
282 165–169, 171–173, 180, 183,
192, 193, 213, 214
categorical data, 219, 220, 222, 225, computation time, 156, 158, 169,
226, 239 192
computation time, 159, 179
AdaBoost, see AdaBoost, gradient boosting, 207, 213
computation time gradient descent, 254–256
ETR, see ETR, computation time
forest, see forest, computation time holdout set, 180
k-fold cross validation, 147, 148, as evaluation set, 212
176 compared to test set, 178, 179,
multi-threading, 147 181
XGBoost, see XGBoost, creation, 178, 181, 239
computation time
intermittent demand, 27
data leakage, 32, 48 KPI, 26, 27
seasonality, 81
early stopping
XGBoost, 215 judgmental forecast, 280, 281, see
ensemble, 151, 152, 183, 185, 214 Chapter 26
306 INDEX