0% found this document useful (0 votes)
213 views93 pages

Powell Modern Approach To Teaching Optimization Submitted November 9 2023

This document outlines a new approach to teaching an introductory optimization course. It proposes starting with basic machine learning problems to introduce students to stochastic optimization and sequential decision-making. The course would then progress from simple to more complex optimization problems, focusing on modeling and evaluating solutions rather than algorithm design. It aims to make optimization more accessible and relevant by beginning with familiar decision-making concepts before covering traditional topics like linear programming and integer programming. The document provides a detailed course outline and suggestions for lectures, readings, and tailoring the material based on student background.

Uploaded by

M s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views93 pages

Powell Modern Approach To Teaching Optimization Submitted November 9 2023

This document outlines a new approach to teaching an introductory optimization course. It proposes starting with basic machine learning problems to introduce students to stochastic optimization and sequential decision-making. The course would then progress from simple to more complex optimization problems, focusing on modeling and evaluating solutions rather than algorithm design. It aims to make optimization more accessible and relevant by beginning with familiar decision-making concepts before covering traditional topics like linear programming and integer programming. The document provides a detailed course outline and suggestions for lectures, readings, and tailoring the material based on student background.

Uploaded by

M s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

A Modern Approach to Teaching Introduction to

Optimization

Warren B. Powell
Professor Emeritus, Princeton University
Chief Innovation Officer, Optimal Dynamics
[email protected]

November 8, 2023
2

Summary
“Optimization” is widely taught in departments such as operations research, industrial
engineering, and sometimes applied math, as focusing on complex, multidimensional (and often
very high-dimensional) problems that can be formulated as linear, nonlinear or integer
programs. Introductory courses are often centered on linear programming, the simplex
algorithm and duality theory.

“Optimization” should be the study of making good decisions, and should start with the simplest
(but nontrivial) decisions that are familiar to every student. Linear programs solve a very tiny
fraction of decision problems, even in areas such as business where linear programming is
often taught. I will note that almost no-one in business without formal training in linear
programming has even heard the term. More distressingly, only a fraction of undergraduates or
masters students who take linear programming ever solve a linear program (even with a
package). And no-one outside a tiny core of specialists has ever programmed the simplex
algorithm.

This document outlines a new way of teaching optimization that starts with some basic machine
learning problems that are very familiar today given the attention that “AI” has attracted. Few
people recognize that basic machine learning is actually solving a stochastic optimization
problem by assuming we are given a training dataset. I leverage this idea to introduce students
to some simple sequential decision problems that involve decisions that everyone faces. We
make these decisions using methods (policies) that range from simple, parameterized rules or
functions that can be optimized exactly as we solve our machine learning problem.

I then present a series of topics that progress from simple problems that are familiar to
everyone, eventually reaching topics such as linear, integer, and nonlinear programming.
However, I minimize the attention given to the design of algorithms given that there are widely
available packages (it will be the rare beginning student who ever transitions to an algorithmic
designer). Instead, I focus on modeling and evaluating the solution which is typically done in
the presence of uncertainty.

This book is aimed at faculty who are already teaching an introductory optimization course, or
who have a background in optimization and are designing an optimization course. It is also
useful to anyone with conventional training in optimization, since it will show you how to think
about optimization differently. The presentation consists of a set of topics to guide the design of
lectures, leaving considerable flexibility in terms of how much emphasis is placed on individual
topics.

The presentation starts with sequential problems, since these are the simplest nontrivial
decision problems that are most familiar to students. It then transitions to classical static
optimization problems (linear, integer and nonlinear programming) but in each case we show
how static optimization models can often be viewed as methods for making decisions in a
sequential setting.
3

Table of contents
Contents
Summary ................................................................................................................................... 2
Table of contents ....................................................................................................................... 3
1.0 Introduction .......................................................................................................................... 1
2.0 Audience .............................................................................................................................. 3
2.1 Academic departments ..................................................................................................... 3
2.2 Students ........................................................................................................................... 4
2.3 Faculty .............................................................................................................................. 4
2.4 Students/professionals with prior optimization training ...................................................... 5
3.0 Course outline ...................................................................................................................... 6
4.0 Readings .............................................................................................................................11
5.0 Lectures ..............................................................................................................................12
Topic 0: Applications .............................................................................................................12
Topic 1: Machine learning......................................................................................................14
1.1 Linear models ..............................................................................................................14
1.2 Nonlinear models .........................................................................................................15
1.3 Neural networks (absolutely optional)...........................................................................18
1.4 Teaching notes for machine learning ...........................................................................19
Topic 2: Sequential decision problems I ................................................................................20
2.1 Introduction to sequential decision problems ................................................................20
2.2 Asset selling .................................................................................................................23
2.3 Inventory planning ........................................................................................................25
2.4 Teaching notes ............................................................................................................26
Topic 3: Adaptive optimization - the newsvendor problem .....................................................27
Topic 4: Optimal learning - Finding the best treatment...........................................................30
Topic 5: Shortest path problems ............................................................................................33
5.1 Static shortest paths.....................................................................................................33
5.2 Dynamic shortest paths ................................................................................................34
Topic 6 – General concepts ...................................................................................................37
6.1 Modeling static optimization problems ..........................................................................37
6.2 Modeling sequential decision problems ........................................................................39
4

6.3 Designing policies ........................................................................................................43


6.4 Evaluating policies .......................................................................................................45
Topic 7 – Linear programming ...............................................................................................47
7.1 As a static problem – The simplex algorithm I ..............................................................47
7.2 The simplex algorithm II – with the matrix linear algebra ..............................................53
7.3 As a policy for a dynamic problem................................................................................62
Topic 8: Dynamic inventory problems - Energy storage .........................................................66
Topic 9: Integer programming ................................................................................................70
9.1 Static facility location ....................................................................................................70
9.2 Types of integer programs ...........................................................................................73
Topic 10: Dynamic facility location .........................................................................................74
10.1 Notation .....................................................................................................................74
10.2 Single-period model with uncertain demands .............................................................76
10.3 Evaluating the policy for a multiperiod problem ..........................................................79
10.4 Alternative facility location policies .............................................................................81
Topic 11: Nonlinear programming..........................................................................................85
11.1 Static portfolio optimization ........................................................................................85
11.2 Dynamic portfolio optimization....................................................................................87
1.0 Introduction
Making decisions is a universal human activity, something we all have done since we are born.
Making good decisions is how we perform better, whether we are running a business, managing
a health or energy system, designing and controlling a supply chain, inventing new products and
materials, or creating new drugs. Optimization, I claim, is the science of making the best
decisions that we can.

The academic community has largely limited the scope of optimization to pet methods for
different communities. Industrial engineering and operations research equate optimization with
linear programming (as a starter), progressing into even more specialized fields such as
nonlinear and integer programming. Dynamic programming is considered a very advanced
topic, rarely taught at the undergraduate level. Faculty in engineering (mechanical, electrical,
chemical) and economics focus on the field of optimal control, a close cousin of dynamic
programming, typically using fairly advanced mathematics. Computer scientists will either study
combinatorics (a close cousin of integer programming) or, more recently, reinforcement learning
(a cousin of dynamic programming). All of these approaches to optimization are typically taught
at a fairly advanced level.

I am going to suggest a new way to teach introductory optimization to undergraduates and


masters that is far more useful and addresses the modeling and algorithmic challenges that
arise in decision problems that occur in practice. To do this, I am going to take on two of the
titans of optimization: George Dantzig and Richard Bellman. In this course, we will still teach
linear programming, but it is demoted to a method that is used only for a small number of
problems, and while I will illustrate the simplex algorithm using a network problem (where the
steps of the simplex algorithm can be described visually), I do not drag students through the
simplex algorithm for general linear programs.

By contrast, we will pay considerably more attention to sequential decision problems (also
known as dynamic programs), but we will largely ignore Bellman’s equation (or Hamilton-Jacobi
equations). Also, we note that the origins of linear programming were in the context of
sequential decision problems, and we will make the distinction between linear programming as
a static problem, and as a policy for sequential decision problems. We do the same for integer
and nonlinear programming.

My approach will be to start with the simplest nontrivial problems, and progress to more
complex settings. We start with some basic machine learning problems (with linear and
nonlinear models) that lay the foundation for modeling and solving sequential decision problems
(SDPs). SDPs, which usually involve making relatively simple decisions over time, are
problems that are pervasive in business, economics, engineering, the sciences, and even
everyday life. SDPs include problems where the decision may be binary (which web page
design to use, when to stop and sell an asset), discrete (choosing the best catalyst, drug,
product, path, supplier, employee, …), and scalar continuous (finding the best price, dosage,
2

temperature, budget). We make decisions using the three classes of policies that are widely
used in practice (albeit in an ad hoc way), but show students how to properly model and tune
these policies. And we do not use Bellman’s equation other than a brief illustration in a shortest
path problem. Bellman requires a level of sophistication for both modeling and algorithmic work
that is not suitable for an introductory course, and it is only useful for a very narrow subset of
problems.

In the process of using sequential decision problems, we are going to have to introduce the
dimension of modeling sequential information processes. While this potentially opens a
pandora’s box into the complex arena of stochastic modeling, we avoid this by exclusively
working with random samples, just as is done with machine learning where a training dataset is
a sample of the random observations.

The course is aimed at undergraduates (or masters) using a minimal amount of calculus or
linear algebra. We use very simple Monte Carlo simulation, but do not require a course in
probability or statistics. There are many opportunities for a faculty member to adjust the
examples and scope of the material to their own students. However, we urge faculty to resist
the typical style of teaching this material which emphasizes the methods that have a strong
theoretical foundation. This material can be taught at a very advanced level, but this course
focuses on teaching students how to think about making decisions.
3

2.0 Audience
This book was originally written to help schools teaching introductory optimization courses
adjust their pedagogical approach to a modern style that is much more useful and relevant to
students. For this purpose, I discuss below the audience from three perspectives: what
departments can use this approach, the students that I am targeting, and the faculty who might
be interested in teaching this.

A fourth audience is people who have already taken a traditional course in (typically
deterministic) optimization. Simply reading these notes will provide a different perspective that
draws on the skills you have already learned.

2.1 Academic departments


I think the style of this course can be used in any department that involves making quantifiable
decisions. This includes engineering (all departments), the physical sciences (laboratories are
full of sequential decision problems), the social sciences, economics, business schools, politics,
and psychology.

“Optimization” (linear, nonlinear and integer programming) is traditionally taught in operations


research, industrial engineering and applied math. In engineering there is much more emphasis
on control theory (a form of sequential decision problem), while economics (along with
advanced courses in OR and IE) will emphasize dynamic programming. Computer science for
the past decade has taught reinforcement learning, a field that addresses “Markov decision
processes” which is just a different form of control problem. These courses all tend to be taught
with a moderate to high level of mathematical sophistication, ignoring the reality that virtually
everyone needs to solve sequential decision problems.

I will note that none of these classical courses on optimization traditionally deal with what I call
“optimal learning” problems, which are problems where the decision controls what information to
collect to improve your understanding of a process so you can make better decisions in the
future. Optimal learning problems arise in laboratory experimentation (either physical
experiments or computer simulations) and a host of field settings (what is the best product to
recommend, what is the best price to charge, what is the best medical treatment, what process
to use to make a material, ….). I taught a course called Optimal Learning for 10 years at
Princeton at the undergraduate level. The course was quite popular, and deals with problems
that are much more familiar to our students than linear programs.

The course I am proposing covers both static problems (such as linear programs) as well as
sequential decision problems (which includes dynamic programming, optimal control and
reinforcement learning). Topics like linear programming can be ignored (for fields where these
problems simply do not arise), covered very briefly (a single lecture), or extensively (some
courses will spend 4-6 weeks just on linear programming).
4

The real novelty of our teaching style is how we approach sequential decision problems, which
arise in virtually every problem domain, and yet are often ignored in introductory courses in
optimization. Our approach to teaching this topic emphasizes practical solution methods that
reflect what is used in practice but placed in a framework that formalizes the evaluation and
tuning of policies.

Our point of departure from traditional approaches for teaching sequential decision problems is
that we largely ignore methods based on solving Bellman’s equation (known as Hamilton-Jacobi
equations in the optimal control field). Methods based on HJB equations are mathematically
elegant, but only apply to a very narrow set of problems, which is a reason why people routinely
make decisions over time who have never even heard of HJB equations.

[Side note: I spent 20 years of my career, and wrote a popular 500-page book, on approximate
dynamic programming (also known as reinforcement learning) which is a field that focuses on
methods for solving HJB equations approximately. My conclusion that this approach has limited
practical value is based on many years of research.]

2.2 Students
The course is aimed at undergraduates or masters students with no prior training in
optimization. The following skills will be useful:

• Students will need some calculus, but only at the level of understanding a derivative and
gradient (which, to be honest, can be presented very quickly). When we do use
derivatives/gradients, these can often be estimated numerically instead of using the
analytical formulas stressed in introductory calculus courses.
• We use a very modest amount of linear algebra – much less than traditional courses in
optimization. For example, there are perhaps two places where we use the concept of
an inverse of a matrix. Students without any prior training in linear algebra could be
taught the basic idea of a vector and matrix in a short tutorial session.
• We will occasionally use some very basic concepts from statistics, but we will do this in a
way that does not require a prior course in statistics. For example, it is very easy to
introduce a student to a mean and variance. In this course there is no need for
familiarity with different probability distributions.

2.3 Faculty
I am assuming that the faculty teaching this course are already trained in the core fields of
linear, nonlinear and integer programming if you wish to cover this material, but there are entire
fields (such as computer science) where students are typically not introduced to linear
programming. These topics are covered, but not nearly in the depth that the faculty member
5

might remember from their own training. Times have changed, and there is no longer a need to
teach, for example, algorithms for solving linear programs (packages will be used).

2.4 Students/professionals with prior optimization training


Reading (even skimming) these notes by someone who has already had a course in
(deterministic) optimization should change how they think the solution of the optimization model
should be evaluated. This means appreciating that in many (most? almost all?) settings,
deterministic optimization models are actually policies (methods for making decisions) that need
to be evaluated over time, under uncertainty. This then opens a door to improving the solution
of their deterministic model in terms of real-world performance.
6

3.0 Course outline


This section provides a sketch of the course. The material is divided into 11 topics, which are
described in much more depth in section 5.0. Here, I summarize each of the topics, focusing on
the development of key concepts.

The course will transition from the simplest (but nontrivial) decision problems to more complex
settings. We will start with basic machine learning problems partly because they are very
familiar today, but also because they are well motivated and easy to understand. The machine
learning problems will also lay the foundation for how we handle random observations in
sequential decision problems which are of fundamental importance (since so many real
problems are, in fact, sequential in nature).

The lectures are organized to follow a natural progression from simpler decisions (binary,
discrete, continuous scalar) to more complex ones (continuous vectors, integer variables,
nonlinear functions). Our emphasis is on formulating optimization models, including the critical
(but historically overlooked) problem of understanding how to handle decision problems when
they are made sequentially over time. We present algorithms when one or both of the following
applies:
• An understanding of the algorithm can help students appreciate the behavior of the
solution, even if they never implement an algorithm.
• Students may need to program the algorithm if they are to solve the problem (we limit
these to relatively simpler algorithms).

This document presents the course as a series of “topics” that can be adapted to the interests of
the faculty member, and the background and interests of the students. Most topics can be
presented in 1 to 3 lectures, but some topics can be extended to as many as 6-8 lectures
depending on the interests of the faculty member and the skills and background of the students.

Below I will list each topic and describe the key points being covered, emphasizing the transition
from simpler to more complex problems. This is a sharp departure from introductory courses in
“optimization” that turn out to be courses in “linear programming with extensions.”

• Topic 1 – Machine learning – We start with fitting a linear model to introduce the idea of
solving a convex optimization problem exactly. Students should learn the minimum
amount of data required to fit a linear model ( for example, 𝑛𝑛 ≥ 𝑝𝑝) in addition to other
conditions on the data. We then transition to nonlinear models and introduce gradient
search and the issue of multiple optima (a major topic with neural networks that are so
prominent today). Also note that with nonlinear estimation, we no longer require 𝑛𝑛 ≥ 𝑝𝑝,
implying that we can fit, say, a neural network with 100 million parameters with a single
datapoint.

A particularly important piece of pedagogy in Topic 1 is the idea (often overlooked) that
estimation problems are, in fact, stochastic optimization problems, where random
7

variables are replaced with sampled observations. This is going to set the style for
handling uncertainty which will run throughout our handling of sequential decision
problems. We note that this style allows students to do “stochastic optimization” without
any training in stochastic optimization, probability, or even a course in statistics.

• Topic 2 - Sequential decision problems – We next introduce the concept of a sequential


decision problem, and then use some important and visible problems to illustrate how to
model these. Most important is the concept of a policy which is a method (that is, a
function) for making decisions, that is controlled by tunable parameters. This closely
parallels fitting a model to data (as we do in Topic 1). We start with a basic example for
selling an asset that uses historical data, and then transition to an inventory problem
where we need to randomly generate observations. This is done using a very basic
introduction to Monte Carlo sampling.

The policies we introduce are both forms of policy function approximations (PFAs) which
are widely used by individuals as well as corporations. PFAs help us create a natural
bridge to estimating functions in machine learning, but students also learn how to set up
an objective function for sequential decision problems. This skill will stay with us as we
progress to more complex decision problems.

• Topic 3 – Adaptive optimization – In this topic I use the newsvendor problem to introduce
the idea of using sampled information to compute a gradient. This is widely known as a
stochastic gradient in the literature, but the gradient is based on a sample, which means
we are taking the derivative of a deterministic function. This problem will require
generating random variables dynamically rather than creating a sample in advance as
we did in Topic 2. It is important to recognize that while the newsvendor problem is
perhaps the most widely studied stochastic optimization problem, the algorithm is quite
simple, and outside of generating random samples, all of the steps use deterministic
methods.

• Topic 4 – Optimal learning – This is a topic where the optimization problem is making
decisions of what to observe, such as how a patient responds to a drug, how many
clicks a website attracts, and the market demand for a product at a particular
(discretized) price (applications of this model are endless, and familiar to everyone). We
use a policy called interval estimation (a form of upper confidence bounding) that is very
popular with tech companies (e.g. Google and Facebook) for maximizing ad-clicks.
Optimal learning will also play a role any time we need to do parameter tuning, which will
turn out to be a common type of optimization problem.

Upper confidence bounding policies represent a form of “cost function approximation”


(CFA) where the policy for making decisions has an imbedded optimization problem.
For UCB policies, this optimization problem involves a simple sort, but it introduces the
idea of a policy that involves solving an optimization problem to make a decision. This
8

means we have an optimization problem (the sort) within a larger optimization problem
of tuning the parameter in the UCB policy.

● Topic 5 – Shortest path problems - Here we introduce our first nontrivial static,
deterministic optimization problem which is also a very special form of linear program
(but that comes later). This is the only time we use Bellman’s equation in the course,
although there are problems (in Topic 10) where we could draw on Bellman again.

After presenting the model and algorithm for a deterministic, static shortest path
problem, we then transition to show how this can be used in a dynamic setting, as would
happen if we are modeling a path through a dynamic network. We show how to model
this problem, and then show how to create a classic deterministic dynamic lookahead
approximation (deterministic DLA). We show how to evaluate the shortest path problem
as a policy, and how to parameterize it so that it works better in a stochastic, dynamic
environment.

We are going to copy this setting as we move into more general optimization problems.
We will start by presenting a basic, static optimization problem (this could be a linear,
nonlinear or integer program), and then show how it is often used as a policy in a
dynamic setting. In my experience, the vast majority of “optimization problems” are
actually policies used in a sequential problem setting, something that is typically
overlooked in standard texts on optimization.

● Topic 6 – General concepts – We pause at this point to discuss two important


dimensions of sequential decision problems:
○ Classes of policies – So far, we have illustrated four ways of making decisions,
each of which come from the four classes of policies. These four classes cover
every possible method that we might use to make decisions, including any
method people are already using.
○ Evaluating policies – The biggest difference between people who make decisions
in an ad hoc way versus someone with formal training is their understanding of
the concept of a policy, and how to evaluate it. In this topic (typically a single
lecture) we start by reviewing how we have evaluated policies in topics 2 – 5.
We then list different ways of evaluating policies such as cumulative reward for
online learning, and final reward for learning in a lab. We also differentiate
between expected performance versus risk. While the academic literature deals
with risk with a considerable amount of mathematical sophistication, we are
going to show students how to model risk in a way that can be easily computed
in a spreadsheet.
9

● Topic 7 – Linear programming – Here is where we introduce linear programming. This


can be done in a single lecture (which I recommend for an introductory optimization
course) or expanded given the time available, interests of the students, and the interests
of the faculty member teaching the course.

Our preferred style for an introductory course is to teach the idea of a linear program and
then transition to the idea that “algorithms exist” for solving it. An in-depth presentation
of the simplex algorithm is simply not appropriate at this stage, since no-one is ever
going to implement their own simplex algorithm. Modern implementations of the simplex
algorithm use a variety of sophisticated strategies to improve performance; these
strategies are never discussed in textbook treatments of the simplex algorithm, so it is
not clear what a student is learning from these streamlined presentations. In addition,
production implementations might combine strategies such as dual simplex or even
interior point methods. This material is simply not appropriate for an introductory course.

This said, we illustrate the simplex algorithm graphically using a network problem which
helps students understand, in a highly visual way, the concept of a basis, pivoting, and
most important, dual variables. This can be done without any linear algebra, but we do
have a section where we present the simplex algorithm (for a network problem) both
graphically, and then algebraically. We leave it to the instructor to decide which
presentation best suits their students.

We begin by presenting linear programming as the solution to a static problem, but we


then transition to using linear programming as a policy for sequential decision problems.
We suspect that most linear programming applications arise in the context of sequential
decision problems (and this is certainly true of the original motivating applications used
by George Dantzig). I think the recognition that linear programs are often used as
policies for sequential decision problems is one of the great failures of the math
programming community. Topic 8 illustrates how a deterministic linear program might
be used in a sequential inventory problem.

● Topic 8 – Dynamic inventory problem – Here we are going to copy what we did for our
dynamic shortest path problem but use the context of an energy storage problem in a
highly dynamic setting with rolling forecasts (a topic that has been completely
overlooked in the operations research literature). This requires solving a series of
simple linear programs, even though the decision at a point in time is a scalar (we get
the LP because we are optimizing over a planning horizon, which means our decision
variable is now a vector). The lookahead LP with be parameterized to help mitigate the
errors in the rolling forecasts, and we will show that this produces a much better result
than using typical point forecasts. The challenge, as always, will be the tuning, a
problem we first saw in the machine learning problem in Topic 1. We will suggest a
strategy that is fairly easy to implement.
10

● Topic 9 – Integer programming – Here we introduce the idea of integer variables in the
context of a facility location problem. As with our shortest path problem, we will start
with a simple, static facility location problem. Then, Topic 10 shows how the static
model can be used as a policy in a fully sequential problem.

Optimization books tend to become drawn into the fairly sophisticated algorithms
required to solve integer programs. However, since year 2000, commercial packages
have conquered wide classes of even very large integer programs, although some care
has to be used since there is a wide range of integer programming problems, and some
still require specialized algorithms. The best packages (such as Gurobi and Cplex) can
be dramatically better than free software that students can download over the internet.
As with algorithms for linear programs, teaching algorithms for integer programs is
pointless for an introductory course – these algorithms are quite sophisticated and no-
one today would implement their own. However, it is important for students to be able to
recognize which types of integer programming problems are likely to be solvable with a
general purpose package.

● Topic 10 – Dynamic facility location – As we did with linear programming, we start by


presenting a static integer programming problem using facility location, and then extend
it here to a dynamic setting. We start by making the case that any facility location
problem would have to be implemented in a stochastic environment. We separate the
decision of where to locate facilities, which is made using forecasted demands, and the
“real world” decisions of how to meet demands which are revealed after we make the
decision to locate facilities.

We then recognize that decisions to locate facilities are themselves decisions that are
made over time, in the presence of the uncertainties about demands. We illustrate a few
strategies for solving the dynamic facility location problem.

● Topic 11 – Nonlinear programming – We have already seen nonlinear programming in


topic 1 when we fitted a nonlinear model, but here we are going to address this rich topic
in more depth. As with linear and integer programming, we are going to present
nonlinear programming in two stages: first as a static problem, and then as a policy in a
fully sequential problem. Nonlinear programming is a rich topic that can be introduced in
a single lecture but can span an entire course. It is up to the professor to decide how
much time to spend on this topic given the interests of the students.

We are going to introduce students to a quadratic programming problem that arises


when optimizing investments over a portfolio. We will first introduce this problem in its
classical formulation as a static problem, and then transition to solving it sequentially
over time, treating it as a policy in a fully sequential decision problem (based on actual
practice on Wall St.). This will be a sophisticated (but very real) extension of the asset
selling problem we introduced in Topic 2.
11

4.0 Readings
Many of the topics are organized around sequential decision problems presented in

Warren Powell, Sequential Decision Analytics and Modeling, NOW Press, 2022 (available for
free download from https://tinyurl.com/sdamodeling). Below I refer to this as “SDAM.”

Readings from SDAM are indicated at the beginning of each topic (or subtopic).

Occasionally I refer to material in my graduate-level book:

Warren Powell, Reinforcement Learning and Stochastic Optimization, Wiley, 2022 (see
https://tinyurl.com/RLandSO/ for an overview). Below I refer to this as “RLSO.”

RLSO is not appropriate for an introductory course such as this, but I recommend that the
instructor have a copy of the book.

There are blocks of material on mature topics like linear, integer, and nonlinear programming. I
assume that any professor teaching a course in optimization will already have a favorite book
they like to use for these topics. We encourage, for an introductory course like this, putting
more emphasis on describing what these problems are and how they are used, with less
emphasis on algorithms, especially when these are widely available in packages.
12

5.0 Lectures
In this section I sketch out a sequence of topics that steadily transition from relatively simple
decision problems to more complex ones. There is considerable flexibility in terms of how much
time is spent on each topic. For example, linear programming can easily be taught in a single
lecture (basically defining what a linear program is), but it is easy to fill half a course. There are
also topics on integer programming and nonlinear programming which can also be taught in a
single lecture, but there are entire (graduate level) courses dedicated to each of these topics. In
an introductory course, I think students should be introduced to these topics, but in a modest
way.

Topic 0: Applications
It always helps to start an introductory course such as this with a series of applications. This will
be very dependent on the department where the course is being taught. Below I give some
illustrative applications that I used when I was teaching this material.

• Machine learning problems – These require optimizing parameters to make a model fit a
training dataset.
• Management of physical resources – Physical resources might be:
o People (hiring, firing, training, moving)
o Equipment (trucks, drones, robots, medical equipment, …)
o Facilities (building, leasing, closing, resizing)
o Product (planning inventories for retail sale, or parts used in manufacturing)
• Management of financial resources – These include
o Planning cash reserves
o Making Investments
o Arranging different financial instruments (loans, insurance contracts, …)
o Setting budgets
• Information acquisition and communication
o Running experiments in a lab or the field
o Running medical tests
o Sending/sharing information about the status of a system
• Finding the best ways of making decisions
o Choosing the best method for making decisions
o Tuning parameters used by a method
13

These decision problems can come in two forms:

• Static problems – These are problems we solve once and then use the solution
• Sequential decision problems – These are decisions that are made repeatedly over time
as new information is arriving.

Sequential decision problems are quite rich, and typically involve relatively simple decisions.
However, the sequential nature, and in particular the flow of new information, can introduce
significant complexities. We are going to steer around these complexities and show how to
model and solve problems that everyone encounters in their own personal activities, or any of a
wide range of problems in business, engineering and the sciences.

The powerpoint slides I used in my first lecture can be downloaded by going to


https://tinyurl.com/RLSOcourses/. Scroll down to the heading “Undergraduate/masters course
in sequential decision analytics” and then scroll down to “Lecture 1” and download the slides.
However, it is very important that these applications be chosen based on the interests of the
students.
14

Topic 1: Machine learning


One of the most visible (and accessible) optimization problems today arises in machine
learning, where we have to find the best fit of a model to a training dataset (this is also known as
supervised machine learning).

Below we describe the optimization problems and solution methods that arise when we are
fitting linear models and nonlinear models. Each setting will allow us to illustrate different
optimization strategies, from finding an optimal solution analytically with linear models to using a
derivative-based search algorithm for nonlinear models. We will also learn some properties of
optimal solutions along with the necessary conditions for optimality in each setting.

1.1 Linear models


We are going to start by assuming we have a basic training dataset that we can write as

(𝑥𝑥 1 , 𝑦𝑦1 ), (𝑥𝑥 2 , 𝑦𝑦 2 ), … , (𝑥𝑥 𝑁𝑁 , 𝑦𝑦 𝑁𝑁 ).

,We are going to assume that we have a model of the form

𝑦𝑦 = 𝑓𝑓(𝑥𝑥|𝜃𝜃) = ∑𝑁𝑁 𝑛𝑛
𝑛𝑛=1 𝜃𝜃𝑓𝑓 𝜙𝜙𝑓𝑓 (𝑥𝑥 ), (1.1)

where 𝜙𝜙𝑓𝑓 (𝑥𝑥) is known as a “feature” which is some function of the input data 𝑥𝑥. Given the
features (chosen manually) our optimization problem is given by

2
min ∑𝑁𝑁 𝑛𝑛 𝑛𝑛
𝑛𝑛=1�𝑦𝑦 − 𝑓𝑓(𝑥𝑥 |𝜃𝜃)� . (1.2)
𝜃𝜃

I would start by deriving the well-known normal equations, given by

𝜃𝜃 ∗ = [𝑋𝑋 𝑇𝑇 𝑋𝑋]−1 𝑋𝑋 𝑇𝑇 𝑌𝑌,

where 𝑋𝑋 is the design matrix given by

𝑥𝑥 1 𝑥𝑥21 … 𝑥𝑥𝑝𝑝1
⎡ 1 ⎤
𝑥𝑥 2 𝑥𝑥22 … 𝑥𝑥𝑝𝑝2 ⎥,
𝑋𝑋 = ⎢⎢ 1 ⎥
⎢ ⋮𝑛𝑛 ⋮ ⋮ ⋮ ⎥
𝑛𝑛 𝑛𝑛
⎣𝑥𝑥1 𝑥𝑥2 … 𝑥𝑥𝑝𝑝 ⎦
15

and 𝑌𝑌 is our vector of observations (also called responses or labels)

𝑦𝑦1
⎡ 2⎤
𝑌𝑌 = ⎢ 𝑦𝑦 ⎥.
⎢ ⋮ ⎥
⎣𝑦𝑦 𝑛𝑛 ⎦

This should be a warmup using their linear algebra and setting up a quadratic optimization
problem that can be solved analytically.

Bring out that we need 𝑛𝑛 observations ≥ 𝑝𝑝 (number of parameters), and that the data has to be
well behaved (so that the [𝑋𝑋 𝑇𝑇 𝑋𝑋] matrix is invertible). It is easy to illustrate this with a simple
example – fitting a line through two datapoints.

In the teaching notes below, we argue that that the optimization problem in (1.2) is actually a
stochastic optimization problem. It looks deterministic here (and it is) because we are working
with a sample of the random variables 𝑦𝑦. The sequence (𝑦𝑦1 , … , 𝑦𝑦 𝑁𝑁 ) is actually a sample of the
random observations. We are going to see later that we can turn a lot of stochastic optimization
problems into problems requiring deterministic methods by using samples, so this starter
problem not only introduces an important application (machine learning), it is setting the stage
for how we are going to solve a wide range of sequential problems that involve random
information.

1.2 Nonlinear models

Next we transition from a linear model (where 𝑓𝑓(𝑥𝑥|𝜃𝜃) is linear in the parameters) to a nonlinear
model. One example is a logistic regression such as

𝑒𝑒 𝜃𝜃0 |𝜃𝜃1 𝑥𝑥
𝑓𝑓(𝑥𝑥|𝜃𝜃) =
1+𝑒𝑒 𝜃𝜃0|𝜃𝜃1𝑥𝑥
,

or

−1 𝑥𝑥 < 𝜃𝜃1
𝑓𝑓(𝑥𝑥|𝜃𝜃) = � 0 𝜃𝜃1 ≤ 𝑥𝑥 ≤ 𝜃𝜃2 .
+1 𝑥𝑥 > 𝜃𝜃2

Or, our nonlinear model could be a neural network with millions (or billions) of parameters 𝜃𝜃 that
looks like
16

We now have the optimization problem

2
min 𝑔𝑔(𝜃𝜃) = ∑𝑁𝑁 𝑛𝑛 𝑛𝑛
𝑛𝑛=1�𝑦𝑦 − 𝑓𝑓(𝑥𝑥 |𝜃𝜃)� . (1.3)
𝜃𝜃

We can solve this using a gradient-based search algorithm that looks like:

𝜃𝜃 𝑛𝑛+1 = 𝜃𝜃 𝑛𝑛 − 𝛼𝛼𝑛𝑛 ∇𝑥𝑥 𝑔𝑔(𝜃𝜃 𝑛𝑛 ) (1.4)

(Note that we use the negative gradient because we are minimizing). You need to talk the
students through the process of finding the gradient (and possibly explaining what this is). One
key step is finding the stepsize 𝛼𝛼𝑛𝑛 which is done by solving the one-dimensional search
problem:

𝛼𝛼𝑛𝑛 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑥𝑥𝛼𝛼>0 𝑔𝑔(𝜃𝜃 𝑛𝑛 − 𝛼𝛼𝑛𝑛 ∇𝑥𝑥 𝑔𝑔(𝜃𝜃 𝑛𝑛 )) (1.5)

The figure below illustrates the search process.


17

One issue that often arises is that the function 𝑔𝑔(𝑥𝑥|𝜃𝜃) may have local minima (llustrated below),
which means your gradient algorithm may produce different optimal depending on the starting
point.

Also – I would note that while we need 𝑛𝑛 ≥ 𝑝𝑝 to use the normal equations for our linear model
(in Topic 1), we no longer require this for the nonllinear model. We can apply the gradient
algorithm for any value of 𝑛𝑛, even if it is smaller (and potentially much smaller) than 𝑝𝑝. Students
need to understand that just because the algorithm returns an estimate of the best value of 𝜃𝜃,
that does not mean that it is guaranteed to be a good value that will work well on future
datasets.

You can start by illustrating it using a linear model, and then show how to use it for a nonlinear
model. For example, fit a logistic regression to predict demand as a function of price, or
probability of winning a bid for placing an ad on Google or Facebook. Then extend to a simple
neural network. Be sure to highlight the presence of multiple optima and the need to use
multiple starting points. Then, show that you get an answer even when 𝑛𝑛 < 𝑝𝑝, an issue that
becomes important with deep neural networks.

This will create a basis for later discussing a well-known issue with neural networks (all students
will have heard of this as “AI”) which use models where 𝑝𝑝 ≫ 𝑛𝑛. Students will also learn that
there is not a unique solution when we optimize parameters for nonlinear models. Finally, I
would also make the point that while linear models require 𝑛𝑛 ≥ 𝑝𝑝, nonlinear models do not. We
can fit a neural network with 100 million parameters with a single datapoint (!).
18

1.3 Neural networks (optional)


A special kind of nonlinear model is a neural network. Given the visibility of this technology, an
instructor may wish to introduce students to the basic idea of a neural network, since the
calculations are relatively simple. It is easy to show how to construct a basic neural network,
and how to take a vector of inputs and translate it to an output (or set of outputs). Section 3.9.3
of RLSO describes how to compute the output of a neural network through a simple forward
pass.

You can then jump to section 5.5 of RLSO if you wish to show how to compute the gradient of a
neural network with respect to the weights on each link (these are the tunable parameters). Key
to neural networks is that these derivatives are easy to compute. Given the interest in deep
neural networks, you can show how these calculations can be done in parallel, which is why
chips such as those by Nvidia (which are designed for massively parallel computation, originally
for the graphics in video games) are so popular.

I would also point out that there are packages such as TensorFlow that do these calculations
very effiiciently. However the calculations are performed, the core ideas are the same as what
we illustrated using the logistic regression example in section 1.2. Remember – the idea in an
introductory course is not to prepare students to actually do research on this topic; it is only to
introduce them to important optimization problems and how we go about solving them. For
example, the issue of multiple local optimal solutions that we introduced in our section on
nonlinear models above is important, since it helps to understand that the “optimal” solution we
obtain is not truly optimal, and that small changes in inputs might result in finding a different
local solution.

Also – remember that while we require 𝑛𝑛 ≥ 𝑝𝑝 in a linear model, we have no such requirement for
nonlinear models, including neural networks. We might have a neural network with 100 million
parameters, but we will still get a number if we try to train it with just one data point. While this
seems silly, it is quite common to fit neural networks where the number of observations (that is,
the size of the training dataset) is quite a bit smaller than the number of parameters.

I don’t personally encourage including this material, but it fits very nicely here, and might go a
long way to attracting student interest.
19

1.4 Teaching notes for machine learning


There are some key points that should be brought out in this topic:
• The linear model is a nice opportunity to remind students of some basic linear algebra
when deriving the normal equations.
• Be sure to bring out the requirement that [𝑋𝑋 𝑇𝑇 𝑋𝑋], where 𝑋𝑋 is the “design matrix” of data,
must be invertible, which requires that 𝑛𝑛 ≥ 𝑝𝑝. I suggest illustrating with the problem of
fitting a line to a single data point, and then to two data points where 𝑥𝑥 is the same for
each one.
• The objective functions (equations (1.2) and (1.3)) look like deterministic optimization
problems, but they are not. Fitting a function (linear or nonlinear) to data is actually a
stochastic optimization problem which should be written

2
min 𝔼𝔼𝑋𝑋 𝔼𝔼𝑌𝑌|𝑋𝑋 �𝑌𝑌 − 𝑓𝑓(𝑋𝑋|𝜃𝜃)� . (1.6)
𝜃𝜃

Here we view 𝑋𝑋, the explanatory variables (also known as independent variables or
covariates), and the response 𝑌𝑌 = 𝑓𝑓(𝑋𝑋|𝜃𝜃), as random variables. We assume we are given
a sample of these variables

(𝑥𝑥 1 , 𝑦𝑦1 ), (𝑥𝑥 2 , 𝑦𝑦 2 ), … , (𝑥𝑥 𝑁𝑁 , 𝑦𝑦 𝑁𝑁 ),

which we call the training dataset. However, this is just a sample of the random variables
𝑋𝑋 and 𝑌𝑌, which turns our stochastic optimization problem (1.6) into a deterministic
optimization problem (1.2 or 1.3). We are going to use this technique over and over again
in this course to handle virtually any form of uncertainty. The only difference will be in
future applications is that we may need a way of generating our own random sample.
20

Topic 2: Sequential decision problems I


The vast majority of all decision problems are sequential decision problems. Later, we are
going to motivate harder decision problems that use linear, nonlinear and integer programming,
but to start we are going to use simpler problems which are still important as well as
challenging.

2.1 Introduction to sequential decision problems


Readings: SDAM Chapter 1 (introduction to sequential decision problems).

A sequential decision problem is any problem that consists of the sequence:

decision, information, decision, information, …

where each decision receives a contribution or incurs a cost. Sequential decision problems
cover an extremely broad class of optimization problems. Most important is that they cover
problems that arise in almost any setting: business, health (all kinds), energy, economics,
laboratory experiments, field experiments, …, the list is endless, which means it is possible to
illustrate these problems with applications that are suitable to any class.

Decisions are made with methods that we call “policies.” There are two broad strategies for
designing policies, and each of these produces two classes, creating four classes of policies:

Strategy 1 – Policy search – This is where we identify a class of functions for making decisions,
and then search for the best function that works well over time. The two classes of policies in
this strategy are:
1. Policy function approximations (PFAs) – These are analytical functions that take what
we know to determine what decision to make now. Examples are order-up-to policies for
inventory, or buy low, sell high policies in finance.
2. Cost function approximations (CFAs) – These are simplified (usually deterministic)
optimization models that have been parameterized to work well under uncertainty. We
will see CFAs when we introduce optimal learning (Topic 4). We will also see these
when we introduce linear, integer and nonlinear programming in topics 7-11.

Strategy 2 – Lookahead approximations – We estimate the value of a decision by combining the


immediate cost or reward of a decision plus some approximation of downstream costs and
rewards from the initial decision. This strategy can be divided into two classes:
3. Value function approximations (VFAs) – Here we find the decision that optimize the
immediate cost or reward plus an approximate value of the state that the decision takes
us to. These are the only policies that use Bellman’s equation (Hamilton-Jacobi if you
are a controls person).
21

4. Direct lookahead approximations (DLAs) – Finally we optimize the immediate cost or


reward plus an estimate of downstream costs or rewards computed by solving an
approximate model of the future.

Note that sequential decision problems arise throughout human activities. PFAs are the simplest
class and are the most widely used. Most important, these are parameterized functions, just
like the parametric models in statistics that we saw in Topic 2.

Below is the slide I use to present the elements of a sequential decision problem. It illustrates
the notation of states 𝑆𝑆𝑡𝑡 (what we know at time t), decisions 𝑥𝑥𝑡𝑡 (what decision we choose from a
set of feasible decisions), and the exogenous information 𝑊𝑊𝑡𝑡+1 that we learn only after we make
the decision 𝑥𝑥𝑡𝑡 . Decisions are made with a method (policy) that we designate as 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) that
often depends on tunable parameters 𝜃𝜃. Also shown is the contribution (if we are maximizing) or
cost (if minimizing) 𝐶𝐶(𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 ) which may depend on information in the state 𝑆𝑆𝑡𝑡 (such as
dynamically changing prices or costs) in addition to the decision 𝑥𝑥𝑡𝑡 . The transition function
𝑆𝑆 𝑀𝑀 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 , 𝑊𝑊𝑡𝑡+1 ) gives the updated state 𝑆𝑆𝑡𝑡+1 given the information in 𝑆𝑆𝑡𝑡 , the decision 𝑥𝑥𝑡𝑡 , and the
exogenous information 𝑊𝑊𝑡𝑡+1 .

The next slide (below) is one I use to compare machine learning with sequential decisions. The
difference between machine learning and sequential decisions is that with machine learning,
you need a training dataset to fit a general-purpose function, whereas sequential decisions
(which do not need a training dataset) need a model of the underlying problem (such as the
evolution of inventories). It is important to emphasize this difference, since it is often overlooked
(especially in discussions of “AI”).
22

There is a wide range of sequential decision problems that will be familiar to students (unlike
linear programs). The challenge is that sequential decision problems involve the arrival of new
information, which puts even simple decision problems into a class that has been treated under
a wide variety of names, with the most common being “dynamic programs.” This will produce
gasps of “oh, that is stochastic optimization…you can’t teach stochastic optimization in an
introductory optimization course.” For most faculty, “dynamic programming” means teaching
Bellman’s equation.

We are not going to go down that path. First, we observe that both of the machine learning
problems in Topics 1 and 2 are forms of stochastic optimization problem. We avoid the
complexities that would normally arise by tuning our statistical models used a training dataset
that we assumed was given to us.

We are going to follow the same strategy for sequential decision problems, emphasizing three
types of policies that are widely used, and which students will typically be familiar with. These
are:

• PFAs – These are the simplest policies such as buy low, sell high in finance or order-up-
to policies for inventory planning. Note that PFAs include every possible model that
might be used in machine learning, which might be anything from a linear model to a
deep neural network.
• CFAs – These will be simplified (always deterministic) optimization problems that might
involve nothing more than a sort. For more complex problems, we will introduce linear
programs, and then show how we can parameterize a linear program to make it work
well over time. This powerful idea is widely used in practice but has been completely
ignored in the academic literature.
23

• Deterministic DLAs – Google maps is a nice example of a deterministic DLA, since it


plans a path to a destination (which requires looking into the future). For problems
where we need to plan into the future, the most widely used strategy is to use a
deterministic approximation. We might use tunable parameters to make these perform
better over time, but not always.

Below we are going to illustrate some simple sequential decision problems using PFAs, which
require tuning parameters. The approach exactly parallels what we did in Topic 1 for machine
learning.

2.2 Asset selling


Readings: SDAM Chapter 2.

We are going to start with a simple buy-low, sell-high policy in finance, which offers an important
simplification to help us get started: the random information (the price at which we can sell our
asset) is drawn from history, allowing us to treat it just as we did the observations 𝑦𝑦 𝑛𝑛 in our
machine learning problems in Topic 1.

We are going to assume that we are given a historical sequence of prices

(𝑝𝑝𝑡𝑡−𝐻𝐻 , 𝑝𝑝𝑡𝑡−𝐻𝐻+1 , … , 𝑝𝑝𝑡𝑡′ , … , 𝑝𝑝𝑡𝑡 )

where time 𝑡𝑡′ is any point in time in the history where we might sell an asset (assume for
simplicity that we can only sell at the end of each day). The price 𝑝𝑝𝑡𝑡 would be the most recent
price we have available. A nice feature of this problem is that we can assume that the prices are
independent of any decisions that we make.

Next imagine that we are going to use the following policy that determines when to buy, sell or
hold a single asset

+1 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑖𝑖𝑖𝑖 𝑝𝑝𝑡𝑡 > 𝑝𝑝̅𝑡𝑡−1 + 𝜃𝜃1


𝑋𝑋 𝜋𝜋1 (𝑆𝑆𝑡𝑡 |𝜃𝜃) = � 0 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 𝑖𝑖𝑖𝑖 𝑝𝑝̅𝑡𝑡−1 − 𝜃𝜃2 ≤ 𝑝𝑝𝑡𝑡 ≤ 𝑝𝑝̅𝑡𝑡−1 + 𝜃𝜃1 (2.1)
−1 𝐵𝐵𝐵𝐵𝐵𝐵 𝑖𝑖𝑖𝑖 𝑝𝑝𝑡𝑡 < 𝑝𝑝̅𝑡𝑡−1 − 𝜃𝜃2

where

𝑝𝑝̅𝑡𝑡 = .5𝑝𝑝𝑡𝑡 + .35𝑝𝑝𝑡𝑡−1 + 0.15𝑝𝑝𝑡𝑡−2 (2.2)

is a smoothed estimated of prices. The variable 𝑆𝑆𝑡𝑡 , which is called the “state variable” captures
all the information we know at time t, that we need to make a decision (that is, compute our
24

policy 𝑋𝑋 𝜋𝜋1 (𝑆𝑆𝑡𝑡 |𝜃𝜃)), as well as any other information we might need. For this problem, the state
variable consists of

𝑆𝑆𝑡𝑡 = (𝑝𝑝𝑡𝑡 , 𝑝𝑝𝑡𝑡−1 , 𝑝𝑝𝑡𝑡−2 )

It is easy to see that our policy 𝑋𝑋 𝜋𝜋1 (𝑆𝑆𝑡𝑡 |𝜃𝜃) is one of many possible strategies we could use. This
policy has two tunable parameters. To tune the parameters, we need an objective function.

The objective function we would use to perform our tuning would look like

𝐹𝐹(𝜃𝜃) = ∑𝑡𝑡𝑡𝑡 ′ =𝑡𝑡−𝐻𝐻 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡′ |𝜃𝜃)𝑝𝑝𝑡𝑡′ . (2.3)

Note that our policy depends only on information that we would know at time t, even though we
have an entire history of prices (𝑝𝑝𝑡𝑡−𝐻𝐻 , 𝑝𝑝𝑡𝑡−𝐻𝐻+1 , … , 𝑝𝑝𝑡𝑡′ , … , 𝑝𝑝𝑡𝑡 ). This type of tuning is widely used in
finance, and is known as “backtesting” since it requires using historical prices to evaluate the
policy.

This objective closely parallels the one we used to fit nonlinear models in Topic 1. If we let 𝜃𝜃1 =
𝜃𝜃2 , then we have a one-dimensional problem, which could be easily optimized in a spreadsheet
(it is not much harder to do this over two dimensions).

This exercise accomplishes several educational objectives:


• It introduces the idea of a policy for making decisions.
• It uses historical data to represent a sample of outcomes, just as we created a training
dataset for machine learning (see Topics 1 and 2).

A different way to perform the tuning is to create a mathematical model of prices, and use this to
create a brand new set of prices for each iteration of the algorithm. We do not have to do this
for this specific problem, although the tuning would be more robust if we did. However, creating
mathematical models of prices is relatively difficult – we would need to capture both the
distribution of prices (which is fairly easy) in addition to the correlations of prices over time (this
is quite difficult). Working with historical prices avoids these complications.
25

2.3 Inventory planning


Reading: Section 1.3 of SDAM

Inventory problems are one of the most popular applications of sequential decision problems.
This will be a fairly minor variation of the asset selling problem, but it opens the door to an
incredibly rich set of applications that arise in supply chain management.

For this topic, we will start with a vanilla inventory problem where we define:

𝑅𝑅𝑡𝑡 = The amount of inventory at time t


𝑥𝑥𝑡𝑡 = The amount of new inventory we order, where we assume it arrives right away.
�𝑡𝑡+1 =The demand for product that arises between t and t+1 (and after we make the
𝐷𝐷
decision 𝑥𝑥𝑡𝑡 ).
𝑝𝑝 = The unit price at which we sell the product.
𝑐𝑐 = The unit cost of ordering more product.

The basic equation for updating the inventory 𝑅𝑅𝑡𝑡 is given by

�𝑡𝑡+1 }
𝑅𝑅𝑡𝑡+1 = max {0, 𝑅𝑅𝑡𝑡 + 𝑥𝑥𝑡𝑡 − 𝐷𝐷 (2.4)

Unlike our asset selling problem, we generally do not get to observe the actual demands 𝐷𝐷�𝑡𝑡 ,
which means we cannot just use a set of observations from history. Instead, we are going to
need to generate a set of observations of demands using a random number generator. We are
going to take advantage of a standard function built into all computer languages to generate a
random number between 0 and 1. For example, in Excel this function is called Rand(). In
Python it is called Random.uniform(0,1). We are going to just let the function be represented by
U().

If we want a random number 𝑅𝑅 that is uniformly distributed between 𝑎𝑎 and 𝑏𝑏, we use

𝑅𝑅 = 𝑎𝑎 + (𝑏𝑏 − 𝑎𝑎)𝑈𝑈(). (2.5)

�𝑡𝑡 is
Now imagine that we know that on average the demand is 𝜇𝜇, where our actual demand 𝐷𝐷
given by

�𝑡𝑡 = 𝜇𝜇 + 𝜀𝜀,
𝐷𝐷 (2.6)

where 𝜀𝜀 is a random error term that is uniformly distributed between – 𝜇𝜇 and +𝜇𝜇. We can
generate random observations of 𝜀𝜀 using

𝜀𝜀 = −𝜇𝜇 + 2𝜇𝜇 𝑈𝑈(). (2.7)

If we want 100 observations of demands, we need to generate 100 observations of 𝜀𝜀.


26

Now that we have a set of random demands 𝐷𝐷 �1, 𝐷𝐷


�2 , … 𝐷𝐷
�𝑡𝑡 , … , 𝐷𝐷
�𝑇𝑇 , we need a method for making
decisions. We are going to use a policy known in the literature as an “(𝑠𝑠, 𝑆𝑆)” policy where 𝑠𝑠 and
𝑆𝑆 are tunable parameters. We like to use 𝜃𝜃 for our tunable parameters, so we are going to
replace 𝑠𝑠 and 𝑆𝑆 with 𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 and 𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 . Our policy is then given by

𝑚𝑚𝑚𝑚𝑚𝑚
𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) = �𝜃𝜃 − 𝑅𝑅𝑡𝑡 𝑖𝑖𝑖𝑖 𝑅𝑅𝑡𝑡 < 𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 (2.8)
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒.

Our challenge now is to find the best value of 𝜃𝜃 = �𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 , 𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 �, which we do by optimizing the
objective function

�𝑡𝑡+1 } − 𝑐𝑐𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) ,


max ∑𝑇𝑇𝑡𝑡=0 𝑝𝑝 𝑚𝑚𝑚𝑚𝑚𝑚{𝑅𝑅𝑡𝑡 + 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃), 𝐷𝐷 (2.9)
𝜃𝜃

where the inventory 𝑅𝑅𝑡𝑡 evolves according to equation 2.4.

The optimization problem in (2.9) can be approached just as we did to optimize our nonlinear
model in Topic 1 (equation (1.3)) or the asset selling problem in section 2.2.

2.4 Teaching notes

• We have now seen that optimizing a parameterized policy closely parallels optimizing
the parameters of a nonlinear function for machine learning. One difference is that we
can generally compute the gradient of the nonlinear function in machine learning,
whereas this is typically not true when we are simulating a policy.
• The simplest approach to use right now is to introduce the idea of numerical derivatives.
• There is also a wide range of derivative-free methods, but these should be introduced
slowly over the course, depending on the interests of the professor and the backgrounds
of the students.
27

Topic 3: Adaptive optimization - the newsvendor problem


Readings: Chapter 3 in SDAM

Arguably one of the most widely encountered problems when managing resources is the
newsvendor problem, which is a nice illustration of a decision problem involving uncertainty.
We will show how to solve this using a simple stochastic gradient algorithm that can be
implemented in an online (learn as you go) setting. This is a natural extension of the gradient-
based method we used in section 1.2 for optimizing the parameters of a nonlinear statistical
model.

In Topic 1, we solved the following machine learning problem

1 𝑁𝑁 2
min 𝐹𝐹� (𝑥𝑥|𝜃𝜃) = ∑ �𝑦𝑦 𝑛𝑛 − 𝑓𝑓(𝑥𝑥 𝑛𝑛 |𝜃𝜃)� . (3.1)
𝜃𝜃 𝑁𝑁 𝑛𝑛=1

This is a sampled estimate of the function

min 𝐹𝐹(𝑥𝑥|𝜃𝜃) = 𝔼𝔼(𝑌𝑌 − 𝑓𝑓(𝑋𝑋|𝜃𝜃)), (3.2)


𝜃𝜃

where “𝑌𝑌” is a random variable representing the response given the random input “𝑋𝑋”, and
where

(𝑥𝑥 1 , 𝑦𝑦1 ), (𝑥𝑥 2 , 𝑦𝑦 2 ), … (𝑥𝑥 𝑛𝑛 , 𝑦𝑦 𝑛𝑛 ), … (𝑥𝑥 𝑁𝑁 , 𝑦𝑦 𝑁𝑁 ),

is a sample of 𝑁𝑁 observations of the variables (𝑋𝑋, 𝑌𝑌).

The idea of using a sampled estimate to transform a stochastic optimization problem (3.2) into a
deterministic optimization problem (3.1) is a powerful and widely used strategy when optimizing
functions of random variables.

One of the most popular stochastic optimization problems is the newsvendor problem, where we
need to choose a quantity 𝑥𝑥 to meet an unknown demand 𝐷𝐷. The challenge is that we have to
purchase 𝑥𝑥 units of a product at a unit cost 𝑐𝑐, to meet the demand 𝐷𝐷 receiving a revenue 𝑝𝑝 for
each unit sold. The problem is that we cannot sell more than the demand, giving us the
objective function

max 𝐹𝐹(𝑥𝑥|𝐷𝐷) = 𝑝𝑝 max{𝑥𝑥, 𝐷𝐷} − 𝑐𝑐𝑐𝑐. (3.3)


𝑥𝑥

The function 𝐹𝐹(𝑥𝑥|𝐷𝐷) assumes we know the demand 𝐷𝐷, but we do not. What we have to do is to
find the quantity 𝑥𝑥 that maximizes the expected value of 𝐹𝐹(𝑥𝑥|𝐷𝐷) over the random quantity 𝐷𝐷.
This would be written
28

max 𝑔𝑔(𝑥𝑥) = 𝔼𝔼𝐷𝐷 𝐹𝐹(𝑥𝑥|𝐷𝐷) = 𝔼𝔼𝐷𝐷 {𝑝𝑝 max{𝑥𝑥, 𝐷𝐷} − 𝑐𝑐𝑐𝑐} (3.4)
𝑥𝑥

Imagine that we have a historical dataset of order quantities and demands which we can write
out as

(𝑥𝑥 1 , 𝐷𝐷1 ), (𝑥𝑥 2 , 𝐷𝐷 2 ), … (𝑥𝑥 𝑛𝑛 , 𝐷𝐷 𝑛𝑛 ), … (𝑥𝑥 𝑁𝑁 , 𝐷𝐷 𝑁𝑁 ).

If we have this data, we could solve our problem just as we did our machine learning problem in
(3.1):

1
max 𝐹𝐹� (𝑥𝑥) = ∑𝑁𝑁 𝑛𝑛
𝑛𝑛=1(𝑝𝑝 max{𝑥𝑥, 𝐷𝐷 } − 𝑐𝑐𝑐𝑐). (3.5)
𝑥𝑥 𝑁𝑁

The problem with this approach is that we never have a set of observations of demands
𝐷𝐷1 , 𝐷𝐷 2 , … , 𝐷𝐷 𝑁𝑁 because we do not observe demands – we observe what we sell which is the
smaller of what we ordered 𝑥𝑥 𝑛𝑛 and the true demand 𝐷𝐷 𝑛𝑛 . If we order 𝑥𝑥 𝑛𝑛 = 6 and observe sales
of 6, it might be that 𝐷𝐷 𝑛𝑛 = 𝑥𝑥 𝑛𝑛 , but more often it means that 𝐷𝐷 𝑛𝑛 > 𝑥𝑥 𝑛𝑛 .

There is a simple way to get around this problem. We are going to use a basic gradient search
algorithm, but we cannot take the derivative of the function 𝑔𝑔(𝑥𝑥). What we are going to do
seems magical (that is, it seems as if we should not be able to do it). We are going to assume
that we choose a quantity 𝑥𝑥 = 𝑥𝑥 𝑛𝑛 , and then observe a demand 𝐷𝐷 𝑛𝑛+1. Note that we have
introduced a subtle shift in how we are indexing 𝑥𝑥 𝑛𝑛 and the demand 𝐷𝐷 𝑛𝑛+1; this way of indexing
means that 𝑥𝑥 𝑛𝑛 depends on 𝐷𝐷1 , … , 𝐷𝐷 𝑛𝑛 but does not depend on 𝐷𝐷 𝑛𝑛+1 .

After we observe the demand, we now have the deterministic function

𝑔𝑔(𝑥𝑥|𝐷𝐷 𝑛𝑛+1 ) = 𝑝𝑝 min{𝑥𝑥, 𝐷𝐷 𝑛𝑛+1 } − 𝑐𝑐𝑐𝑐. (3.6)

Next we are just going to take the derivative of 𝑔𝑔(𝑥𝑥|𝐷𝐷 𝑛𝑛+1) with respect to 𝑥𝑥:

𝑛𝑛+1
𝑑𝑑𝑑𝑑�𝑥𝑥 �𝐷𝐷 � 𝑝𝑝 − 𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥 ≤ 𝐷𝐷 𝑛𝑛+1
∇𝑔𝑔(𝑥𝑥|𝐷𝐷 𝑛𝑛+1 ) = =� . (3.7)
𝑑𝑑𝑑𝑑 −𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥 > 𝐷𝐷 𝑛𝑛+1

We now use the same gradient-based search algorithm we first introduced in Topic 1 for
nonlinear models:

𝑥𝑥 𝑛𝑛+1 = 𝑥𝑥 𝑛𝑛 + 𝛼𝛼𝑛𝑛 ∇𝑔𝑔(𝑥𝑥|𝐷𝐷 𝑛𝑛+1 ). (3.8)

Unlike our first use of gradient based methods in Topic 1, we can no longer find the stepsize 𝛼𝛼𝑛𝑛
by solving a one-dimensional search problem as we did in equation (1.5). Instead, we are going
to use something that is much simpler:
29

1
𝛼𝛼𝑛𝑛 = . (3.9)
𝑛𝑛

Incredibly, we can show that this stepsize rule will produce a sequence of decisions
𝑥𝑥 1 , 𝑥𝑥 2 , … , 𝑥𝑥 𝑛𝑛 , where

lim 𝑥𝑥 𝑛𝑛 → 𝑥𝑥 ∗ . (3.10)
𝑛𝑛→∞

This means that if we run this algorithm an infinite number of times, it will find the optimal
solution! The bad news is that it is possible that the algorithm will be quite slow. A lot of
research has gone into finding bettter stepsize formulas. One way speed up the algorithm is to
insert a tunable parameter giving us

𝜃𝜃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝛼𝛼𝑛𝑛 (𝜃𝜃 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ) = . (3.11)
𝜃𝜃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 +𝑛𝑛−1

where 𝜃𝜃 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 is a parameter that has to be tuned. OK, so we have another problem that involves
tuning a parameter, but the core idea here is quite simple!

Stochastic gradient algorithms tend to be taught in advanced stochastic optimization classes.


However, they are perfectly appropriate for an introductory optimization course, especially for
the context of unconstrained problems. Nonnegativity constraints and upper bounds are easy to
handle.
30

Topic 4: Optimal learning - Finding the best treatment


Readings: Chapter 4 in SDAM

An important class of optimization problems falls under the umbrella of “optimal learning” where
the decision involves what to observe, or what experiment to run. From the information gained
from the observation (or experiment). We then use our beliefs to make a choice about a choice
of design, or product, or price.

There is an endless array of optimal learning problems – a sample might include:

• Medical treatments (choice of drug, dosage)


• Which product to advertise on a website
• Which supplier to use to supply materials or components
• Which schools to visit to interview for employees
• What price to charge for a product (from a set of prices)
• Who should be starters on a basketball team
• Which path to take through a congested network to get to work
• Which product to recommend on a webpage to attract the most clicks
• Which of several webpage designs to use to maximize traffic
• …(This is a very long list)

In this section we will use the context of finding the best medical treatment for a patient.

Learning the best of a set of “treatments” is widely known as a “multiarmed bandit problem”
which is a mathematically rich (and computationally complex) problem. However, the approach
we are going to use here is quite simple and very popular at companies like Google and
Facebook for optimizing ads (students should resonate with this). In the process, we are also
going to introduce a new class of policy that is going to open the door to a much more complex
class of problems. These policies are known as “cost function approximations” (or CFAs).

We are going to start with a policy called “interval estimation” that helps us choose the best
treatment 𝑥𝑥 ∈ 𝑋𝑋 = {𝑥𝑥1 , … , 𝑥𝑥𝐾𝐾 }. Let 𝜇𝜇̅𝑥𝑥𝑛𝑛 be our estimate of the effectiveness of treatment 𝑥𝑥 after
we have performed 𝑛𝑛 experiments using any of the choices. Then let 𝜎𝜎�𝑥𝑥𝑛𝑛 be the standard
deviation of 𝜇𝜇̅ 𝑥𝑥𝑛𝑛 (remind students how to do the standard deviation of a mean, and how it goes to
zero as 𝑛𝑛 → ∞).
31

After 𝑛𝑛 experiments, our beliefs might look like those shown in the figure above - this might
represent the potential sales of different products, but there are many settings where this
applies. We represent our beliefs using the belief state variable 𝐵𝐵𝑛𝑛 = (𝜇𝜇̅ 𝑥𝑥𝑛𝑛 , 𝜎𝜎�𝑥𝑥𝑛𝑛 )𝑥𝑥∈𝑋𝑋 capture our
beliefs about the performance of each choice 𝑥𝑥. For this problem, the state variable 𝑆𝑆 𝑛𝑛 = 𝐵𝐵𝑛𝑛
equals the belief state, but there are problems where we may have information other than the
belief state (such as the budget remaining to run experiments). For now, we are just going to
limit the state variable to the belief state.

The interval estimation policy is given by:

𝑋𝑋 𝐼𝐼𝐼𝐼 (𝑆𝑆 𝑛𝑛 |𝜃𝜃 𝐼𝐼𝐼𝐼 ) = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑥𝑥𝑥𝑥∈𝑋𝑋 (𝜇𝜇̅𝑥𝑥𝑛𝑛 + 𝜃𝜃 𝐼𝐼𝐼𝐼 𝜎𝜎�𝑥𝑥𝑛𝑛 ) (4.1)

where 𝑥𝑥 𝑛𝑛 = 𝑋𝑋 𝐼𝐼𝐼𝐼 (𝑆𝑆 𝑛𝑛 |𝜃𝜃 𝐼𝐼𝐼𝐼 ) is the design we are going to choose for the 𝑛𝑛 + 1𝑠𝑠𝑠𝑠 experiment, which
produces an observation 𝑊𝑊 𝑛𝑛+1 .

With our inventory problem, we updated our state variable (the inventory) using the inventory
equation (2.4). With a learning problem, we have to update our beliefs, which we are going to
do using some simple recursions. Assume that we observe performance 𝑊𝑊𝑥𝑥𝑛𝑛+1 when we test
choice 𝑥𝑥 = 𝑥𝑥 𝑛𝑛 . First, we are going to replace the variances (𝜎𝜎�𝑥𝑥𝑛𝑛 )2 with their inverses which we
call the precision given by

1
𝛽𝛽𝑥𝑥𝑛𝑛 = (𝜎𝜎�𝑛𝑛 )2. (4.2)
𝑥𝑥

We are also going to assume that when we observe the results of an experiment which we
2
represent by 𝑊𝑊𝑥𝑥𝑛𝑛+1 that this experiment is random with a known variance 𝜎𝜎𝑊𝑊 and precision

1
𝛽𝛽 𝑊𝑊 = 2 . (4.3)
𝜎𝜎𝑊𝑊
32

We can use the precision to write the updating equations for the means and precisions using

� 𝑥𝑥𝑛𝑛+1 +𝛽𝛽 𝑊𝑊 𝑊𝑊𝑥𝑥𝑛𝑛+1


𝛽𝛽𝑥𝑥𝑛𝑛 𝜇𝜇
𝜇𝜇̅ 𝑥𝑥𝑛𝑛+1 = 𝛽𝛽𝑥𝑥𝑛𝑛 +𝛽𝛽 𝑊𝑊
, (4.4)

𝛽𝛽𝑥𝑥𝑛𝑛+1 = 𝛽𝛽𝑥𝑥𝑛𝑛 + 𝛽𝛽 𝑊𝑊 . (4.5)

Equations (4.4) and (4.5) represent the transition equations 𝑆𝑆 𝑛𝑛+1 = 𝑆𝑆 𝑀𝑀 (𝑆𝑆 𝑛𝑛 , 𝑥𝑥 𝑛𝑛 , 𝑊𝑊 𝑛𝑛+1 ) for this
problem.

An important feature of our interval estimation policy (4.1) is that imbedded in the policy is an
optimization problem. Here the optimization (given by the “argmax”) requires nothing more than
a simple sort over the alternatives to find one with the best value of 𝜇𝜇̅ 𝑥𝑥𝑛𝑛 + 𝜃𝜃 𝐼𝐼𝐼𝐼 𝜎𝜎�𝑥𝑥𝑛𝑛 . Later, we are
going to replace this with more sophisticated optimization problems. For example, in Topic 5,
our imbedded optimization problem will be a shortest path problem. In Topic 7 the imbedded
optimization problem will be a linear program, which we see again in Topic 8 when we are
dynamically planning energy storage. In Topic 10 the imbedded optimization problem will be an
integer program, and in Topic 11 (section 11.2) the imbedded optimization problem will be a
nonlinear programming problem.

The interval estimation policy is very easy to implement, but it is important not to overlook the
need to tune the parameter 𝜃𝜃 𝐼𝐼𝐼𝐼 . One way to do this is to run a simulation where we assume
that we know the true performance of each alternative 𝑥𝑥 which we denote by 𝜇𝜇𝑥𝑥 . If we choose
to test alternative 𝑥𝑥, we cannot observe 𝜇𝜇𝑥𝑥 perfectly – instead, we can only perform a noisy
observation where we add a noise term 𝜀𝜀. This means that the observed performance of 𝑥𝑥 = 𝑥𝑥 𝑛𝑛
would be given by

𝑊𝑊𝑥𝑥𝑛𝑛+1 = 𝜇𝜇𝑥𝑥 𝑛𝑛 + 𝜀𝜀. (4.6)

We can use the methods we presented for the inventory planning problem (see equations (2.5)-
(2.7)) to generate random observations of 𝜀𝜀. Now, create 𝑁𝑁 (say, 𝑁𝑁=100) observations of 𝑊𝑊𝑥𝑥𝑛𝑛
for each alternative 𝑥𝑥 and store these. Now, simulate our interval estimation policy 𝑋𝑋 𝐼𝐼𝐼𝐼 (𝑆𝑆 𝑛𝑛 |𝜃𝜃 𝐼𝐼𝐼𝐼 )
which we evaluate using

𝑛𝑛
𝐹𝐹(𝜃𝜃) = ∑𝑁𝑁
𝑛𝑛=1 𝑊𝑊𝑥𝑥 𝑛𝑛 (4.7)

where 𝑥𝑥 𝑛𝑛 = 𝑋𝑋 𝐼𝐼𝐼𝐼 (𝑆𝑆 𝑛𝑛 |𝜃𝜃 𝐼𝐼𝐼𝐼 ) and where the state variable 𝑆𝑆 𝑛𝑛 is updated using equations (4.4) and
(4.5). Note that we have written (4.7) as if we are running a single simulation. We can do this,
but it will be noisy. Instead of pre-generating the outcomes of 𝑊𝑊𝑥𝑥𝑛𝑛 and using these in the
simulation of the policy, it makes more sense to generate them on the fly (this is very fast using
any programming language such as Python). Now compute the sum in (4.7), say, 1000 times
and take an average. Finally, repeat this for a discrete set of values of 𝜃𝜃 such as 0, 0.1, 0.2, …,
4.0 and choose the value of 𝜃𝜃 𝐼𝐼𝐼𝐼 that works the best.
33

Topic 5: Shortest path problems


We are going to use shortest path problems as our first small step into a nontrivial decision
problem. While this is technically a linear program, we will be able to demonstrate a simple
algorithm based on Bellman’s equation that everyone can understand. Shortest path problems
also provide a nice visual setting to later illustrate the idea of a basis and pivoting (but this is for
later).

We are going to first present the static shortest path problem, and then transition to using this in
a dynamic setting as would occur when you are driving from one location to the next while your
navigation system is responding to new information. It is in the dynamic setting that we will see
that a shortest path problem is a form of policy (specifically a direct lookahead policy, or DLA)
that can, if we like, be parameterized to help deal with uncertainty.

5.1 Static shortest paths


This lecture is based on chapter 5 of SDAM.

This will be our first peek at a specialized linear program that we will solve using a classical
Bellman iteration. This is very simple in the context of a network problem since the “state”
variable is just the node where the traveler is located.

Assume we are trying to find the shortest path from origin 𝑠𝑠 = 1 to destination 𝑟𝑟 = 11. Let

𝑣𝑣𝑖𝑖 = the minimum cost to get from node 𝑖𝑖 to the destination node 𝑟𝑟 = 11. We are going
to initialize 𝑣𝑣11 = 0, and set 𝑣𝑣𝑖𝑖 equal to some large number for all other nodes 𝑖𝑖 ≠ 11.

Let

𝑁𝑁𝑖𝑖+ = the set of nodes we can reach from node 𝑖𝑖.


34

A simple (but not very efficient) algorithm for finding the shortest path from each node to node
11 would be to repeatedly compute for every node 𝑖𝑖

𝑣𝑣𝑖𝑖 = min+(𝑐𝑐𝑖𝑖𝑖𝑖 + 𝑣𝑣𝑗𝑗 ). (5.1)


𝑗𝑗∈𝑁𝑁𝑖𝑖

The idea is to repeatedly loop over all nodes i and compute (5.1) until none of the node values
change. We can store the optimal solution by letting

1 𝑖𝑖𝑖𝑖 𝑗𝑗 = argmin(𝑐𝑐𝑖𝑖𝑖𝑖 + 𝑣𝑣𝑘𝑘 )



𝑥𝑥𝑖𝑖𝑖𝑖 =� 𝑘𝑘∈𝑁𝑁𝑖𝑖+ (5.2)
0 𝑂𝑂𝑂𝑂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

Equation (5.1) is known as Bellman’s equation and is very popular in the academic literature for
solving a wide range of sequential decision problems. In practice, it only works for a very small
subset of problems, but this happens to be one where it works very well (although commercial
algorithms use a lot of shortcuts).

We can trace the shortest path by starting at node 𝑟𝑟 and then traversing from any node 𝑖𝑖 to the

node 𝑗𝑗 where 𝑥𝑥𝑖𝑖𝑖𝑖 = 1.

5.2 Dynamic shortest paths


Readings: SDAM Chapter 6

Here we introduce the problem faced by Google maps. We have to route a traveler through a
network where new information is arriving over time. Imagine that time steps forward one
increment each time we traverse a link. When the traveler arrives at node i at time t, Google
receives updated estimates of travel times and recomputes the shortest path. This is a dynamic
system where the “state” now includes two pieces of information: the node where the traveler is
located (node i), and the updated estimates of the travel times over the entire network.

To keep the notation simple, assume that we get updates of estimated travel times at each time
period, although in reality we only need to update the travel times each time a traveler arrives at
a node and has to make a decision. Now, instead of a fixed cost 𝑐𝑐𝑖𝑖𝑖𝑖 , we have costs that are
updated and depend on time 𝑡𝑡, so we define

𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 = The estimated travel time on each link (𝑖𝑖, 𝑗𝑗) given the information we have at time 𝑡𝑡.

We then solve the same static shortest path problem we did above, but instead we are using the

updated costs 𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 , and obtain the updated path 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 . We are not going to implement the entire
shortest path – instead, if we are at some node 𝑖𝑖 at time 𝑡𝑡, we are going to choose to go to node

𝑗𝑗 if 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 = 1.
35

We can write our shortest path problem at time 𝑡𝑡 using our vocabulary of policies. At time 𝑡𝑡, our
state variable 𝑆𝑆𝑡𝑡 captures what we know, which includes the node 𝑖𝑖𝑡𝑡 where we are located, and
the current estimates of all the link costs which we can write as

𝑐𝑐𝑡𝑡 = (𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 ) for all links (𝑖𝑖, 𝑗𝑗) in the network.

So we would write our state variable as

𝑆𝑆𝑡𝑡 = (𝑖𝑖𝑡𝑡 , 𝑐𝑐𝑡𝑡 ).

Our policy is to solve the static shortest path problem using our updated vector of costs 𝑐𝑐𝑡𝑡 , but
the policy only returns what the traveler should do at time 𝑡𝑡, which we can write as

𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 ) = 𝑥𝑥𝑖𝑖∗𝑡𝑡 ,𝑗𝑗 .

This means that 𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 ) is a vector of 0’s with a 1 in the entry corresponding to 𝑥𝑥𝑖𝑖∗𝑡𝑡 ,𝑗𝑗𝑡𝑡 = 1. We will
let

𝜋𝜋 1 𝑖𝑖𝑖𝑖 𝑗𝑗 = 𝑗𝑗𝑡𝑡
𝑋𝑋𝑡𝑡𝑖𝑖𝑡𝑡 ,𝑗𝑗
(𝑆𝑆𝑡𝑡 ) = �
0 𝑂𝑂𝑂𝑂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

Let’s introduce a twist. Imagine that we have a goal of reaching our destination by a particular
time, and while the shortest path suggests that we will arrive in time, we recognize that there is
uncertainty in the travel times. The costs (times) 𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 are just the estimated means from the
sampled observations we have from watching individual travelers. Instead of using an average,
what if we use the 80th percentile, or the 90th percent, or the 50th? These are easy to compute
from the raw data.

Let

𝜃𝜃 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = the percentile of the travel time for a link.


𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 �𝜃𝜃 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 � =the travel time corresponding to the 𝜃𝜃 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 of the travel times.

Now we use 𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 �𝜃𝜃 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 � for the travel times (instead of the means 𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 ). We would then write
our policy as 𝑋𝑋𝑡𝑡𝜋𝜋 �𝑆𝑆𝑡𝑡 |𝜃𝜃 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 � to express the dependence on 𝜃𝜃 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 . The performance of the
policy 𝑋𝑋𝑡𝑡𝜋𝜋 �𝑆𝑆𝑡𝑡 |𝜃𝜃 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 � depends on both the actual travel time, but also how late the traveler is for
their appointment. Typically we would add a penalty 𝜃𝜃 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 times how late the traveler

Now we have another tuning problem just like we saw with PFAs (Topic 2), and solved using the
same methods we saw in Topic 1. To evaluate our policy, let
36

𝑛𝑛
𝑐𝑐̂𝑡𝑡𝑡𝑡𝑡𝑡 = the sample realization of the actual time to traverse link (𝑖𝑖𝑖𝑖) that we reach at time
𝑡𝑡. These samples are not used to plan a path – they are only used to evaluate the policy for
𝑛𝑛
making decisions. We can generate 𝑐𝑐̂𝑡𝑡𝑡𝑡𝑡𝑡 using the Monte Carlo simulation methods we
introduced in section 2.3.

Next let

𝐹𝐹� 𝑛𝑛 (𝜃𝜃) =the actual travel time over the entire path for the 𝑛𝑛𝑡𝑡ℎ trial, using costs 𝑐𝑐̂𝑡𝑡𝑡𝑡𝑡𝑡
𝑛𝑛
.
𝜋𝜋 𝑛𝑛
= ∑𝑇𝑇𝑡𝑡=1 ∑𝑖𝑖𝑖𝑖 𝑋𝑋𝑡𝑡𝑡𝑡𝑡𝑡 (𝑆𝑆𝑡𝑡 |𝜃𝜃) 𝑐𝑐̂𝑡𝑡𝑡𝑡𝑡𝑡

𝐹𝐹� 𝑛𝑛 (𝜃𝜃) is the actual travel time we experience in our 𝑛𝑛𝑡𝑡ℎ trip following policy 𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) while
𝑛𝑛
experiencing link costs 𝑐𝑐̂𝑡𝑡𝑡𝑡𝑡𝑡 . Let’s say that we have to finish the trip in time 𝜏𝜏 to arrive in time for
our appointment. Let

𝜂𝜂 =penalty per unit time for being late.

The total cost (time plus late penalty) for the 𝑛𝑛𝑡𝑡ℎ trip is then

𝐶𝐶̂ 𝑛𝑛 (𝜃𝜃) = 𝐹𝐹� 𝑛𝑛 (𝜃𝜃) + 𝜂𝜂 max�0, 𝐹𝐹� 𝑛𝑛 (𝜃𝜃) − 𝜏𝜏�. (5.3)

We can then write the performance of our policy by averaging over 𝑁𝑁 as

1 𝑁𝑁
𝐶𝐶̅ 𝜋𝜋 (𝜃𝜃) = ∑ 𝐶𝐶̂ 𝑛𝑛 (𝜃𝜃).
𝑁𝑁 𝑛𝑛=1

We now have another instance of needing to tune the parameter of a policy.

This lecture is setting the stage for parameterizing linear programs. When we present linear
programs in Topic 7, we are going to start by presenting a basic static, deterministic linear
program (just as we did with our initial shortest path problem), and then transition to recognizing
that the linear program we chose is typically solved repeatedly over time, just as we have done
above with our shortest path problem.
37

Topic 6 – General concepts


Up to now we have been approaching problems in what might appear an ad hoc manner.
Actually our problems have been carefully chosen to illustrate some important dimensions of
modeling and solving different types of decision problems.

Below we are going to step back and talk more generally about modeling, designing policies,
and evaluating policies.

So far we have seen two types of optimization problems:

• Static problems (the machine learning problems in Topic 1, and the shortest path
problem in Topic 5, section 5.1).
• Sequential decision problems (Topics 2, 3 and 4,, and the dynamic shortest path
problem in Topic 5, section 5.2).

Below we are going to provide modeling frameworks for each of these problems.

6.1 Modeling static optimization problems

For our machine learning problems, we faced the problem of minimizing a nonlinear function
(the sum of squares of errors) of a set of tunable coefficients 𝜃𝜃. If we let

2
𝐹𝐹(𝜃𝜃) = ∑𝑁𝑁 𝑛𝑛 𝑛𝑛
𝑛𝑛=1�𝑦𝑦 − 𝑓𝑓(𝑥𝑥 |𝜃𝜃)� , (6.1)

then we can write our optimization problem as

min 𝐹𝐹(𝜃𝜃). (6.2)


𝜃𝜃

We refer to 𝐹𝐹(𝜃𝜃) as our objective function. The standard form that we might use would be to
replace 𝜃𝜃 with 𝑥𝑥 which is our standard notation for a decision variable, and let 𝐶𝐶(𝑥𝑥) be a generic
“cost” function (assuming we are minimizing, which is what is standard in deterministic
optimization). Our problem would then be written

min 𝐶𝐶(𝑥𝑥). (6.3)


𝑥𝑥

We can use this style for our shortest path problem, where 𝑥𝑥 = (𝑥𝑥𝑖𝑖𝑖𝑖 ) is the vector of flows over
the network. We would let
38

1 𝑖𝑖𝑖𝑖 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 (𝑖𝑖, 𝑗𝑗) 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑒𝑒 𝑠𝑠ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝ℎ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡 𝑡𝑡ℎ𝑒𝑒 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑.
𝑥𝑥𝑖𝑖𝑖𝑖 = �
0 𝑂𝑂𝑂𝑂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

We would then write our objective function as

𝐶𝐶(𝑥𝑥) = ∑𝑖𝑖,𝑗𝑗 𝑐𝑐𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖 , (6.4)

where 𝑐𝑐𝑖𝑖𝑖𝑖 is the cost of traversing link (i,j). The problem with this formulation is that the optimal
solution is 𝐶𝐶(𝑥𝑥) = 0 since we would just set 𝑥𝑥𝑖𝑖𝑖𝑖 = 0. We need to introduce constraints so that
we guarantee that our optimal solution returns a shortest path from the origin 𝑟𝑟 to the
destination 𝑠𝑠. We do this by introducing constraints which would be written

∑𝑗𝑗 𝑥𝑥𝑖𝑖𝑖𝑖 − ∑𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖 = 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖, 𝑗𝑗 ≠ 𝑟𝑟 𝑜𝑜𝑜𝑜 𝑠𝑠. (6.5)

∑𝑗𝑗 𝑥𝑥𝑖𝑖𝑖𝑖 − ∑𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖 = 1 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 𝑟𝑟. (6.6)

∑𝑗𝑗 𝑥𝑥𝑖𝑖𝑖𝑖 − ∑𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖 = −1 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 𝑠𝑠. (6.7)

We then typically impose the requirement that 𝑥𝑥𝑖𝑖𝑖𝑖 cannot be negative by writing

𝑥𝑥𝑖𝑖𝑖𝑖 ≥ 0. (6.8)

We often want to include upper bounds 𝑢𝑢𝑖𝑖𝑖𝑖 . In our shortest path problem, equations (6.5)-(6.7)
would allow a solution where 𝑥𝑥𝑖𝑖𝑖𝑖 > 1 implying a path that is running in circles (which is clearly
not a good idea). If we want upper bounds, we would then write

𝑥𝑥𝑖𝑖𝑖𝑖 ≤ 𝑢𝑢𝑖𝑖𝑖𝑖 (6.9)

Equations (6.5) – (6.7) can be written in matrix form

𝐴𝐴𝐴𝐴 = 𝑏𝑏 (6.10)

by suitably constructing the matrix 𝐴𝐴 and the vector 𝑏𝑏. By doing this, we can now write our
optimization problem in the form

min𝐶𝐶(𝑥𝑥) = ∑𝑖𝑖𝑖𝑖 𝑐𝑐𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖 = 𝑐𝑐 𝑇𝑇 𝑥𝑥, (6.11)


𝑥𝑥

subject to the constraints

𝐴𝐴𝐴𝐴 = 𝑏𝑏 (6.12)
𝑥𝑥 ≤ 𝑢𝑢 (6.13)
𝑥𝑥 ≥ 0 (6.14)
39

Equations (6.11)-(6.14) is a fairly standard way of writing a static optimization problem. What is
most important about this framework is that it is a language for thinking about a very large class
of decision problems.

An important dimension of modeling optimization problems is the use of constraints such as


(6.12) – (6.14). We first saw constraints in our shortest path problem, but these were enforced
by the concept of finding a path. The optimization problems in Topics 1 and 2 were
unconstrained. Linear programs (which we introduce in Topic 7) are meaningless without
constraints. In fact, we are going to see that expressing problem characteristics through
constraints is an art form that is a necessary skill for people who want to use linear programs.

Constraints represent a major feature of static optimization problems. They come in a number
of flavors:

• Unconstrained problems, which we saw in Topic 1 for machine learning.


• Upper and lower bounds (also known as “box” constraints), such as (6.13) and (6.14).
• General linear constraints such as (6.12). With some creativity, we can handle a wide
range of constrained problems using linear constraints. For our network problem, linear
constraints are natural, but later (in Topic 10, section 10.3) we are going to need some
creativity to express constraints using linear equations.
• Nonlinear constraints – An example might be 𝑥𝑥𝑖𝑖 �1 − 𝑥𝑥𝑗𝑗 � = 0. We do not deal with
nonlinear constraints in this course.

Easily the most widely used algorithmic search strategy depends on gradients. The simplest
search algorithms are gradient-based, as we saw in section 1.2 for optimizing the parameters of
a nonlinear machine learning model, and again in topic 3 for our newsvendor problem. In Topic
7, we are going to see how to use a gradient-based algorithm in the presence of linear
constraints using a method that is widely known as the simplex algorithm.

The literature on optimization problems is incredibly rich, but often ignores that an optimization
problem is just solving a decision problem at one point in time, whereas the real application
involves decisions that are made sequentially over time.

6.2 Modeling sequential decision problems


Modeling any sequential decision problem starts by answering three questions:

1. What are the performance metrics?


2. What types of decisions are being made (and for larger problems, who makes each type
of decision)?
3. What are the sources of uncertainty?
40

The answers to these three questions lay the foundation for building our model.

Any model of a sequential decision problem can be broken down into five components:

1. State variables 𝑆𝑆𝑡𝑡 – A “state variable” 𝑆𝑆𝑡𝑡 is all the information we need to model the
system from time t onward. In other words, the state variable can be viewed as the
“state of information” or, more precisely, the “state of knowledge,” capturing everything
we know or believe that we need to a) compute the objective function (e.g. changing
prices and costs), b) make a decision, which means the information needed to represent
the available set of actions (e.g. what product to choose among those that are available)
or the constraints that determine the feasibility of 𝑥𝑥, and c) any other information needed
to model the evolution of information in (a) or (b) (see the “transition function” below).

It is useful to distinguish between the initial state 𝑆𝑆0 that might include static information
that never changes (such as the maximum speed of a truck), from the dynamic state
variables 𝑆𝑆𝑡𝑡 , 𝑡𝑡 > 0 which includes information that is changing over time such as the
location of a vehicle moving over a network, the amount of inventory being held, or the
evolving prices of an asset.

State variables come in three flavors:

• 𝑅𝑅𝑡𝑡 = Vector of physical and financial aresources (people, product, equipment,


facilities). This can be inventories, or the location of a person or piece of
equipment that is moving around. It can also be the amount of cash on hand,
investments of different kinds, loans, …
• 𝐼𝐼𝑡𝑡 = Other information about the system not included in 𝑅𝑅𝑡𝑡 such as prices,
weather, market conditions.
• 𝐵𝐵𝑡𝑡 = Beliefs about quantities and parameters that are not known perfectly. This
could be the mean and variance of a normal distribution, a vector of probabilities
of discrete values of parameters such as a cost or constraint (we first saw belief
state variables in Topic 4).

There is tremendous (and surprising) confusion about state variables in the academic
literature. There are entire fields (Markov decision processes, reinforcement learning)
that never even define a state variable. For a more in-depth discussion see the
webpage

https://tinyurl.com/onstatevariables/

2. Decision variables 𝑥𝑥𝑡𝑡 – These are the variables that we are controlling. We distinguish
between initial decisions 𝑥𝑥0 which are made once (called design decisions), versus
𝑥𝑥𝑡𝑡 , 𝑡𝑡 > 0 which are made over time (called control decisions). We represent the feasible
decisions by creating a set 𝑋𝑋𝑡𝑡 that can be a feasible set of actions (e.g. what link to
traverse, which person to hire) or the feasible region defined by constraints such as
41

(6.12) - (6.14). Finally, we introduce a function 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃), which we call a policy, which
determines how we make a decision from the information available in the state variable
𝑆𝑆𝑡𝑡 . Our policies usually depend on tunable parameters 𝜃𝜃 (but not always). Most
important: We will determine the policy later!

3. Exogenous information 𝑊𝑊𝑡𝑡+1 – This is information that we did not know at time t, but
which became available by time t+1 when we have to determine the decision 𝑥𝑥𝑡𝑡+1 . In
our examples above, the information 𝑊𝑊1 , … , 𝑊𝑊𝑡𝑡 , … , 𝑊𝑊𝑇𝑇 is generated in advance, either
from history (as we did with the asset selling example) or from Monte Carlo simulation,
as we did in the inventory planning example. Although we created this information in
advance, we never made a decision 𝑥𝑥𝑡𝑡 using the information in 𝑊𝑊𝑡𝑡+1 or later.

Up to now, we have generated a single random sequence of the sequence of


observations 𝑊𝑊1 , … , 𝑊𝑊𝑡𝑡 , … , 𝑊𝑊𝑇𝑇 , which we then used just as we used the training data in
our machine learning problems in Topic 1. However, there are settings where 𝑊𝑊𝑡𝑡+1
depends on the state variable 𝑆𝑆𝑡𝑡 , or the decision 𝑥𝑥𝑡𝑡 , or both. We can still use a sample of
the sequence 𝑊𝑊1 , … , 𝑊𝑊𝑡𝑡 , … , 𝑊𝑊𝑇𝑇 , but we cannot generate it in advance. Instead, we have
to generate it as the system evolves.

Sometimes (in fact, frequently) we want to explicitly capture that there may be more than
one sequence of exogenous information. Imagine that we are generating the demands
for our inventory problem. Instead of generating one sample, we generate 10 as shown
below. Following standard practice from the modeling literature, we let the Greek letter
𝜔𝜔 (“omega”), index the sample paths. So, if we generate 10 samples of the demands, 𝜔𝜔
would range from 1 to 10.

To indicate a particular realization of 𝑊𝑊𝑡𝑡 , we would write 𝑊𝑊𝑡𝑡 (𝜔𝜔) (if we are using demands
𝐷𝐷𝑡𝑡 , we would write 𝐷𝐷𝑡𝑡 (𝜔𝜔)). We can use this notation to compute, say, the average
demand at time 𝑡𝑡 using

1 10
�𝑡𝑡 =
𝐷𝐷 ∑ 𝐷𝐷 (𝜔𝜔) (6.15)
10 𝜔𝜔=1 𝑡𝑡
42

4. Transition function – This is the function that describes how the information in the state
variable evolves over time. We write this function using

𝑆𝑆𝑡𝑡+1 = 𝑆𝑆 𝑀𝑀 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 , 𝑊𝑊𝑡𝑡+1 ). (6,16)

In other words, given what we know (or believe) which is captured in 𝑆𝑆𝑡𝑡 , the decision we
made 𝑥𝑥𝑡𝑡 , and the new information that arrived from outside the system (which is not
known at time 𝑡𝑡), given by 𝑊𝑊𝑡𝑡+1 , the transition function returns the updated state variable
𝑆𝑆𝑡𝑡+1 . The notation 𝑆𝑆 𝑀𝑀 (⋅) stands for “state transition model” (or if you like, “system
model”).

A transition function can be a single equation such as the inventory equation (2.4) for our
inventory planning problem. However, for complex systems (supply chains, trucking
companies, energy systems, health systems) the transition function may require many
thousands of lines of code.

5. Objective function – With deterministic optimization, objective functions are very


straightforward, typically summing costs that are minimized, or it might be some
performance metric to be maximized.

With sequential decision problems, objective functions can come in different styles. We
start by assuming that we are maximizing the cumulative contribution (or reward) over
time, as we did in the problems in Topic 2. Let’s start by writing the contribution in each
time period as

𝐶𝐶(𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 ) = the contribution from decision 𝑥𝑥𝑡𝑡 made at time t, using the information in 𝑆𝑆𝑡𝑡
such as a dynamically varying cost or price, such as the price 𝑝𝑝𝑡𝑡 in our
asset selling problem.

We may write the contribution using 𝐶𝐶(𝑆𝑆𝑡𝑡 , 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃)) to reflect the dependence on the
policy, since 𝑥𝑥𝑡𝑡 = 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃).

Now imagine that we have created in advance a random sample of the information
sequence 𝑊𝑊1 , … , 𝑊𝑊𝑇𝑇 as we did in our asset selling example or the inventory planning
example. Our objective function would then be written

𝐹𝐹(𝜃𝜃) ≈ ∑𝑇𝑇𝑡𝑡=0 𝐶𝐶(𝑆𝑆𝑡𝑡 , 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃)) (6.17)

where the transition function 𝑆𝑆𝑡𝑡+1 = 𝑆𝑆 𝑀𝑀 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 = 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃), 𝑊𝑊𝑡𝑡+1 ) is computed with our
sampled set of observations 𝑊𝑊1 , … , 𝑊𝑊𝑇𝑇 .

If we wish to use more than one sample of 𝑊𝑊1 , … , 𝑊𝑊𝑇𝑇 , we can assume we have a set
𝑊𝑊𝑡𝑡 (𝜔𝜔) for 𝜔𝜔 = {𝜔𝜔1 , … , 𝜔𝜔𝑛𝑛 , … , 𝜔𝜔𝑁𝑁 }. Now we would model our transition function by
reflecting the dependence on the sample 𝜔𝜔 which we can write using
43

𝑆𝑆𝑡𝑡+1 (𝜔𝜔) = 𝑆𝑆 𝑀𝑀 (𝑆𝑆𝑡𝑡 (𝜔𝜔), 𝑥𝑥𝑡𝑡 (𝜔𝜔) = 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 (𝜔𝜔)|𝜃𝜃), 𝑊𝑊𝑡𝑡+1 (𝜔𝜔)). (6.18)

We then calculate an estimate of our objective function using

1 𝑁𝑁
𝐹𝐹(𝜃𝜃) ≈ ∑ ∑𝑇𝑇 𝐶𝐶(𝑆𝑆𝑡𝑡 (𝜔𝜔𝑛𝑛 ), 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 (𝜔𝜔𝑛𝑛 )|𝜃𝜃))] (6.19)
𝑁𝑁 𝑛𝑛=1 𝑡𝑡=0

Here we are just running 𝑁𝑁 simulations and taking an average. However we compute
the objective function, our optimization problem is then written

min 𝐹𝐹(𝜃𝜃) (6.20)


𝜃𝜃

Instead of listing constraints as we did in our static objective function (these are built into
the design of our policy 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃)), we would follow the statement of the objective in
(6.20) with the transition function (6.16) (or (6.20)) and the information given to the
model in the form of the initial state 𝑆𝑆0 and the information 𝑊𝑊1 , … , 𝑊𝑊𝑇𝑇 .

We see that modeling a sequential decision problem is much richer than a static optimization
model, but it is roughly a mathematical statement of the simulations that we have already
illustrated in Topic 2.

Our modeling framework can be used to model any sequential decision problem, although there
are other choices for the objective function. Next we are going to turn to the issue of designing
policies, which is a little richer than we have indicated above.

6.3 Designing policies


Above, we wrote our policy as 𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) which seems to imply that we already have some
functional form for the policy, and then have to tune the parameters 𝜃𝜃. However, just as we
have to choose between different models in machine learning (such as the linear and nonlinear
models we saw in Topic 1), we also have to choose among different functional forms for
policies. However, the set of choices becomes much broader.

We can divide all the different types of policies into four classes which cover every possible
method for making decisions. These are organized into two broad strategies as follows:

Strategy I: Policy Search. This strategy searches over parameterized functions to identify the
ones that work best over time. These come in two classes:
1) Policy function approximations (PFAs). Analytical functions that map the information
in the state variable direction to a decision. Some examples are:
a. Buy low, sell high policies in finance - see asset selling in Topic 2.
44

b. Order-up-to policies for inventories: If the inventory is below 𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 order up to


𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 and we then have to tune 𝜃𝜃 = �𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 , 𝜃𝜃 𝑚𝑚𝑚𝑚𝑚𝑚 � – see inventory planning in
Topic 2.
c. Linear decision rules are special problems where we can write a policy as:
𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) = � 𝜃𝜃𝑔𝑔 𝜙𝜙𝑔𝑔 (𝑆𝑆𝑡𝑡 ).
𝑔𝑔∈𝐺𝐺
d. A PFA can be any lookup table, parametric function (linear or nonlinear including
neural networks) or nonparametric function.
We note that PFAs include every possible functional form that we might use in machine
learning.

2) Cost function approximations (CFAs). A parameterized optimization problem that is


typically a deterministic approximation, in which parameters have been introduced to
make it work well under uncertainty (see [7] for an introduction). CFAs are widely used in
industry in an ad-hoc way, but I have not been able to find this strategy formally studied
in the research literature. Some examples include:
a. Solving the shortest path over a network with random link times, but use the 𝜃𝜃-
percentile of the travel times (instead of the mean) - see Dynamic shortest paths
in Topic 5..
b. Scheduling aircraft using an integer program, while inserting slack in the
schedule to account for weather delays.
c. Scheduling nurses but limiting their time to 32 hours per week to provide slack in
case emergencies arise.
Strategy II: Lookahead approximations. These policies identify good decisions by optimizing
across the current cost or contribution plus an approximation of the effect of a decision now on
the future. Again, these come in two classes:
3) Value function approximations (VFAs). Policies based on VFAs cover all methods
based on Bellman’s equation, which approximates the downstream value of landing in a
state. This approach has attracted tremendous attention under names such as
approximate dynamic programming, adaptive dynamic programming, neurodynamic
programming, and most commonly today, reinforcement learning. We illustrate
Bellman’s equation in Static shortest paths (Topic 5, section 5.1), but here the value
functions are exact, since we take advantage that our “state” variable is simply which
node where the traveler is located.

We can formulate virtually any sequential decision problem using Bellman’s equation,
but the vast majority cannot be solved exactly. There is by now a substantial literature
for estimating value functions approximately, although this approach is more popular in
the academic literature rather than used in practice. An in-depth investigation of this
strategy is beyond the scope of this course.
45

4) Direct lookahead approximations (DLAs). This is where we explicitly plan into the
future to help make a decision now. DLAs can be split into two subclasses:
a. Deterministic DLAs are when we ignore uncertainty to create a deterministic
lookahead model, a strategy that is often called a rolling (or receding) horizon
procedure, or model predictive control. We do this in Dynamic Shortest Paths
(section 5.2).

It is possible to parameterize the lookahead to help make it more robust to


uncertainty, producing a hybrid CFA/DLA, as we do in section 5.2.
b. Stochastic DLAs create an approximate stochastic lookahead model, typically
using sampled approximations of random outcomes. This covers stochastic
programming (with scenario trees), robust optimization and approximate dynamic
programming, which can be used to solve a simplified stochastic lookahead.
This policy is beyond the scope of this course.

These four (meta)classes of policies are universal – they include any method proposed in the
research literature or used in practice. None of these methods is a panacea – depending on the
specific characteristics of a problem, any one of these may work best. However, some are more
useful than others. If we divide DLAs into two classes (deterministic lookahead and stochastic
lookahead), we have five types of policies. These can be organized into three categories,
ranging from the most to least widely used:

• Category 1: PFAs, CFAs and deterministic DLAs – This category is absolutely the most
widely used. The choice of PFA, CFA and deterministic DLA tends to be obvious from
the application.
• Category 2: Stochastic DLAs – There is a handful of problems where we need to plan
into the future, and where we have to recognize that the future is uncertain.
• Category 3: Policies based on VFAs – Value function approximations are incredibly
popular in the academic literature, but the number of applications in practice for VFAs is
quite small.

This course focuses primarily on the policies in Category 1. These are by far the most widely
used, and therefore are most appropriate for an introductory course on optimization. Categories
2 and 3 are useful for very special classes of problems, but are beyond the scope of an
introductory course.

6.4 Evaluating policies


The most common way to evaluate a policy is to simulate its performance using historical data
as we did with the asset selling example, or simulated data as we did for inventory planning
(see equation (6.17)). If we have access to multiple samples of the information process
𝑊𝑊1 , … , 𝑊𝑊𝑇𝑇 which we represented using 𝑊𝑊𝑡𝑡 (𝜔𝜔) for 𝜔𝜔 = {𝜔𝜔1 , … , 𝜔𝜔𝑁𝑁 }, we can use the average
46

performance in equation 6.19. Both versions sum the contributions over the time periods 𝑡𝑡 =
0,1, … , 𝑇𝑇.

There are times when we are running a series of experiments as we did in Topic 4 for finding
the best treatment. Now imagine that we are trying to find the best combinations of chemicals
to produce a new material, or we are testing different processes to create a new drug that we
are evaluating in a lab. Alternatively, we might be using a simulator in a computer to test
different sizes of a fleet of

These are settings where we do not care how well we do along the way – instead we just care
how well we do at the end.

There are two objective functions we can use to evaluate a policy:

• Optimize the cumulative cost or contribution – Here we add up costs (or contributions) to
evaluate the policy over time (or experiments). I like to call this the “cumulative reward”
objective.
• Optimize the final cost or contribution – Here we run a series of experiments where we
learn from the experiments, but we are not concerned with how well we do. Then, after
using up our budget for learning, we have to make a final decision of what is the best
choice, and then evaluate the performance of this choice. I call this the “final reward”
objective.

A second issue we have to recognize is that we often have to distinguish between how well we
expect to perform, and the risk that the performance might trigger a red flag that we would like
to avoid. For example:

• In an inventory problem, we may want to minimize average costs (where we have a cost
for lost demand), but where we want to make sure we cover at least 97 percent of
demands.
• In a financial problem, we may want to place special emphasis on avoiding loses beyond
some acceptable amount when we sell our asset.

We have seen an example of risk in our dynamic shortest path problem where we need to
minimize the risk of arriving late for an appointment. This is handled in equation (5.3) where we
add a penalty for late arrivals.

Risk is a very rich issue, but is beyond what should be covered in an introductory course.
47

Topic 7 – Linear programming


We finally get to linear programming. Unlike traditional optimization courses that might start
with linear programming, we recognize that these are complex problems that only arise in very
specialized situations. By this point in the course, students have seen optimization problems
that everyone encounters. Now we are going to move into an important class of resource
allocation problems that are much higher dimensional. These problems are important and arise
in many business settings, but it is unlikely that students will have experienced these problems.

Linear programming is initially presented as a static problem, where we formulate a problem as


a linear program, solve it, get the optimal solution, and then implement it (in some way).
Virtually all of the original motivating applications for linear programs are, in fact, sequential
decision problems, which means the linear program is actually a policy where the decisions are
implemented over time, almost always in the presence of some form of uncertainty. We
address this perspective after we treat the static problem.

We are going to progress in three steps:

• Section 7.1 – We are going to use a basic resource allocation problem to illustrate a
linear program. This will be the first time that we address a decision that is in the form of
a vector. We are going to illustrate the simplex algorithm using a purely graphical
approach (networks make this easy). I would note that I do not think it is necessary to
teach the simplex algorithm, but it is popular material, and it does help in understanding
dual variables.
• Section 7.2 – In this section I repeat the simplex algorithm but this time I show how to
perform each step using linear algebra. I consider this material completely optional, but
for faculty who enjoy presenting the simplex algorithm, the network problem makes it
very easy to walk through the steps without having to resort to a two-dimensional linear
program.
• Section 7.3 – Now we show that our so-called static linear program is actually a
sequential decision problem, which means that our original LP is actually a policy for a
sequential decision problem. This exactly parallels the transition from a static shortest
path problem (section 5.1) to a dynamic shortest path problem (section 5.2).

7.1 As a static problem – The simplex algorithm I


There are many ways to illustrate the need for a linear program – one is the network problem
below where we have supplies of resources at three locations, and we need to satisfy demands
at four locations. Finding the optimal way to distribute these supplies to meet the demands can
be solved as a linear program. In this section we illustrate the simplex algorithm applied to
networks (this is known as “network simplex”) where the entire presentation is graphical – no
48

linear algebra. In section 7.2 I repeat the steps, but this time I include the linear algebra that
goes with each step.

Using what we learned in Topic 6, we can write this out as a linear program using our canonical
model

𝐴𝐴𝐴𝐴 = 𝑏𝑏 (7.1)
𝑥𝑥 ≤ 𝑢𝑢 (7.2)
𝑥𝑥 ≥ 0 (7.3)

Writing out the constraints gives us

∑𝑗𝑗 𝑥𝑥𝑖𝑖𝑖𝑖 = 𝑅𝑅𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3. (7.4)


∑𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖 ≤ 𝐷𝐷𝑗𝑗 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 = 4,5,6,7. (7.5)
𝑥𝑥𝑖𝑖𝑖𝑖 ≥ 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3 𝑎𝑎𝑎𝑎𝑎𝑎 𝑗𝑗 = 4,5,6,7 (7.6)

If you want to teach the simplex method, a nice way is to use the network above and show the
steps of simplex graphically, rather than the usual treatment using matrices. We are going to
start by having all the flows exit through a super sink with zero-cost links moving from each
destination node (nodes 4-7) with upper bounds equal to the demand at the node:
49

The supersink, while not necessary for the algorithm, will help to simplify the presentation.

We first need an initial feasible solution, and this solution has to represent what is known as a
basis. A basis (for a network problem) is a set of links with flows that satisfy all flow
conservation constraints, along with all upper and lower bounds.

There are different ways to get an initial feasible solution. For this network problem, we can start
by putting the required flow into each destination node (nodes 4-7) on the link into the
supersink. Then, we start at node 4, find a link from a supply node (node 1), and put as much
flow on this link as we can. Since the demand at node 4 is only 15, we cannot put more than
15. Also, since the supply at node 1 is 30, we have to take the smaller of 30 and 15 and put this
amount (15) on the link (1-4).
50

Now we go to node 1 where we still have 15 units of unassigned flow. Node 5 needs 35 units of
flow, but we only have 15 remaining, so we put 15. Now we go to node 5, where we still have
an unsatisfied demand of 20, and look to the next supply node, node 2, which has 45 units of
available flow. We take 20 of these to put on the link (2-5) which now gives us our required flow
of 35 into node 5.

Next we move to node 2 where we still have 25 unassigned units of flow. We turn to node 6
that needs 30 units of flow, and push all 25 units of flow from 2 to 5. We still need 5 more units
of flow at node 6, so we move down to node 3 and take 5 out of the available 25 units of flow.

Finally, we move the remaining 20 units of flow at node 3 to node 7.

We are not quite done. To be a basis (for a network), the set of links in the basis must satisfy
two conditions:

1) All links with flow strictly between the upper and lower bound must be in the basis.
2) The set of links in the basis must form of tree.

So, we quickly see our links fail condition 2 – the set of links with flow do not form a tree. But
we are not required to keep links with flow if the flow is at the upper or lower bound. Of the four
links into the supersink, only one can be in the basis, so we are going to arbitrarily choose the
first one, which gives us the basis:

We are not claiming that this is optimal (it is not), but we now have a way of finding the optimal
solution. The first step is to compute a “value” (known as a “dual variable”) which is the cost of
moving a unit of flow from each node to the supersink. Remember that all the links directly
attached to the supersink have 0 cost. So, the value at node 4 would be 0, since this is the cost
of the path from 4 to SS.
51

To get from node 1 to SS, we have to move 1-4 (cost 4) and then 4-ss (cost 0) which is a cost of
4. A better way to get the dual at node 1 is to see that the path moves from 1 to 4 (at a cost of
4), and then just add the dual at 4 (which is 0).

To get from node 5 to SS, we first have to go backwards on the link (1-5), so this is a cost of
minus 12. The dual at 1 is 4, so the dual at 5 is -12 + 4 = -8.

The dual at node 2 would be the cost of 2 to get 2-5, plus the dual at 5 = -8, so the dual at 2 = 2
+ (-8) = -6. Continuing this logic for the remaining nodes gives us the duals:

You can quickly check that each cual 𝑣𝑣𝑖𝑖 is the cost from that node to the supersink, following
links that are in the basis.

Note at the same time there is always a single path from each node to the basis. This is a key
property of the basis, which we are about to exploit.

Now let’s optimize. We are going to do this by looking at the nonbasic links without flow, and
ask: What is the value of increasing flow on a nonbasic link?

Let’s take the link (1-6). To move one more unit of flow from 1 to 6, we are going to first add the
flow from 1-6 (at a cost of 7), then we are going to move one unit of flow from 6 to the super
sink, at a cost 𝑣𝑣6 = −16 (the dual variable for node 6), and then we are going to move one unit
of flow from the super sink back to node 1. The cost of moving flow from SS to node 1 is
negative the dual for node 1 (since this is the cost to go from 1 to SS). This means moving a
unit of flow from 1 to 6, then 6 to SS, and finally SS back to 1, is

𝑐𝑐̅16 = 𝑐𝑐16 + 𝑣𝑣6 − 𝑣𝑣1 .

The cost 𝑐𝑐̅16 is called the reduced cost since the cost is “reduced” by the changes needed to
guarantee that we still satisfy all the constraints. This is a very simple calculation, which means
52

we can do this calculation for every link that is not in the basis (note that the reduced cost for
any link in the basis is equal to zero). For link 1-6, the reduced cost is

𝑐𝑐̅16 = 𝑐𝑐16 + 𝑣𝑣6 − 𝑣𝑣1 = 7 + (−16) − 4 = −13.

The reduced cost captures the change in all the costs if we add one more unit of flow from 1 to
6, and then make all the other adjustments needed to ensure that we are still satisfying all the
other constraints. Since the reduced cost is negative, this means that for each unit of flow, total
costs will go down by 13, which means we get a better solution.

Now we just have to figure out how much flow we can move. We first note that adding one unit
of flow from 6 to SS, and then moving one unit of flow from SS to 1, means there is no change
on links 1-4 and 4-SS. The only
links where flow actually
changes are the links on the
path along the links in the basis
from 6 back to 1. This means
flow increases on the links 1-6
and 2-5, while flow decreases
on links 1-5 and 2-6. We want
to move flow until the first link
hits its lower bound (or upper
bound if we had these, but we
don’t). The link 1-5 has 15 units
of flow, while link 2-6 has 25
units of flow, so link 1-5 will be
the first link to hit zero. This
means we can move 15 units of flow, at which point we would stop and drop link 1-5 from the
basis (since it no longer has flow), while we now add link 1-6 to the basis.

We now have a new basis which means we have to update all our dual variables 𝑣𝑣𝑖𝑖 . There are
ways to do this very quickly, but these are technical issues that are not important for our
discussion. Remember – we are not trying to teach students how to code the simplex algorithm
– we are trying to teach important concepts in optimization.

With the new dual variables, we now have to recompute the reduced costs for all the nonbasic
links (note that linear programming packages have tricks to do this very quickly). If we find
another link with a negative reduced cost, then we have to repeat this exercise. We keep doing
this until we no longer find any links with a negative reduced cost. At this point, we have found
the optimal solution.

It is possible that when we route flow around the cycle, two links may hit zero at the same time.
If this happens, we drop only one from the basis, and leave the other link with zero flow in the
basis. This is known as a degenerate basis. A byproduct of a degenerate basis is that it is
53

possible that we may find that the amount of flow we can move around the cycle (when we find
a nonbasic link with negative reduced cost) is zero. Nothing wrong with this – it is actually fairly
common.

This is a peek into the simplex method for network problems. The simplex algorithm works for
linear programs that are not networks, and in this case we cannot draw these pretty pictures.
But the basic idea is the same. Modern implementations of the simplex algorithm involve a vast
array of engineering tricks to make the algorithm extremely fast. What is most important for
students to understand is that we have exceptionally fast algorithms for solving linear programs,
and free software is widely available.

Next, we are going to illustrate a dynamic inventory problem where we repeatedly solve linear
programs over time.

7.2 The simplex algorithm II – with the matrix linear algebra


For a first course on optimization, I do not think it is necessary to show the matrix linear algebra
behind the simplex algorithm, but there will be instructors who will want to include this material.
This section includes all the slides from one of my lectures that presents the network simplex
algorithm alongside the matrix calculations. You can download the PowerPoint slides that
contain this material from https://tinyurl.com/PowellNetworkSimplex/.

The illustration below uses the network in section 7.1, but without the supersink. We choose
node 1 arbitrarily to be the root node (and we explain why we need the concept of a “root node”
for networks in the discussion below).

A generic linear programming model for a network problem is written as follows:

min � � 𝑐𝑐𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖


𝑥𝑥
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑖𝑖 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑗𝑗

Subject to the constraints:

∑𝑗𝑗 𝑥𝑥𝑖𝑖𝑖𝑖 = 𝑆𝑆𝑖𝑖 = Supply at node i

∑𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖 = 𝐷𝐷𝑗𝑗 =Demand at node j

𝑥𝑥𝑖𝑖𝑖𝑖 ≥ 0
54

We can write these constraints in matrix form as

𝐴𝐴𝐴𝐴 = 𝑏𝑏
𝑥𝑥 ≥ 0

Now assume we are going to solve the numerical example below. Our constraints would look
like
𝑥𝑥14 + 𝑥𝑥15 + 𝑥𝑥16 = 12

𝑥𝑥24 + 𝑥𝑥25 + 𝑥𝑥26 + 𝑥𝑥27 = 18

𝑥𝑥36 + 𝑥𝑥37 = 15

−𝑥𝑥14 − 𝑥𝑥24 = −8

−𝑥𝑥15 − 𝑥𝑥25 = −19

−𝑥𝑥16 − 𝑥𝑥26 − 𝑥𝑥36 = −12

−𝑥𝑥27 − 𝑥𝑥37 = −6

𝑥𝑥14 , 𝑥𝑥15 , 𝑥𝑥16 , 𝑥𝑥24 , 𝑥𝑥25 , 𝑥𝑥26 , 𝑥𝑥27 , 𝑥𝑥36 , 𝑥𝑥37 ≥ 0

We would write our constraint matrix 𝐴𝐴 with a row for each constraint, and a column for each
variable 𝑥𝑥𝑖𝑖𝑖𝑖 , giving us the ematrix:
55

Some notes:
• If we add the rows of the matrix 𝐴𝐴, they sum to zero. This means that one of the rows is
redundant…. If we drop one of the rows, all the constraints will still be satisfied. In other
words, if we have a network with 𝑛𝑛 nodes, we need 𝑛𝑛 − 1 constraints.
• The simplest way to illustrate this property of networks is to consider a network with two
nodes and one link:

• If we enforce flow conservation at node 1, this means we will be sending 6 units of flow
from 1 to 2, which automatically satisfies the flow conservation constraint at node 2.
• So, for larger networks, we can pick any node and drop its constraint. This node is
called the root node. If 𝑟𝑟 is the root node, then the dual variable 𝑣𝑣𝑟𝑟 = 0 (remember – this
is the cost of moving a unit of flow from that node to the root node).
• For our network above, we can arbitrarily pick node 1 as the root node. This gives us
the following constraint matrix:

The simplex algorithm for linear programming requires that we start with a basis which is a set
of variables 𝑥𝑥𝑖𝑖𝑖𝑖 which we are going to adjust to guarantee that the constraints are satisfied any
time we change a variable that is not in the basis. Our only requirement for a nonbasic variable
is that it must be at its lower or upper bound. In our numerical example (without the supersink),
we do not have any variables with upper bounds, so all nonbasic variables must equal 0.

These concepts are best explained by example.


56

We have to start by creating our basis. We need to find a set of flows that satisifes the
constraints. We do not care about the quality of the solution – our simplex algorithm can start
with any feasible solution that satisfies the rules for our basic and nonbasic variables.

Professional linear programming packages have sophisticated logic for creating starting
solutions. For our simple problem, we are going to use a simple strategy called the “northwest
corner rule” where we literally start in the northwest corner of our graph (that is, node 1), and
start assigning flow to the northeast corner (which would be node 4). We assign as much as we
can (that is, the smaller of either the supply at node 1 or the demand at node 4). We then move
to either the next supply node (if we allocated all the supply from node 1) or the next demand
node (if we satisfied all the demand at node 4), and keep repeating the process using the
remaining nodes with unused supply or unsatisfied demand.

This process produces the network below, along with the basic vector 𝑥𝑥 𝐵𝐵 (the vector of links 𝑥𝑥𝑖𝑖𝑖𝑖
for links in the basis) and the nonbasic vector 𝑥𝑥 𝑁𝑁 (the links not in the basis):

We can now partition our 𝐴𝐴-matrix into a square matrix 𝐴𝐴𝐵𝐵 of the basic variables (it will always
be square) and the remaining matrix 𝐴𝐴𝑁𝑁 comprised of the non-basic columns:
57

We can now restate our constraints using the vectors of basic and nonbasic variables, and the
corresponding columns of the 𝐴𝐴 − matrix. The constraints

𝐴𝐴𝐴𝐴 = 𝑏𝑏

Becomes

𝐵𝐵
[𝐴𝐴𝐵𝐵 𝐴𝐴𝑁𝑁 ] �𝑥𝑥𝑁𝑁 � = 𝑏𝑏
𝑥𝑥
which is the same as

𝐴𝐴𝐵𝐵 𝑥𝑥 𝐵𝐵 + 𝐴𝐴𝑁𝑁 𝑥𝑥 𝑁𝑁 = 𝑏𝑏
We can now solve for thebasic variables 𝑥𝑥 𝐵𝐵 in terms of the basic variables 𝑥𝑥 𝑁𝑁 :

𝑥𝑥 𝐵𝐵 = [𝐴𝐴𝐵𝐵 ]−1 [𝑏𝑏 − 𝐴𝐴𝑁𝑁 𝑥𝑥 𝑁𝑁 ].

For our network problem, we have no upper bounds so 𝑥𝑥 𝑁𝑁 = 0, but this will not always be the
case. Note that we can guarantee that the matrix 𝐴𝐴𝐵𝐵 is, in fact, invertible by how we have
constructed the basis.

For general linear programming problems we need some reasonably sophisticated linear
algebra to handle the matrix inversion [𝐴𝐴𝐵𝐵 ]−1 . Remember that linear programming models have
been solved with millions of variables. In fact, network problems (which have a lot of structure)
can be solved even with tens of millions of variables. We would not even be able to store a
matrix 𝐴𝐴𝐵𝐵 with millions of rows and columns. This is where specialists use a lot of tricks.

In fact, we are going to show you how you can invert the basis matrix 𝐴𝐴𝐵𝐵 for our network
problem by inspection!

Recall that each row of the basis matrix 𝐴𝐴𝐵𝐵 corresponds to a node, while each column
corresponds to a link (that is, a decision variable 𝑥𝑥𝑖𝑖𝑖𝑖 ). This is why we often call 𝐴𝐴𝐵𝐵 a “node-arc
incidence matrix.” It turns out that each row of the inverse [𝐴𝐴𝐵𝐵 ]−1 corresponds to a link, while
each column corresponds to a path from a node to the root node. The element of the matrix
[𝐴𝐴𝐵𝐵 ]−1 indicates if the link for that row is in the path from the node for that column. We use 0 if
the link is not in the path, and then +1 or -1 to indicate if the link is in the path, and whether you
have to traverse the link in the forward direction or backwards. The choice of whether it is +1 or
-1 depends on what sign convention you have used in your flow conservation constraints (that
is, did you write flow out minus flow in, or the reverse). Below is the inverse that we get for our
problem.
58

To verify that [𝐴𝐴𝐵𝐵 ]−1 is in fact the inverse of 𝐴𝐴𝐵𝐵 , we can perform the multiplication 𝐴𝐴𝐵𝐵 [𝐴𝐴𝐵𝐵 ]−1 to
verify that we get the identify matrix. This is done below:

Note that if we perform this multiplication and find we get −𝐼𝐼 then you just have to switch your
sign convention. Obviously you only have to check this for one element.

We can rewrite our objective function using

Min 𝑐𝑐 𝑇𝑇 𝑥𝑥 = (𝑐𝑐 𝐵𝐵 )𝑇𝑇 𝑥𝑥 𝐵𝐵 + (𝑐𝑐 𝑁𝑁 )𝑇𝑇 𝑥𝑥 𝑁𝑁


𝑥𝑥
= (𝑐𝑐 𝐵𝐵 )𝑇𝑇 ([𝐴𝐴𝐵𝐵 }−1 (𝑏𝑏 − 𝐴𝐴𝑁𝑁 𝑥𝑥 𝑁𝑁 )) + (𝑐𝑐 𝑁𝑁 )𝑇𝑇 𝑥𝑥 𝑁𝑁
= (𝑐𝑐 𝐵𝐵 )𝑇𝑇 [𝐴𝐴𝑏𝑏 ]−1 𝑏𝑏 − (𝑐𝑐 𝐵𝐵 )𝑇𝑇 [𝐴𝐴𝐵𝐵 ]−1 𝐴𝐴𝑁𝑁 𝑥𝑥 𝑁𝑁 + (𝑐𝑐 𝑁𝑁 )𝑇𝑇 𝑥𝑥 𝑁𝑁
= (𝑐𝑐 𝐵𝐵 )𝑇𝑇 [𝐴𝐴𝐵𝐵 ]−1 𝑏𝑏 + 𝑐𝑐̅𝑁𝑁 𝑥𝑥 𝑁𝑁

where 𝑐𝑐̅𝑁𝑁 is the vector of “reduced costs” for the nonbasic links given by

𝑐𝑐̅𝑁𝑁 = (𝑐𝑐 𝑁𝑁 )𝑇𝑇 − (𝑐𝑐 𝐵𝐵 )𝑇𝑇 [𝐴𝐴𝐵𝐵 ]−1 𝐴𝐴𝑁𝑁


59

Reduced costs tell us if we should increase the flow on a nonbasic link, while adjusting flows on
all the basic links so that the flow conservation (plus upper and lower bound) constraints are
satisfied. For our numerical example above, the reduced costs are calculated as follows:

Continuing the calculations:

We see that (𝑐𝑐 𝐵𝐵 )𝑇𝑇 [𝐴𝐴𝐵𝐵 ]−1 is the inner product of the link costs (for basic links) times the link-path
incidence matrix [𝐴𝐴𝐵𝐵 }−1. This means that (𝑐𝑐 𝐵𝐵 )𝑇𝑇 [𝐴𝐴𝐵𝐵 ]−1 is the vector of dual variables, which as
we have seen are the path costs along the basis from each node to the root node. All of this
reduces to the simple relationship (for networks) between costs and reduced costs for nonbasic
links

𝑐𝑐̅𝑁𝑁 = [𝑐𝑐16 − 𝑣𝑣1 + 𝑣𝑣6 𝑐𝑐24 − 𝑣𝑣2 + 𝑣𝑣4 𝑐𝑐27 − 𝑣𝑣2 + 𝑣𝑣7 ].
60

We can compute the path costs (dual variables) 𝑣𝑣𝑖𝑖 just by following the path along the basis
from each node to the supersink (node 1). Remember we have to subtract the cost for any link
that we traverse in the reverse direction of the link. This gives us

𝑣𝑣1 = 0
𝑣𝑣2 = 16 − 8 = 8
𝑣𝑣3 = 8 − 4 + 16 − 8 = 12
𝑣𝑣4 = −14
𝑣𝑣5 = −8
𝑣𝑣6 = −4 + 16 − 8 = 4
𝑣𝑣7 = −5 + 8 − 4 + 16 − 8 = 7

The reduced costs on the nonbasic links are then

𝑐𝑐̅𝑁𝑁 = [𝑐𝑐̅16 𝑐𝑐̅24 𝑐𝑐̅27 ]


= [𝑐𝑐16 − 𝑣𝑣1 + 𝑣𝑣6 𝑐𝑐24 − 𝑣𝑣2 + 𝑣𝑣4 𝑐𝑐27 − 𝑣𝑣2 + 𝑣𝑣7 ]
= [9 − 0 + 4 15 − 8 + (−14) 17 − 8 + 7]
= [13 −7 16]

The reduced cost tells us how much total costs will change if we increase the flow on each
nonbasic link. If the reduced cost is negative, then we We now search for nonbasic links with
a negative reduced cost, since this means increasing the flow on that link, and then adjusting
the flows on the basic links so we maintain flow conservation, will reduce total costs.

There are different strategies for choosing the nonbasic link, since networks can be quite large
(from many thousands of links to millions of links. It makes sense to choose the nonbasic link
with the most negative reduced cost, but this would mean calculating all the reduced cost and
finding the smallest. Computer scientists have refined these strategies to balance the time
required to find the best nonbasic link that produces the fastest convergence.

From our list of reduced costs above, we see that the only nonbasic link with a negative reduced
cost is link (2,4). We want to increase flow from 2 to 4. Then, to maintain flow conservation, for
each unit of flow we push from 2 to 4, we want to push a unit from 4 to the root node (node 1),
and then from the root node to node 2, always limiting ourselves to links in the basis. We note
that there is always exactly one path between any node and the root node.
61

As we see the graph to the right, the path


from 4 to the root node means reducing a
unit of flow on the link from 1 to 4. Then, we
have to push a unit of flow from node 1 to
node 2, which means increasing flow on link
(1,5) and then decreasing flow on link (2,5).

The next step is that we have to figure out


how much flow to move. The answer to this
is simple: we move as much as possible.
We look at each link that is losing flow, and
then calculate how much flow is on these
links, and take the link with the smallest flow.
If links have upper bounds, then we also
look at each link that is gaining flow, and
take the link that can increase flow by the smallest amount before hitting the upper bound.
Finally, we choose the link losing flow where we can lose the smallest amount of flow, and
compare it to the links gaining flow, and choose the link that can gain the list. Finally, we
choose the link that can lose (or gain) the smallest amount of flow, and that link determines how
much flow we can lose. The constraining link is also the link that we drop from the basis.
After we complete the process of adding the new nonbasic link and dropping the constraining
link, we have to recompute the dual variables 𝑣𝑣𝑖𝑖 . Again, we note that commercial software uses
a variety of programming tricks to accelerate this process.

Some notes:

• It is possible that there may be a tie – two links losing flow have the same amount of
flow, or the link that can lose the least flow matches the link that can gain the least flow.
In case of ties, we just pick one link arbitrarily to drop from the basis. After adding the
new nonbasic link, we regain a valid basis (that is, all the links in the basis always form
of a tree).
• It is also possible that the amount of flow that we can move is zero. This can (and will)
happen, and is known as a “degenerate pivot.” Even when the amount of flow we can
move is zero, we still go through the same process of adding the new nonbasic link to
the network, and dropping one of the constraining links (if there is more than one).

We note that all of our calculations seem to depend on our choice of root node. It turns out that
changing the root node has the effect of changing all of the dual variables by a constant, which
means that the reduced costs, which all involve differences between dual variables, are not
affected. Below are two networks with different root nodes to illustrate this property.
62

This nice property also hints at a limitation in the interpretation of a dual variable. It is common
to think of a dual variable 𝑣𝑣𝑖𝑖 as being the marginal value of the resources entering or leaving the
network at node 𝑖𝑖. Since the sum of the supplies and demands for our network problem must
sum to zero, it does not make sense to perturb the flow entering or leaving the network at node
𝑖𝑖, since we also have to specify the change in the flow at some other node so that the supplies
and demands remain balanced. We have actually addressed this problem by dropping one of
the constraints (the root node). This means that if we perturb the supply or demand at some
node 𝑖𝑖, we are implicitly balancing this change by adding or subtracting the same amount of flow
at our root node so that the sum of supplies and demands remain balanced.

7.3 As a policy for a dynamic problem


Let’s start with the same network problem we used above, but now assume that we are using
this to match available inventories of products in distribution centers to the demands of retailers
looking to restock their inventories. We might reasonably solve this problem daily (or weekly),
but in this case we are now solving a sequential decision problem, where our “policy” requires
solving the linear programming problem.

Recognizing that we are solving this problem each time period, our optimization problem at time
𝑡𝑡 would be written

min ∑𝑖𝑖,𝑗𝑗 𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡


𝑥𝑥𝑡𝑡
subject to

∑𝑗𝑗 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 = 𝑅𝑅𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3, (7.7)


∑𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 ≤ 𝐷𝐷𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 = 4,5,6,7, (7.8)
𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 ≥ 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3 𝑎𝑎𝑎𝑎𝑎𝑎 𝑗𝑗 = 4,5,6,7, (7.9)
63

where the supplies at time 𝑡𝑡 are given by 𝑅𝑅𝑡𝑡𝑡𝑡 and the demands are given by 𝐷𝐷𝑡𝑡𝑡𝑡 . We have also
allowed our costs 𝑐𝑐𝑡𝑡 to be time-dependent. We quickly see that we can minimize costs by not
satisfying any demand, so a better model might be to maximize profits. Let

𝑝𝑝𝑡𝑡𝑡𝑡 = The price we receive for satisfying a unit of demand at node 𝑗𝑗.

Our objective function would then be

max�∑𝑗𝑗 𝑝𝑝𝑡𝑡𝑡𝑡 (min {𝐷𝐷𝑡𝑡𝑡𝑡 , ∑𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 }) − ∑𝑖𝑖,𝑗𝑗 𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 � (7.10)
𝑥𝑥𝑡𝑡

A more realistic model needs to be modified to reflect the possibility that total supply may be
greater than total demand, or less than total demand. We have already written the demand
constraint (7.8) as an inequality. However, we need to add the option that allows us to hold
excess inventory until time 𝑡𝑡 + 1. In fact, we may even want to hold inventory while not
satisfying demand. Remember that in our model we are allowing costs and prices to vary over
time. There may be a time period where costs rise and/or prices drop, at which point we prefer
to hold our inventory for a future time period when prices and costs may be more favorable.

We may face uncertainty in our available inventories (these might include inventories arriving
soon, but perhaps they are delayed), or the demands (which may be higher or lower than
expected). We might, then, replace constraints (7.7)-(7.9) with

∑𝑗𝑗 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 = 𝜃𝜃 𝑅𝑅 𝑅𝑅𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3. (7.11)


𝐷𝐷
∑𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 ≤ 𝜃𝜃 𝐷𝐷𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 = 4,5,6,7. (7.12)
𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 ≥ 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3 𝑎𝑎𝑎𝑎𝑎𝑎 𝑗𝑗 = 4,5,6,7 (7.13)

The objective function (7.10) with constraints (7.11)-(7.13) looks like another optimization
problem, but there is an important difference. This is just a problem at time 𝑡𝑡, where we want to
maximize profits over time, not just at a point in time. If we choose to maximize (7.10) to get our
decisions of what to do at time 𝑡𝑡, then we would say that this problem is now a policy which
should be written

𝑋𝑋 𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) = argmax�∑𝑗𝑗 𝑝𝑝𝑡𝑡𝑡𝑡 (min {𝐷𝐷𝑡𝑡𝑡𝑡 , ∑𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 }) − ∑𝑖𝑖,𝑗𝑗 𝑐𝑐𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 � (7.14)
𝑥𝑥𝑡𝑡

which has to be solved subject to the constraints (7.11)-(7.12). Using (7.14) as a policy
completely ignores the impact of decisions now on the future. For example, imagine that at time
𝑡𝑡 that the prices 𝑝𝑝𝑡𝑡 are unusually low. Thus, we can use our simplex algorithm to find the optimal
solution of our linear program, but solving (7.14) is not an optimal policy!

We might create a more sophisticated policy by optimizing into the future, just as we did in our
dynamic shortest path problem where we would plan a path to the destination, which would then
be updated as new information came in. Such a direct lookahead (DLA) policy could be written
64

𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃) = argmax �∑𝑡𝑡+𝐻𝐻


𝑡𝑡 ′ =𝑡𝑡�∑𝑗𝑗 𝑝𝑝
�𝑡𝑡𝑡𝑡′𝑗𝑗 , ∑𝑖𝑖 𝑥𝑥�𝑡𝑡𝑡𝑡′𝑖𝑖𝑖𝑖 }) − ∑𝑖𝑖,𝑗𝑗 𝑐𝑐̃𝑡𝑡𝑡𝑡′𝑖𝑖𝑖𝑖 𝑥𝑥�𝑡𝑡𝑡𝑡′𝑖𝑖𝑖𝑖 ��
�𝑡𝑡𝑡𝑡′𝑗𝑗 (min {𝐷𝐷 (7.15)
𝑥𝑥�𝑡𝑡𝑡𝑡,,…,𝑥𝑥�𝑡𝑡,𝑡𝑡+𝐻𝐻,

subject to, for 𝑡𝑡 ′ = 𝑡𝑡, 𝑡𝑡 + 1, … 𝑡𝑡 + 𝐻𝐻:

∑𝑗𝑗 𝑥𝑥�𝑡𝑡𝑡𝑡′𝑖𝑖𝑖𝑖 = 𝜃𝜃 𝑅𝑅 𝑅𝑅�𝑡𝑡′𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3, (7.16)


� 𝑡𝑡𝑡𝑡 ′ 𝑗𝑗
∑𝑖𝑖 𝑥𝑥�𝑡𝑡𝑡𝑡′𝑖𝑖𝑖𝑖 ≤ 𝜃𝜃 𝐷𝐷 𝐷𝐷 𝑓𝑓𝑓𝑓𝑓𝑓 𝑗𝑗 = 4,5,6,7, (7.17)
𝑥𝑥�𝑡𝑡𝑡𝑡′𝑖𝑖𝑖𝑖 ≥ 0 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 = 1,2,3 𝑎𝑎𝑎𝑎𝑎𝑎 𝑗𝑗 = 4,5,6,7. (7.18)

Note that we have put tilde’s on all the variables used to plan into the future. Also, each of the
tilde-variables has two time indices: time t, since we are solving the problem at time 𝑡𝑡, and time
𝑡𝑡′, which is the time we are planning into the future. Other variables such as prices or demands
are forecasts of future prices or demands made at time 𝑡𝑡.

Note that we do not care about the entire optimal solution over the horizon 𝑡𝑡, … , 𝑡𝑡 + 𝐻𝐻. We are
only going to implement the first time period, which means we would set

𝑥𝑥𝑡𝑡 = 𝑥𝑥�𝑡𝑡𝑡𝑡 .

Note that our DLA policy in (7.15) - (7.18) is still not an optimal policy, but it might be quite good.
One challenge is that we have to tune the parameters

𝜃𝜃 = (𝜃𝜃 𝑅𝑅 , 𝜃𝜃 𝐷𝐷 ).

We need to first write out how we would simulate our policy 𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 |𝜃𝜃). This requires identifying
how random events (e.g. random demands, travel times, …) affect how the solution 𝑥𝑥𝑡𝑡 behaves
in practice. This is a key step that is almost always overlooked when modeling and solving
linear programs. What you have to do is to imagine that you are simulating the process. These
equations represent the transition function in our sequential decision model which we represent
compactly as

𝑆𝑆𝑡𝑡+1 = 𝑆𝑆 𝑀𝑀 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 , 𝑊𝑊𝑡𝑡+1 ).

Assume as we have done before that we can generate a series of sample paths where the 𝑛𝑛𝑡𝑡ℎ
sample path is

𝜔𝜔𝑛𝑛 = (𝑊𝑊1𝑛𝑛 , 𝑊𝑊2𝑛𝑛 , … , 𝑊𝑊𝑇𝑇𝑛𝑛 ).

Let 𝑆𝑆𝑡𝑡𝑛𝑛 be the state at time 𝑡𝑡 while we are following sample path 𝜔𝜔𝑛𝑛 . Specifying 𝜔𝜔𝑛𝑛 refers to a
specific sample of anything random (demands, prices, travel times, …). The value of the policy
for sample path 𝜔𝜔𝑛𝑛 might be written
65

𝐹𝐹 𝑛𝑛 (𝜃𝜃) = ∑𝑇𝑇𝑡𝑡=0 𝐶𝐶(𝑆𝑆𝑡𝑡 (𝜔𝜔𝑛𝑛 ), 𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 (𝜔𝜔𝑛𝑛 )|𝜃𝜃)),

where 𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 (𝜔𝜔𝑛𝑛 )|𝜃𝜃) is the policy computed from (7.15) - (7.18). If we generate 𝑛𝑛 = 1, … 𝑁𝑁
sample paths, we can evaluate the policy by taking an average

1 𝑁𝑁
𝐹𝐹� 𝜋𝜋 (𝜃𝜃) = ∑ ∑𝑇𝑇 𝐶𝐶(𝑆𝑆𝑡𝑡 (𝜔𝜔𝑛𝑛 ), 𝑋𝑋𝑡𝑡𝜋𝜋 (𝑆𝑆𝑡𝑡 (𝜔𝜔𝑛𝑛 )|𝜃𝜃)).
𝑁𝑁 𝑛𝑛=1 𝑡𝑡=0

There are, of course, different ways to parameterize the policy. This means that when we
optimize over policies, it requires designing different parameterizations, and then tuning each
one. Let 𝑓𝑓 ∈ 𝐹𝐹 be a family of parameterizations (we have to make these up by hand), and let
𝜃𝜃 ∈ Θ 𝑓𝑓 be the parameters for parameterization 𝑓𝑓. Our policy 𝜋𝜋 = (𝑓𝑓, 𝜃𝜃) consists of the
parameterization and its tunable parameters.

We can now write out our optimization problem as

min 𝐹𝐹� 𝜋𝜋 (𝜃𝜃).


𝜋𝜋=(𝑓𝑓,𝜃𝜃)

This discussion has illustrated that we are going to need to identify and compare different types
of policies (such as (7.15) - (7.18)) as well as tuning any parameters that we have inserted.
Coming up with different types of policies parallels choosing between the linear and nonlinear
models we illustrated in Topic 1 on machine learning. Tuning the parameters for the policies
parallels fitting our parametric models in machine learning.

While this process may seem ad hoc, it is no more ad hoc than searching among different
statistical models in machine learning. Furthermore, this is precisely how the vast majority of
sequential decision problems are solved in practice.
66

Topic 8: Dynamic inventory problems - Energy storage


Readings: SDAM Chapter 9 - This is also where we introduce linear programming

Here we address a more complex energy storage problem:

We have to decide how much to draw from a windfarm (with variable supply), the grid (with
variable prices), to meet a predictable load (demand) for a building, using an energy storage
device to absorb variations. We have rolling forecasts of wind which change quite a bit from
hour to hour:

We are going to solve this much as we solve our dynamic shortest path problem, where we look
into the future and pretend the various forecasts (such as wind) are perfectly accurate. This
“lookahead model” is another example of a linear program. Even though the decision at each
point in time is a scalar, we have to optimize the decisions over the entire horizon, which means
we have to optimize over the entire vector of decisions.
67

Now, to handle uncertainty, we insert a coefficient 𝜃𝜃𝑡𝑡 ′ −𝑡𝑡 for our forecast of wind made at time 𝑡𝑡,
of what the energy from the wind farm will be at time t’. Note that 𝜃𝜃𝑡𝑡 ′ −𝑡𝑡 is not a function of 𝑡𝑡, it is
just a function of the difference𝑡𝑡 ′ − 𝑡𝑡. So if we look 24 hours into the future, we would have 24
coefficients. A simpler strategy to get us started would be to assume that there is just one
coefficient for all forecasts. The modified lookahead linear program looks like:

This closely parallels the idea of solving a shortest path with modified costs. We solved that
problem using our shortest path algorithm. This time, we need the full power of a linear
program that we introduced in Topic 7.

Our next challenge is to tune the parameter vector 𝜃𝜃 which could be a scalar (if we use one
parameter for all the forecasts) or, in our case, a 24-dimensional vector (one for each hour into
the future). We first need to write out how we are going to evaluate our policy. If 𝑋𝑋𝑡𝑡𝐷𝐷−𝐿𝐿𝐿𝐿 (𝑆𝑆𝑡𝑡 |𝜃𝜃) is
our policy above, assume that it returns a decision vector 𝑥𝑥𝑡𝑡 that determines what to do right
now.

Let 𝐶𝐶(𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 ) be our performance metric (e.g. total costs) that occur just at time 𝑡𝑡. Now let
𝑊𝑊1 , 𝑊𝑊2 , … , 𝑊𝑊𝑡𝑡 , … , 𝑊𝑊𝑇𝑇 be a sample realization of all the new information arriving, where 𝑊𝑊𝑡𝑡 is the
information that arrives between t-1 and t. This would include energy from the wind farm, the
grid price, along with the latest set of rolling forecasts of demands, grid prices and the energy
from wind. If we have access to real data, we could use that. Otherwise, we would likely use
Monte Carlo simulation which we described in Topic 2.

We need to write a simulator that we represent using our transition function

𝑆𝑆𝑡𝑡+1 = 𝑆𝑆 𝑀𝑀 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 , 𝑊𝑊𝑡𝑡+1 ),

where 𝑆𝑆𝑡𝑡 captures everything we know at time 𝑡𝑡, our decision comes from our policy 𝑥𝑥𝑡𝑡 =
𝑋𝑋𝑡𝑡𝐷𝐷−𝐿𝐿𝐿𝐿 (𝑆𝑆𝑡𝑡 |𝜃𝜃), and the new information 𝑊𝑊𝑡𝑡+1 comes from our historical data or simulation. The
68

transition function describes how our state variable changes over time (for example, this is
where we update how much energy is in our storage device).

Running the simulation on the data 𝑊𝑊𝑡𝑡 , with decisions from the policy 𝑋𝑋𝑡𝑡𝐷𝐷−𝐿𝐿𝐿𝐿 (𝑆𝑆𝑡𝑡 |𝜃𝜃), allows us to
estimate the performance of the policy:

𝐹𝐹 𝐷𝐷−𝐿𝐿𝐿𝐿 (𝜃𝜃) = ∑𝑇𝑇𝑡𝑡=0 𝐶𝐶(𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 = 𝑋𝑋𝑡𝑡𝐷𝐷−𝐿𝐿𝐿𝐿 (𝑆𝑆𝑡𝑡 |𝜃𝜃)).

Next we have to turn to algorithms to search for the best value of 𝜃𝜃. If we assume there is only
a single parameter 𝜃𝜃 for all time periods in the future, this would just be a one-dimensional
search. If it is a vector, we could use a gradient-based search such as what we illustrated in
Topic 1 for nonlinear models. [See section 5.4 in RLSO for a more thorough discussion of how
to compute gradients numerically.]

We can set up our simulation so that the rolling forecast is perfectly accurate. In this case, we
would expect the best value of 𝜃𝜃 to be 1.0. The figure below confirms this.

Now run the experiment where the rolling forecasts (say of wind) is not accurate, as would occur
in practice. In this case, we get the graph below, where the best values of 𝜃𝜃 are quite different
from 1.0.
69

The figure below shows that we can achieve approximately a 30 percent improvement using
optimized 𝜃𝜃.

This application illustrates the very powerful idea of parameterizing an optimization problem. I
suspect that there is a wide array of optimization problems that are actually policies for fully
sequential problems. The idea of parameterizing an optimization problem is widely used in
industry, but in an ad hoc way.
70

Topic 9: Integer programming


We now introduce integer programming, using the classic problem of a facility location problem.
Some schools teach entire courses on integer programming, so this lecture is more of a
placeholder. An instructor can cover this material very briefly (e.g. at the level of these notes),
or delve into the richer modeling challenges. We note that as of this writing, the best integer
programming solvers (such as Gurobi or Cplex) are quite powerful for a wide range of problems.

As we did with linear programming, we are going to start with a basic integer programming
problem which can be solved using commercial packages. It is for this reason that, as with
linear programming, the traditional emphasis on algorithms is simply not appropriate for an
introductory course.

We are going to introduce integer programming using a classical (and widely used) application
of optimizing a set of locations to build or lease warehouses. This is a problem that can be
solved to near optimality by high quality commercial packages. Unlike linear programs, integer
programs come in different flavors, some of which are easier to solve while others still require
specialized algorithms. In section 9.2 we provide a summary of some of the major classes of
integer programming problems.

As we did with linear programming, we start with static models to illustrate integer programming.
We follow in Topic 10 by extending the facility location problem to a dynamic setting.

9.1 Static facility location


A problem faced by many companies is locating facilities – these might be manufacturing points,
distribution centers, warehouses, and even retail locations (if the company is a retailer).

Imagine that we are trying to design the network in the graphic below. We might assume we
have a known manufacturing location in Mexico, but we have to optimize the location of
distribution centers (black squares) and local warehouses (the smaller circles). To model this
problem, let:

𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 = Set of candidate locations for distribution centers and warehouses (to be chosen).
𝐼𝐼 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = Set of production facilities (fixed in advance).
𝐼𝐼 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 =Set of retailers where product is sold (fixed in advance).
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥𝑖𝑖𝑖𝑖 =Flow of goods (in pounds per quarter) from 𝑖𝑖 to 𝑗𝑗.
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = �𝑥𝑥𝑖𝑖𝑖𝑖 � .
𝑖𝑖,𝑗𝑗
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 1 𝑖𝑖𝑖𝑖 𝑤𝑤𝑤𝑤 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖
𝑥𝑥𝑖𝑖 =� , 𝑖𝑖 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 .
0 𝑂𝑂𝑂𝑂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 = �𝑥𝑥𝑖𝑖 �𝑖𝑖∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 .
71

We can combine these into a single vector:

𝑥𝑥 = (𝑥𝑥 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 , 𝑥𝑥 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 ).

Next define the costs:

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑐𝑐𝑖𝑖 =Cost per quarter to lease/operate facility 𝑖𝑖.
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑐𝑐𝑖𝑖𝑖𝑖 = Transportation cost per pound for moving freight (in pounds) from 𝑖𝑖 to 𝑗𝑗.

We have to move the freight from places where we have supplies, which includes initial
inventories as well as the ability to produce the product (presumably only at the manufacturing
facility). For this we let

𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑞𝑞𝑖𝑖 = Production capacity of manufacturing facility 𝑖𝑖 ∈ 𝐼𝐼 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 , where we assume there
is sufficient production capacity to meet the total market demand.

We then have to satisfy demands, given by

𝐷𝐷𝑖𝑖 =Demand for goods (in pounds) at retailer 𝑖𝑖 over the quarter.

Our optimization problem to find the location of facilities can then be stated as

min ∑𝑖𝑖∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑖𝑖𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 + ∑𝑖𝑖,𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑖𝑖


𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥𝑖𝑖𝑖𝑖 (9.1)
x=(𝑥𝑥 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ,𝑥𝑥 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 )
72

This has to be optimized subject to the constraints:

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
∑𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑖𝑖𝑖𝑖 ≤ 𝑞𝑞𝑖𝑖 for all 𝑖𝑖 ∈ 𝐼𝐼 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 (9.2)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
∑𝑘𝑘∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑘𝑘𝑘𝑘 = 𝐷𝐷𝑖𝑖 for all 𝑖𝑖 ∈ 𝐼𝐼 (9.3)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑖𝑖∈𝐼𝐼𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑥𝑥𝑖𝑖𝑖𝑖 ≤ 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (9.4)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑘𝑘∈𝐼𝐼𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑥𝑥𝑗𝑗𝑗𝑗 ≤ 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (9.5)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝑥𝑥𝑖𝑖𝑖𝑖 ≥0 for all 𝑖𝑖, 𝑗𝑗 ∈ 𝐼𝐼 , 𝐼𝐼 , 𝑖𝑖 , (9.6)
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥𝑖𝑖 ∈ (0,1) for all 𝑖𝑖, 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (9.7)

Equation (9.2) makes sure the total flow out of each production facility does not exceed the
production capacity. Equation (9.3) requires that we meet the retail demand.

Equations (9.4) and (9.5) make sure that we do not ship out of or into any facility that has not
been built, and if it has been built, that we do not exceed its capacity.

Equation (9.6) imposes the obvious condition that flows cannot be negative.

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Equation (9.7) is where we insist that the facility variables 𝑥𝑥𝑖𝑖 must be 0 or 1, and nothing
in between. It is this constraint that makes this an integer programming problem.

Prior to year 2000, optimization problems with integer variables such as this could not be solved
using commercial packages, and you can still find textbook authors referring to these problems
as “hard.” Today, commercial packages such as Gurobi (I think this is the leader) or its
predecessor, Cplex, can handle a wide range of integer programming problems. The solution
times will be slower than similar sized problems without integer variables, but the best packages
can handle problems of realistic size without difficulty. Students do need to understand that
while there are a number of freeware packages for solving linear programs, integer programs
are harder, and the best packages may be quite a bit better than free software that you can get
from the internet.

While we do make an attempt to indicate how linear programs can be solved with the simplex
method (although I consider this optional – it really depends on the types of students),
algorithms for integer programs are quite tedious. I strongly recommend leaving these to more
advanced courses. Students learning optimization for the first time just have to understand that
the complexity of integer programs means that more care has to be used when choosing a
package.
73

9.2 Types of integer programs


It is important to recognize that there are major classes of integer programs, ranging from those
that are no harder than solving a linear program to exceptionally hard problems that typically
require specialized algorithms. Below is a list of some major classes of integer programs:

• Assignment problems (people/equipment to task) – These are problems where we are


assigning discrete resources (people, machines) to discrete tasks. All the flows are 0 or
1. As long as a resource can only be assigned to at most one task, and each task
requires only one resource, this is an easy problem that can be solved using a general
purpose linear programming code and be guaranteed that the optimal decisions will be 0
or 1.
• Network flow problems as we illustrated in Topic 7 – This is another example of
problems that can be solved using general purpose linear programming solvers, and still
be guaranteed that the optimal solution will be integer as long as all the supplies,
demands, and upper or lower bounds, are integer. Network flow problems are not limited
to settings where a resource can cover just one task – the only limitation is that there
can only be two types of constraints: flow conservation (flow out = flow in) and
upper/lower bounds on flows.
• Network design (as we illustrated above) – Our facility design problem used to be
considered a hard integer programming problem, but today the most advanced solvers
(such as Gurobi or Cplex) can handle these problems. Run times will be much slower
than if we drop the integrality constraints, but reasonable-sized problems (hundreds,
even thousands of integer variables) can be solved with reasonable times.
• Vehicle routing problems – The simplest routing problem is the traveling salesman
problem, and even this problem is beyond the capability of standard integer
programming solvers. The problem arises when specifying constraints. It is easy to see
that we need flow conservation constraints so that the flow into each node equals the
flow out, but if we just include these constraints, it is possible to create cycles where a
vehicle goes from city 1 to city 2 to city 3 back to city 1, without ever passing through the
home depot for the vehicle. We can eliminate these “subtours” with “subtour elimination
constraints” but we need an exponentially large number of these. Much harder problems
include vehicles that have to make multiple stops to deliver goods. Even harder are
problems where the vehicle must visit cities within a time window. There is an extensive
literature on these problems.
• Sequencing and scheduling problems – These are arguably the hardest class of integer
programming problems. These often arise when determining when to use a machine or
trained technician to perform a set of jobs within time constraints (loose constraints are
harder than tight ones). These problems tend to be solved using a class of methods
known as constraint programming.
74

Topic 10: Dynamic facility location


The optimization problem we formulated in Topic 9 for our facility location problem is possibly
one of the most standard problems used to illustrate integer programming. It is also widely used
in industry, so we need to emphasize that this is a very useful model. However, it is important
to understand how these decisions are actually made over time, and how the resulting network
actually performs.

We begin by transitioning our original static, deterministic facility location problem into a two-
stage problem where we first make the decision of locating facilities using forecasted demands
to produce an estimate of the flows. Then, after we see the real demands, we reoptimize the
flows. Our choice of where to locate facilities ignores the effect of these decisions on the future.

We then build on this “two-stage” model and use it as a policy for a fully sequential problem (in
Topic 10) where decisions about which facilities to open or close are made sequential over time.

We start by describing the notation that we use.

10.1 Notation
To prepare for our dynamic models, we are going to begin by indexing all of our variables by 𝑡𝑡.
So, our decision variables become:

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥𝑡𝑡𝑡𝑡 = 1 if we decide to activate facility 𝑖𝑖 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 at time 𝑡𝑡 (we assume it becomes
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
active right away). If it is already active, then 𝑥𝑥𝑡𝑡𝑡𝑡 = 1 means to keep it active, while
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥𝑡𝑡𝑡𝑡 = 0 means to deactivate it.
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑖𝑖𝑖𝑖 = the estimated flow of product from 𝑖𝑖 to 𝑗𝑗 in period (𝑡𝑡, 𝑡𝑡 + 1), using forecasted
demands since this is all we know at time 𝑡𝑡. We are not going to implement these flows – these
are estimates of what we think the flows might be using the information that does not become
available until time 𝑡𝑡 + 1, using our best estimate (the forecast) at time 𝑡𝑡. We put a tilde on this
variable so that it does not become confused with the decision variables that we are
implementing (such as where to put a facility). We use a double time time index – the first index
𝑡𝑡 indicates when we are making the decision which controls what we know, while the second
time index captures the time period being modeled.
�𝑡𝑡+1,𝑖𝑖 = the demand at retailer 𝑖𝑖 that does not become known until time 𝑡𝑡 + 1.
𝐷𝐷
𝐷𝐷
𝑓𝑓𝑡𝑡,𝑡𝑡+1,𝑖𝑖 �𝑡𝑡+1,𝑖𝑖 that is made at time 𝑡𝑡.
= the forecast of the demand 𝐷𝐷

�𝑡𝑡+1,𝑖𝑖 . We use:
We need a variable for the flows that are made after we learn 𝐷𝐷

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥𝑡𝑡+1,𝑖𝑖𝑖𝑖 �𝑡𝑡+1,𝑖𝑖 . Just as the facility
= the actual flows that we determine after we see 𝐷𝐷
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
decision 𝑥𝑥𝑡𝑡𝑡𝑡 is implemented using the information known at time 𝑡𝑡, 𝑥𝑥𝑡𝑡+1,𝑖𝑖𝑖𝑖 would be the
75

transportation flows that actually happen, given that they are computed using the actual
demands 𝐷𝐷�𝑡𝑡+1,𝑖𝑖 .

We are next going to introduce variables for the state of facilities which we model using:

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 1 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡


𝑅𝑅𝑡𝑡𝑡𝑡 =�
0 𝑂𝑂𝑂𝑂ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
We can now make a decision to bring a facility into the network (if 𝑅𝑅𝑡𝑡𝑡𝑡 = 0) or force it to
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
leave the network (if 𝑅𝑅𝑡𝑡𝑡𝑡 = 1). Our facility decision is then

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 1 𝐼𝐼𝐼𝐼 𝑤𝑤𝑤𝑤 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡 + 1
𝑥𝑥𝑡𝑡 =�
0 𝐼𝐼𝐼𝐼 𝑤𝑤𝑤𝑤 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡 + 1

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Of course, we can only add a facility if 𝑅𝑅𝑡𝑡𝑡𝑡 = 0, and we can only drop a facility if 𝑅𝑅𝑡𝑡𝑡𝑡 =
1.

We assume that if we add a facility at time 𝑡𝑡 that it becomes available for the flows that are
moved between 𝑡𝑡 and 𝑡𝑡 + 1 while meeting the demands 𝐷𝐷 �𝑡𝑡+1 . We could introduce a longer
delay, but it just complicates the model.

We have different costs for adding a facility to the network versus dropping it from the network,
so we define:

𝑎𝑎𝑎𝑎𝑎𝑎
𝑐𝑐𝑡𝑡𝑡𝑡 =Cost of adding facility 𝑖𝑖 at time t.
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑐𝑐𝑡𝑡𝑡𝑡 =Cost of dropping facility 𝑖𝑖 at time t.

We need variables that indicate whether we added or dropped facility 𝑖𝑖, or made no change, so
we introduce

𝑎𝑎𝑎𝑎𝑎𝑎
𝑥𝑥𝑡𝑡𝑡𝑡 =1 if we add a facility at 𝑖𝑖, 0 otherwise.
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑥𝑥𝑡𝑡𝑡𝑡 =1 if we drop a facility at 𝑖𝑖, 0 otherwise.

We then need to compute these variables using the “language” of linear constraints. We can do
𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
this with the following (remember that we want the smallest possible value of 𝑥𝑥𝑡𝑡𝑡𝑡 and 𝑥𝑥𝑡𝑡𝑡𝑡 ):

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥𝑡𝑡𝑎𝑎𝑎𝑎𝑎𝑎 ≥ 𝑥𝑥𝑡𝑡𝑡𝑡 − 𝑅𝑅𝑡𝑡𝑡𝑡 ,
𝑥𝑥𝑡𝑡𝑎𝑎𝑎𝑎𝑎𝑎 ≥ 0,
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥𝑡𝑡 ≥ 𝑅𝑅𝑡𝑡𝑡𝑡 − 𝑥𝑥𝑡𝑡𝑡𝑡 ,
𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
𝑥𝑥𝑡𝑡 ≥ 0.
76

We have to record the facility decision at time 𝑡𝑡 in the facility state variable at time 𝑡𝑡 + 1:

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑅𝑅𝑡𝑡+1 = 𝑥𝑥𝑡𝑡𝑡𝑡 .

10.2 Single-period model with uncertain demands


In this section, we are going to argue that people often overlook the process of how a facility
location problem is implemented. Typically it is understood that we solve the problem, and use
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
the facility variables 𝑥𝑥𝑖𝑖 to determine where we open or close facilities. On the other hand,
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
we do not actually implement the flows contained in the flow variables 𝑥𝑥𝑖𝑖𝑖𝑖 which are included
in the model only as an approximation of what will actually happen in the field.

For example, the optimal solution might specify that a retailer gets their product from a particular
warehouse (as indicated in the figure below), but on a particular day the warehouse may stock
out, and the retailer would get their product from the next closest warehouse. We do not model
these dynamics in the facility location model simply because it would make the model too large
and complex. However, this means that our objective function (9.1) is nothing more than a
rough approximation of how well the solution will perform in practice.

It is possible that the objective function (9.1) is a reasonable approximation, but not necessarily.
A student used this model in a business game I was teaching at Princeton (famously known as
the “orange juice game”) to determine which of 50 possible locations should be used for
warehouses. The cost of shipping to the warehouses was much lower than the cost of shipping
from warehouses to retailers. As a result, the optimization model produced a solution
recommending building a warehouse at each of the 50 locations.

The solution worked terribly in practice since it ignored the randomness in demands. When the
solution was implemented, there were many stockouts because the model had not made any
effort to capture the effects of random demands. When there are 50 warehouses, buffer stocks
have to be larger in proportion to the averages. Using 5 warehouses means the flow through
each warehouse is much larger, which produces a solution that is less sensitive to variations in
the flow.

We are going to capture the effect of uncertainty by first assuming that we are going to locate
our facilities using only forecasted demands.

Using our new notation, the constraints (9.2)-(9.7) in the static model become, at time 𝑡𝑡:

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
∑𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑡𝑡𝑡𝑡𝑡𝑡 ≤ 𝑞𝑞𝑡𝑡𝑡𝑡 for all 𝑖𝑖 ∈ 𝐼𝐼 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 , (10.1)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝐷𝐷
∑𝑘𝑘∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑘𝑘𝑘𝑘 = 𝑓𝑓𝑡𝑡,𝑡𝑡+1,𝑖𝑖 for all 𝑖𝑖 ∈ 𝐼𝐼 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , (10.2)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑖𝑖∈𝐼𝐼𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑖𝑖𝑖𝑖 ≤ 𝑞𝑞 𝑅𝑅𝑡𝑡,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 , (10.3)
77

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑘𝑘∈𝐼𝐼𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑗𝑗𝑗𝑗 ≤ 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑅𝑅𝑡𝑡,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 , (10.4)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑖𝑖𝑖𝑖 ≥0 for all 𝑖𝑖, 𝑗𝑗 ∈ 𝐼𝐼 . (10.5)

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Finally, we include the allowed values of 𝑥𝑥𝑡𝑡𝑡𝑡 :for 𝑖𝑖 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 :

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 1 𝐼𝐼𝐼𝐼 𝑤𝑤𝑤𝑤 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡 + 1
𝑥𝑥𝑡𝑡 =� (10.6)
0 𝐼𝐼𝐼𝐼 𝑤𝑤𝑤𝑤 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡 + 1

The optimization problem for facilities is given by

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡


𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∑𝑖𝑖∈𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐
min � 𝑚𝑚𝑚𝑚𝑛𝑛𝑥𝑥�𝑡𝑡,𝑡𝑡+1 𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡 + ∑𝑖𝑖,𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑖𝑖 𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑖𝑖𝑖𝑖 �. (10.7)
𝑥𝑥𝑡𝑡

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Solving (10.7) gives us the facility decisions 𝑥𝑥𝑡𝑡 (which are implemented), and the planned
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
transportation decisions 𝑥𝑥�𝑡𝑡,𝑡𝑡+1 which are not implemented.

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Once we make the decisions 𝑥𝑥𝑡𝑡 = (𝑥𝑥𝑡𝑡 , 𝑥𝑥�𝑡𝑡,𝑡𝑡+1 ), we then have to evaluate the quality of the
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
solution. The planned transportation decisions 𝑥𝑥�𝑡𝑡,𝑡𝑡+1 were made based on forecasted demands
𝐷𝐷
𝑓𝑓𝑡𝑡,𝑡𝑡+1 . However, we are going to assume that the actual transportation decisions are only made
after we see the actual demands 𝐷𝐷 �𝑡𝑡+1.

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
We first need to update the facility state variable 𝑅𝑅𝑡𝑡 using the decisions 𝑥𝑥𝑡𝑡 computed
from solving (9.15) using

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


𝑅𝑅𝑡𝑡+1,𝑖𝑖 = 𝑅𝑅𝑡𝑡,𝑖𝑖 + 𝑥𝑥𝑡𝑡𝑡𝑡 . (10.8)

To find the actual transportation decisions, we let

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥𝑡𝑡+1,𝑖𝑖𝑖𝑖 �𝑡𝑡+1 .
=the actual transportation flows based on the demands 𝐷𝐷

We do this by solving the problem above using the actual demands, and where the facility
decisions have already been made (but not yet implemented). We can write this problem as

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∑𝑖𝑖,𝑗𝑗∈𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑖𝑖
𝑚𝑚𝑚𝑚𝑛𝑛𝑥𝑥𝑡𝑡+1 𝑥𝑥𝑡𝑡+1 (10.9)

which has to be solved subject to the constraints

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
∑𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑡𝑡+1,𝑖𝑖𝑖𝑖 ≤ 𝑞𝑞𝑡𝑡𝑡𝑡 for all 𝑖𝑖 ∈ 𝐼𝐼 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 , (10.10)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
∑𝑘𝑘∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑡𝑡+1,𝑘𝑘𝑘𝑘 �𝑡𝑡+1,𝑖𝑖
= 𝐷𝐷 for all 𝑖𝑖 ∈ 𝐼𝐼 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
, (10.11)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑖𝑖∈𝐼𝐼𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑥𝑥𝑡𝑡+1,𝑖𝑖𝑖𝑖 ≤ 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑅𝑅𝑡𝑡+1,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 , (10.12)
78

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑘𝑘∈𝐼𝐼𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑥𝑥𝑡𝑡+1,𝑗𝑗𝑗𝑗 ≤ 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑅𝑅𝑡𝑡+1,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 , (10.13)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥�𝑡𝑡+1,𝑖𝑖𝑖𝑖 ≥0 for all 𝑖𝑖, 𝑗𝑗 ∈ 𝐼𝐼 . (10.14)

�𝑡𝑡+1,𝑖𝑖 , whereas before we were using


Note that equation (10.11) is using the actual demands 𝐷𝐷
𝐷𝐷
the forecasted demands 𝑓𝑓𝑡𝑡,𝑡𝑡+1,𝑖𝑖 in equation (9.9).

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
We now want to evaluate the solution (𝑥𝑥𝑡𝑡 , 𝑥𝑥𝑡𝑡+1 ). Our performance metrics can be
divided between facility costs (including the cost of adding and dropping facilities), and the
actual transportation costs. Facility costs are given by

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑


𝐶𝐶 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑥𝑥𝑡𝑡 � = ∑𝑖𝑖∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑐𝑐𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡 + 𝑐𝑐𝑖𝑖𝑎𝑎𝑎𝑎𝑎𝑎 𝑥𝑥𝑡𝑡𝑡𝑡
𝑎𝑎𝑎𝑎𝑎𝑎
+ 𝑐𝑐𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡 �, (10.12)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 � 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝐶𝐶 �𝑥𝑥𝑡𝑡+1 , 𝐷𝐷𝑡𝑡+1 � = ∑𝑖𝑖,𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑖𝑖 𝑥𝑥𝑡𝑡+1,𝑖𝑖𝑖𝑖 . (10.13)

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
We assume that 𝑥𝑥𝑡𝑡 is the optimal solution from solving (10.7), and 𝑥𝑥𝑡𝑡+1 is the optimal
solution from solving (10.9) - (10.14). Of course, this means that we cannot compute
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 � �𝑡𝑡+1 . Let the total costs be
𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑥𝑥𝑡𝑡+1 , 𝐷𝐷𝑡𝑡+1 � until after we have observed 𝐷𝐷

�𝑡𝑡+1 )= 𝐶𝐶 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑥𝑥𝑡𝑡𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 � + 𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑥𝑥𝑡𝑡+1


𝐶𝐶(𝑥𝑥𝑡𝑡 , 𝐷𝐷 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �
, 𝐷𝐷𝑡𝑡+1 �. (10.14)

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑡𝑡+1 , and from


To evaluate our facility decision 𝑥𝑥𝑡𝑡 , we have to simulate different values of 𝐷𝐷
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
this create different values of 𝑥𝑥𝑡𝑡+1 . Assume we create 𝑛𝑛 = 1, … , 𝑁𝑁 samples of the vector 𝐷𝐷 �𝑡𝑡+1
which we designate

�𝑡𝑡+1
𝐷𝐷 1 �𝑡𝑡+1
, 𝐷𝐷 2 �𝑡𝑡+1
, … , 𝐷𝐷 𝑛𝑛 �𝑡𝑡+1
, … , 𝐷𝐷 𝑁𝑁
.

�𝑡𝑡+1
𝑛𝑛 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡,𝑛𝑛
For each sample 𝐷𝐷 , we compute a new set of transportation flows 𝑥𝑥𝑡𝑡+1 which allows us to
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡,𝑛𝑛 � 𝑛𝑛
compute a new set of costs 𝐶𝐶 �𝑥𝑥𝑡𝑡+1 , 𝐷𝐷𝑡𝑡+1 �. Finally, we evaluate the cost of our facilities
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
decision 𝑥𝑥𝑡𝑡 using

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡,𝑛𝑛 � 𝑛𝑛


𝐶𝐶̅ 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (𝑥𝑥𝑡𝑡 )= 𝐶𝐶 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑥𝑥𝑡𝑡 � + 𝑁𝑁 ∑𝑁𝑁
𝑛𝑛=1 𝐶𝐶 �𝑥𝑥𝑡𝑡+1 , 𝐷𝐷𝑡𝑡+1 � (10.15)

In the next section, we are going to recognize that we do not make facility decisions just once –
these are made repeatedly over time. Since the decisions we make at time 𝑡𝑡 depend on what
facilities have already been created (and which have not), this means that our decisions now will
impact the future, and need to understand how a decision now affects future decisions. In other
words, it is a sequential decision problem!
79

10.3 Evaluating the policy for a multiperiod problem


In the section above we evaluated the facility location policy for a single time period 𝑡𝑡, which
illustrates the need to compute the transportation flows twice: first using forecasted demands,
which we do in the optimization model where we choose the facilities, and then after we see the
actual demands.

This realization highlights that we may not be doing as well as we could when we optimize
facilities, since we are using point estimates of the demands that could produce a solution to the
facility location problem that is vulnerable to variations in demands. We are going to address
this issue in this section, but first we are going to transition to a full multiperiod setting,
recognizing that we are not going to solve our facility location problem just once – we will need
to keep solving it over and over.

In practice, it is likely that we would optimize flows daily, while we might reoptimize facilities
monthly or quarterly. To keep the notation as simple as possible, we will continue to assume
that we solve both problems at each time period.

In the previous section we averaged over 𝑁𝑁 samples of 𝐷𝐷 �𝑡𝑡+1. Here, we are going to simulate
our policy over time, using just a single sample of 𝐷𝐷 �𝑡𝑡+1 for time 𝑡𝑡 + 1. This time, however, we
will simulate over a planning horizon 𝑡𝑡 = 0, 1, … 𝑇𝑇. We are going to introduce a minor notational
change – we are now going to explicitly model state variables which capture the information we
need when we make a decision. Since we have two decisions (and therefore two policies), we
have two state variables: the facility state variable that captures the information we use to
optimize facilities

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑆𝑆𝑡𝑡 = (𝑅𝑅𝑡𝑡 , 𝑓𝑓𝑡𝑡𝐷𝐷 ),

�𝑡𝑡+1
and the transportation state variable which we use to optimize flows after the demands 𝐷𝐷
have become known (at time 𝑡𝑡 + 1):

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑡𝑡+1 ).


𝑆𝑆𝑡𝑡+1 = (𝑅𝑅𝑡𝑡+1 , 𝐷𝐷

We now write the optimization problem for determining the facilities (9.14) in the form of a policy

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑠𝑠


𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∑𝑖𝑖∈𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐
� = argmin � 𝑚𝑚𝑚𝑚𝑛𝑛𝑥𝑥�𝑡𝑡,𝑡𝑡+1 𝑖𝑖 𝑥𝑥𝑡𝑡𝑡𝑡 + ∑𝑖𝑖,𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑖𝑖 𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑖𝑖𝑖𝑖 �. (10.16)
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥𝑡𝑡

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
Note that we are only interested in 𝑥𝑥𝑡𝑡𝑡𝑡 ; we do not care about our determination of 𝑥𝑥�𝑡𝑡,𝑡𝑡+1,𝑖𝑖𝑖𝑖
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
since we are only computing the transportation flows to help us find 𝑥𝑥𝑡𝑡𝑡𝑡 .
80

After we optimize facilities, we update the vector that stores where we have facilities:

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


𝑅𝑅𝑡𝑡+1,𝑖𝑖 = 𝑅𝑅𝑡𝑡,𝑖𝑖 + 𝑥𝑥𝑡𝑡𝑡𝑡 . (10.17)

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑡𝑡+1 . This information, along with


After we determine 𝑥𝑥𝑡𝑡 , we assume we see the demands 𝐷𝐷
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
the decision 𝑥𝑥𝑡𝑡𝑡𝑡 , determines the state for the transportation decision.

We next write the optimization problem for determining the flows (9.16) in the form of a policy

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑆𝑆 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ) 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡


𝑋𝑋𝑡𝑡+1 𝑡𝑡+1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∑𝑖𝑖,𝑗𝑗∈𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑖𝑖
= 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑛𝑛𝑥𝑥𝑡𝑡+1 𝑥𝑥𝑡𝑡+1 (10.18)

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
Solving our transportation problem gives us 𝑥𝑥𝑡𝑡+1 . We then return to choosing the facilities
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝐷𝐷
𝑥𝑥𝑡𝑡+1 for time 𝑡𝑡 + 1, where we assume we are given a new set of forecasts 𝑓𝑓𝑡𝑡+1,𝑡𝑡+2 .

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑆𝑆 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ),


To evaluate our policies 𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 � and 𝑋𝑋𝑡𝑡+1 𝑡𝑡+1 we simply extend what we did in
the previous section to multiple periods. We can simulate a single sequence of demands
𝐷𝐷 �2 , … , 𝐷𝐷
�1 , 𝐷𝐷 �𝑇𝑇 (for this problem, the demands do not depend on decisions, so we can generate
these in advance). Using these simulated demands, we can evaluate the performance of our
policies using

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝐹𝐹� 𝜋𝜋 = ∑𝑇𝑇𝑡𝑡=0 �𝐶𝐶 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑆𝑆 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ), �
�� + 𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑋𝑋𝑡𝑡+1 𝑡𝑡+1 𝐷𝐷𝑡𝑡+1 ��. (10.19)

This approach evaluates the policy based on a single sample realization, which is just what we
did in our machine learning problems in Topic 1, as well as the asset selling and inventory
planning problems in Topic 2. We could create a sequence of samples

�1𝑛𝑛 , 𝐷𝐷
𝐷𝐷 �2𝑛𝑛 , … , 𝐷𝐷
�𝑡𝑡𝑛𝑛 , … , 𝐷𝐷
�𝑇𝑇𝑛𝑛 , for 𝑛𝑛 = 1, … , 𝑁𝑁.

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡,𝑛𝑛 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓,𝑛𝑛
We would then let 𝑆𝑆𝑡𝑡 and 𝑆𝑆𝑡𝑡+1 be the state variables created when following the 𝑛𝑛𝑡𝑡ℎ
set of demands. We then simulate our policy 𝑁𝑁 times and take an average:

1 𝑁𝑁
𝐹𝐹� = ∑ ∑𝑇𝑇 �𝐶𝐶 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑋𝑋𝑡𝑡𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑆𝑆𝑡𝑡𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓,𝑛𝑛 �� + 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡,𝑛𝑛 � 𝑛𝑛
𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑋𝑋𝑡𝑡+1 �𝑆𝑆𝑡𝑡+1 �, 𝐷𝐷𝑡𝑡 ��. (10.20)
𝑁𝑁 𝑛𝑛=1 𝑡𝑡=0

At this point we can comment on the quality of the solution produced by our policies.

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
We note that the decision 𝑥𝑥𝑡𝑡𝑡𝑡 impacts both 𝑆𝑆𝑡𝑡+1 as well as 𝑆𝑆𝑡𝑡+1 . If we assume that
inventory is not held from one time period to the next, then it would mean that the transportation
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
decisions 𝑥𝑥𝑡𝑡+1 do not impact the future, which means that our myopic policy (9.28) is optimal;
otherwise, we can improve the policy by capturing the impact of decisions at time 𝑡𝑡 on the
future.
81

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
The facility policy 𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 � in (9.26) is not optimal because facility decisions definitely
impact the future. In fact, we anticipate that we should be improve this policy significantly by
capturing the impact of decisions on the future.

Our challenge now is designing better policies. However we design the policies, we can
evaluate them using equation (9.30).

10.4 Alternative facility location policies


There are two reasons why our policy for optimizing facilities is not optimal:

1) The transportation flows - Although we are not implementing the estimates of the
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
flows 𝑥𝑥𝑡𝑡,𝑡𝑡+1 , it is still the case that our facility decisions 𝑥𝑥𝑡𝑡 depends on our
approximate model of the transportation flows which depends on the forecasted
demands rather than the actual demands.
2) Facility decisions made at time 𝑡𝑡, which depends on the facilities that are in use from
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
the previous time period, 𝑅𝑅𝑡𝑡−1 . The facility decisions 𝑥𝑥𝑡𝑡 then have an
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
impact on 𝑅𝑅𝑡𝑡+1 , which would then have an impact on the facility decisions
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑥𝑥𝑡𝑡+1 in the next time period. Our policy does not capture the impact of decisions
now on the future, so while we may be solving an optimization problem at time 𝑡𝑡, it
would not be an optimal policy.

Below we introduce two new policies. The first is designed to handle the random demands as
we did in section 10.2 for the single period model. The second policy is designed to handle
limits on how many facilities we can add or drop in any time period. This constraint requires that
we plan into the future. We illustrate this strategy using a simple deterministic lookahead
model, paralleling how we solved the dynamic shortest path problem in section 5.2. Both of
these policies would still need to be evaluated using the objective function in (10.20) which uses
the average performance of the policy over 𝑁𝑁 simulations.

10.4.1 Adjustment for random demands


We can modify our policy in a simple way to help with errors from using the forecasted demands
𝐷𝐷
𝑓𝑓𝑡𝑡,𝑡𝑡+1 �𝑡𝑡+1 . Using point estimates means we may not be
instead of the actual demands 𝐷𝐷
prepared to handle sudden spikes in demand. Our constraints (10.12) and (10.13) require that
we cannot exceed the capacity of the facilities, but while we might satisfy these constraints for
the forecasted demands when we are making our facility decisions (as we did with equations
(10.1)-(10.5)), we still have to satisfy the corresponding constraints (10.12) and (10.13) when
we are using the actual demands.
82

A simple way to resolve this problem would be to introduce a buffer 𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 < 1 that factors
down the capacity 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 within the facility policy, giving us modified constraints (10.12a) and
(10.13a):

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑖𝑖∈𝐼𝐼𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑥𝑥𝑖𝑖𝑖𝑖 ≤ 𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑅𝑅𝑡𝑡−1,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (10.12a)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑘𝑘∈𝐼𝐼𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑥𝑥𝑗𝑗𝑗𝑗 ≤ 𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑞𝑞 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑅𝑅𝑡𝑡−1,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (10.13a)

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Our facility policy 𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 � (equation (9.26)) now depends on the parameter 𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 ,
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
which means we should write it as a parameterized policy 𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 |𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �. We
would then write our objective function (9.26) as

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


𝐹𝐹� (𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 ) = ∑𝑇𝑇𝑡𝑡=0 �𝐶𝐶 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 �𝑆𝑆𝑡𝑡 , 𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 |𝜃𝜃 �� +
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑆𝑆 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ), �
𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑆𝑆𝑡𝑡+1 , 𝑋𝑋𝑡𝑡+1 𝑡𝑡+1 𝐷𝐷𝑡𝑡+1 �� (10.21)

This gives us a new optimization problem, just as we have seen earlier (for example, in Topic 2)
where we have to optimize the tunable parameter 𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 . We might write this as

min 𝐹𝐹� (𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 ) (10.22)


𝜃𝜃

For this optimization problem to make sense we would have to introduce penalties for not
satisfying demands in the event that total demand exceeds the capacities of the facilities.

The optimization problem in (9.32) is no different than any of the other parameter tuning
problems we have seen for machine learning in Topic 1, or the parameter optimization problems
for the policies in Topic 2.

There are other strategies we might introduce to capture the effect of changes to facilities at
time 𝑡𝑡 on future time periods, but the presentation here communicates the idea that a so-called
optimal solution to an integer program does not mean that it is an optimal policy (and in fact it
will virtually never be).

10.4.2 Deterministic lookahead model


A more realistic model of facility location would recognize that we are not going to open (or
close) an entire set of facilities all at once. This is especially true if we actually have to construct
the facility, but let’s say we are just leasing space. However, let’s say we have a limit on how
many new facilities we can open or close each time period. We do this for simplicity; we might
instead have a constraint on how much we can spend opening and closing facilities, but this
simpler model will illustrate a way of planning into the future.
83

𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
Recall from section 10.1 that 𝑥𝑥𝑡𝑡𝑡𝑡 = 1 if we open facility 𝑖𝑖, and 𝑥𝑥𝑡𝑡𝑡𝑡 = 1 if we close facility 𝑖𝑖.
We express the constraint on the number of openings and closing using

𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


∑𝑖𝑖∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥𝑡𝑡𝑡𝑡 + 𝑥𝑥𝑡𝑡𝑡𝑡 ≤ 𝑈𝑈𝑡𝑡 (10.23)

Given the constraint (10.23), to make a decision of what to build now, we could optimize over a
horizon 𝑡𝑡, … , 𝑡𝑡 + 𝐻𝐻. This is a lookahead model, so we use tilde’s to indicate that these are
variables for the lookahead model rather than the base model. Our decision variables are given
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
by 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 and 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 , where the index 𝑡𝑡 captures that we are solving this problem at time 𝑡𝑡 to

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


determine the facility decisions 𝑥𝑥𝑡𝑡𝑡𝑡 = 𝑥𝑥�𝑡𝑡,𝑡𝑡,𝑖𝑖 . We optimize the decisions 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 for 𝑡𝑡 ′ =
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑡𝑡 + 1, … , 𝑡𝑡 + 𝐻𝐻 just to help us make the decision 𝑥𝑥�𝑡𝑡,𝑡𝑡,𝑖𝑖 that is actually implemented (as
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥𝑡𝑡𝑡𝑡 ). We still have to optimize the transportation decisions 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 but, as before, we have

to model the transportation decisions to capture the interaction of facility location decisions on
transportation costs.

We are going to solve this just as we solved the dynamic shortest path problems (section 5.2)
where we optimize into the future, but only implement the decision in the first time period. We
use a single set of forecasted demands made at time 𝑡𝑡, for all the time periods into the future:

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 �= argmin ∑𝑡𝑡+𝐻𝐻
𝑡𝑡 ′ =𝑡𝑡 �∑𝑖𝑖∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
+ ∑𝑖𝑖,𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑖𝑖𝑖𝑖 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖𝑖𝑖 �, (10.24)
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ′
𝑥𝑥� ′ ,𝑥𝑥�𝑡𝑡,𝑡𝑡′ ,𝑡𝑡 =𝑡𝑡,…,𝑡𝑡+𝐻𝐻
𝑡𝑡,𝑡𝑡 ,

subject to the constraints for 𝑡𝑡 ′ = 𝑡𝑡, … , 𝑡𝑡 + 𝐻𝐻:

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
∑𝑗𝑗∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥�𝑡𝑡,𝑡𝑡′,𝑗𝑗𝑗𝑗 ≤ 𝑞𝑞𝑡𝑡 ′ ,𝑖𝑖 for all 𝑖𝑖 ∈ 𝐼𝐼 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 , (10.25)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝐷𝐷
∑𝑘𝑘∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 = 𝑓𝑓𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 for all 𝑖𝑖 ∈ 𝐼𝐼 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , (10.26)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑖𝑖∈𝐼𝐼𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖𝑖𝑖 ≤ 𝑞𝑞 𝑅𝑅𝑡𝑡 ′ ,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 , (10.27)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
∑𝑘𝑘∈𝐼𝐼𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑗𝑗𝑗𝑗 ≤ 𝑞𝑞 𝑅𝑅𝑡𝑡 ′ ,𝑗𝑗 for all 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 , (10.28)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖𝑖𝑖 ≥ 0 for all 𝑖𝑖, 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 . (10.29)

We then add a version of our constraint on the number of facilities to add or drop:

𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


∑𝑖𝑖∈𝐼𝐼𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 + 𝑥𝑥
�𝑡𝑡,𝑡𝑡 ′ ,𝑖𝑖 ≤ 𝑈𝑈𝑡𝑡 ′ for all 𝑖𝑖, 𝑗𝑗 ∈ 𝐼𝐼 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 . (10.30)

This problem is easy to write down, although solving it may be challenging even for commercial
solvers. It was a major breakthrough when we could solve a single facility location problem
which might be optimizing over 100 (or several hundred) possible locations for facilities. Now,
we are multiplying that problem by the number of time periods in our planning horizon.
84

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Assuming that we can solve the lookahead model, the policy 𝑋𝑋𝑡𝑡 �𝑆𝑆𝑡𝑡 � needs to be
simulated and evaluated using equation (10.20) just as we evaluated our original myopic policy.
We can also introduce parameters as we did in section (10.2) to help accommodate the
uncertainty in the flows.
85

Topic 11: Nonlinear programming


As with linear and integer programming, we are going to start with classical “static” nonlinear
programming problems, where we will introduce the formulation of portfolio optimization as a
quadratic programming problem. We will then transition to solving this in a fully sequential
manner.

11.1 Static portfolio optimization


Readings: This model is a streamlined version of the model in RLSO, section 13.2.4.

We are going to address a real problem solved by financial funds which requires that they
continuously make decisions about how much to invest in a set of assets. The approach we
describe here is based on an actual policy. It starts looking like a basic quadratic programming
problem, but we are then going to see that it is really solved sequentially, and as a result it is a
policy that needs to be tuned over multiple time periods as we have been doing in topics 7, 8
and 10.

We start by providing some notation:

𝑅𝑅𝑖𝑖 = Amount currently invested in asset 𝑖𝑖 ∈ 𝐼𝐼 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 .


𝑅𝑅0 =Current cash on hand.
𝑝𝑝𝑖𝑖 = Current price of asset 𝑖𝑖..
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑝𝑝𝑖𝑖 = Projected price in the future (say, 3 months out).
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑐𝑐𝑖𝑖 =Transaction cost for buying or selling a share of asset 𝑖𝑖.
𝑥𝑥𝑖𝑖 = Number of shares of asset 𝑖𝑖 purchased (if 𝑥𝑥𝑖𝑖 > 0) or sold (𝑥𝑥𝑖𝑖 < 0).

Our purchases and sales have to respect our available cash on hand, given by the constraint:

∑𝑖𝑖 𝑝𝑝𝑖𝑖 𝑥𝑥𝑖𝑖 ≤ 𝑅𝑅0 . (11.1)

The total transaction costs are given by

𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑥𝑥) = ∑𝑖𝑖 𝑐𝑐𝑖𝑖𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 |𝑥𝑥𝑡𝑡𝑡𝑡 |, (11.2)

where we take the absolute value since buying and selling transactions cost the same. To
eliminate the absolute value (which complicates the formulation as an optimization problem),
We first introduce a variable 𝑥𝑥𝑖𝑖𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = |𝑥𝑥𝑖𝑖 | that we compute by introducing two constraints:

𝑥𝑥𝑖𝑖𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ≥ 𝑥𝑥𝑖𝑖 (11.3)


𝑥𝑥𝑖𝑖𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ≥ −𝑥𝑥𝑖𝑖 (11.4)
86

Using this new variable, our transaction costs can now be written

𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑥𝑥) = ∑𝑖𝑖 𝑐𝑐𝑖𝑖𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥𝑖𝑖𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 . (11.5)

A major issue in portfolio management is minimizing risk, which we measure by the standard
deviation of the total return of the portfolio since the future price 𝑝𝑝̂𝑖𝑖 of asset 𝑖𝑖 will deviate from its
current price 𝑝𝑝𝑖𝑖 . We can use past data to compute the covariance matrix Σ of the prices of the
stocks, where element Σ𝑖𝑖𝑖𝑖 is

Σ𝑖𝑖𝑖𝑖 = 𝐶𝐶𝐶𝐶𝐶𝐶(𝑝𝑝̂ 𝑖𝑖 , 𝑝𝑝̂𝑗𝑗 ). (11.6)

The covariance 𝐶𝐶𝐶𝐶𝐶𝐶(𝑝𝑝̂ 𝑖𝑖 , 𝑝𝑝̂𝑗𝑗 ) has units of “dollars squared.” One way this information is
sometimes presented is using the correlation coefficient 𝜌𝜌𝑖𝑖𝑖𝑖 . To compute this, we compute the
standard deviation of price 𝑝𝑝̂𝑖𝑖 using

𝜎𝜎𝑖𝑖 = �𝑉𝑉𝑉𝑉𝑉𝑉(𝑝𝑝̂ 𝑖𝑖 ). (11.7)

𝐶𝐶𝐶𝐶𝐶𝐶(𝑝𝑝�𝑖𝑖 ,𝑝𝑝�𝑗𝑗 )
𝜌𝜌𝑖𝑖𝑖𝑖 =
𝜎𝜎𝑖𝑖 𝜎𝜎𝑗𝑗
. (11.8)

The correlation coefficient has the property that

−1 ≤ 𝜌𝜌𝑖𝑖𝑖𝑖 ≤ +1, (11.9)

where 𝜌𝜌𝑖𝑖𝑖𝑖 = 1 means that the prices of assets 𝑖𝑖 and 𝑗𝑗 are perfectly correlated. Highly correlated
stocks increase the volatility of the overall portfolio since if one stock drops, then other stocks
that are highly correlated also crop, increasing the overall volatility of the portfolio.

We have two objectives when optimizing a portfolio. One is to maximize the return on
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
the portfolio, where 𝑝𝑝𝑖𝑖 − 𝑝𝑝𝑖𝑖 is our estimate of how much the price might be increasing (or
decreasing). If we have 𝑅𝑅𝑖𝑖 shares and purchase (or sell) 𝑥𝑥𝑖𝑖 , we would then have 𝑟𝑟𝑖𝑖 + 𝑥𝑥𝑖𝑖 , and a
measure of our total return would be

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) = ∑𝑖𝑖 �(𝑅𝑅𝑖𝑖 + 𝑥𝑥𝑖𝑖 )�𝑝𝑝𝑖𝑖 − 𝑝𝑝𝑖𝑖 �� − 𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑥𝑥). (11.10)

The variance of the return of the portfolio is given by the quadratic formula

𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) = ∑𝑖𝑖 ∑𝑗𝑗 𝑅𝑅𝑖𝑖 𝑅𝑅𝑗𝑗 𝐶𝐶𝐶𝐶𝐶𝐶�𝑝𝑝𝑖𝑖 , 𝑝𝑝𝑗𝑗 �


= (𝑅𝑅 + 𝑥𝑥)𝑇𝑇 Σ(𝑅𝑅 + 𝑥𝑥). (11.11)

We combine the return and risk in a single objective function that we write as
87

𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑅𝑅, 𝑥𝑥|𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 � = 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) − 𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥). (11.12)

Note that we subtract 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) since this is something we wish to minimize. The parameter
𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 handles the scaling problem since the units of 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) and 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) are different.

Our optimization problem to determine the allocation 𝑥𝑥 is given by

max 𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑅𝑅, 𝑥𝑥|𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 �. (11.13)


𝑥𝑥

The optimization problem in (11.10) is a quadratic programming problem, which can be solved
with several available packages. Of course, one algorithmic strategy is the methods we used in
section 1.2 for fitting nonlinear machine learning models.

Setting the risk parameter 𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 typically involves solving (11.13) for a range of values of 𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
and then plotting 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) versus 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅, 𝑥𝑥) and then choosing the value of 𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 that
seems to strike the right balance for a particular situation.

Instructor: at this point go into as much detail as you want for nonlinear programming. Possible
topics are:

• Gradient-based search, unconstrained and then constrained.


• Performing one-dimensional searches for gradient-based search
• Starting points
• Second-order algorithms.
• Mirror descent
• Nonconvex problems

11.2 Dynamic portfolio optimization


Readings: Section 13.2.4 from RLSO

The portfolio optimization problem in section 11.1 is clearly a problem that has to be solved
repeatedly, over time, as new information is arriving. In other words, just as we illustrated with
linear and integer programming, this is a sequential decision problem, where the optimization
problem is actually a policy.

As with our facility location problem, we have to begin by indexing all the variables that are
changing with a time index. This gives us the following notation:

𝑅𝑅𝑡𝑡𝑡𝑡 = Amount currently invested in asset 𝑖𝑖 ∈ 𝐼𝐼 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 at time 𝑡𝑡.


88

𝑅𝑅𝑡𝑡0 =Current cash on hand at time 𝑡𝑡.


𝑝𝑝𝑡𝑡𝑡𝑡 = Current price of asset 𝑖𝑖..
𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑝𝑝𝑡𝑡𝑡𝑡 = Projected price in the future (say, 3 months out) given what we know at time 𝑡𝑡.
𝑥𝑥𝑡𝑡𝑡𝑡 = Number of shares of asset 𝑖𝑖 purchased (if 𝑥𝑥𝑖𝑖 > 0) or sold (𝑥𝑥𝑖𝑖 < 0).

We are going to add an adjustment term that contains variables which we feel helps to reflect
economic conditions. Examples of these variables might be

𝑦𝑦𝑡𝑡𝑡𝑡1 = producer price index for asset 𝑖𝑖,


𝑦𝑦𝑡𝑡𝑡𝑡2 = index of retailer inventories associated with asset 𝑖𝑖
𝑦𝑦𝑡𝑡𝑡𝑡3 = manufacturing index for asset 𝑖𝑖.

As before, our transactions are limited by our budget constraint

∑𝑖𝑖 𝑝𝑝𝑡𝑡𝑡𝑡 𝑥𝑥𝑡𝑡𝑡𝑡 ≤ 𝑅𝑅𝑡𝑡0 . (11.14)

For a dynamic system, we need to define the state variable 𝑆𝑆𝑡𝑡 that captures all the relevant
information at time 𝑡𝑡. For our problem, this would be

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝑆𝑆𝑡𝑡 = (𝑅𝑅𝑡𝑡 , 𝑝𝑝𝑡𝑡 , 𝑝𝑝𝑡𝑡 , 𝑦𝑦𝑡𝑡 ).

We still have our transaction variables

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥𝑡𝑡𝑡𝑡 ≥ 𝑥𝑥𝑡𝑡𝑡𝑡 , (11.15)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑥𝑥𝑡𝑡𝑡𝑡 ≥ −𝑥𝑥𝑡𝑡𝑡𝑡 , (11.16)

which allows us to calculate our transaction costs using

𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑥𝑥𝑡𝑡 ) = ∑𝑖𝑖 𝑐𝑐𝑖𝑖𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑥𝑥𝑡𝑡𝑡𝑡


𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
. (11.17)

Our portfolio return is still given by

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 ) = ∑𝑖𝑖 �(𝑅𝑅𝑡𝑡𝑡𝑡 + 𝑥𝑥𝑡𝑡𝑡𝑡 )�𝑝𝑝𝑡𝑡𝑡𝑡 − 𝑝𝑝𝑡𝑡𝑡𝑡 � − 𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 (𝑥𝑥𝑡𝑡 )�. (11.18)
⋮.
We are going to include an adjustment term using the economic variables 𝑦𝑦𝑡𝑡 . We may want to
modify each of these (square, log, …), so we are going to write our adjustment term as

𝑎𝑎𝑎𝑎𝑎𝑎
𝐶𝐶 𝑎𝑎𝑎𝑎𝑎𝑎 �𝑦𝑦𝑡𝑡 �𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 � = ∑𝑖𝑖 ∑𝑘𝑘 𝜃𝜃𝑖𝑖𝑖𝑖 𝜙𝜙𝑘𝑘 (𝑦𝑦𝑡𝑡𝑡𝑡𝑡𝑡 ). (11.19)

We still have our risk component that we wish to minimize:

𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 ) = (𝑅𝑅𝑡𝑡 + 𝑥𝑥𝑡𝑡 )𝑇𝑇 Σt (𝑅𝑅𝑡𝑡 + 𝑥𝑥𝑡𝑡 ), (11.20)


89

where the covariance matrix Σt is computed from a rolling set of observations from time periods
𝑡𝑡 − 𝐻𝐻, … , 𝑡𝑡.

The total adjusted return that we wish to optimize is now

𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 |𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 � = 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑆𝑆𝑡𝑡 , 𝑥𝑥𝑡𝑡 ) + 𝐶𝐶 𝑎𝑎𝑎𝑎𝑎𝑎 �𝑦𝑦𝑡𝑡 �𝜃𝜃 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 � − 𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝐶𝐶 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 (𝑅𝑅𝑡𝑡 , 𝑥𝑥𝑡𝑡 ). (11.12)

Our policy is then to solve the following problem at time 𝑡𝑡:

𝑋𝑋 𝜋𝜋 �𝑆𝑆𝑡𝑡 �𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 � = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑥𝑥𝑥𝑥𝑡𝑡 𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑅𝑅𝑡𝑡 , 𝑥𝑥𝑡𝑡 |𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 �. (11.13)

The way we can evaluate our policy is a process known in finance as “back testing” which is to
take a historical sequence of prices for each asset 𝑖𝑖

𝑝𝑝𝑡𝑡−𝐻𝐻,𝑖𝑖 , … , 𝑝𝑝𝑡𝑡−1,𝑖𝑖 , 𝑝𝑝𝑡𝑡𝑡𝑡 ,

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
along with a historical sequence of forecasts 𝑝𝑝𝑡𝑡𝑡𝑡

𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓


𝑝𝑝𝑡𝑡−𝐻𝐻,𝑖𝑖 , … 𝑝𝑝𝑡𝑡−1,𝑖𝑖 , 𝑝𝑝𝑡𝑡𝑡𝑡 ,

and the economic variables 𝑦𝑦𝑡𝑡𝑡𝑡

𝑦𝑦𝑡𝑡−𝐻𝐻,𝑘𝑘 , … , 𝑦𝑦𝑡𝑡−1,𝑘𝑘 , 𝑦𝑦𝑡𝑡𝑡𝑡 .

Using this data from history, we can evaluate our policy using

𝐹𝐹� 𝜋𝜋 (𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 ) = ∑𝑡𝑡𝑡𝑡 ′ =𝑡𝑡−𝐻𝐻 𝐶𝐶 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 �𝑅𝑅𝑡𝑡′ , 𝑥𝑥𝑡𝑡′ = 𝑋𝑋 𝜋𝜋 �𝑆𝑆𝑡𝑡′�𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 �|𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 � .

We assume that we have chosen a value for the risk parameter 𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 as we described for the
static model, but we can tune the parameters 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 using the optimization problem

max 𝐹𝐹� 𝜋𝜋 (𝜃𝜃 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 , 𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 ).


𝜃𝜃𝑎𝑎𝑎𝑎𝑎𝑎

Once again we are tuning a policy that is itself a deterministic optimization problem, but this time
it is a nonlinear programming problem.

You might also like