0% found this document useful (0 votes)
42 views240 pages

MA324 Lecture Notes

This document contains lecture notes for a course on mathematical modelling and simulation. It includes sections on optimization techniques like linear programming and nonlinear optimization. It also covers probability concepts and generating random variables for simulation. The notes provide information on modelling and solving various optimization problems and introduce key concepts in simulation.

Uploaded by

abhaygupta0912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views240 pages

MA324 Lecture Notes

This document contains lecture notes for a course on mathematical modelling and simulation. It includes sections on optimization techniques like linear programming and nonlinear optimization. It also covers probability concepts and generating random variables for simulation. The notes provide information on modelling and solving various optimization problems and introduce key concepts in simulation.

Uploaded by

abhaygupta0912
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 240

MA324

Mathematical Modelling
and Simulation
Lecture Notes
Winter Term 2023-24

Dr Aled Williams
[email protected]

Department of Mathematics
London School of Economics and Political Science
These notes are based in part on notes by Ahmad Abdi, Katerina Papadaki,
Gregory Sorkin, Giacomo Zambelli and on the textbooks [6, 21].
i

Contents

Contents i

1 Initial Information and Orientation 1


1.1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Teaching Arrangements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Course Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

I Mathematical Modelling 7

2 An Introduction to Optimisation and Modelling in Operational Research 9


2.1 An Introduction to Operational Research . . . . . . . . . . . . . . . . . . . . 9
2.2 Procedure for Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 An Introduction to Linear Programming . . . . . . . . . . . . . . . . . . . . . 11
2.4 Feasible and Optimal Solutions to Linear Programs . . . . . . . . . . . . . . 15
2.5 Assumptions of a Linear Programming Problem . . . . . . . . . . . . . . . . 19
2.6 Standard (In)equality Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Further Linear Programming Examples . . . . . . . . . . . . . . . . . . . . . 20
2.8 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Integer and Mixed Integer Programming Applications 27


3.1 Integer and Mixed Integer Programming . . . . . . . . . . . . . . . . . . . . 29
3.2 The Branch and Bound Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Integer Programming Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 The Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 The Set Covering Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
ii Contents

4 Modelling Tricks 51
4.1 Fixed Costs and the Big-M Method . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Facility Location and the Big-M Method . . . . . . . . . . . . . . . . . . . . . 53
4.3 Facility Location and Indicator Variables . . . . . . . . . . . . . . . . . . . . . 54
4.4 Expressing Logical Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Modelling “or” Constraints (Disjunctions) . . . . . . . . . . . . . . . . . . . . 57
4.6 Semi-Continuous Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.7 Binary Polynomial Programming . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.8 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Sensitivity Analysis 65
5.1 A Brief Review of Dual LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 Optimisation Problems on Graphs 81


6.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 Minimum Cost Flow Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.3 Integer Solutions to Minimum Cost Flow Problems . . . . . . . . . . . . . . 88
6.4 If Total Demand and Total Supply are Different in a Minimum Cost Flow
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.5 The Transportation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.6 The Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.7 The Shortest Path Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.8 The Maximum Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.9 Minimum s t Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.10 Maximum Flows vs Minimum Cuts . . . . . . . . . . . . . . . . . . . . . . . . 100
6.11 The Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.12 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7 Nonlinear Optimisation Models 113


7.1 An Introduction to Nonlinear Optimisation . . . . . . . . . . . . . . . . . . . 113
7.2 Global and Local Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.3 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.4 Univariate Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.5 Minima of Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.6 Convex Optimisation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.7 Example: Chemical Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Contents iii

7.8 Quadratic Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125


7.9 Second Order Cone Programming . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.10 Second Order Cone Programming Representable Sets . . . . . . . . . . . . 130
7.11 Applications of Second Order Cone Programming . . . . . . . . . . . . . . . 132
7.12 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

II Simulation 137

8 Statistics and Probability Background 139


8.1 Same Space and Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.2 Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.3 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . 140
8.4 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.5 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.6 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.7 Chebyshev’s Inequality and the Laws of Large Numbers . . . . . . . . . . . 145
8.8 Some Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.9 Some Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . 150
8.10 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

9 Random Numbers 159


9.1 Pseudorandom Number Generation . . . . . . . . . . . . . . . . . . . . . . . . 160
9.2 Using Random Numbers to Evaluate Integrals . . . . . . . . . . . . . . . . . 161
9.3 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

10 Generating Discrete Random Variables 165


10.1 The Inverse Transform Method . . . . . . . . . . . . . . . . . . . . . . . . . . 165
10.2 Generating a Poisson Random Variable . . . . . . . . . . . . . . . . . . . . . . 168
10.3 Generating Binomial Random Variables . . . . . . . . . . . . . . . . . . . . . 170
10.4 The Acceptance-Rejection Technique . . . . . . . . . . . . . . . . . . . . . . . 171
10.5 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

11 Generating Continuous Random Variables 175


11.1 The Inverse Transform Method . . . . . . . . . . . . . . . . . . . . . . . . . . 175
11.2 The Acceptance-Rejection Method . . . . . . . . . . . . . . . . . . . . . . . . . 178
11.3 Generating a Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
11.4 Generating a Nonhomogeneous Poisson Process . . . . . . . . . . . . . . . . 183
iv Contents

11.5 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

12 Discrete Event Simulation 187


12.1 Simulation via Discrete Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.2 A Queuing System with a Single Server . . . . . . . . . . . . . . . . . . . . . 189
12.3 A Queuing System with Two Servers in Series . . . . . . . . . . . . . . . . . 192
12.4 A Queuing System with Two Servers in Parallel . . . . . . . . . . . . . . . . 196
12.5 An Inventory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
12.6 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

13 Statistical Analysis of Simulated Data 203


13.1 The Sample Mean and Sample Variance . . . . . . . . . . . . . . . . . . . . . 204
13.2 Interval Estimates of a Population Mean . . . . . . . . . . . . . . . . . . . . . 211
13.3 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

14 Variance Reduction Techniques 215


14.1 The Use of Antithetic Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
14.2 The Use of Control Varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
14.3 Exercises for Self-Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

15 A Brief Introduction to Markov Chains 229

16 Markov Chain Monte Carlo methods 231

Bibliography 233
1

Chapter 1

Initial Information and Orientation

1.1 General Information

Welcome to Mathematical Modelling and Simulation (MA324). The convener for the
course is Dr Aled Williams. My room number and email address are COL.5.05 (Columbia
House) and [email protected], respectively.
You have two primary options if you would like to discuss anything related to this
course. You can firstly post your question to the anonymous discussion forum (avail-
able through Moodle). This approach will likely yield a much faster response time and
your peers will additionally benefit from the question. It should be emphasised that
the anonymous forum is completely anonymous (meaning that no one including the
lecturer cannot see who has posted). Further, it is very valuable if you can answer ques-
tions posted by your peers as studies demonstrate that teaching others is one of the most
effective ways to actually deepen your understanding and enhance learning gains.
You can instead come to my office hours. Note that you do not need to book an ap-
pointment. My office hours during the Winter Term (WT) are 13:30-14:30 each Monday
in COL.5.05 (Columbia House)1 . These are generally in-person, however, if you would
like to instead meet remotely please drop me an email. It should be noted that these
times could change, however, any such change will be outlined on the course announce-
ments forum (available through Moodle).

1.2 Teaching Arrangements

Lecture Notes

The lecture notes have been designed as a self-contained study resource for the MA324
Mathematical Modelling and Simulation course at the LSE. It should be noted that the
1
Note that my office hours will begin in week 3 this term. Despite this, if you have additional questions
please post on the anonymous discussion forum and I will be more than happy to help.
2 Chapter 1. Initial Information and Orientation

lecture notes are by design “gappy”, which means that gaps are interspersed throughout
and during the lectures we will fill in these gaps together. It should be noted that this
means studying the lecture notes without also attending lectures will not be sufficient
for the course.
The current chapter, Chapter 1, is an orientation section. This chapter provides
information about the course arrangements including lectures, classes, computer work-
shops and course assessment. The other chapters are split across two parts, namely
Mathematical Modelling and Simulation.

Lectures

There is one live lecture each week, taking place on Tuesday 14:00-16:00 (SAL.G.03, Sir
Arthur Lewis Building, formerly 32 Lincoln’s Inn Fields). These will run from WT week
1 through to WT week 10. The aim of these sessions is to help you consolidate your
understanding of the material and the lecture notes. Further, because the notes are by
design “gappy”, throughout the lectures we will fill in the gaps. It should be noted that
we may not cover all the material in the lecture notes during the lectures.
The lectures will be additionally recorded and you can access the videos through
Moodle. Note that there may be a time lag of two days before you can access the
recorded lecture for technical reasons.

Classes

There is one class each week, which takes place either on Monday 9:00-10:00 or Monday
10:00-11:00 (FAW.3.02, Fawcett House). Your personal timetable indicates which of
the two classes you should attend. These classes will run from WT week 2 through
to WT week 11. The aim of these sessions is to strengthen your understanding of the
course material through a combination of individual and group work. Please note that
attendance at these sessions is compulsory and attendance will be recorded. If you have
a good reason to be absent then please email me ahead of the session. The classes will
not be recorded as per departmental policy.

Computer Workshops

There are computer workshops during WT week 3, 5, 7, 9 and 11. This takes place
on Thursday 9:00-10:00 or Thursday 10:00-11:00 (FAW.4.02, Fawcett House). These
workshops are optional and your personal timetable indicates which of the two classes
you are timetabled to attend.
1.3. Syllabus 3

The primary aim of these sessions is to support you in learning how to utilise the
relevant software (AMPL and R) in order to answer real world problems. No material
will be covered during these sessions, however, you can ask programming questions on
programming examples that were covered in the lectures and classes or instead get help
with the using the software on the mock project. Because of this, you are completely free
to drop in and out of either computer workshop as you see fit. The computer workshops
will not be recorded.

The Course Forums

Recall that there are two course forums that can be accessible through Moodle. One
of the forums (course announcements) is for general announcements that will be made
throughout the term. The other forum (anonymous discussion) is intended for any
questions you may have about the course. It should be noted that it is expected that you
try to answer questions posed by your peers via the anonymous forum as this will have
a positive impact on deepening your understanding of the course material.

Exercises for Self-Study

There are some exercises marked “exercises for self-study” at the end of each Chapter of
the lecture notes. These exercises are designed to deepen your understanding through
applying the lecture material. Note that completing these exercises is entirely optional
and complete solutions to these exercises will not be provided.

1.3 Syllabus
By the end of the academic year we will have covered a wide range of mathematical
topics that provide a broad introduction to mathematical modelling and simulation.
Throughout this course there will be an applied focus as we make use of appropriate
computer software to solve real world problems. The topics we discuss can be split
into two rather broad categories, namely mathematical modelling and simulation. For
mathematical modelling we will discuss:

• linear programming,

• integer programming,

• optimisation problems on graphs,

• sensitivity analysis, and


4 Chapter 1. Initial Information and Orientation

• nonlinear optimisation.

For simulation we will discuss:

• generating discrete random variables,

• generating continuous random variables,

• Monte Carlo simulation,

• discrete event simulation,

• variance reduction techniques, and

• Markov chain Monte Carlo methods.

1.4 Course Assessment


The course assessment can be split into two categories, namely formative and summative
assessment.

Weekly Exercises (Formative)

You will be given weekly exercises to complete. These exercises will be initiated at the
end of each class and you will complete them at home. The deadline for each weekly
submission is Thursday at 5pm in the week which your class takes place. Your first class
for instance takes place on Jan 22 (Monday) and the deadline for that submission is Jan
25 (Thursday) at 5pm.
Your solution should be submitted electronically as a PDF file plus any code files
you utilise. You will receive both individual and collective feedback on your work. The
feedback will emphasise key ideas and draw attention to common mistakes and miscon-
ceptions.
It should be emphasised that this work is formative and as such will not contribute
towards your final grade, however, some of the homework will feature questions that
are similar in nature to what will be expected from the project.

Mock Project (Formative)

You will be given a formative mock project in the second half of WT. In particular, you
will be given a mock project in WT week 5 (Feb 14, 1pm) and have three complete
weeks to complete the work (Mar 11, midnight). This mock project will not contribute
1.4. Course Assessment 5

towards your final grade, however, it will give a good indication of what to expect from
the final assessment. You will receive individual and collective feedback on your work.
The mock project will be approximately one third of the size of the final project.

Summative Project

There will be one individual summative project in the Spring Term (ST) worth 100% of
your final mark. This will cover mathematical modelling and/or simulation and will be
a report of around 15-20 pages, along with a copy of any computer code that you make
use of. More information will come later.
Part I

Mathematical Modelling

7
9

Chapter 2

An Introduction to Optimisation and


Modelling in Operational Research

2.1 An Introduction to Operational Research

Operational research is a discipline of applying advanced analytical methods in order


to make better decisions. In non-British usage, this domain is known as operations
research, however, in both cases we normally just write OR.
OR comprises a rather broad range of both theoretical methods and application
areas, where applications include manufacturing, transportation, telecommunication,
health care, public policy, marketing, revenue management and construction. The mod-
ern field of OR was established in World War II, where the massive logistic and organ-
isational questions arising in the planning of military operations engaged the efforts of
scientists from various disciplines including mathematics. The most fundamental tool
of OR, namely linear programming, was developed in this era by George Dantzig and
John von Neumann in the United States and by Leonid Kantorovich in the USSR (see
e.g. [10]). Further, the celebrated simplex method for solving linear programs (LPs)
was developed around 1947 by Dantzig [8, 9]. This was the first method that solved
linear programming problems efficiently in “most cases” and remains the most heavily
used algorithm for solving large-scale LPs in various applications.
One key step in OR applications is modelling. A model is a structure that exhibits
the essential and relevant features and characteristics of some underlying real-world
problem. Note that one may build a model for many reasons including gaining an un-
derstanding about the underlying problem, evaluating or informing decision making
and to perform experiments. The main components of a model are:

1. an objective function: a function that you need to maximise or minimise

2. decision variables: the value of these variables are under control and influence the outcome
10 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

3. constraints: restrictions on the values of certain decision variables

After formulating such a model, the goal is to find the values of the decision variables
that gives the optimised outcome for the objective function. The general process of
modelling is discussed in the next section. The process of finding such decision variables
is a mathematical optimisation problem.
The term mathematical optimisation (or mathematical programming) refers to using
mathematical tools for optimally allocating and using limited resources when planning
activities. It should be noted that the use of the term mathematical “programming” here
has a rather old school meaning of simply meaning planning and does not refer to the
process of creating a set of instructions that tell a computer how to perform a task.
Mathematical optimisation deals with optimisation problems. An optimisation prob-
lem consists of maximising or minimising some function, known as the objective or cost
function, subject to some constraints. The objective function could for example repre-
sent total profit or cost, total number of staff or total carbon emissions associated with
a project, while, the constraints represent limitations on the available resources or on
the way these can be used.
There are perhaps unsurprisingly a wide variety of such mathematical models. In
particular, some of the broad types of models are:

• linear and nonlinear models: a linear model is one in which the objective function
and constraints are linear, else we have a nonlinear model.

• integer and noninteger models: if one or more decision variable must be integer
then the optimisation model is an integer model, else the model is noninteger.

• deterministic and stochastic models: we have a deterministic model when the


value of the objective function and whether or not the constraints are satisfied is
known with certainty, else we have a stochastic model.

It should be emphasised that the above distinctions do not necessarily correspond to


models that are in some sense “easy” (solvable) or “hard” (unsolvable).

2.2 Procedure for Modelling


When tasked with solving a real-world decision problem, the following general proce-
dure is followed, as illustrated by Figure 2.1. Firstly, when given a real-world decision
problem one should try to formulate a mathematical model of the problem. It should be
noted that such a formulation need not exist in general since, for example, it could be
2.3. An Introduction to Linear Programming 11

the case that you do not have a single objective function or that perhaps your objectives
are conflicting. Further, such a model may not capture all details of your real life situa-
tion since, for example, a model may simply not exist for your problem or perhaps you
can formulate a sufficiently detailed model but not solve this in practice. This suggests
there may be some trade off between the accuracy of your model and your ability to
actually solve your model.

Figure 2.1: Procedure for solving a real-world decision problem

After formulating the model then should implement this model using some program-
ming language. In particular, we will be using AMPL (A Mathematical Programming
Language), which is specifically designed for mathematical optimisation. To find the
computer solution we make use of a solver, which AMPL has variety of available. The
computer solution will yield our mathematical solution which should be then interpreted
as a real life solution, which is something can be implemented in the real-world. It may
be that the real life solution is accepted by the organisation who set the initial problem,
however, it could turn out that the solution is unsatisfactory and as such one would go
back and change the model.

2.3 An Introduction to Linear Programming

The focus of the following sections in on linear programming, a simple yet fundamen-
tal mathematical optimisation model. Recall that a linear model is one in which both
the objective function and constraints are linear. A linear programming problem is to
determine values of the decision variables in order to maximise (or minimise) a lin-
12 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

ear objective function subject to linear constraints. This is the fundamental problem in
mathematical optimisation.
Formally, a general linear programming problem is of the form

where the number of variables is n, the number of constraints is m and the symbol ø
denotes any one of the ,  or = relations.
In matrix algebra terms, a linear programming problem is

where c = (c1 , c2 , . . . , cn ) T 2 Rn is an n-dimensional column vector of the real coeffi-


cients of the objective function, x = (x 1 , x 2 , . . . , x n ) T is an n-dimensional column vector
of the decision variables, A 2 Rm⇥n is a real matrix with m rows and n columns, ø
denotes a mixture of ,  and = constraints and b 2 Rm is an m-dimensional column
vector of the right-hand sides of the constraints. Recall the main components of a model
are an objective function, decision variables and constraints. In this setting, c T x is the
objective function, the x j ’s are the decision variables and Ax ø b are the constraints.
In OR, it turns out that most of the time the decision variables are restricted to be
nonnegative. For this reason, we will distinguish the nonnegativity constraints, which are
those of the form x i 0 for i 2 {1, 2, . . . , n}, from the other constraints. The remaining
constraints will be referred to as resource constraints.
A linear programming problem in which all variables are nonnegative is of the form

minimise or maximise c1 x 1 + c2 x 2 + . . . + cn x n
subject to a11 x 1 + a12 x 2 + . . . + a1n x n ø b1
a21 x 1 + a22 x 2 + . . . + a2n x n ø b2
.. (2.1)
.
am1 x 1 + am2 x 2 + . . . + amn x n ø bm
xj 0, j = 1, 2, . . . , n ,
2.3. An Introduction to Linear Programming 13

where the number of variables is n, the number of resource constraints is m and there
are n nonnegativity constraints x j 0 for all j 2 {1, 2, . . . , n}. We will say in such case
that the linear program (LP) has size m ⇥ n, namely if the LP has n variables and m
resource constraints.
Similarly, In matrix algebra terms, a linear programming problem in which all vari-
ables are nonnegative can be written in the form

minimise or maximise cT x
subject to Ax ø b
x 0,

where 0 = (0, 0, . . . , 0) T denotes the n-dimensional zero vector.


The following example demonstrates how we can formulate a mathematical optimi-
sation problem using linear programming.

Example. (Furniture production) A furniture factory produces two types of benches using
steel and wood. The benches are then sold to furniture shops for £3000 and £1000 per
dozen of units of each type, respectively. Wood is only used for benches of type 1 and steel
is used in both. It requires 1 tonne of steel to produce a dozen benches of type 2, while, the
same amount of benches of type 1 requires 1 tonne of steel and 1 tonne of wood. In total,
the factory has 3 tonnes of steel and 2 tonnes of wood available in the next month. The
question facing the factory is, given the limited availability of materials, what quantity (in
dozens) of each product should the company produce in the next month, in order to achieve
the maximum total profit?

It should be noted that it is not always be quite so straightforward to formulate a


mathematical model from the given information. In light of this, it is often useful to ask
the following questions:
14 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

1. What can the company decide to do?

2. What is the company’s objective?

3. What restricts the company’s objective?

In the previous example, the factory can decide how many benches of each type can
be produced (which corresponds to the decision variables), the factory’s objective was
to achieve the maximum total profit (corresponding to the objective function) and the
factory was restricted by limited availability of materials (corresponding to the resource
constraints). Further, note that the nonnegativity constraints were deduced as it would
not make sense to produce a negative number of benches in a month.

Let us briefly look at an abstract production model. For this purpose, consider some
company that produces n products using m types of material. For the next time pe-
riod the unit prices for the n products are projected to be c1 , c2 , . . . , cn . The amount of
materials available to the company in the next month are given by b1 , b2 , . . . , bm . The
amount of material i 2 {1, 2, . . . , m} consumed by a unit of product j 2 {1, 2, . . . , n} is
given by ai j 0, where some ai j ’s can take value zero if j does not use material i. Given
the limited availability of materials, what quantity of each product should the company
produce in the next time period in order to achieve maximum total profit?
This above information can be formulated as an LP, namely

(2.2)

Note once more that the nonnegativity constraints were deduced since it would not make
sense to expect the company to produce a negative amount of a product. In addition, if
one product, say product j is not profitable, i.e. if c j < 0, then without the nonnegativity
constraints the model would produce a solution with x j < 0 which would generate a
profit c j x j > 0. In fact, as a negative amount of a product j would not consume any
material but instead “generate” materials, one could in such case drive the profit towards
positive infinity by forcing x j to tend to negative infinity.
Our objective in the LP (2.2) is to maximise the combined unit prices for the items
produced subject to bounds on the availability of the m types of material. In a similar
2.4. Feasible and Optimal Solutions to Linear Programs 15

fashion, in matrix algebra terms, the abstract production model corresponds to the LP

2.4 Feasible and Optimal Solutions to Linear Programs

Consider the LP
maximise 2x 1 + x 2
subject to 5x 1 + 11x 2  90
(2.3)
x2  5
x1, x2, 0.

We can represent each constraint in (x 1 , x 2 ) space. Note that the nonnegativity con-
straints on the variables imply that only the positive quadrant including the axes needs
to be considered. This example is illustrated in Figure 2.2.

Figure 2.2: The feasible region for the LP (2.3) is represented by the area shaded grey.

Notice that it is not the case that all points in the (x 1 , x 2 ) space satisfy all of the
constraints. The point (0, 6) for example does not satisfy the second constraint, while,
the point (0, 0) in contrast does satisfy each of the constraints. This observation inspires
the following definitions. A point is called a feasible point for an LP if it satisfies all
constraints. The set of all feasible points form the feasible region of the problem. The
feasible region of the above LP (2.3) is depicted in grey in Figure 2.2.
16 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

Observe the optimal solution to the LP (2.3), namely (0, 18) is a corner point of
feasible region. This is not a coincidence as it is true in general that the optimum is
achieved by some “corner point”. For this purpose, we define what we mean by “cor-
ner points”, which in the language of linear programming are called extreme points. An
extreme point is a feasible point that satisfies at equality n (in this case n = 2) indepen-
dent linear constraints. Note in the above example (18, 0) and (0, 0) are examples of
two extreme points, while, (1, 1) and (5, 0) are not extreme. One of the most fundamen-
tal and useful facts in the theory of linear programming is that if an LP has a solution
that satisfies at equality n independent linear constraints and the LP admits an optimal
solution, then there exists some optimal solution that is an extreme point of the feasible
region. This fact is useful as it tells us that when one solves an LP it is enough to search
among the extreme points and pick the one with the best objective function value.
The above Figure 2.2 further shows that likely there are multiple feasible solutions
satisfying our LP. It should be noted that if there are multiple feasible solutions, then
our feasible region is neither empty nor a single point. A natural question is to here ask
which of the possible feasible solutions is “best”? This question inspires the following
definition. Given a maximisation (or minimisation) linear programming problem, an
optimal solution for the LP is a feasible point where the largest (or smallest in the case
of minimisation) value of the objective function is taken among all feasible points.
The LP represented in Figure 2.2 has one optimal solution, however, it should be em-
phasised that there may be cases where there exist multiple optimal solutions. Suppose
for example that instead of maximising 2x 1 + x 2 in the LP (2.3), that we were tasked
with minimising the value of x 2 , namely that we have the LP

minimise x2
subject to 5x 1 + 11x 2  90
x2  5
x1, x2, 0.

Note that the optimal value must be between 0 and 5 because the constraints of the
problem yield that 0  x 2  5. Further, observe that the points (0, 0) and (1, 0) are both
feasible solutions that obtain objective value 0 (since both points have second coordi-
nate equal to zero). It follows in consequence that this problem has multiple optimal
solutions. In particular, all points of the form (a, 0) are optimal, where 0  a  18.
It can happen that a linear programming problem has no feasible solution. In such
case, we say that the LP is infeasible. Note that equivalently we could state that an LP
2.4. Feasible and Optimal Solutions to Linear Programs 17

is infeasible if its feasible region is empty. Consider the LP

maximise 2x 1 + x 2
subject to x1 + x2  4
(2.4)
x2 5
x1, x2, 0.

It can be seen from a diagram that this LP (2.4) is indeed infeasible. Instead, one could
algebraically see that this problem is infeasible since if x 2 5 and both variables x 1 , x 2
are nonnegative, then their sum x 1 + x 2 must have value at least 5, meaning that the
first constraint is violated.
It can happen that a linear programming problem has feasible solutions yet there
does not exist an optimal solution. Note that this means the LP may have a nonempty
feasible region but not attain some maximum or minimum value for the objective func-
tion. Consider the LP
maximise x1 + x2
subject to x1 + x2 1
x 1 +x 2  2 (2.5)
x 1 2x 2  2
x1, x2, 0.

The LP (2.5) is illustrated in Figure 2.3. From the diagram, it is evident that there exist
feasible solutions for the LP with arbitrarily large value in the objective function. This
observation inspires the following definition. A maximisation (minimisation) linear pro-
gramming problem is unbounded if there exist feasible solutions with arbitrarily large
positive (negative) value in the objective function. Informally, an LP is unbounded if it
is feasible but its objective function can be made arbitrarily “good” or “bad”.
Observe that an LP is unbounded only if its feasible set is an unbounded set. How-
ever, an unbounded feasibility set does not necessarily imply that the LP is itself un-
bounded. If for example in the LP (2.5) we were asked to instead minimise x 1 + x 2 then
we clearly have at least one optimal solution.
It should be noted that while the unboundedness of an LP is a mathematical possi-
bility, in real-world scenarios, a message from the solver indicating that the problem is
unbounded likely indicates that some error has occured during the mathematical for-
mulation of the problem, which typically is the omission of at least one constraint.
In summary, for any linear programming problem, exactly one of the following sce-
narios must occur:
18 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

Figure 2.3: An LP that is unbounded.

It should be emphasised that the fact we mention LPs in the above statement is crucial.
In particular, the statement may be false when given an optimisation problem that is not
linear. For example, consider the following nonlinear optimisation problem

minimise x2
subject to x1 · x2 1
x1, x2, 0.

In this scenario, notice that the problem is feasible, as (1, 1) for example satisfies the
constraints and the problem is not unbounded, since it is a minimisation problem that
takes solution no less than 0. Despite this, there is no optimal solution as there are
feasible solutions for which x 2 take values arbitrarily close to 0, however, no feasible
solution with x 2 = 0 exists in light of the first constraint.
2.5. Assumptions of a Linear Programming Problem 19

2.5 Assumptions of a Linear Programming Problem

It is natural to here ask when is it appropriate to formulate an LP for modelling a real-


world problem? It turns out that there are a number of assumptions that you are making.
If these hold true, then formulating an LP will be appropriate. Despite this, the assump-
tions may not apply to your situation and, in such case, either formulating an LP may
yield a good approximation or it could be of no use. The assumptions are:

1.

2.

3.

4.

It should be emphasised that the first assumption would not normally hold because of
economies or diseconomies of scale, as the per unit profit is not independent of the
number of items sold. The second assumption may not hold as two products may can-
nibalise one another. The third assumption makes a significant difference in scenarios
where the variables must take integer values and simply rounding would not be suf-
ficient. The fourth assumption may be overcome provided the level of uncertainty is
“small” by making use of tools from sensitivity analysis, which is a topic we study later,
however, if the level of uncertainty is “large”, then one may have to tackle such problems
using tools from stochastic programming or robust optimisation.

2.6 Standard (In)equality Forms

When dealing with linear programming problems, it is often convenient to assume that
they are of some specific form. We will often consider problems in one of the following
forms.
An LP is in standard form if it is of the form

where A 2 Rm⇥n , c 2 Rn and b 2 Rm and x is a vector of variables in Rn .


20 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

An LP is in standard equality form if it is of the form

It should be noted that it is possible to show that any general LP “can be brought” into
either of the above forms. This means that it is always possible to take any LP and write
an “equivalent” LP in either standard form or standard equality form as desired. An
important note is that if an LP is a minimisation problem, then it can be turned into a
maximisation problem after replacing min c T x by max c T x subject to the same con-
straints. For this reason, it does not make a difference from a mathematical standpoint
in linear programming if we are maximising or minimising.

2.7 Further Linear Programming Examples


The following example demonstrates how we can formulate a mathematical optimisa-
tion problem using linear programming.

Example. (Phone manufacturing) A phone manufacturer has just announced two new
phones, a budget model and a high-end model, to be released in a particular region where
upper bounds on the demand for these models are known. Suppose for simplicity that the
making of a phone can be simplified down to two resources, namely the minutes of machine
time and the material required. The resources are limited since we have only 2500 minutes
of machine time per month and 4800 units of material. The required resources per phone,
the demand per month and the sales price for each model are given in the table below.
Model this scenario as an LP and solve.

Machine Minutes Material Demand Sales Price


Budget Model 5 5 400 200
High-end Model 7 17 270 400
2.8. Exercises for Self-Study 21

The following example demonstrates how we can formulate a mathematical optimi-


sation problem using linear programming.

Example. (Fast food restaurant) A popular fast food restaurant in Covent Garden makes
burgers from some combination of high quality meat and some cheaper meat with a higher
fat content. The company keeps precise details a secret, however, the restaurant guarantees
that its burgers have a fat content no more than 25%. The high quality meat costs 80p per
kilogram and comprises of 80% lean meat and 20% fat. The cheaper meat costs 60p per
kilogram and comprises of 68% lean meat and 32% fat. How much of each kind of meat
should the restaurant use in each kilogram of burger meat if it wants to minimise its cost
and ensure the fat content is no greater than no more than 25%?

It should be noted that the examples we have seen to date have not required a large
amount of data. This will not be the case for all examples and, for that reason, in some
scenarios it may be useful to change the right-hand sides, i.e. the bi ’s appearing in the
LP (2.1) with i 2 {1, 2, . . . , m} to parameters. In particular, for each constraint, it may be
useful to replace the right-hand side with some parameter, save all the data (parameter
values) in a data file and call upon this data before using the solver via AMPL.

2.8 Exercises for Self-Study

1. For each of the following constraints, draw a separate graph to show the nonneg-
ative solutions that satisfy the constraint.

a) x 1 + 3x 2  6.

b) 4x 1 + 3x 2  12.

c) 4x 1 + x 2  8.
22 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

Combine these constraints into a single graph to show the feasible region, where
these constraints are our resource constraints plus nonnegativity.

2. Consider the maximisation LP corresponding to the above resource constraints


plus nonnegativity where the (linear) objective function is unknown, namely

maximise c1 x 1 +c2 x 2
subject to x 1 + 3x 2  6
4x 1 + 3x 2  12
4x 1 + x 2  8
x1, x2 0,

for some c1 , c2 2 R. Find values of c1 and c2 such that the problem:

a) has a unique optimal solution,

b) has multiple optimal solutions,

c) is infeasible, and

d) is unbounded.

If for some case there does not exist such c1 and c2 , then you should justify why
this is the case.

3. Consider the maximisation LP

maximise 2x 1 + x 2
subject to x 2  10
2x 1 + 5x 2  60
x 1 + x 2  18
3x 1 + x 2  44
x1, x2 0.

a) Solve this LP graphically.

b) Solve this LP using the solver via AMPL.

4. Recall the maximisation LP given in the previous exercise. Which (if any) of the
constraints can be removed without changing the optimal solution? Verify your
answer using the solver via AMPL.
2.8. Exercises for Self-Study 23

5. This is your lucky day. You have just won a £10,000 prize. You are setting aside
£4, 000 for taxes and general partying expenses, however, you have decided to in-
vest the other £6, 000. Upon hearing the news, two different friends have offered
you an opportunity to become a partner in two different entrepreneurial ventures,
one planned by each friend. In both cases, this investment would involve expend-
ing some of your time next summer as well as putting up cash. Becoming a full
partner in the first friend’s venture would require an investment of £5, 000 and
400 hours, where your estimated profit (ignoring the value of your time) would
be £4, 500. The corresponding figures for the second friend’s venture are £4, 000
and 500 hours, with an estimated profit to you of £4, 500. However, both friends
are flexible and would allow you to come in at any fraction of a full partnership
you would like. If you choose a fraction of a full partnership, all the above figures
given for a full partnership (money investment, time investment and your profit)
would be multiplied by this same fraction.

Because you were looking for an interesting summer job anyway (for a maximum
of 600 hours), you have decided to participate in one or both friends’ ventures
in whichever combination would maximise your total estimated profit. You now
need to solve the problem of finding the best combination. Model this scenario as
an LP and solve using AMPL.

6. The Household Web sells many household products online. The company needs
substantial warehouse space for storing its goods. Plans now are being made for
leasing warehouse storage space over the next 5 months. Just how much space
will be required in each of these months is known. However, since these space
requirements are quite different, it may be most economical to lease only the
amount needed each month on a month-by-month basis. On the other hand, the
additional cost for leasing space for additional months is much less than for the
first month, so it may be less expensive to lease the maximum amount needed for
the entire 5 months. Another option is the intermediate approach of changing the
total amount of space leased (by adding a new lease and/or having an old lease
expire) at least once but not every month.

Month Required Space (sq. ft) Month Cost per sq. ft Leased
1 30,000 1 £65
2 20,000 2 £100
3 40,000 3 £135
4 10,000 4 £160
5 50,000 5 £190
24 Chapter 2. An Introduction to Optimisation and Modelling in Operational Research

The space requirement and the leasing costs for the various leasing periods are
described in the following tables. The objective is to minimise the total leasing
cost for meeting the space requirements. Formulate a linear programming model
for this problem and solve.

7. The London School of Economics maintains a powerful mainframe computer for


research use by its faculty, PhD students, and research associates. During all work-
ing hours, an operator must be available to operate and maintain the computer,
as well as to perform some programming services. Beryl Ingram, the director of
the computer facility, oversees the operation.

It is now the beginning of the Lent Term and Beryl is confronted with the problem
of assigning different working hours to her operators. Because all the operators
are currently enrolled in the university, they are available to work only a limited
number of hours each day, as shown in the following table.

Maximum Hours of Availability


Operators Wage Rate (per hour) Mon. Tue. Wed. Thu. Fri.
K. C. £25 6 0 6 0 6
D. H. £26 0 6 0 6 0
H. B. £24 4 8 4 0 4
S. C. £23 5 5 5 0 5
K. S. £28 3 0 3 8 0
N. K. £30 0 0 0 6 2

There are six available operators, namely four undergraduate students and two
graduate students. They all have different wage rates because of differences in
their experience with computers and programming. The above table outlines their
wage rates, along with the maximum number of hours that each operator can work
each day.

Each operator is guaranteed a certain minimum number of hours per week that
will maintain an adequate knowledge of the operation. This level is set arbitrarily
at 8 hours per week for the undergraduate students (K. C., D. H., H. B., and S. C.)
and 7 hours per week for the graduate students (K. S. and N. K.).

The computer facility is to be open for operation from 8am to 10pm Monday
through Friday with exactly one operator on duty during these hours. On Satur-
days and Sundays, the computer is to be operated by other staff.

Because of a tight budget, Beryl has to minimise costs. She wishes to determine
the number of hours she should assign to each operator on each day. Formulate a
2.8. Exercises for Self-Study 25

linear programming model for this problem and solve.

8. The shaded area in the following graph represents the feasible region of a linear
programming problem whose objective function is to be maximised.

Label each of the following statements as True or False, justifying your answer
based on a graphical method. In each scenario, give an example of an objective
function that illustrates your answer.

a) If (3, 3) produces a larger value of the objective function than (0, 2) and
(6, 3), then (3, 3) must be an optimal solution.
b) If (3, 3) is an optimal solution and multiple optimal solutions exist, then
either (0, 2) or (6, 3) must also be an optimal solution.
c) The point (0, 0) cannot be an optimal solution.

9. The Metalco Company desires to blend a new alloy of 40 percent tin, 35 percent
zinc and 25 percent lead from several available alloys. The properties of the avail-
able alloys are outlined in the following table.

Alloy
Property 1 2 3 4 5
Percentage of Tin 60 25 45 20 50
Percentage of Zinc 10 15 45 50 40
Percentage of Lead 30 60 10 30 10
Cost (£/kg) 77 70 88 84 94

The objective is to determine the proportions of these alloys that should be blended
to produce the new alloy at a minimum cost. Formulate a linear programming
model for this problem and solve.
27

Chapter 3

Integer and Mixed Integer


Programming Applications

Let us begin this chapter with a simple motivating example. This is of an optimisation
problem that is known as the “healthy” diet problem. Suppose that Caleb is a mathe-
matics student who does not particularly enjoy going to the gym. Despite this, Caleb
still wants to live a somewhat healthy lifestyle and decides to compose a diet that meets
the daily reference intake of vitamins with the minimal amount of calories. Unfortu-
nately for Caleb, they can only eat pizzas and burritos because of where they live. For
this purpose, the (fictional) nutritional values of a slice of pizza and a burrito are shown
below, in Table 3.1.

A C D Calories
Pizza 225 120 200 600
Burrito 600 100 75 300
Intake 1800 550 600
Table 3.1: The (fictional) nutritional values of a slice of pizza and a burrito, as well as
the required daily intake of vitamins A, C and D.

Note that in this case a solution to this optimisation problem yields a diet of some
combination of slices of pizzas and burritos. The variables of the diet are the number
of slices of pizza, x p 0, and the number of burritos, x b 0. Since Caleb has a
preference for lower calories meals, the objective function will be the minimisation of
calories consumed, namely

Observe that if one simply requires x p , x b 0, the optimal solution for the above ob-
jective function would be to simply eat nothing at all. However, to achieve the daily
28 Chapter 3. Integer and Mixed Integer Programming Applications

reference intake, we have to respect the constraints on the amount of vitamins, namely

These constraints will be our resource constraints. Following the techniques outlined
in the previous chapter, one suspects that the optimisation problem can be modelled by
formulating the LP

Using AMPL, we find that this LP has optimal solution (x p , x b ) T = (75/44, 38/11) T with
optimal value 22650/11 ⇡ 2059.09. This tells us that Caleb’s optimal diet consists of
eating approximately 1.70455 slices of pizza and 3.45455 burritos each day, for a total
of around 2059.0909 calories.
Despite the solution (x p , x b ) T = (75/44, 38/11) T being optimal, it is not particularly
practical since expecting Caleb to eat 1.70455 slices of pizza and 3.45455 burritos each
day is rather difficult. In light of this, it would be useful if we could find a solution
with integer entries since, in that case, Caleb would have a food plan that requires
consuming a combination of a specific number of slices of pizza or whole burritos. It
should be emphasised that in such case we are interested in solving the above LP with
the additional requirement that the variables take integer values.
Perhaps at this point one may naturally think that simply rounding the above optimal
solution, where we round each entry to the nearest integer, would be sufficient. If we
do this, we yield the rounded solution (x p , x b ) T = (2, 3) T , which corresponds to eating
two slices of pizzas and three burritos. Upon substituting this rounded solution into the
above resource constraints, we notice that the second constraint becomes

120 · 2 + 100 · 3 = 540 6 550,

which tells us that this rounded solution is not feasible, i.e. that eating this combination
does not meet the daily intake requirements for vitamin C. Further, it is not particularly
clear how we should proceed without either relying on a graphical method (which would
not be of use with more variables) or to instead solve this optimisation problem via some
brute-force approach.
3.1. Integer and Mixed Integer Programming 29

3.1 Integer and Mixed Integer Programming


In problems such as the “healthy” diet problem, it turns out that we often need to deal
with integral and inseparable quantities. Further, the motivating example outlined illus-
trates that it is not enough in general to simply round fractional solutions to an integer
solution (but rounding each of the entries to the nearest integer) as such a point need not
be feasible. Discrete optimisation is the mathematical field that deals with optimisation
problems where the variables must be integer.
An integer linear programming problem is a linear programming problem in which
some variables are required to take integer values. We can therefore write any integer
programming problem, namely as the integer program (IP)

where c = (c1 , c2 , . . . , cn ) T 2 Rn is an n-dimensional column vector of the real coef-


ficients of the objective function, x = (x 1 , x 2 , . . . , x n ) T is column vector of the deci-
sion variables, A 2 Rm⇥n is a real matrix with m rows and n columns, b 2 Rm is
an m-dimensional column vector of the real right-hand sides of the constraints, 0 =
(0, 0, . . . , 0) T denotes the n-dimensional zero vector and I ✓ {1, 2, . . . , n} is an index set
of the integer variables. The other variables, namely the variables x j where j 62 I, are
referred to as the continuous variables. Further, with nonnegativity, we yield the IP

maximise cT x
subject to Ax  b
(3.1)
x 0
x i 2 Z, i 2 I.

Note that if the problem has both integer and continuous variables, namely if I 6= ;
and I 6= {1, 2, . . . , n}, then the problem is called a mixed-integer linear programming
problem. In that case, we call the above integer program a mixed-integer program
(MIP). If instead all variables are integer, i.e. if I = {1, 2, . . . , n}, then the problem is
called a (pure) integer linear programming problem. If all variables are binary, namely
if x i 2 {0, 1} for all i 2 {1, 2, . . . , n}, then the problem is called a binary linear program-
ming problem. In that scenario, we call the above integer program a binary integer
program (BIP).
The set
30 Chapter 3. Integer and Mixed Integer Programming Applications

is the feasible region of the IP (3.1). Figure 3.1 illustrates the feasible regions for some
mixed-integer and pure integer linear programming problems.

Figure 3.1: The feasible regions associated with some mixed-integer and a pure integer
linear programming problems.

There is a natural linear programming problem associated with (3.1), namely the
linear programming problem

maximise cT x
subject to Ax  b (3.2)
x 0.

The LP (3.2) is called the linear relaxation of the IP (3.1). Let I P(c, A, b) and L P(c, A, b)
denote the optimal values of the IP (3.1) and the LP (3.2), respectively. It is worth
noting that I P(c, A, b) and L P(c, A, b) denote the maximum objective function values
associated the IP (3.1) and the LP (3.2), respectively. An easy but surprisingly useful
fact is that
(3.3)

Indeed, if x ⇤ is an optimal solution to (3.2) and z ⇤ is an optimal solution to (3.1), then


z ⇤ is a feasible solution to the LP (3.2) and hence

holds. Informally, because more restrictions are imposed on the variables in the IP (3.1)
compared to LP (3.2), we would not expect the objective value of (3.1) to be greater
than the objective value of the relaxed problem (3.2).

Remark. Observe that the we define integer linear programming problems as “maximi-
sation” problems. Integer linear programming problems can be defined equivalently as
minimisation problems. If we instead consider a minimisation IP, then the above relation
(3.3) is reversed, namely I P(c, A, b) L P(c, A, b) .
3.1. Integer and Mixed Integer Programming 31

Furthermore, in light of (3.3), observe that if we find a feasible integer solution x̄


for the IP (3.1) such that
c T x̄ = L P(c, A, b) ,

we can conclude that x̄ is indeed an optimal solution for the IP (3.1) and that the
equality L P(c, A, b) = I P(c, A, b) holds.
If L P(c, A, b) = I P(c, A, b) holds, then there exists an optimal solution to the IP which
is also optimal for the corresponding LP relaxation. Despite this, in general I P(c, A, b)
may be different from L P(c, A, b) and, in practice, “it almost always is”. Note that similar
deductions can be made in the setting of MIPs rather than IPs.
A natural question to ask is since these two values do not in general coincide, how
“challenging” is it to solve each problem? It turns out that an LP can be solved efficiently
in both theory and practice. Being a little more precise, an LP in practice can be solved
efficiently in “most cases” by making use of the celebrated simplex method. The first
polynomial time algorithm was the ellipsoid method, demonstrated by Khachiyan [17]
in 1979, however, the later polynomial time interior-point method of Karmarkar [16]
was arguably of greater theoretical and practical importance. Note that problems that
can be solved in polynomial time are thought of as “easy” or “tractable” since the running
time of the algorithm is upper bounded by a polynomial expression in the size of the
input for the algorithm.
Despite the seeming similarity between (M)IPs and LPs, it turns out solving such
problems cannot in theory be solved efficiently, however, some instances of the problem
may be solvable depending on the formulation. Formally, integer and mixed-integer
programming is N P -hard, a mathematical concept of hardness in computational com-
plexity theory, which informally means that one should not expect to solve a random
instance of the problem in polynomial time unless P = N P (see e.g. [13]). Despite
this, in practice solvers make use of two main approaches, namely branch and bound
and cutting planes, to solve (M)IPs. It should be emphasised that unlike LP solvers,
(M)IP solvers cannot guarantee fast solution times in all instances and the running time
it is heavily dependent on the underlying formulation. The branch and bound algorithm
will be discussed in the next section.
It important to keep in mind the trade-off between the vastly increased expressive
power of IP models and the increase in difficulty in solving IPs compared to LPs. Prac-
tically speaking, state-of-the arts solvers can solve LPs with hundreds of thousands of
variables. For IPs, the time required to compute a solution is heavily dependent on the
specific instance. For example, there may be IPs instances that can be solved within
minutes, however, there exist “tiny” instances with a few hundred variables that are out
32 Chapter 3. Integer and Mixed Integer Programming Applications

of reach of any state-of-the-art solver.


Before we proceed with IP modelling, we provide the following simple example
meant to illustrate how the same problem admits infinitely many formulations, how
some formulations are better than others, and how simple ideas such as solving the LP
relaxation and rounding the solution to the nearest integer do not work, in general.

Example. (Unsuccessful rounding) Consider the following IP

maximise x2
subject to 2k x 1 x2 2k
2k x 1 + x 2  4k
x1, x2 0
x 1 , x 2 2 Z,

where k is any positive number.


Note that the only two feasible points are (1, 0) and (2, 0) as the above constraints
imply that 1  x 1  2 . Further, if x 1 = 1 or x 2 = 2, then we have x 2 = 0. This implies
that the optimal value of the above problem is I P(c, A, b) = 0 . In contrast, the feasible
region of the LP is a triangle with vertices (1, 0) , (2, 0) and (1.5, k) , see Figure 3.2.

Figure 3.2: An integer programming problem with large (additive) integrality gap. We
set k = 3 in this figure.
3.2. The Branch and Bound Algorithm 33

Note that each choice of k provides an integer programming formulation which yields
the same feasible region, namely {(1, 0), (2, 0)} . The (additive) integrality gap

L P(c, A, b) I P(c, A, b) = k 0=k

between the IP optimum I P(c, A, b) = 0 and the LP relaxation L P(c, A, b) = k gets arbi-
trarily large for larger choices of k. Further, notice that rounding the optimal solution of
the LP relaxation, (1.5, k) to a “nearby” integer point, say (1, k) or (2, k) if we set k to be
an integer, does not yield a feasible solution and in particular the rounded solution can be
made arbitrarily far from those feasible solutions to the above IP.

3.2 The Branch and Bound Algorithm


A lot of research has been and is being applied to general purpose methods to solve IPs.
The most successful approach to date is the Branch and Bound algorithm, developed
in 1960 by Ailsa Land and Alison Doig, at the LSE Operational Research Department
(the predecessor of the Operations Research Group). All the commercial mathematical
programming systems offer an integer programming solution procedure based on some
variation of this celebrated method.
Branch and Bound can be thought of as a systematic exploration of the feasible
region and an elimination of those parts which can be shown either not to contain an
integer solution or not to contain the optimal solution. The Branch and Bound algorithm
is a divide-and-conquer approach: The feasible region is partitioned into a collection of
smaller regions. Each region is itself represented by linear constraints and the objective
function can be optimised over these smaller regions. The parts of the feasible region in
the partition can themselves be further partitioned into even smaller regions. For this
reason, the algorithm is best represented by a tree graph, as we see shortly. In practice,
the partitioning of the feasible region is performed sequentially.
In order to illustrate the branch and bound algorithm, consider the IP

maximise 2x 1 + 5x 2
subject to 12x 1 + 5x 2  60
2x 1 + 10x 2  35
x1, x2 0
x 1 , x 2 2 Z.

The feasible region is illustrated in Figure 3.3, where the dots are the feasible integer
points. The linear program optimum is at the point x 1 = 3.864, x 2 = 2.727 and the
34 Chapter 3. Integer and Mixed Integer Programming Applications

objective function 2x 1 +5x 2 takes value 21.363. Graphically, we can see that the possible
candidate optimum points are (2, 3) with objective value 19, and (4, 2) with objective
value 18. Hence, the optimal solution is x 1 = 2 and x 2 = 3.

0 2 4

Figure 3.3: The feasible regions associated with the above IP.

The branch and bound algorithm starts by solving the LP relaxation. This corre-
sponds to the root node of the branch and bound tree. If the solution satisfies the inte-
grality constraints, we stop, we have the optimal integer solution. Otherwise, we branch
by selecting a variable whose value is noninteger and create two new subproblems. The
constraints of the new subproblems are chosen so that the current noninteger solution
is infeasible in both subproblems.
In the example, the solution at the root node is x 1 = 3.864 and x 2 = 2.727 with
objective value 21.363. In light of the relation (3.3), it follows that this objective value
is an upper bound on the value of the optimal integer solution x ⇤ , i.e. that

where c = (2, 5) T . Further, since the objective coefficients of the IP are integer, we know
that the optimal value is integer and thus we can round down 21.363 to 21 to yield the
stronger upper bound
c T x ⇤  21.

This is called the bounding step.


We choose to branch on x 1 as x 1 is fractional (i.e. noninteger). Note that in this
scenario we could have equivalently selected to branch on x 2 . This branching is done by
creating two subproblems, where the first problem features the extra constraint x 1  3
and the other features x 1 4, as illustrated in Figure 3.4(a).
3.2. The Branch and Bound Algorithm 35

x1 = 3.864
x2 = 2.727
obj= 21.363
2

x1 ≤ 3 x1 ≥ 4

0 2 4

(a) The root note and initial branches of the (b) The geometric interpretation of the first
branch and bound tree. branching step.

Figure 3.4: The branch and bound tree and geometric interpretation of the first steps.

The two subproblems are illustrated in Figure 3.4(b). This step in the algorithm is
called the branching step.

1. We choose to solve the left-hand side problem first, namely the relaxed LP with the
added constraint x 1  3. This gives a solution x 1 = 3 and x 2 = 2.9 with objective
value 20.5. Notice that because an additional constraint, namely x 1  3, has been
added to the original IP, the value of the objective function either stays the same
or decreases. This gives a tighter upper bound of 20 for the value of the optimal
integer solution of the current subproblem, after rounding down from 20.5. It
should be emphasised that the upper bound of 20 does not apply to the original
problem.

As x 2 is fractional, two further subproblems with the constraints x 2  2 and x 2 3


are created. See Figure 3.5, there are now three unsolved problems.

a) If we now solve the leftmost problem, namely x 1  3 and x 2  2, this gives


the integer solution x 1 = 3 and x 2 = 2 with object value 16. We call this
solution an incumbent solution, which means that it is the best known integer
solution so far. This integer solutions provides a lower bound on the original
problem, namely
16  c T x ⇤  21.

In other words, this tells us that the optimal IP value is an integer value be-
tween 16 and 21. Note that whereas the LP relaxation gives an upper bound,
36 Chapter 3. Integer and Mixed Integer Programming Applications

obj= 21.363

x1 ≤ 3 x1 ≥ 4

x1 = 3
x2 = 2.9 2
obj= 20.5

x2 ≤ 2 x2 ≥ 3

0 2 4

(a) The root note and further branches of (b) The geometric interpretation of the sec-
the branch and bound tree. ond branching step.

Figure 3.5: The branch and bound tree and geometric interpretation of the next steps.

finding feasible integer solutions yields lower bounds. For this subproblem
we have found the best integer solution, so we do not branch any further.
This branch or subproblem, namely x 1  3 and x 2  2, is said to be fathomed
or conquered by integrality.
b) Now, the algorithm backtracks and selects one of the unsolved subproblems.
Choosing the most recently formed unsolved subproblem, namely x 1  3 and
x2 3 and solving it gives the solution x 1 = 2.5 and x 2 = 3 with objective
value 20. Since x 1 is fractional, two subproblems are created, namely x 1  2
and x 1 3. See Figure 3.6.
i. Moving to the subproblem with constraints x 1  3, x 2 3 and x 1  2
is equivalent to solving the subproblem with constraints x 1  2 and
x2 3. This produces the solution x 1 = 2 x 2 = 3.1 with objective
value 19.5. This gives an upper-bound of 19 on the value of the optimal
integer solution of the current subproblem. Since x 2 is fractional, from
here we branch with respect to x 2 by adding constraints, namely the
constraints x 2  3 and x 2 4.
A. Solving the relaxation of the left subproblem with x 2  3 gives an
integer solution x 1 = 2 and x 2 = 3 with objective value 19, which is
higher than the value of the current incumbent solution. As a result
x 1 = 2 and x 2 = 3 becomes the incumbent solution, and we say that
this subproblem is fathomed by incumbent solution. Thus, we do not
branch on this subproblem. The integer solution here provides a
3.2. The Branch and Bound Algorithm 37

lower bound on the original problem, namely

19  c T x ⇤  21.

B. Now, if we solve the relaxation of the right subproblem with the


constraint x 2 4, we yield an infeasible problem. Thus, we will not
be able to find any more solutions by branching further, so we do
not branch on this subproblem. We say that this branch is fathomed
by infeasibility.
ii. Continuing, we solve the relaxation of subproblem x 1  3, x 2 3 and
x1 3, which is equivalent to adding the constraints x 1 = 3 and x 2 3
and we find that the problem is infeasible. Thus, we ignore this sub-
problem and do not branch on it.

We have now considered all options that branch from the subproblem x 1  3. The
resulting subtree is shown in Figure 3.6. The incumbent solution at this stage is
(2, 3) with objective value 19 and the bounds are

19  c T x ⇤  21.

This means that if no optimal integer solution satisfies x 1 4, then (2, 3) is an


optimal integer solution.

2. At this point in our process, the entire branch of the tree where x 1  3 is com-
pletely fathomed. So we backtrack to the very beginning and solve the subproblem
with the single extra constraint x 1 4. Solving the subproblem with x 1 4 yields
the solution x 1 = 4 and x 2 = 2.4 with objective function value 20. Since x 2 is frac-
tional, we add the constraints x 2  2 and x 2 3, respectively. Continuing in the
same fashion as before and subsequently branching gives the tree on Figure 3.7.

a) At the subproblem with x 1 4 and x 2  2, the LP optimum value is 18.3.


As extra constraints are added to form the subproblems, the value of the
objective function cannot increase so it either stays the same or decreases.
Consequently, in all the deeper subproblems on this branch, we get solutions
with objective value less than or equal to 18.3. However, we have a better
incumbent solution of value 19. Consequently, there is no point in continuing
to branch. We say that the subproblem x 1 4 and x 2  2 is fathomed by
bound.
b) The subproblem x 1 4 and x 2 3 is infeasible. Thus, this branch is fath-
omed by infeasibility.
38 Chapter 3. Integer and Mixed Integer Programming Applications

obj= 21.363

x1 ≤ 3 x1 ≥ 4

obj= 20.5

x2 ≤ 2 x2 ≥ 3

obj= 20
Fathomed by
integrality
x1 ≤ 2 x1 ≥ 3
x1 = 3
x2 = 2
obj= 16
x1 = 2 Infeasible
x2 = 3.1
obj=19.5

x2 ≤ 3 x2 ≥ 4

Infeasible
Fathomed,
incumbent
solution
x1 = 2
x2 = 3
obj= 19

Figure 3.6: The first branch and bound subtree.

Thus, no better solution has come out of branch x 1 4.

The incumbent, and thus optimal solution to the problem at the end of the entire branch
and bound search is x 1 = 2 and x 2 = 3 with an objective function value of 19.

Node Termination

In the Branch and Bound Algorithm, there are three ways that a subproblem can be
so-called fathomed, which means conquered or dismissed from consideration:

• Fathomed by integer solution: When the solution of the LP relaxation of the sub-
problem is integer, then there is no need to branch further since we have solved
3.2. The Branch and Bound Algorithm 39

Figure 3.7: The second branch and bound subtree.

optimally the IP subproblem. If this integer solution is better than the current
incumbent solution, then it becomes the new incumbent and we say that it is
fathomed by the incumbent solution.

• Fathomed by infeasibility: When the LP relaxation of the subproblem is infeasible,


then we know that the IP subproblem is also infeasible and thus we do not branch
any further.

• Fathomed by bound: When the solution of the LP relaxation of the IP subproblem


has objective value (or rounded objective value, if objective coefficients are inte-
ger) less than the objective value of the incumbent solution, then we know that
if we branch further we will only get worse integer solutions and thus we do not
branch any further.

The branch and bound algorithm for (pure) IPs can be summarised as:

• Initialise: Apply the bounding step, fathoming step and optimality test to the
whole problem. If not fathomed, then classify this problem as one remaining
subproblem and perform the iteration steps below:

1. Branching step: Amongst the remaining unfathomed problems select one.


Branch on one of the variables that did not have integer value in the linear
programming relaxation.
40 Chapter 3. Integer and Mixed Integer Programming Applications

2. Bounding step: For each new subproblem, apply the Simplex Method to its
linear programming relaxation to obtain an optimal solution and an objective
value. If the objective coefficients are integer, then round down this value
(round up if minimising). This rounded objective value is an upper bound
on the objective value of the IP subproblem (lower bound if minimising).
3. Fathoming step: For each subproblem, apply the three fathoming tests sum-
marised above and discard all the subproblems that are fathomed by any of
the tests.

• Optimality test: Stop when there is no remaining subproblem. The current in-
cumbent solution is optimal.

Remarks.

1. We did not specify above how to pick the next subproblem. We can pick the one
with the highest bound (lowest if minimising) because this subproblem would be
the most promising one to contain an optimal solution to the whole problem. Or
we could pick the one that was created most recently (this is what we did above),
so the solver could use re-optimisation techniques to solve it faster.

2. If the objective coefficients are not integer, then we should not round the bound
in the bounding step.

3. If the variables were binary integer variables, then for branching variable say x 1 ,
the branches would simply be equalities x 1 = 1 and x 1 = 0.

4. To solve the problem in the previous section, which had just two variables, we
solved 11 linear programs. In general, the number of steps “blows up” exponen-
tially: If there are k binary variables, the number of subproblems can be as large
as 2k . This is essentially why solving an IP requires substantially more work than
solving a LP. For this reason, great care should be taken in setting up an IP and,
in particular, one should always check whether there is a way to formulate the
problem in a more economical way with fewer integer variables.

5. We described the Branch and Bound Algorithm for pure integer programs. How-
ever, we can also apply the algorithm, though with some minor changes, to mixed-
integer programs, which contain both integer and continuous variables. The mi-
nor changes are listed as follows:

a) Branching step: We only branch on variables required to be integer.


3.3. Integer Programming Examples 41

b) Bounding step: We do not round the bound of the linear programming relax-
ation since the objective value of the mixed-integer program is most likely
fractional.

c) Fathoming step: A solution is considered incumbent if it takes integer values


for the variables required to be integer.

3.3 Integer Programming Examples

The following example demonstrates how we can formulate simple mathematical opti-
misation problems using integer linear programming.

Example. (Computer sales) A shop needs to put together two types of computer systems
to sell. They are identical except that one contains one monitor and 3 hard drives and the
other has 2 monitors and 1 hard drive. The profit for the two systems is the same, £300.
The shop has 70 monitors and 63 hard drives available to put into the systems. How many
of each computer should it make?

3.4 The Knapsack Problem

We now consider a classical problem known as the knapsack problem. It should be noted
that the word knapsack was the usual name for a rucksack or backpack until around the
middle of the 20th century.
Suppose we are given a knapsack which can carry a maximum weight b and that
there are n types of items that we could take. Suppose further that an item of type
i 2 {1, 2, . . . , n} has weight ai > 0 and value ci 2 R. The knapsack problem is to load
the knapsack with items (possibly several items of the same type) without exceeding
the knapsack capacity b to maximise the total value of the knapsack. In order to model
this, let variable x i represent the number of items of type i to be loaded.
42 Chapter 3. Integer and Mixed Integer Programming Applications

Then the knapsack problem can be modelled as

An important variant of this problem is known as the binary knapsack problem, when
only one unit of each item type can be selected. In this case we use binary variables
instead of general integers. The binary knapsack set can be formulated as
X
n
maximise ci x i
i=1
X
n
subject to ai x i  b
i=1

x 0
x 2 {0, 1}n ,

where x 2 {0, 1}n denotes that x is an n-dimensional column vector whose entries are
either 0 or 1.

Example. (Project management) The project manager of a company has five projects that
they would like to undertake. It is sadly not possible for the company to undertake all five
projects due to budgetary limitations. In particular, the available budget is £85,000. Each
project has some positive value to the company and requires certain investment. The value
and costs are presented in the following table.

Project 1 Project 2 Project 3 Project 4 Project 5


Value (£1000s) 43 21 12 25 50
Cost (£1000s) 30 17 8 21 37

Which of the projects should be undertaken in order to maximise the total value of these
projects subject to the aforementioned budgetary constraint?
3.5. The Set Covering Problem 43

3.5 The Set Covering Problem


Set covering is a classical model in integer programming which has many applications
and has been extensively studied. As a concrete example, consider the problem of se-
lecting where to place ambulances in a city while they wait for a call. Suppose that there
are m neighbourhoods to be served by the ambulances and n candidate locations among
which we could place an ambulance. The time that it would take for an ambulance
waiting in location j for j 2 {1, 2, . . . , n} to reach neighbourhood i for i 2 {1, 2, . . . , m} is
known. We need to decide in which locations to place an ambulance so that every neigh-
bourhood has at least an ambulance within a 15 minute travel time. The objective is to
minimise the total number of ambulances needed (which is equivalent to selecting the
locations as we will need to place one ambulance in each selected location). It should be
noted that we are “covering” all neighbourhoods by ensuring that each neighbourhood
has access to an ambulance within a 15 minute travel time.
This can be thought of more abstractly. In the set covering problem we are given a
set of m elements {1, 2, . . . , m} and n subsets S1 , S2 , . . . , Sn ✓ {1, 2, . . . , m} (in the above
example, S j denotes the subset of the neighbourhoods {1, 2, . . . , m} that can be reached
by location j in at most 15 minutes) and we need to select the minimum number of
sets that cover the whole set {1, 2, . . . , m}, meaning that every i 2 {1, 2, . . . , m} must be
contained in at least one of the selected sets (where in the example, we are seeking to
select the minimum number of locations such that every neighbourhood i is in at least
one of the sets S j corresponding to a selected location). In applications, rather than
trying to minimise the total number of sets selected, it may be the case that we have
some cost c j associated with each set S j and that we want to minimise the total cost of
our selection.
This problem can be modelled as a pure binary problem as follows. We have binary
variables x j , where j = 1, 2, . . . , n, whose intended meaning is that x j = 1 if and only if
set S j is selected (where in the above example we have x j = 1 if and only if location j
44 Chapter 3. Integer and Mixed Integer Programming Applications

is selected). The objective function is

For the constraints, we define the following m ⇥ n matrix whose entries take value
either 0 or 1. For each i 2 {1, 2, . . . , m} and each j 2 {1, 2, . . . , n} we set
8
<1, if i 2 S ,
j
ai j =
:0, otherwise.

The coverage constraints can then be expressed as

It should be noted that when x j 2 {0, 1} for all j, the above coverage constraints require
that each neighbourhood i 2 {1, 2, . . . , m} is covered by at least one of the n subsets
S j ✓ {1, 2, . . . , m}.
In matrix form, this can be expressed by

minimise cT x
subject to Ax 1
x 2 {0, 1}n ,

where A = (ai j ) 2 Zm⇥n is the m ⇥ n matrix defined above with ai j denoting the entry
in the i-th row and j-th column, 1 denotes the m-dimensional vector of all ones and
c = (c1 , c2 , . . . , cn ) T is the n-dimensional vector with entries c j for j 2 {1, 2, . . . , n}.

The following illustrates a real-world application of the set covering problem that is
known as the crew scheduling problem.

Example. (Airline crew allocation) Airlines routinely solve massive set covering problems
in order to allocate crews to aircrafts. This is known as the crew scheduling problem.
An airline wants to operate its daily flight schedule using the smallest number of crews
to make use of the available resources efficiently. A crew is on duty for a certain number
of consecutive hours and may therefore operate several flights. A crew assignment is a
sequence of flights that may be operated by the same crew within its duty time. For instance,
some crew assignment may consist of the 8:30-10:00am flight from Pittsburgh to Chicago,
then the 11:30am-1:30pm Chicago-Atlanta flight and finally the 2:45-4:30pm Atlanta-
Pittsburgh flight. The problem is to find the smallest number of crew assignments such that
every flight is covered by at least one of the selected crew assignment.
3.6. Exercises for Self-Study 45

This is a set covering problem, where n is the number of crew assignments, m is the
number of flights to be operated and, for each crew assignment j, S j denotes the set of
flights that are included in crew assignment j. Since we want to minimise the total number
of crews needed, the cost of each set S j is 1.

It should be noted that in an optimal solution to such a problem a flight may be


covered by more than one crew. For example, it may be the case that one crew operates
the flight while the other crew occupies passenger seats. The number of columns, i.e the
number of possible crew schedules, is typically enormous. Due to the dimension of the
problem, solving such a problem via “standard approaches” would likely be inefficient
and as such a technique called column generation is often applied.

3.6 Exercises for Self-Study


1. Consider the IP
maximise 3x 1 + 5x 2
subject to 5x 1 7x 2 3
x1, x2  3
x1, x2 0
x 1 , x 2 2 Z.

a) Solve this problem graphically.


b) Use the branch and bound algorithm to solve this problem, where for each
subproblem you should solve its linear programming relaxation using the
solver via AMPL.

2. Use the branch and bound algorithm to solve the BIP

maximise 2x 1 x 2 + 5x 3 3x 4 + 4x 5
subject to 3x 1 2x 2 + 7x 3 5x 4 + 4x 5  6
x1 x 2 + 2x 3 4x 4 + 2x 5  0
x 1 , x 2 , . . . , x 5 2 Z,

where for each subproblem you should solve its linear programming relaxation
using the solver via AMPL.

3. Use the solver via AMPL in order to solve the IPs appearing in the previous two
exercises. Note that your solutions should coincide with the solutions previously
found by applying the branch and bound algorithm.
46 Chapter 3. Integer and Mixed Integer Programming Applications

4. Consider the following statements about any pure integer programming problem
in maximisation form and its LP relaxation. Label each of the following statements
as True or False, justifying your answer.

a) The feasible region for the LP relaxation is a subset of the feasible region for
the IP.

b) If an optimal solution for the LP relaxation is an integer solution, then the


optimal value of the objective function is the same for both problems.

c) If a noninteger solution is feasible for the LP relaxation, then the nearest


integer solution (rounding each variable to the nearest integer) is a feasible
solution for the IP problem.

5. Eve and Steven are a young couple and want to divide their main household
weekly chores between them such that each has two allocated tasks but the total
time they spend on household duties is kept to a minimum. The main household
chores are cleaning, cooking, dishwashing and laundry. Their efficiencies on these
tasks differ, where the time each would need to perform the task is outlined in the
following table.

Time Needed per Week


Cleaning Cooking Dishwashing Laundry
Eve 4.5 hours 7.8 hours 3.6 hours 2.9 hours
Steven 4.9 hours 7.2 hours 4.3 hours 3.1 hours

Formulate a BIP model for this problem and solve.

6. Vincent Cardoza is the owner and manager of a machine shop that does custom
order work. This Wednesday afternoon, they received calls from two customers
who would like to place rush orders. One is a trailer hitch company which would
like some custom-made heavy-duty tow bars. The other is a mini-car-carrier com-
pany which needs some customized stabilizer bars. Both customers would like as
many as possible by the end of the week (two working days). Since both products
would require the use of the same two machines, Vincent needs to decide and in-
form the customers this afternoon about how many of each product he will agree
to make over the next two days.

Each tow bar requires 3.2 hours on machine 1 and 2 hours on machine 2. Each
stabilizer bar requires 2.4 hours on machine 1 and 3 hours on machine 2. Machine
1 will be available for 16 hours over the next two days and machine 2 will be
3.6. Exercises for Self-Study 47

available for 15 hours. The profit for each tow bar produced would be $130 and
the profit for each stabilizer bar produced would be $150.

Vincent now wants to determine the mix of these production quantities that will
maximize the total profit. Formulate an integer programming model for this prob-
lem and solve.

7. An American real estate development firm, Peterson and Johnson, is considering


five possible development projects. The following table shows the estimated long-
run profit (net present value) that each project would generate, as well as the
amount of investment required to undertake the project, in units of millions of
dollars.

Development Project
1 2 3 4 5
Estimated Profit 1 1.8 1.6 0.8 1.4
Capital Required 6 12 10 4 8

The owners of the firm, Dave Peterson and Ron Johnson, have raised $20 million
of investment capital for these projects. Dave and Ron now want to select the
combination of projects that will maximize their total estimated long-run profit
(net present value) without investing more that $20 million. Formulate an integer
programming model for this problem and solve.

8. Northeastern Airlines is considering the purchase of new long-, medium-, and


short-range jet passenger airplanes. The purchase price would be $67 million for
each long-range plane, $50 million for each medium-range plane, and $35 million
for each short-range plane. The board of directors has authorized a maximum
commitment of $1.5 billion for these purchases. Regardless of which airplanes
are purchased, air travel of all distances is expected to be sufficiently large that
these planes would be utilized at essentially maximum capacity. It is estimated
that the net annual profit (after capital recovery costs are subtracted) would be
$4.2 million per long-range plane, $3 million per medium-range plane, and $2.3
million per short-range plane.

It is predicted that enough trained pilots will be available to the company to crew
30 new airplanes. If only short-range planes were purchased, the maintenance
facilities would be able to handle 40 new planes. However, each medium-range
4
plane is equivalent to 3 short-range planes, while, each long-range plane is equiv-
5
alent to 3 short-range planes in terms of their use of the maintenance facilities.
48 Chapter 3. Integer and Mixed Integer Programming Applications

The information given here was obtained by a preliminary analysis of the prob-
lem. A more detailed analysis will be conducted subsequently. However, using the
preceding data as a first approximation, management wishes to know how many
planes of each type should be purchased to maximize profit. Formulate an integer
programming model for this problem and solve.

9. GreenPower are a renewable energy developer who have been tasked with select-
ing the best five out of ten possible sites for the construction of new wind warms
in the UK. The sites and expected profits associated with each site are s1 , s2 , . . . , s10
and p1 , p2 , . . . , p10 , respectively. Requirements from UK planning permission en-
force that if site s2 is selected, then site s3 must also be selected.
Development restrictions enforce that selecting sites s1 and s7 prevents the se-
lection of s8 . Further, these restrictions also enforce that selecting sites s3 or s4
prevents the selection of s5 . Formulate an integer program that could determine
the best selection scheme.

10. There are six cities (labelled cities 1-6) in Kilroy County. The county must deter-
mine where to build fire stations. The county wants to build the minimum number
of fire stations needed to ensure that at least one fire station is within a 15 minute
drive of each city. The times in minutes required to drive between the cities in
Kilroy County are shown in following table.

To
From City 1 City 2 City 3 City 4 City 5 City 6
City 1 0 10 20 30 30 20
City 2 10 0 25 35 20 10
City 3 20 25 0 15 30 20
City 4 30 35 15 0 15 25
City 5 30 20 30 15 0 14
City 6 20 10 20 25 14 0

Formulate and solve an IP that will tell Kilroy how many fire stations should be
built and where they should be located.

11. StockCo is considering four investments. Investment 1 will yield a net present
value (NPV) of $16, 000, investment 2 yields an NPV of $22, 000, investment 3
yields an NPV of $12, 000 and investment 4 yields an NPV of $8000. Each in-
vestment requires a certain cash outflow at the present time, namely $5000 for
investment 1, $7000 for investment 2, $4000 for investment 3 and $3000 for in-
vestment 4. There is presently $14, 000 is available for investment. Formulate
3.6. Exercises for Self-Study 49

an IP whose solution will tell StockCo how to maximize the NPV obtained from
investments 1 4.

12. Modify the StockCo formulation from the previous exercise to account for each of
the following requirements:

a) StockCo can invest in at most two investments,

b) If StockCo invests in investment 2, they must also invest in investment 1, and

c) If StockCo invests in investment 2, they cannot invest in investment 4.


51

Chapter 4

Modelling Tricks

It turns out that we can make use of IPs, MIPs or BIPs when the objective function or the
constraints do not appear to be linear at first sight. In this section we outline how we
can apply different modelling tricks such that we can use IPs, MIPs or BIPs to a broader
set of problems.

4.1 Fixed Costs and the Big-M Method

Suppose a company which manufactures a number of products has asked us to model


their optimisation problem. Their objective is to minimise the cost of production. We
have accomplished this via an LP, where our LP includes a decision variable z 2 R which
expresses the amount of a certain product to be manufactured. This variable appears in
the objective function as c · z as we were told that each unit of this product costs c.
Upon showing the production managers our LP, they point out an oversight. This
oversight is that this product incurs some fixed cost. If any of this product is to be
produced, then the company incurs a one-time cost of f > 0 as they would have to turn
on a new machine. In other words, the production costs associated with this specific
product are

While this cost function may seem linear, it actually is not linear. Observe that the
function f + cz evaluated at z = 0 is f and not 0. This raises the question as to how
should we express the production cost of this product because if the costs are no longer
linear, then we cannot rely on a standard LP.
It turns out that binary (or indicator) variables can come to our rescue. In order to
revise our model, we require two things. First, we need a new variable 2 {0, 1} which
takes binary values, where = 0 indicates that the product is not produced and =1
52 Chapter 4. Modelling Tricks

indicates that the product is produced. Secondly, we require an upper bound M on the
decision variable z. We can obtain the upper bound M from the manufacturer, who will
have an absolute upper bound on the total number of units produced.
We can now revise our linear program as follows. We firstly add the following two
constraints

(4.1)

to the system of constraints. In addition, we replace c · z in our objective function with

Let us argue that this revision now models our problem. Observe that if = 0, then
(4.1) forces z = 0 and hence the production cost becomes f + cz = 0 as required.
If instead = 1, then (4.1) becomes z  M and hence the production cost becomes
f + cz = f + cz. Because z  M is always satisfied by definition, the inequality does
not limit the range of possible values for z.
Observe that = 0 indicates that z = 0 and, in an optimal solution, we have that if
= 1, then z > 0. The second implication follows as the objective is to minimise the
total costs and it would not make sense to not produce any of a certain product yet pay
the fixed costs associated with turning on the new machine, i.e. an optimal solution
could not have both = 1 and z = 0. It follows that the new revised program, which is
now a MIP, correctly models this manufacturing problem.
Note that in the above we introduced a new binary variable 2 {0, 1} modelling the
logical statement

if z > 0, then = 1.

That is, behaves akin to an on/off switch that is turned on once z > 0. The previous
logical statement is logically equivalent to the contrapositive statement

if = 0, then z = 0.

It follows consequently that = 1 if and only if z > 0 (or equivalently z = 0 if and only
if = 0) holds when minimising total costs.
In the above we utilised the big-M method, which is a widely applicable modelling
strategy, which requires an upper bound M on the possible values of z. During the next
section, we illustrate the big-M method on a more complex modelling problem.
4.2. Facility Location and the Big- M Method 53

4.2 Facility Location and the Big-M Method


A logistics company is given a set T of stores and a set F of facilities. The goal of the
company is to supply the stores from the facilities while minimising the cost.
Each store t 2 T has to be supplied with exactly r t units of goods, where the goods
can come from any number of facilities. The facilities are currently closed and the com-
pany can choose an arbitrary set of these facilities to open. However, for each facility
f 2 F , there is a significant opening cost b f associated with facility f that has to be paid
irrespective of how intensively the facility is used. Moreover, the cost of transporting
one unit of good from facility f to store t is c f t . The stores are to be supplied only from
those open facilities.
For each facility f 2 F and store t 2 T , let the variable x f t express the amount of
goods supplied by facility f to store t. Note that since each store t needs a supply of r t
units of goods, we have

P P
We have so far incurred the transportation costs t2T f 2F c f t x f t in the objective func-
tion, however, we have not taken account of opening costs.
For this purpose, we similarly need a binary variable f 2 {0, 1} for each facility f ,
indicating whether or not the facility is open. More precisely, acting as an on/off switch,
we use f to model the following logical statement

if the facility f is opened, then f = 1.

This is logically equivalent to the statement

if f = 0, then the facility f is closed.

If the facility f is closed, then ever x f t for t 2 T must be 0, so we need a big-M constraint
enforcing this. A natural upper bound on x f t is r t as this is the total amount of goods
needed for store t. We model the above statement via the two big-M constraints

x f t  rt f for all t 2 T

f 2 {0, 1}.

In addition, we include

the opening cost of facility f = b f f


54 Chapter 4. Modelling Tricks

in our objective function. Observe that if f = 0, then x f t = 0 for all t 2 T and the
opening cost is b f f = 0. If instead f = 1, then x f t  r t and the opening cost is
bf f = b f . Notice that since x f t  r t always holds, it follows that the inequality does
not constrain anything. Moreover, since the objective is to minimise cost, it does not
make sense to set f = 1 when each x f t , t 2 T is 0 .
In summary, our final model for this scenario is the following MIP
X XX
minimise bf f + cf t x f t
f 2F t2T f 2F

X
subject to xf t = rt for all t 2 T
f 2F
xf t  rt f for all f 2 F, t 2 T
xf t 0 for all f 2 F, t 2 T
f 2 {0, 1} for all f 2 F.

4.3 Facility Location and Indicator Variables


Consider the facility location problem in the previous section. Let us now include an
additional restriction, namely that every store must choose one facility and get its entire
supply from that facility. In light of this, how should we revise our above MIP?
Let ft be a binary variable which indicates whether or not store t is supplied from
facility f . Firstly, we need the condition that stores can only be supplied from those
facilities that are open. In other words, we cannot have ft = 1 when f = 0. This can
be ensured via the inequality

ft  f for all f 2 F and t 2 T.

Instead of the big-M constraints, we must formulate that

namely that the amount of goods x f t supplied by facility f to store t is either the r t
units of goods needed by t if f is the facility selected, while, the supply from f is zero
otherwise. This is captured by the constraint

x f t = rt ft for all f 2 F and t 2 T.


P
For each store t 2 T , instead of the previous constraint f 2F x f t = r t that ensures that
the supply required is met across all facilities, we can instead ask that store t is supplied
4.4. Expressing Logical Conditions 55

from exactly one facility. That is, among the values f t, f 2 F we must have exactly
one 1, while, all others values are 0. Furthermore, upon recalling the f t ’s are binary
variables, the above is enforced by

Our revised MIP is


X XX
minimise bf f + cf t x f t
f 2F t2T f 2F

X
subject to ft = 1 for all t 2 T
f 2F
xf t = rt ft for all f 2 F, t 2 T
ft  f for all f 2 F, t 2 T
xf t 0 for all f 2 F, t 2 T
f 2 {0, 1} for all f 2 F
ft 2 {0, 1} for all f 2 F, t 2 T.

Note that in this revised program, the variables x f t are not necessary and could be
replaced simply by x f t = r t f t. In particular, notice that the above inequalities on x f t
are implied by the inequalities on the ft variables. Removing the x f t ’s would leave us
with variables f and only. This replacement yields
ft
X XX
minimise bf f + c f t rt f t
f 2F t2T f 2F

X
subject to ft = 1 for all t 2 T
f 2F

ft  f for all f 2 F, t 2 T
f 2 {0, 1} for all f 2 F
ft 2 {0, 1} for all f 2 F, t 2 T.

Note that in contrast to the previous models, this problem is a pure IP.

4.4 Expressing Logical Conditions


In the previous examples, we made use of binary variables to express certain logical
conditions. Let us now approach this in a more systematic way.
Consider for this purpose two (possibly correlated) events X 1 and X 2 , where each
event can either take place or not. For each i 2 {1, 2}, we associate a binary variable
56 Chapter 4. Modelling Tricks

x i 2 {0, 1} which has value x i = 1 if X i takes place, while, x i = 0 otherwise. The


following logical events concerning X 1 and X 2 can be expressed with the corresponding
constraints in terms of the corresponding binary variables x 1 and x 2 , respectively. These
logical events yield the constraints:

It should be emphasised that “X 1 or X 2 ” means that at least one (or possibly both) of
the events occur, while, “X 1 and X 2 ” means that both events must occur.
Logical conditions such as these can be transformed into other equivalent logical
conditions, where two conditions are called logically equivalent if they always have
the same truth value. Further, such equivalences can be shown using the algebraic
expressions above. For example, notice that

• “not (X 1 and X 2 )” is equivalent to “(not X 1 ) or (not X 2 )”: Observe that “not (X 1


and X 2 )” is equivalent to x 1 +x 2  1. This can be rewritten as (1 x 1 )+(1 x 2 ) 1,
which is equivalent to “(not X 1 ) or (not X 2 )” as required.

• “if X 1 , then X 2 ” is equivalent to “(not X 1 ) or X 2 ”: Observe that “if X 1 , then X 2 ”


is equivalent to x 1  x 2 . This can be rewritten as (1 x1) + x2 1, which is
equivalent to “(not X 1 ) or X 2 ” as required.

Remark. The constraints for the condition “X 1 and X 2 ” have been mentioned here since
they can be used when an “and” condition is part of a larger and perhaps more complex
logical expression. In the case of a simple “and” statement, we do not need to use
indicator variables. For example, for the constraints appearing in linear and integer
linear programming problems, it is assumed that there is an “and” relationship between
the constraints. In particular, we assume the first constraint holds “and” the second
constraint holds “and” so on. This implies that for an expression like “X 1 and X 2 ”, it is
normally sufficient to simply add the expressions X 1 and X 2 as regular constraints.

Furthermore, we can generalise the above to the case of more than two events. This
generalisation allows us to express longer and more complicated conditions using binary
variables. Consider for this purpose n events X 1 , X 2 , . . . , X n with corresponding indicator
4.5. Modelling “or” Constraints (Disjunctions) 57

variables x 1 , x 2 , . . . , x n , respectively. Upon noting that x i 2 {0, 1} for all i 2 {1, 2, . . . , n}


it follows that for some 0  k  n we can use the following inequalities to express the
corresponding logical conditions, namely

P
Note that the constraint f 2F ft = 1 for all t 2 T of the facility location problem that
ensured that only one facility f 2 F can supply store t was of this type.

4.5 Modelling “or” Constraints (Disjunctions)


Suppose that a small brewery in London has to decide how much lager and how much
ale they should produce during the next quarter. Whereas they are willing to brew
both types, the management has the vision of building a strong brand with a dominant
product that it can be associated with. Consequently, they want to either produce more
lager than ale by at least 6,000 barrels or more ale than lager by at least 4,000 barrels.
They do not wish to produce anything in-between, i.e. they want to avoid the scenario
that the two amounts are roughly the same. Further, the maximum amount they can
produce is in total 10,000 barrels during the quarter.
Denote by x 1 and x 2 the amount of larger and ale to be produced during the next
quarter, respectively. In addition to other constraints that need to be satisfied, we have
the following unusual requirement that

(4.2a)
(4.2b)

It should be emphasised that a standard program admits only “and” constraints and not
the “or” constraints that we have in this scenario. It turns out that once more binary
variables and the big-M method can come our rescue.
For this purpose, let us introduce a binary variable 2 {0, 1}, where = 1 if our
production is “lager-dominant”, i.e. if (4.2a) holds, whereas = 0 if our production is
“ale-dominant”, i.e. if (4.2b) holds. In other words, our aim is to keep (4.2a) and make
(4.2b) void if = 1, and conversely, keep (4.2b) and make (4.2a) void if = 0.
58 Chapter 4. Modelling Tricks

Recall that the big-M method requires some upper bound on the underlying decision
variables. Because total production is at most 10,000 barrels in the next quarter, we yield
the big-M bounds
10, 000  x 1 x 2  10, 000.

Consider now the following two “and” constraints

(4.3)

Observe that if = 1, then the first inequality from (4.3) becomes (4.2a), while, the
second inequality from (4.3) becomes x 1 x 2  10, 000. This inequality is void as it
always holds in light of the constraint on total production during the next quarter. If
instead = 0, then the first inequality from (4.3) becomes the void inequality x 1 x2
10, 000, while, in this case the second inequality from (4.3) becomes (4.2b). Thus, it
follows that (4.3) correctly models our problem.

This can be done in general. Suppose that, within our problem, we have several
linear constraints, denoted by

a1T x  b1 , a2T x  b2 , . . . , a kT x  bk ,

where a i 2 Rn and bi 2 R for each i 2 {1, 2, . . . , k} and that we want to impose the
condition that at least one of them is satisfied. In other words, suppose that

(a1T x  b1 )
or (a2T x  b2 )
.. (4.4)
.
or (a kT x  bk )

It should be emphasised that here we are interested in imposing the condition that at
least one of the k constraints are satisfied. It is perhaps surprising that it is impossible
in general to write as an IP the condition that exactly one of the above constraints is
satisfied. Note that in the previous example (4.2a) and (4.2b) could not be both satisfied
at the same time and as such imposing that at least one of the two is satisfied was the
same as imposing that exactly one of the two is satisfied, however, such case cannot be
ensured in general.
Suppose that M denotes an upper bound on a iT x for all i, namely that

a iT x  M for all i.
4.6. Semi-Continuous Variables 59

Note that the choice of M means that the following holds for every feasible solution x for
our underlying problem. In a similar fashion, we can formulate (4.4) by introducing a
binary variable i for every i 2 {1, 2, . . . , k}, where i = 1 if the i-th constraint a iT x  bi
is satisfied.
We can express this with the system

(4.5)

Observe the constraint 1 + 2 + ··· + k = 1 forces all the i ’s to take value 0, except
for exactly one, say h, which takes value 1. For each i such that i = 0, notice that the
constraint a iT x  bi i +M (1 i ) becomes a iT x  M , which by our assumption is always
satisfied and hence does not impose any further restriction. For h = 1, notice that the
constraint ahT x  bh i + M (1 h) becomes ahT x  bh . In particular, h = 1 enforces
that the h-th constraint must be satisfied. It follows that the system (4.5) imposes that at
least one (in this case the h-th) of the constraints is satisfied by x . Note for completeness
that the equality constraint 1 + 2 + ··· + k = 1 appearing in (4.5) could be replaced
by 1 + 2 + ··· + k 1 without impacting on the solutions.

4.6 Semi-Continuous Variables

An example of the above is in applications where we want to ensure that, whenever a


variable takes positive value, it takes a “large enough” value. For example, in portfolio
optimisation it is common to require that if we decide to invest any money in a certain
asset, then we invest at least predetermined minimum amount. Another example is in
production problems where typically if a company has to decide whether to introduce a
new product in the market, they will want to do so only if is profitable to produce it to
at least some minimum specified amount.
The above situations may be modelled using semi-continuous variables. Given some
decision variable x 1 and a bound `1 > 0, we want to enforce the nonlinear constraint

(4.6)
60 Chapter 4. Modelling Tricks

This can be modelled once more using the big-M method. Suppose that we have
knowledge of a value M > 0 that is “large enough” such that x 1  M is guaranteed in
every optimal solution. Further, let us define a binary variable which we want to take
the following meaning 8
<0, if x 1 = 0,
=
:1, if x 1 > 0.

This allows us to model the constraint (4.6) using

x1  M (4.7)
x1 `1 (4.8)
2 {0, 1}.

Observe that if = 0, then (4.7) becomes x 1  0 and (4.8) becomes x 1 0, which


hence forces x 1 = 0. If instead = 1, then (4.7) becomes x 1  M , which does not cut
off any optimal solution by the choice of M , while, (4.8) becomes x 1 `1 . It follows
that the above system does indeed enforce that the (nonlinear) constraint (4.6) holds
as required.

4.7 Binary Polynomial Programming


A binary polynomial program is an optimisation problem of the form

minimise z = f (x)
subject to g i (x) = 0, i = 1, 2, . . . , m (4.9)
x j 2 {0, 1}, j = 1, 2, . . . , n,

where the functions f and g i (i = 1, . . . , m) are polynomials. This model is clearly


nonlinear in general, however, it turns out that the functions can be linearised when the
variables only take value 0 or 1. In particular, any binary polynomial program (4.9) can
be equivalently formulated as a (pure) BIP by introducing additional variables.
To see why this is the case, observe that for any integer exponent k 1, the binary
variable x j , where j 2 {1, 2, . . . , n}, satisfies the equality

x kj = x j .

It follows that we can replace every expression of the form x kj with x j for all j. This
ensures that no variable appears in the functions f or g i , where i 2 {1, 2, . . . , m}, with
an exponent greater than 1.
4.7. Binary Polynomial Programming 61

Note that this linearises all expressions featuring only one variables. It remains to
consider how we linearise expressions with more than one binary variable. The product
x j · x l of two binary variables where j, l 2 {1, 2, . . . , n} can be replaced by a new binary
variable y jl related to x j and x l by linear constraints. In particular, in order to ensure
that we have
y jl = x j · x l

when x j and x l are binary variables, it suffices to impose the linear constraints

y jl  x j
y jl  x l
y jl xj + xj 1

in addition to x j , x l , y jl 2 {0, 1}. If there are more variables are featured in an expres-
sion, then we can apply a similar procedure to linearise.
For example, consider the objective function f defined by

f (x) = x 15 x 2 + 4x 1 x 2 x 32 .

Upon applying the above linearisation sequentially, the function f is replaced initially
by the function
z = x 1 x 2 + 4x 1 x 2 x 3

for the binary variables x j , where j = 1, 2, 3. We then introduce binary variables y12 in
place of x 1 x 2 and y123 in place of y12 x 3 . The objective function is as such replaced by
the linear function
z = y12 + 4 y123 ,

where we additionally impose the restrictions

y12  x 1
y12  x 2
y12 x1 + x2 1
y123  y12
y123  x 3
y123 y12 + x 3 1
y12 , y123 , x 1 , x 2 , x 3 2 {0, 1}.

It should be noted that it is possible to replace the fourth and sixth constraints above by
other constraints if one would prefer not to make use of the new binary variable y12 in
the right-hand sides.
62 Chapter 4. Modelling Tricks

4.8 Exercises for Self-Study


1. Consider the following integer nonlinear programming problem

maximise 4x 12 x 13 + 10x 22 x 24 + x 1 x 27
subject to x1 + x2  3
x1 + x2  3
x1, x2 0
x 1 , x 2 2 {0, 1}.

Formulate as a BIP and solve using the solver via AMPL.

2. The Research and Development Division of the Progressive Company has been de-
veloping four possible new product lines. Management must now make a decision
as to which of these four products actually will be produced and at what levels.
Therefore, an operations research study has been requested to find the most prof-
itable product mix. A substantial cost is associated with beginning the production
of any product, as given in the first row of the following table. Management’s ob-
jective is to find the product mix that maximizes the total profit (total net revenue
minus start-up costs).

Product
1 2 3 4
Start-up Cost $50,000 $40,000 $70,000 $60,000
Marginal Revenue $70 $60 $90 $80

Let the continuous decision variables x 1 , x 2 , x 3 , and x 4 be the production levels


of products 1, 2, 3 and 4, respectively. Management has imposed the following
policy constraints on these variables:

a) No more than two of the products can be produced.


b) Either product 3 or 4 can only be produced only if either product 1 or 2 is
also produced.
c) Either 5x 1 + 3x 2 + 6x 3 + 4x 4  6, 000, or 4x 1 + 6x 2 + 3x 3 + 5x 4  6, 000.

Introduce auxiliary binary variables to formulate and solve a mixed BIP model for
this problem.

3. Suppose that a mathematical model fits linear programming except for the restric-
tion that |x 1 x 2 | = 0, or 3 , or 6 . Show how to reformulate this restriction to fit
an MIP model.
4.8. Exercises for Self-Study 63

4. The Toys 4 U Company has developed two new toys for possible inclusion in its
product line for the upcoming Christmas season. Setting up the production fa-
cilities to begin production would cost $50, 000 for toy 1 and $80, 000 for toy 2.
Once these costs are covered, the toys would generate a unit profit of $10 for toy
1 and $15 for toy 2.

The company has two factories that are capable of producing these toys. However,
to avoid doubling the start-up costs, just one factory would be used, where the
choice would be based on maximizing profit. For administrative reasons, the same
factory would be used for both new toys if both are produced.

Toy 1 can be produced at the rate of 50 per hour in factory 1 and 40 per hour in
factory 2 . Toy 2 can be produced at the rate of 40 per hour in factory 1 and 25 per
hour in factory 2. Factories 1 and 2 , respectively, have 500 hours and 700 hours
of production time available before Christmas that could be used to produce these
toys. It is not known whether these two toys would be continued after Christmas.
Therefore, the problem is to determine how many units (if any) of each new toy
should be produced before Christmas to maximize the total profit.

Formulate and solve a MIP model for this problem.

5. Suppose that a mathematical model fits linear programming except for the restric-
tions that

a) at least one of the two inequalities

3x 1 x2 x 3 + x 4  12
x 1 + x 2 + x 3 + x 4  15

holds.

b) At least two of the inequalities

2x 1 + 5x 2 x 3 + x 4  30
x 1 + 3x 2 + 5x 3 + x 4  40
3x 1 x 2 + 3x 3 x 4  60

holds.

Show how to reformulate these restrictions to fit an MIP model.

6. A contractor, Susan Meyer, has to haul gravel to three building sites. She can
purchase as much as 18 tons at a gravel pit in the north of the city and 14 tons at
64 Chapter 4. Modelling Tricks

one in the south. She needs 10, 5, and 10 tons at sites 1, 2, and 3, respectively.
The purchase price per ton at each gravel pit and the hauling cost per ton are
given in the table below.

Hauling Cost per Ton at Site


Pit 1 2 3 Price per Ton
North $100 $190 $160 $300
South $180 $110 $140 $420

Susan wishes to determine how much to haul from each pit to each site to minimise
the total cost for purchasing and hauling gravel.

a) Formulate and solve an appropriate model for this problem.

b) Susan now needs to hire the trucks (and their drivers) to do the hauling.
Each truck can only be used to haul gravel from a single pit to a single site.
In addition to the hauling and gravel costs specified above, there now is a
fixed cost of $150 associated with hiring each truck. A truck can haul 5 tons,
but it is not required to go full. For each combination of pit and site, there
are now two decisions to be made: the number of trucks to be used and the
amount of gravel to be hauled.
Formulate and solve an appropriate model for this problem.
65

Chapter 5

Sensitivity Analysis

5.1 A Brief Review of Dual LPs

For a real matrix A 2 Rm⇥n with m rows and n columns, b 2 Rm and c 2 Rn , consider
the maximisation LP given in standard form

maximise cT x
subject to Ax  b (5.1)
x 0.

The dual problem is


minimise bT y
subject to AT y c (5.2)
y 0,

where AT , b T and c T represent the transposes of A, b and c, respectively. Observe that


if we start from an LP problem and take the dual of its dual, then we get back to the
original problem, i.e. that the dual of the dual is the primal.
The original problem is often referred to as the primal. Primal and dual problems
have the following relations:

• If the primal problem is a maximisation problem, the dual is a minimisation prob-


lem, while, if the primal is a minimisation problem, the dual is a maximisation
problem.

• For every variable in the primal problem, there is a constraint in the dual.

• For every constraint in the primal, there is a variable in the dual.

• The objective function coefficients in the primal are the right-hand side coefficients
in the dual, and vice versa.
66 Chapter 5. Sensitivity Analysis

The most important result in linear programming is a theorem that connects the
primal and dual problems. This is known as the Strong Duality Theorem of Linear
Programming and the result is stated below without proof.

Theorem. If a linear programming problem admits an optimal solution, then also its dual
admits an optimal solution. Furthermore, the optimal values of the primal problem and of
its dual coincide.

Further, duality enables us to provide convenient conditions for verifying if some


solution to an LP is optimal. Recall that if an LP has a solution that satisfies at equality n
independent linear constraints and the LP admits an optimal solution, then there exists
some optimal solution that is an extreme point of the feasible region.
Before stating the conditions we introduce two definitions. The resource constraints
that are defining for some extreme point are called the effective constraints at that point,
while, the remaining resource constraints are called ineffective. If some nonnegativity
constraint, say x j 0, is defining at some extreme point, then we say that x j is a nonbasic
variable. The other variables are basic variables at that point.
The conditions are presented in the following result which is stated without proof.

Theorem. It can be proven that an extreme solution x ⇤ to an LP is optimal if and only if


there exists a feasible solution y ⇤ for the dual that satisfies the following conditions:

• y ⇤ satisfies at equality all the dual constraints corresponding to basic variables.

• For every ineffective constraint for x ⇤ , the corresponding component of y ⇤ is 0.

Furthermore, such a solution y ⇤ is optimal for the dual.

Consider a constraint of the form

a iT x  bi .

Given a point x̄ , the difference bi a iT x̄ is called the slack of the constraint at x̄ . Ob-
serve that if x̄ is feasible, then the slack is nonnegative. The following is known as
complementary slackness.

Theorem. Consider the primal LP (5.1) and its dual LP (5.2), where A 2 Rm⇥n , b 2 Rm0
and c 2 Rn 0 . Given a feasible solution x ⇤ for the primal (5.1) and a feasible solution y ⇤
for the dual (5.2), the following statements are equivalent:

• x ⇤ is optimal for the primal (5.1) and y ⇤ is optimal for the dual (5.2),
5.2. Sensitivity Analysis 67

• for i = 1, 2, . . . , n , either x i⇤ = 0 or the slack of the i-th constraint of AT y ⇤ c is 0


and
for j = 1, 2, . . . , m , either y ⇤j = 0 or the slack of the j-th constraint of Ax ⇤  b is 0.

It should be noted that complementary slackness provides us with an approach to


decide whether a feasible solution is optimal. Let us consider a primal feasible solution
x ⇤ for this purpose. In light of the above Theorem, we note that x ⇤ is optimal if, and
only if, there exists a dual feasible solution y ⇤ such that x ⇤ and y ⇤ simultaneously satisfy
the complementary slackness conditions.

5.2 Sensitivity Analysis


One of the weaknesses of linear programming is the underlying assumption that all the
coefficients are known with certainty. It is a deterministic model of the real situation.
However, often important characteristics of the real-world problem depend on random
eventualities, including for example demand, prices or resource availability.
A useful feature of linear programming is that one can analyse how sensitive the
optimal solution is to changes in the data. That means that one can evaluate, without the
need to re-solve the LP, how changing the values of certain coefficients affects the optimal
solution and its value in the objective function. This analysis is known as sensitivity
analysis. Despite this, there are two main limitations that one should however consider.
Firstly, sensitivity analysis can only provide information on the impact of changing one
coefficient at the time, and only for changes within a certain range (which depends on
the problem and on the specific coefficient).

The Dual Values as Marginal Values

Consider the linear programming problem

maximise 2x 1 + 8x 2
subject to 2x 1 + x 2  10
x 1 + 2x 2  10
x1 + x2  6
(5.3)
x 1 + 3x 2  12
3x 1 + x 2  0
x1 4x 2  4
x1, x2 0.
68 Chapter 5. Sensitivity Analysis

As apparent from the diagram in Figure 5.1, the optimum is the extreme point x ⇤ defined
by constraints 4 and 5, which is the point of coordinates x 1 = 1.2, x 2 = 3.6. The
maximum objective function value is 31.2. Recall that an extreme point is a feasible
point that satisfies at equality n independent constraints from the above system.

Figure 5.1: Feasible region, shaded in gray, and optimal contour of the objective func-
tion. The direction of maximisation is represented by the arrow perpendicular to the
objective function contour.

The dual problem to (5.3) is

By complementary slackness, the optimal dual solution y ⇤ satisfies y1 = y2 = y3 = y6 =


0 since the corresponding primal constraints are not satisfied at equality and it must also
satisfy the first and second dual constraint at equality since x 1 > 0 and x 2 > 0. This
yields the linear system
y4 3 y5 = 2
3 y4 + y5 = 8,

which uniquely determines the values y4⇤ = 2.6 and y5⇤ = 0.2. Observe that the dual
problem has minimum objective function value equal to 2.6·12+0.2·0 = 31.2, affirming
primal and dual objective function values coincide.
How does the optimal value change if we were to change the value of the right-hand
side of constraint 5 by some amount ✓ , namely from 0 to 0 + ✓ ? Note that ✓ could take
5.2. Sensitivity Analysis 69

both positive or negative values. The new (primal) problem is

maximise 2x 1 + 8x 2
subject to 2x 1 + x 2  10
x 1 + 2x 2  10
x1 + x2  6
x 1 + 3x 2  12
3x 1 + x 2  ✓
x1 4x 2  4
x1, x2 0,

while, the new corresponding dual problem is

mininimise 10 y1 + 10 y2 + 6 y3 + 12 y4 + ✓ y5 4 y6
subject to 2 y1 + y2 + y3 + y4 3 y5 y6 2
y1 + 2 y2 + y3 + 3 y4 + y5 4 y6 8
y1 , y2 , y3 , y4 , y5 , y6 0.

If the amount of change in ✓ is “too large”, there is not much one can say without
simply re-solving the problem. However, let us assume that the change is “small enough”
that the optimal solution will still be defined by constraints 4 and 5. We will later discuss
how small ✓ must be in order to satisfy this assumption.
⇤ ⇤
Denote by = (✓ ) the optimal value of the modified problem. Assuming that
the optimum point is still defined by constraints 4 and 5, the optimal dual solution y ⇤
satisfies y1 = y2 = y3 = y6 = 0 and it must also satisfy the first and second dual
constraints at equality as x 1 > 0 and x 2 > 0 holds. Note that the constrains of the dual
are unchanged (since ✓ only appears in the objective function) and as such y4⇤ = 2.6
and y5⇤ = 0.2 as before.

It follows that the new optimal value is

= 12 y4⇤ + ✓ y5⇤ = 12 · 2.6 + ✓ · 0.2 = 31.2 + 0.2 · ✓ .

In particular, a change by ✓ in the right-hand side of constraint 5 leads to change of


0.2 · ✓ in the optimal value, where 0.2 is the dual value of constraint 5.
This observed relationship is true in general. In particular, the size and sign of a dual
value gives the rate of change of the objective function value as the right-hand side of a
constraint changes. In symbols, we have
70 Chapter 5. Sensitivity Analysis

Notice that the change in resource availability can be either up or down. If the i-th
constraint is a  and the problem is a maximisation problem so that the dual value yi is
nonnegative, then if the resource increases, so will the objective function, while, if the
available resource decreases, then so will the objective function. This is to be expected
as increasing the right-hand side of a  constraint makes the problem less constrained
and therefore there could exist solutions with higher objective function value, whereas
decreasing the right-hand side makes the problem more constrained and as such the
previous optimal solution might become infeasible and the new optimum would have
lower value.

Analogously, if the constraint is a and the problem is maximisation, so that the


optimum the dual value yi of the i-th constraint is nonpositive, then if the right-hand
side increases, the optimal objective value will decrease, while, if the resource decreases,
the optimal objective value will increase.

If instead the dual variable has value 0, i.e. yi = 0, then for small enough changes in
the right-hand side of the i-th constraint, there will be no change in the optimal objective
function value.

Figure 5.2: Feasible region of the original and modified problem and corresponding
optimal contours of the objective function. The new optimal solution is indicated by the
black dot. Note that the optimal solution changes but it is still defined by constraints 4
and 5. The diagram corresponds to the value ✓ = 3.
5.2. Sensitivity Analysis 71

Ranges for the Coefficients of the Right-hand Side

The interpretation of the dual value as the rate of change of the objective function that
results from a change in a resource is only true for a limited range of values of the
right-hand side of the constraint. The derivation of the ranges is easy for the ineffective
constraints. The general derivation of this range for effective constraints is outside the
scope of this course, however we will compute it for the previous example and give
a geometric intuition through diagrams. Recall that the resource constraints that are
defining for some extreme point are called the effective constraints at that point, while,
the remaining resource constraints are called ineffective.

Ineffective Constraints

If the ineffective constraint is a  constraint, then clearly the right-hand side can increase
to infinity without affecting the solution. The right-hand side can decrease until it is low
enough for the constraint to be satisfied at equality, namely to the value below which
the current solution would be infeasible. Similarly, the right-hand-side of an ineffective
constraint could decrease to minus infinity and increase to the value above which the
current solution will be infeasible.
For example, consider the LP (5.3). At the optimal solution x 1 = 1.2, x 2 = 3.6, the
constraints 1, 2, 3, and 6 are all ineffective. By how much can we change the right-hand
side of these constraints before the optimal solution changes? For constraint 1, we have
that
2 · 1.2 + 1 · 3.6 = 6.

Since the right-hand side of the first constraint is 10, this can decrease to 6 without
affecting the optimal solution and it can increase to infinity. That is, the optimal solution
x 1 = 1.2, x 2 = 3.6 does not change as long as the right-hand side b1 of the first constraint
is within the range 6  b1  +1.
In a similar fashion, we can compute the ranges for the other constraints, namely

Constraint 2 : 1 · 1.2 + 2 · 3.6 = 8.4 =) 8.4 · 4  b2  +1


Constraint 3 : 1 · 1.2 + 1 · 3.6 = 4.8 =) 4.8  b3  +1
Constraint 4 : 1 · 1.2 4 · 3.6 = 15.6 =) 15.6  b6  +1

Effective Constraints

Let us consider the previous example (5.3), where we change the right-hand side of
constraint 5. From the diagram on the left in Figure 5.3, it is apparent that if we increase
the right-hand side of constraint 5, the optimal solution of the problem remains defined
72 Chapter 5. Sensitivity Analysis

by constraints 4 and 5 until the border of constraint 5 passes through the intersection
of constraint 4 and the nonnegativity of x 1

Figure 5.3: Representation of the largest and smallest values that the right-hand side of
constraint 5 can take in order for constraints 4 and 5 to remain effective.

By solving a suitable linear system, we can compute that the intersection point of
constraint 4 and x 1 = 0 is the point (0, 4). Constraint 5 passes through this point when
its right-hand side becomes
3 · 0 + 1 · 4 = 4.

Similarly, from the diagram on the right of Figure 5.3, it is apparent that if we decrease
the right- hand-side of constraint 5, the optimal solution of the problem remains defined
by constraints 4 and 5 until the border of constraint 5 passes through the intersection of
constraints 3 and 4. One can compute that the point has coordinates (3, 3). Constraint
5 passes through this point when its right-hand side becomes

3 · 3 + 1 · 3 = 6.

It follows that the range for the right-hand side of constraint 5 is

6  b5  4.

It is possible to derive these bounds formally as follows. As we have seen, the primal
optimal solution is defined by constraints 4 and 5 as long as such solution is feasible (be-
cause the corresponding dual solution, as well as the dual constraints, are unchanged).
The basic solution defined by constraints 4 and 5 is the unique solution to the system

x 1 + 3x 2 = 12
3x 1 + x 2 = ✓ ,

where we denote by ✓ the new right-hand side of constraint 5. Solving the system, we
get that the solution is given by x 1 (✓ ) = 1.2 0.3✓ and x 2 (✓ ) = 3.6 + 0.1✓ . We need to
5.2. Sensitivity Analysis 73

find the values of ✓ for which the solution is feasible. This is done by substituting x (✓ )
into the primal constraints:

Constraint 1 : 2x 1 (✓ ) + x 2 (✓ )  10 () ✓ 8
Constraint 2 : x 1 (✓ ) + 2x 2 (✓ )  10 () ✓ 16
Constraint 3 : x 1 (✓ ) + x 2 (✓ )  6 () ✓ 6
Constraint 6 : x 1 (✓ ) 4x 2 (✓ )  4 () ✓ 116
Nonnegativity : x 1 (✓ ), x 2 (✓ )  0 () ✓  4 and ✓ 36.

In this case, observe that ✓ satisfies the above conditions if and only if 6  ✓  4 as
we had previously determined.

Coefficients of the Objective Function

In this subsection, we study how changes in one of the objective function coefficients
affect the value of the optimal solution. As before, if the coefficient falls within a certain
range, the new optimal objective value can be computed without having to re-solve
the problem. Computing this range is a straightforward task if the change occurs to the
coefficient of a variable that is nonbasic, while it is more complicated for basic variables.
Recall that if some nonnegativity constraint, say x j 0, is defining at some extreme
point, then we say that x j is a nonbasic variable. The other variables are basic variables
at that point.

Nonbasic Variables

If the problem is a maximisation and a variable x j is nonbasic, then clearly its coeffi-
cient c j in the objective function can decrease to minus infinity and the variable x j will
continue at the value zero. As the coefficient c j of the objective function increases, there
will be a value at which the solution will be multiply optimal, and then above that the
current solution will be suboptimal. The new optimal solution will have the variable x j
at a nonzero value.
74 Chapter 5. Sensitivity Analysis

Consider the LP
maximise 8x 1 + 3x 2
subject to 2x 1 + x 2  10
x 1 + 2x 2  10
x1 + x2  6
x 1 + 3x 2  1
3x 1 + x 2  0
x1 4x 2  4
x1, x2 0.

At the optimal solution, x 1 is basic with a value of 5 and x 2 is nonbasic. The effective
constraint is constraint 1 with dual value y1 = 4. Suppose we replace the objective
function coefficient of the nonbasic variable x 2 with another number c2 . Since all other
coefficients in the LP are unchanged, in order to check optimality of the solution x 1 = 5,
x 2 = 0, we only need to confirm that the dual constraint relative to the variable x 2
remains satisfied by the dual solution. That is

y1 + 2 y2 + y3 + 3 y4 + y5 4 y6 c2 .

Thus we need to confirm that

1·4+2·0+1·0+3·0+1·0 4·0=4 c2 .

The solution would be multiply optimal if the value of c2 , now 3 , were to increase to
4 . Beyond the value of 4 , the solution would be non-optimal. Thus the upper limit of
the value of c2 is 4 and its lower limit is 1.
More generally suppose we have a maximisation LP. Given some optimal extreme
point x ⇤ , let y ⇤ be an optimal dual solution. Let x ⇤j be a nonbasic variable for x ⇤ . The
solution x ⇤ remains optimal if we change the corresponding objective function coeffi-
cient c j within the range

1  c j a1 j y1⇤ + a2 j y2⇤ + · · · + am j ym

.

If instead we were given a minimisation LP, then the solution x ⇤ remains optimal if we
change the corresponding objective function coefficient c j for nonbasic x ⇤j within the
range
a1 j y1⇤ + a2 j y2⇤ + · · · + am j ym

 c j  +1.
5.2. Sensitivity Analysis 75

Basic Variables

Consider once more the problem (5.3), represented in Figure 5.1. At the optimal so-
lution x ⇤ of coordinates x 1 = 1.2 and x 2 = 3.6, both variables are basic. Suppose we
change the coefficient of x 1 in the objective function, which is 2x 1 + 8x 2 . As the coeffi-
cient c1 of x 1 increases, the optimal contour of the objective function rotates clockwise
around the point x ⇤ until it becomes parallel to constraint 4, namely x 1 + 3x 2  12.
This happens when c1 = 8/3. At this value of c1 , the solution x ⇤ is no longer the unique
optima, while, for c1 > 8/3 the optimal solution becomes the point x 0 at the intersection
of constraints 3 and 4. This is illustrated in Figure 5.4(a).

(a) When c1 is increased to 8/3, the optimal (b) When c1 is decreased to 24, the opti-
extreme points are x ⇤ and x 0 . mal extreme points are x ⇤ and x 00 .

Figure 5.4: Effect of changes in objective coefficient of a basic variable.

Similarly, as the coefficient of x 1 decreases, the optimal contour of the objective


function rotates anticlockwise around the point x ⇤ until it becomes parallel to constraint
5, which is 3x 1 + x 2  0. This happens when c1 = 24. At this value of c1 , the solution
x ⇤ is no longer the unique optima, while, for c1 < 24 the optimal solution becomes the
point x 00 at the intersection of constraints 5 and 6. This is illustrated in Figure 5.4(b).
Therefore, the solution x ⇤ remains optimal for all values of c1 in the range

8
24  c1  .
3

The following example demonstrates how AMPL allows us to perform sensitivity


analysis in a real-world scenario to analyse various “what-if” questions.

Example. (Computer manufacturing) A computer manufacturer produces 5 families of


laptops. The company has to cope with shortages from suppliers of two components, namely
solid state drives (SSDs) and memory boards. The table below illustrates the 5 types of
computers and the use of components.
76 Chapter 5. Sensitivity Analysis

System L1 L2 L3 L4 L5
Price (£) 2000 1400 1000 800 500
# CPUs 1 1 1 1 1
# SSDs 1 0.7 0 0.3 0
# Memory Boards 6 4 4 2 2

For example, 7 out of 10 of laptops from family L2 make use of SSDs (where the re-
maining 3 out of 10 use regular hard-disks) and the average price of a model in the family
L2 is £1400. The following difficulties are anticipated for the next quarter:

• the supplier of CPUs can provide at most 8000 units,

• the supplier of SSDs can provide at most 3000 units, and

• the supplier of memory boards can provide at most 14000 units.

A demand of 5000 units is estimated for the first two types of laptops, while, a demand
of 4000 is estimated for the last three types. Furthermore, there are already 700 orders
placed for laptops L2 and L5.
The company would like to devise a production plan for the next quarter in order to
maximise their profit. Further, the company wants to analyse the following “what-if” sce-
narios:

• The company could purchase 2000 extra memory boards from a different supplier, at
a cost of £200,000. Should they consider it?

• Marketing estimates that spending £150,000 in advertising would boost demand for
the lower priced laptops L3, L4, L5 by one thousand units in the next quarter. Should
the company invest the money in advertising?

• The company realises that it is loosing money on its cheapest line of laptops, so they
intend to scrap production. However, the company would face a £120,000 penalty
for the missed delivery of the orders that have already been placed. What should they
do?

• The company realises that it has priced its top-of-the range laptop too low. At how
much should they price it in order for it to become profitable?

• Higher labour prices in the factory producing laptops of type 4 will reduce profit on
each unit by £100. Should the company consider changing its production plan? How
will this affect its profits?
5.2. Sensitivity Analysis 77

Model this scenario and provide suggestions as to what decisions should be made to the
previous “what-if” scenarios?
78 Chapter 5. Sensitivity Analysis

5.3 Exercises for Self-Study


1. For each of the following primal LPs, write the corresponding dual problem.

a) The LP
maximise 4x 1 + x 2 + 3x 3
subject to x 1 + 4x 2 1
3x 1 x2 + x3  3
3x 1 x2 + x3  3
x1 x2, x3 0.
b) The LP
minimise x 1 + 7x 2 + 17x 3
subject to x 1 + 4x 2 13
x1 11x 2 + x 3  3
x1 x2, x3 0.
c) The LP
maximise x1 2x 2
subject to x 1 + 2x 2 x3 + x4 0
4x 1 + 3x 2 + 4x 3 2x 4  3
x1 x 2 + 2x 3 + x 4 = 1
x2, x3 0.

2. For each of the following linear programming models, give your recommendation
on which is probably the more efficient way to obtain an optimal solution, by either
say applying the simplex method directly to this primal problem or by instead
applying the simplex method directly to the dual problem instead. Justify your
answer.

a) The LP
maximise 10x 1 + 4 x 2 + 7x 3
subject to 3x 1 2x 2 + 2x 3  25,
x1 2x 2 + 3x 3  25,
5x 1 + x 2 + 2x 3  40,
x 1 + x 2 + x 3  90,
2x 1 x 2 + x 3  20,
x1, x2, x3 0.
5.3. Exercises for Self-Study 79

b) The LP
maximise 2x 1 + 5x 2 + 3x 3 + 4x 4 + x 5
subject to x 1 + 3x 2 + 2x 3 + 3x 4 + x 5  6
4x 1 + 6x 2 + 5x 3 + 7x 4 + x 5  15
x1, x2, . . . , x5 0.

3. Construct a pair of primal and dual problems, each with two decision variables and
two resource constraints, such that the primal problem has no feasible solutions
and the dual problem has an unbounded objective function.

4. Consider the LP
maximise 3x 1 8x 2
subject to x1 2x 2  10
x1, x 2 0.

a) Construct the dual problem and find its optimal solution by inspection.
b) Use the complementary slackness property and the optimal solution to the
dual problem to find the optimal solution to the primal problem.
c) Suppose that c1 , the coefficient of x 1 in the primal objective function, actually
can have any value in the model. For what values of c1 does the dual problem
have no feasible solutions? For these values, what does duality theory then
imply about the primal problem?

5. Consider the maximisation LP

maximize x 1 + 2x 2 + x 3 + x 4
subject to 2x 1 + x 2 + 5x 3 + x 4  8
2x 1 + 2x 2 + 4x 4  12
3x 1 + x 2 + 2x 3  18
x1, x2, x3, x4 0.

a) Solve this LP using the solver via AMPL.


b) What will be an optimal solution to the problem if the objective function is
changed to 3x 1 + 2x 2 + x 3 + x 4 ?
c) What will be an optimal solution to the problem if the objective function is
changed to x 1 + 2x 2 + 0.5x 3 + x 4 ?
d) What will be an optimal solution to the problem if the second constraint’s
right-hand side is changed to 26?
80 Chapter 5. Sensitivity Analysis

6. SugarCo can manufacture three types of candy bar. Each candy bar consists totally
of sugar and chocolate. The compositions of each type of candy bar and the profit
earned from each candy bar are shown in the table below. Fifty oz of sugar and
100 oz of chocolate are available.

Amount of Sugar (ounces) Amount of Chocolate (ounces) Profit (cents)


Bar 1 1 2 3
Bar 2 1 3 7
Bar 3 1 1 5

a) Formulate the LP that SugarCo should solve.

b) Solve this LP.

c) If 60 oz of sugar was available, what would be SugarCo’s profit? How many


of each candy bar should they make? Could these questions be answered if
only 30 oz of sugar was available?

d) Suppose a type 1 candy bar used only 0.5 oz of sugar and 0.5 oz of chocolate.
Should SugarCo make type 1 candy bars?

e) SugarCo is considering making type 4 candy bars. A type 4 candy bar earns
17 cents profit and requires 3 oz of sugar and 4 oz of chocolate. Should
SugarCo manufacture any type 4 candy bars?

7. For each of the objective coefficients in the previous exercise, find the range of
values for which the optimal solution remain optimal.
81

Chapter 6

Optimisation Problems on Graphs

In this chapter, we introduce an important class of linear programming problems that are
known as network flow problems. These problems are important for various reasons. One
reason is that they can be represented not only as a linear programming problem, but
additionally as a specific mathematical object called a graph. It turns out that looking
at problems in terms of graphs can be a helpful way of analysing them which often
provides a fresh perspective on the problem.
Moreover, because of their mathematical structure, network flow problems can be
sometimes solved much faster than general linear programming problems. From a prac-
tical perspective, the modeler is therefore in a very convenient situation if it is possible
to represent a problem as a network flow problem. Finally, network flow problems are
guaranteed to have solutions that are integer if the right-hand sides of the constraints are
integer. This can be very important in practical applications when fractional solutions
do not make sense. Throughout this chapter, we consider some of the most relevant
types of network flow problems including minimum cost flow problems and transporta-
tion problems.

6.1 Graphs

We can represent any network flow problems as a graph. Here we introduce some basic
terminology about graphs before we come back to different optimisation problems.
An undirected graph (or simply a graph), denoted by G = (V, E) , consists of two
(finite) sets V and E. The elements of V are called the vertices or nodes and the elements
of E are called the edges of the graph G. Each edge is an unordered pair of vertices that
are called the endnodes.
Consider for example the undirected graph with

V = {a, b, c, d} and E = {a, b}, {b, c}, {b, d}, {a, d} .


82 Chapter 6. Optimisation Problems on Graphs

In order to simplify notation, we will usually just write a b instead of {a, b} to repre-
sent the unordered pair of vertices. It should be emphasised that in an undirected graph
a b and ba are the same edge. Graphs have natural visual representations in which each
vertex is represented by a point and each edge by a line joining its endnodes. Figure 6.1
illustrates this undirected graph.

Figure 6.1: The representation of the undirected graph {a, b, c, d}, {a b, bc, bd, ad} .

It should be noted that for the above graph, illustrated by Figure 6.1, some pairs of
vertices, including ad and bc, are “connected” in the sense that there exists some edge
connecting them. This observation inspires the following definition. Two vertices x, y
of a graph G are said to be adjacent if x y 2 E. If e = x y is an edge of G, then we say
that e is incident with x and y.
Notice that if x and y are adjacent vertices in an undirected graph, then it follows
that y and x are adjacent. Informally, this tells us that the above notion of connected
is commutative over undirected graphs. From a modelling viewpoint, it turns out that
this commutativity property is rather restrictive as we can only represent relationships
that are “symmetrical”. In order to overcome this limitation, we introduce the following
definition of a directed graph.
A directed graph (or digraph) or network, denoted by G = (N , A) , consists of two
(finite) sets N and A. The elements of N are called the vertices or nodes and the elements
of A are the arcs of the digraph G, where each arc is an ordered pairs of vertices. Consider
for example the directed graph with

N = {a, b, c, d} and A = (a, b), (b, c), (b, d), (d, b), (a, d) .

Observe that (b, d) is not the same as (d, b) because (b, d) is the arc from b to d, while,
(d, b) is the arc from d to b. In particular, making use of arcs in this manner has enabled
6.2. Minimum Cost Flow Problems 83

us to overcome the restrictive aforementioned commutativity property. Every arc in a


directed graph has a specific direction, which can visually indicated represented by an
arrow. Figure 6.2 illustrates this directed graph.

Figure 6.2: The representation of the above directed graph, namely {a, b, c, d},
{(a, b), (b, c), (b, d), (d, b), (a, d)} .

It will be useful in applications for us to formalise the above intuitive notion of some
graph or digraph being “connected”. In an undirected graph, a path is a sequence of
vertices v1 , v2 , . . . , vk 2 V such that {vi , vi+1 } is an edge for each i = 1, 2, . . . , k 1. In
other words, a path is a sequence of vertices with the property that each vertex in the
sequence is adjacent to the vertex next to it within the sequence. A graph G is said to
be connected if any two vertices of G are joint by a path.
In a similar fashion, in a directed graph, a directed path (or simply a path) is a
sequence of vertices v1 , v2 , . . . , vk such that (vi , vi+1 ) is an arc for each i = 1, 2, . . . , k 1
and this is a path from v1 to vk . In other words, a directed path is a path with the added
restriction that the edges must be all directed in the same direction. A directed graph D
is said to be connected if, for any two nodes x, y, there exists both a directed path from
x to y and a directed path from y to x.

6.2 Minimum Cost Flow Problems

During this section, we introduce the minimum cost network flow problem in a general
setting before providing examples. For the minimum cost network flow problem, the
inputs are:

• a digraph G = (N , A) , where N and A denote the nodes and arcs of G, respectively,


84 Chapter 6. Optimisation Problems on Graphs

• a set S ✓ N of supply nodes,

• a set D ✓ N of demand nodes,

• supplies ai 0 available at each node i 2 S,

• demands bi 0 at each node i 2 D,

• costs ci j per unit of “flow” travelling on arc (i, j) , and

• lower and upper capacity bounds `i j and ui j for arc (i, j) , respectively.

It should be noted that the “flow” corresponds to whatever it is that it supply and de-
manded, which for example could be products in a logistic network or electricity over
electrical distribution systems, that passes through the given network. Suppose that the
lower capacity bound is no larger than the upper bound, i.e. that

`i j  ui j .

It will often be the case in applications that flows must be nonnegative meaning that
our lower capacity bound is `i j 0. Further, suppose that the total supply equals the
total demand, namely

It should be noted that we can make this assumption without loss of generality provided
the problem is feasible, which will be explained in detail later. The aim of the problem is
to send flow from the specified supply nodes S to demand nodes D at minimum cost in
order to satisfy the demands, while, not exceeding capacities and satisfying the bounds
on the flow on each arc.
For example, suppose the directed graph G = (N , A) represents some supply chain
network, where the set S ✓ N represents the warehouses and D ✓ N represents the
stores. Further, suppose that ai represents the supply of a certain product available at
warehouse i 2 S and bi represents the demand at each store i 2 D. In this case, the
problem is that we wish to send products from warehouses to stores to meet demand at
minimum total cost, while, not violating constraint capacities on the arcs. Note that the
constraint capacities could in this case correspond to say capacities vehicles available in
different locations.
6.2. Minimum Cost Flow Problems 85

The decision variables x i j for every (i, j) 2 A are defined to capture the amount of
flow on arc (i, j) . In this scenario, we need to solve the following LP
X
minimise ci j x i j
(i, j)2A
X X
subject to x ji x i j = ai for every i 2 S,
j:( j,i)2A j:(i, j)2A
X X
x ji xi j = 0 for every i 2 N \(S [ D) ,
j:( j,i)2A j:(i, j)2A
X X
x ji x i j = bi for every i 2 D,
j:( j,i)2A j:(i, j)2A

`i j  x i j  ui j for every (i, j) 2 A.


The objective is to minimise the total cost of the flows, namely the sum of the flows
multiplied by the costs on the arcs. There are then three different types of flow-balance
constraints. For the supply vertices i 2 S, the net out-flow must be at most the available
supplies. For the demand vertices i 2 D, the net in-flow must be at least the demands.
For the other vertices i 2 N \(S [ D), there is no supply or demand and hence the net
out-flow must be equal to the net in-flow. It should be noted that our assumption that
the total supply equals the total demand means that we actually require equality at each
node. The final set of constraints are the aforementioned upper and lower capacities
bounds on the flows.
Note that for the supply nodes S we wrote the constraint in the form “net-inflow =
minus supply”. The reason for this is since all flow-balance constraints now have the
same left-hand sides and, as such, this allows us to express the corresponding constraint
matrix in a more simple way. This will be explained in more detail below. Notice that the
number of decision variables is the number of arcs, while, the number of flow-balance
constraints is the number of vertices.
In matrix form, we can write the above LP as

where AG is a matrix that depends on the directed graph G with a row for every node
and a column for every arc, d is the vector with a component for every node where each
component is equal to the negative supply, the demand or zero depending on whether
the component corresponds to a supply node, a demand node or neither, respectively
and ` and u are vectors whose entries are the corresponding lower and upper capacity
bounds on each arc, respectively.
86 Chapter 6. Optimisation Problems on Graphs

Recall that each decision variable x i j corresponds to arc (i, j). Further, recall that
the columns and rows of AG correspond to the arcs and the nodes of G, respectively. It
follows in light of the structure of the problem that the column of AG corresponding to
arc (i, j) has exactly two non-zero elements, namely one 1 in the row corresponding
to node i and one +1 in the row corresponding to node j. The matrix AG is called the
incidence matrix of the digraph G. In other words, entry (i, e) of the incidence matrix
AG is:

This will be demonstrated during the examples below.


The following example demonstrates how we can construct an incidence matrix
when we are given a directed graph, namely the digraph shown in Figure 6.3.

Example. (Constructing an incidence matrix from a digraph) Consider the directed graph
illustrated in Figure 6.3. The digraph can be represented by the incidence matrix given in
the table below.

Figure 6.3: An example of a directed graph with five nodes.


6.2. Minimum Cost Flow Problems 87

The next example demonstrates how we can construct an LP for the minimum cost
network flow problem from a directed graph, namely the digraph shown in Figure 6.4.

Example. (Constructing an LP from a digraph) Consider the digraph illustrated in Figure


6.4 below. In this case, node 1 is the origin with a supply of 20, nodes 4 and 5 are desti-
nations with respective demands 5 and 15, while, nodes 2 and 3 are transhipment nodes,
i.e. nodes where goods are neither supplied or demanded. Further, each arc in our directed
graph has a cost, e.g. arc (1, 3) has a cost of 4. In addition, some of the arcs have an upper
capacity (which are shown in squares), e.g. arc (1, 3) has a capacity of 8. In this case, the
lower capacities are zero for all arcs, meaning that we require the flows to be nonnegative.
The objective is to find the minimum cost pattern of flows that satisfies the demands.

Figure 6.4: An example of a directed graph with five nodes, where the capacities (within
squares) and costs associated with each arc are shown.
88 Chapter 6. Optimisation Problems on Graphs

6.3 Integer Solutions to Minimum Cost Flow Problems


There may be many situations when one additionally requires that the flows are integer.
The flows for example may represent indivisible quantities, such as items of products,
or even binary decisions, as we will see in other examples. Recall (from Chapter 2) that
integer and mixed-integer programming is in general in N P -hard, which informally
means that one should not expect to solve a random instance of the problem in polyno-
mial time unless P = N P (see e.g. [13]). Further, one would expect that the number
of steps “blows up” exponentially depending on the number of decision variables, cor-
responding here to the number of arcs in our digraph. It is not hard to imagine that in
countless applications, the number of arcs is likely large and, as such, it is natural to
question how one deals with this complexity?
An important observation is that in situations where one additionally requires that
flows are integer, then all the given supplies ai for i 2 S, demands bi for i 2 D and
capacities `i j , ui j for (i, j) 2 A should be integer.
This observation allows us to state the following incredibly useful property for min-
imum cost flow problems. The useful property is that if all supplies, demands and ca-
pacities are integers, then all the extreme points of the linear programming formulation
of the minimum cost flow problem are integer. This is formalised more precisely in the
following theorem.

Theorem (Integrality Theorem for Incidence Matrices of Digraphs). Let G = (N , A) be


directed graph. Let d, f be vectors with entries di , f i 2 R [ {+1, 1} for all i 2 N and
`, u, c be vectors with entries ue , `e , ce 2 Z for all e 2 A. Consider the LP

minimise cT x
subject to d  AG x  f (6.1)
`  x  u.

If all right-hand sides d, f , `, u of the constraints are integer, then all the extreme solutions
to the LP are integer. If all entries of c are integer, then all extreme optimal dual solutions
of the above LP are also integer (even if d, f , `, u are not integer).

It should be noted that the minimum cost network flow problem corresponds to a
special case of above theorem, where d = f . The above theorem ensures that, whenever
we need the values of the flows to take integer values, we get this “for free” because any
optimal extreme solution to the LP will have integer components. Furthermore, the
second part of the statement says that if the cost function c is also integer, then the
extreme optimal dual solutions are integer as well. This is a very useful property as
6.3. Integer Solutions to Minimum Cost Flow Problems 89

in many situations where an integer solution is sought, it is not necessary to resort to


costly integer programming algorithms like branch-and-bound and rather we can find
an integer solution in polynomial time.
We will not present a detailed proof of the above theorem here. Despite this, note for
completeness that the property of AG which ensures that any optimal extreme solution to
the LP will have integer components is that the incidence matrix of any digraph is totally
unimodular, where a matrix A 2 Rm⇥n is totally unimodular if every square submatrix
has determinant -1, 0 or 1. The result then follows upon making use of the celebrated
Cramer’s rule [2, Section 3.4].
The following example demonstrates how within a production and inventory based
scenario, a minimum cost flow formulation can be used.

Example. (Product production) Consider a factory producing a particular product, the


demand for which varies over time. There are three methods of adjusting the volume of
output to meet the fluctuating demand, namely:

i) regular production up to a limit of r units per month at a cost of £a per unit,

ii) overtime production up to a limit of v units per month at a cost of £b per unit, or

iii) store the product from one month to the next at a cost of £c per unit per month.

The planning horizon for this factory is 3 months, where the demand is d1 , d2 and d3 units
in each of these months, respectively. The cost of meeting demand at minimum total cost
can be formulated as a minimum cost flow problem.
Figure 6.5 illustrates the network structure. The nodes M 1, M 2 and M 3 are the three
months. The nodes RT 1, RT 2 and RT 3 are regular time in those three months, while, OT 1,
OT 2 and OT 3 are overtime.
90 Chapter 6. Optimisation Problems on Graphs

Figure 6.5: The network corresponding to the production and inventory example.

The next example from [1] demonstrates how a network flow model can be used
within a medical setting.

Example. (Network flow for the left ventricle) This application describes a network flow
model for reconstructing the three-dimensional shape of the left ventricle from biplane an-
giocardiograms that the medical profession uses to diagnose heart diseases. To conduct this
analysis, we first reduce the three-dimensional reconstruction problem into several two-
dimensional problems by dividing the ventricle into a stack of parallel cross sections. Each
two-dimensional cross section consists of one connected region of the left ventricle.

Figure 6.6: Using X-ray projections to measure a left ventricle.


6.4. If Total Demand and Total Supply are Different in a Minimum Cost Flow Problem
91

During a cardiac catheterization, doctors inject a dye known as Roentgen contrast agent
into the ventricle; by taking X-rays of the dye, they would like to determine what portion
of the left ventricle is functioning properly, i.e. permitting the flow of blood. Conventional
biplane X-ray installations do not permit doctors to obtain a complete picture of the left
ventricle; rather, these X-rays provide one-dimensional projections that record the total
intensity of the dye along two axes (see Figure 6.6). The problem is to determine the dis-
tribution of the cloud of dye within the left ventricle and thus the shape of the functioning
portion of the ventricle, assuming that the dye mixes completely with the blood and fills the
portions that are functioning properly.

We can conceive of a cross section of the ventricle as a p ⇥ r binary matrix, where a 1 in


a position indicates that the corresponding segment allows blood to flow and a 0 indicates
that it does not permit blood to flow. The angiocardiograms give the cumulative intensity
of the contrast agent in two planes which we can translate into row and column sums of
the binary matrix. The problem is then to construct the binary matrix given its row and
column sums.

This can be modelled by a network with a supply node for every row, where the supply
at a given row equals the cumulative dye intensity in that row, and a demand node for every
column, where the demand at a given column equals the cumulative dye intensity in that
column. Each entry (i, j) of the matrix corresponds to an arc (i, j) in the network, with
capacity 0, 1. An integer flow will therefore correspond to a binary assignment to the entries
of the matrix so that the row sums and column sums equal the corresponding cumulative
dye intensities.

To reconstruct a plausible shape of the left ventricle, we can use a priori information:
after some small time interval, the cross sections might resemble cross sections determined in
a previous examination. In consequence, we might attach a probability pi j that a solution
will contain an element (i, j) of the binary matrix and might want to find a feasible solution
with the largest possible total probability. This problem is equivalent to a minimum cost
flow problem.

6.4 If Total Demand and Total Supply are Different in a


Minimum Cost Flow Problem

Recall that during our statement of the minimum cost flow problem we assumed that
total supply was equal to total demand. This assumption was not necessary and, as
92 Chapter 6. Optimisation Problems on Graphs

such, we could simply the minimum cost network flow problem as the LP

X
minimise ci j x i j
(i, j)2A
X X
subject to x ji xi j ai for every i 2 S,
j:( j,i)2A j:(i, j)2A
X X
x ji xi j 0 for every i 2 N \(S [ D) ,
j:( j,i)2A j:(i, j)2A
X X
x ji xi j bi for every i 2 D,
j:( j,i)2A j:(i, j)2A

`i j  x i j  ui j for every (i, j) 2 A.

Note that if total demand is strictly larger than total supply, then the above problem
is unsurprisingly infeasible. If instead the total supply is strictly larger than the total
P P
demand, i.e. i2S ai > i2D bi , then we can write the problem in the original (equality)
form by rebalancing supply and demand by introducing some new “dummy” demand
P P
node, say zdummy , whose demand is exactly the excess supply i2S ai i2D bi . We
then can introduce a new arc (i, zdummy ) from every node i 2 N with cost 0. It should
be noted that in this “rebalanced” network, it is indeed the case that total demand is
equal to total supply and the excess capacity in each supply node is sent at cost 0 to the
dummy node. This explains why we can indeed assume that total supply is equal to total
demand without loss of generality. It should be noted that the constraints corresponding
to N \(S [ D) could be written as equality constraints (with right-hand side 0) if all costs
ci j are positive, however, if this is not the case, then there may be benefit for sending
additional supplies.

6.5 The Transportation Problem

Suppose that there are known quantities of a (homogeneous) commodity available at


m sources in set S = {1, 2, . . . , m} and known quantities required at n destinations in set
D = {1, 2, . . . , n}. Suppose that the cost of transport from each source to each destination
is known and that the problem is to find the pattern of distribution from sources to
destinations that minimises transport costs.
Let ci j denote the cost of transporting a unit of the commodity between source i 2 S
and destination j 2 D. Let ai denote the availability of the commodity at the source
i 2 S and let b j be the demand at destination j 2 D. Suppose for feasibility that total
6.5. The Transportation Problem 93

demand is not greater than total supply, namely that

The nonnegative variable x i j is the amount of the commodity to be transported from


source i to destination j. The problem can be formulated as the LP

In other words, our objective is here to minimise total transportation costs. The con-
straints ensure that the amount of the commodity leaving each source i 2 S is not greater
than the available supply ai at source i and, similarly, that the demand b j for the com-
modity at every destination j 2 D is met. It should be emphasised that the transportation
problem is a special case of the minimum cost flow problem, where all nodes are either
supply nodes or demand nodes and where every arc goes directly from a supply node to
a demand node.
The following example demonstrates how the transportation problem may be used
for running targeted internet advertising.

Example. (Targeted advertising) A large company called Faces hosts internet sites and it
derives a large proportion of its overall revenue from advertising. Faces’s customers are other
companies who intend to advertise their services and products throughout Face’s webpages.
When an ad is displayed on a page, Faces is paid a fee if the visitor clicks on the link.
In order to increase yield, Faces intends to resort to more targeted advertising. For ex-
ample, visitors on a sports news webpage may be more inclined to click on an advertisement
for sporting goods than, say, readers of a cooking forum.
Webpages hosted by Faces are divided into many context clusters, which include sports,
entertainment, technology, weather and politics. Let m be the number of these clusters.
Suppose within a given time unit that there are n advertisements to be displayed. Faces
have an estimate of the probability pi j visitors will click on advertisement j 2 {1, 2, . . . , n}
when this is displayed on a page in cluster i 2 {1, 2, . . . , m}.
Faces’s customers understandably want their ad to appear in at least a certain number
of pages within the given time unit, where b j denote the minimum number of times that
94 Chapter 6. Optimisation Problems on Graphs

ad j should appear. Further, due to bounds on available webpage space, within a given
time unit only a limited number of ads can appear in each cluster, where ai denotes the
maximum number of ads that can appear in cluster i in the given time unit.
Face’s objective is to maximise the expected total number of clicks on the advertisements
displayed, which increases the fees paid to them.
This problem can be cast as a (maximisation) transportation problem as follows. Each
cluster i corresponds to a supply vertex, with available supply ai (to ensure that each cluster
has no more than ai ads in total). Each ad j corresponds to a demand vertex, with demand
b j (to ensure that each ad is shown at least b j times across all clusters). The “transported
profit” from source i to destination j is the probability pi j .
For each ad j 2 {1, 2, . . . , n} and each cluster i 2 {1, 2, . . . , m}, the decision variable
x i j represents the number of times ad j appears in cluster i. The objective function is to
maximise total profit, namely
n X
X m
maximise pi j x i j ,
i=1 j=1

which is to maximise the total expected number of times that visitors will click on some ad.

6.6 The Assignment Problem


Consider the problem of assigning n workers to n jobs in order to minimise the total
time required to complete all the jobs. Suppose that worker i takes time t i j to complete
job j and each worker is assigned to precisely one job. The decision variables x i j for
every i, j 2 {1, . . . , n} are defined as
8
<1, if worker i is assigned to job j,
xi j =
:0, otherwise.

Then we can formulate the problem as the following binary program, namely
X
n n X
X n
minimise ti j xi j = ti j xi j
i, j=1 j=1 i=1
X
n
subject to xi j = 1 for every i 2 {1, 2, . . . , n},
j=1
Xn
xi j = 1 for every j 2 {1, 2, . . . , n},
i=1

x i j 2 {0, 1} for every i, j 2 {1, 2, . . . , n}.


6.7. The Shortest Path Problem 95

Alternatively, we can consider this as a transportation problem. In such case, we create


n sources and n sinks which correspond to n workers and n jobs, respectively. We then
create an arc going from each worker to each job whose cost is the time it takes for
that worker to complete that job. The supply of each worker and the demand of each
job is set to 1. Due to the integrality theorem, any optimal extreme solution to the
LP relaxation will automatically have binary entries and, as such, so we do not need
to enforce the constraints x i j 2 {0, 1}. In particular, the theorem means that we can
replace the binary constraints by the constraints x i j 0.

Figure 6.7: The assignment problem.

Note that the X-ray projection example is a special case of assignment problem.

6.7 The Shortest Path Problem

The shortest path problem is a particular network flow model that has received much
attention for both practical and theoretical reasons. This problem can be stated as fol-
lows. Given a directed graph G = (N , A) with (possibly negative) costs ci j associated
with each arc (i, j) 2 A, find the cheapest path through the network from a specified
source (or origin) s 2 N to a specified sink (or destination) t 2 N .
The theoretical interest in the shortest path problem arises since the problem has
a special structure and, in addition, the underlying network results in very efficient
solution procedures. The practical interest in this problem is unsurprising where, for
example, when you ask Google/Apple Maps for directions, it must solve a shortest path
problem in order to tell you the cheapest or fastest route.
Representing this problem as a network flow problem is straightforward. Figure 6.8
illustrates a directed graph that corresponds to the shortest path problem, where we
wish to find the shortest path from node s = 1 to node t = 8.
96 Chapter 6. Optimisation Problems on Graphs

Figure 6.8: A directed graph corresponding to a shortest path problem, where we wish
to find the shortest path from the source s = 1 to the sink t = 8.

The problem is to send one unit of flow from the source node s to the sink node t
at the minimum cost (or distance). Let x i j be the flow from node i to node j, then the
formulation of the shortest path problem is
X
minimise ci j x i j
(i, j)2A
8
>
> 1, if i = s,
X X <
subject to x ji xi j = 1, if i = t,
>
>
j:( j,i)2A j:(i, j)2A :0, otherwise,

xi j 0 for every (i, j) 2 A.

Note that if i = s, then we are at the source node and, as such, the flow into s minus the
flow out s is -1. Similarly, if i = t, we are at the sink node and, as such, the flow into t
minus the flow out of t is 1. For other vertices, the in-flow minus out-flow must be 0.
It should be noted that because of the integrality property, every extreme optimal
solution to the above problem will have integer components, where the components will
be 0 or 1. In particular, the arcs with flow 1 will form a path from s to t and the cost
of the corresponding flow will be exactly the cost of the path. Hence, the above LP will
provide a solution to the shortest path problem. This problem can be solved efficiently
via usual linear programming algorithms, however, there are in practice more efficient
algorithms available for this problem such as the Bellman-Ford algorithm [12, 4].
We finally make a short observation in the case that our digraph has a directed cycle
with negative total cost. In a directed graph, a directed cycle is a nonempty directed path
from some node to itself. Observe that if the digraph has a directed cycle of negative
6.8. The Maximum Flow Problem 97

total cost, then the above LP will be unbounded. This intuitively follows since the cost
in such case is 1 as you can “go around” the cycle forever, where the cost decreases
each time you “go around”. Hence, the above LP is of no use when you want to find a
shortest path if there is a directed cycle of negative total cost.

6.8 The Maximum Flow Problem


In this problem we have a directed graph G = (N , A) with two special nodes, namely the
source s and the sink t. Every arc e 2 A has a strictly positive capacity ue > 0. We want
to send as much flow as possible from s to t while not exceeding the capacities on the
arcs. Note that this problem is often called the maximum s t flow problem. It should
be noted that there are no costs on the arcs but rather there are the positive capacities.
We can for example think of this as a problem of trying to maximise the amount of some
commodity shipped from a factory to a store.
Figure 6.9 illustrates a digraph corresponding to a maximum flow problem, where
a and f are the source and sink nodes, respectively. In this directed graph, observe that
the red flow is feasible since the flow does not exceed the capacities (black numbers) on
the arcs. Further, observe that for this red flow, the in-flow equals the out-flow over all
nodes (other than the source and sink).

Figure 6.9: A directed graph illustrating an optimal solution to the maximum flow prob-
lem. The red numbers here represent positive flows, while, black numbers denote the
positive arc capacities. Note that missing red numbers represent 0 flows. The maximum
flow through this digraph has value 5.

The maximum flow problem can be modelled as a minimum cost flow problem.
Recall that in the maximum flow problem that the in-flow equals the out-flow over all
nodes (other than the source s and sink t) and, as such, the above problem is equivalent
to maximising over all feasible flow out of the source s.
98 Chapter 6. Optimisation Problems on Graphs

In order to remove this technicality regarding the in-flow out-flow equality every-
where other than the source and sink, it is useful to add an artificial arc (t, s) from
the sink to the source, illustrated in Figure 6.10. The objective in the maximum flow
problem is to then maximise the flow value on that artificial arc.
More precisely, let A0 := A [ {(t, s)} denote the set of arcs from our original digraph
G plus the artificial arc, then the general formulation of the maximum flow problem is
maximise x ts
X X
subject to x ji xi j = 0 for every i 2 N ,
(6.2)
j:( j,i)2A0 j:(i, j)2A0

0  x i j  ui j for every (i, j) 2 A.


It should be noted that here we present the above problem in maximisation form, how-
ever, we could equivalently have presented the objective function as
X
minimise ci j x i j ,
(i, j)2A0

where 8
< 1, if (i, j) = (t, s),
ci j =
:0, otherwise.
Observe that, in the above formulation, the artificial arc x st is unrestricted, meaning
that there are no upper or lower bounds imposed on it. In the digraph illustrated Figure
6.10, we wish to find the maximum flow from vertex a to vertex b. The values associated
with the solid arcs are the capacities. The dashed arc is introduced to the digraph to
model this as a maximum flow problem.

Figure 6.10: A directed graph corresponding to a maximum flow problem, where the
artificial arc (dashed) connects the sink to the source.

Note for completeness that there are similarly more efficient ways of solving maxi-
mum flows problems, such as the algorithms described by Ford and Fulkerson [11].
6.9. Minimum s t Cuts 99

6.9 Minimum s t Cuts

The maximum flow problem can be seen as a measure of resilience of the connection
between the source s and sink t. Another possible measure of this connection can be
provided by the minimum cut.

Under the same data as the maximum flow problem we have a digraph G = (N , A)
with two special nodes, the source s and t, where every arc e 2 A has a strictly positive
capacity ue > 0. In this problem we would like to remove a set of arcs of minimum
total capacity in order to disconnect the s from t. Being more precise, an s t cut is a
set C ✓ A of arcs such that there is no path from s to t in the graph (N , A\C) , which is
obtained from the original digraph G by removing those arcs in C. We want to find an
s t cut C of minimum total capacity, namely the cut C with objective

Notice that if S is the set of nodes that can be reached from s in (N , A\C) , then
because C is an s t cut, we see that the sink t cannot be in S. In particular, it follows
that any optimal s t cut C contains all arcs leaving S. This set is denoted by

+
(S) = (i, j) 2 A : i 2 S, j 2
/S .

+
In particular, since every set (S) with s 2 S and t 2
/ S is itself an s t cut (as every
path from s to t must at some point use an arc that goes from some node in S to some
+
node not in S), it follows that a minimum s t cut must indeed be of the form (S)
for some S ⇢ N such that s 2 S and t 2
/ S.

This problem can be modelled as an IP as follows. Let us introduce binary decision


variables yi for every i 2 N , where we want yi = 1 if i 2 S. Since we want s 2 S and
t2
/ S, we set ys = 1 and y t = 0. Further, we introduce a binary variable zi j for every
arc (i, j) 2 A , where we want zi j = 1 if yi = 1 and y j = 0, while, zi j = 0 otherwise. In
other words, zi j takes value 1 (i, j) 2 C, while, zi j takes value 0 otherwise. This can be
achieved by the constraints zi j yi y j for every (i, j) 2 A.

The integer programming problem associated with the minimum s t cut problem
100 Chapter 6. Optimisation Problems on Graphs

is therefore X
minimise ue ze
e2A

subject to yj yi + zi j 0 for all (i, j) 2 A,


ys = 1, (6.3)
y t = 0,
yi 2 {0, 1} for all i 2 N ,
ze 2 {0, 1} for all e 2 A.

It should be noted that despite no constraint forcing zi j = 0 when yi  y j , this will


be indeed be the case in an optimal solution. This follows because in such case the
constraint zi j yi y j will be implied by zi j 0 and it is optimal to choose zi j = 0.

6.10 Maximum Flows vs Minimum Cuts

Recall that the maximum flow and minimum cut problems provide measures of the
connection between the source and the sink within a digraph. It turns out that the
minimum s t cut problem is in a sense the “dual” of the maximum s t flow problem
and throughout this section we argue why this is the case.
Let us firstly write the dual of the maximum flow LP (6.2). Note that the LP has
a flow-balance constraint for each node i 2 N , meaning that the dual will have a cor-
responding variable yi , and the LP has a capacity constraint x e  ue for every e 2 A,
meaning the dual will have a corresponding variable ze . The right-hand sides of the
flow-balance constraints of (6.2) are all 0, while, the right-hand-side of the capacity
constraint x e  ue for arc e 2 A is ue , therefore the objective of the dual is
X
minimise ue ze .
e2A

Note the LP (6.2) has a variable for each arc e 2 A and an extra variable for the
artificial arc (t, s) . Further, recall that in the LP (6.2), for each (i, j) 2 A , the decision
variable x i j has a zero coefficient in the objective function, it appears in the flow-balance
constraint for node i with coefficient 1, in the flow-balance constraint of node j with
coefficient -1 and it appears in the constraint x i j  ui j with coefficient 1. It follows that
the corresponding dual constraint is y j yi +zi j 0. For the remaining variable x ts that
corresponds to the artificial arc, the objective coefficient is 1, however, the variable has
no capacity and is free (since we did not enforce nonnegativity). It follows therefore
that the corresponding dual constraint is ys y t = 1.
6.10. Maximum Flows vs Minimum Cuts 101

Hence, the dual of the LP (6.2) is


X
minimise ue ze
e2A

subject to yj yi + zi j 0 for all (i, j) 2 A, (6.4)


ys y t = 1,
ze 0 for all e 2 A.

Observe that the LP (6.4) is similar to the LP (6.3) corresponding to the minimum
s t cut problem. We show that the two are indeed equivalent. Firstly, observe that
by the integrality property discussed previously, extreme solutions to the LP (6.4) are
integer. Secondly, observe that every constraint involving the yi variables in (6.4) always
contain one variable with coefficient 1 and one with coefficient -1.
Furthermore, notice that the yi ’s do not appear in the objective function. This means
that we can “translate” the yi ’s without changing the value of the solution. That is, for
any solution (ȳ, z̄) to (6.4), if we replace all ȳi for i 2 N with ȳi + ✓ for any value
✓ , we yield another feasible solution with the same objective function value. We can
in consequence assume without loss of generality that y t = 0 in an optimal solution,
which then implies ys = 1 since ys y t = 1.
Finally, given an optimal integer solution to (6.4) such that y t = 0, if we define

S = {i 2 N : yi 1},

+
then the constraints zi j yi y j force zi j 1 for all (i, j) 2 (S) and therefore
X X
ue ze ue .
e2E e2 + (S)

It follows that the value of (6.4) is always greater than or equal to the value of the
minimum s t cut, which is equal to the value of the (6.3). It follows that, in an optimal
solution to (6.4) where we assume that we set y t = 0, the variables yi for i 2 N and ze
for e 2 A will always take binary values even without explicitly using binary variables.
This implies that the optimum value of the LP (6.4) equals the value of the minimum
s t cut. Because the LP (6.4) is the dual of the maximum flow problem, it follows
by strong duality that the two problems have the same objective function value. This
argument yields the following classical result.

Theorem (Max-flow Min-cut Theorem).


102 Chapter 6. Optimisation Problems on Graphs

Note that in all types of networks with flow conservation (meaning that in-flow
equals out-flow), the amount of flow that can flow through the network is intuitively
restricted by the weakest connection between disjoint parts of the network. This weakest
connection can be thought of as a “bottleneck” in the digraph. It turns out that this
“bottleneck” is precisely the minimum cut, namely a minimal set of edges that stops the
flow through the network. Figure 6.11 illustrates the connection between maximum
flows and minimum cuts.

Figure 6.11: The optimal flow of value 5 along with an s t cut of total capacity 5. In
this case, we have S = {a, b, c} and + (S) is represented by the bold arcs.

6.11 The Traveling Salesman Problem

In the traveling salesman problem, the salesperson must visit n cities and return to the
city they started from. This will be called a tour. Given the costs ci j of travelling from
city i to city j for each 1  i, j  n with i 6= j, in which order should the salesperson
visit the cities in order to minimise the cost of the tour? This problem is the famous
(asymmetric) traveling salesman problem (ATSP). Note that the acronym TSP is usually
reserved for the symmetric version of the problem, where ci j = c ji for all arcs (i, j) .
Further, note that traffic collisions and one-way streets between cities are examples of
how this symmetry could break down.
The ATSP and TSP problems unsurprisingly can be viewed as an optimisation prob-
lem on a graph. In the ATSP problem, we have a directed graph where the cities are
the vertices, the paths between cities are the arcs, a path’s distance is the corresponding
cost and we have a minimisation problem starting and finishing at a specified vertex,
subject to visiting each other vertex exactly once. In the TSP problem, we instead have
6.11. The Traveling Salesman Problem 103

an undirected weighted graph that is defined in a similar fashion.


To model this problem as an integer programming problem, we firstly introduce a
binary decision variable x i j for all i, j = 1, 2, . . . , n such that i 6= j. The variable is
defined as
8
<1, if the tour visits city j immediately after city i,
xi j =
:0, otherwise.

Given a subset of cities S ⇢ {1, 2, . . . , n}, denote its complement by S̄, namely S̄ :=
{1, 2, . . . , n}\S. Further, denote the cardinality of some subset of cities S as |S|.
The TSP problem can be formulated as
XX
minimise ci j x i j
i j6=i
X
subject to xi j = 1 for every i 2 {1, 2, . . . , n},
j6=i
X
xi j = 1 for every j 2 {1, 2, . . . , n},
i6= j
X
xi j 1 for every nonempty subset S ⇢ {1, 2, . . . , n} with |S|, |S̄| 2,
i2S, j2S̄

x i j 2 {0, 1} for every i, j 2 {1, 2, . . . , n} such that i 6= j,

where S ⇢ {1, 2, . . . , n} is used to denote that S is proper subset of {1, 2, . . . , n}, namely
that S ✓ {1, 2, . . . , n} with S 6= {1, 2, . . . , n}. This is the classical and most widely used
formulation of the TSP as developed by Dantzig, Fulkerson and Johnson [7].
Note that the first two sets of constraints guarantee that the traveling salesman visits
each city exactly once. It should be emphasised that these two sets of constraints alone
are not sufficient. In particular, one can find solutions satisfying the first two sets of
constraints that do not correspond to one tour, but rather multiple subtours.
In order to prevent this from happening, we impose the third set of constraints,
known as the subtour elimination constraints. Indeed, for any proper nonempty subset
S of the cities, to “reach” the cities in S̄ from the cities in S we need to cross S and, thus,
there must be a city in S̄ that is preceded by a city in S. This condition is enforced by
the third set of constraints.
Note that we only need the subtour elimination constraints only for sets S such
that both S and S̄ contain at least two elements. If instead |S| = 1, then the subtour
elimination constraint relative to S is implied by the equation in the first set of constraints
relative to the only node in S. In a similar fashion, if |S̄| = 1, then the subtour elimination
104 Chapter 6. Optimisation Problems on Graphs

constraint relative to S is implied by the equation in the second set of constraints relative
to the only node in S̄.
Let us consider an example with the five cities, namely {1, 2, . . . , 5}, in order to
demonstrate how the subtour elimination constraints prevents subtours. The subtour
elimination constraints correspond to every nonempty subset S ⇢ {1, 2, . . . , 5} with
|S|, |S̄| 2. Notice that in this case there is one subtour elimination constraint for
each nonempty subset S with either two or three elements. Firstly, consider the subset
S = {1, 2} with S̄ = {3, 4, 5}. The subtour elimination constraint ensures that there ex-
ists at least one edge connecting a city in S (namely 1 or 2) to a city in S̄ (namely to
3, 4 or 5). This prevents any solution from being a subtour that only includes 1 and 2
or only includes 3, 4 or 5. Consider now the subset S = {3, 4} with S̄ = {1, 2, 5}. In a
similar fashion, the corresponding constraints ensures that there must be at least one
edge that goes from either 3 or 4 to 1, 2 or 5. Upon applying these constraints to all
possible subsets of cities with at least two cities in each subset, the formulation ensures
that the final solution is a single tour that covers all cities without forming smaller, dis-
connected tours. It should be noted that the greater than cannot be necessarily replaced
with an equality in the subtour elimination constraints. This follows since doing such
a replacement could lead to over-constraining the problem, where it then could impos-
sible to even find a feasible solution (since it is not clear there is necessarily a solution
with the rigid condition that there is exactly one edge between S and S̄ over all subsets
of suitable size).
Notice that such formulation has an exponential number of constraints, since the
number of proper subsets of {1, 2, . . . , n} is 2n 1 . Despite the exponential number of
constraints, this is the formulation that is most widely used in practice. Initially, one
solves the linear programming relaxation that only contains first and second sets of
constraints and 0  x i j  1. The subtour elimination constraints are generally added
later, on the fly, only when needed. It should be noted that this formulation is not the
only way of modelling the TSP. In particular, two other approaches of interest, namely
the MTZ [20] or SCF [14] formulations, are carefully outlined in the document entitled
Remarks on Modelling the TSP available on Moodle.
It turns out that the TSP problem is hard both in theory and in practice. More for-
mally, the TSP problem is an N P -hard problem, which recall informally means that
one should not expect to solve a random instance of the problem in polynomial time
unless P = N P (see e.g. [13]). The difficultly of this problem is not simply posed
by the exponential number of constraints in the above formulation. In contrast to the
minimum cost network flow problem, the extreme points of the linear programming
relaxation are typically not integer and the optimal value of the linear programming re-
6.12. Exercises for Self-Study 105

laxation can be very far from the optimal value of a tour. Figure 6.12 below (taken from
https://www.math.uwaterloo.ca/tsp/uk/index.html) illustrates how the TSP
problem was remarkably solved in order to calculate an optimal 49,687 stop pub crawl
of the UK. The computation of this tour required 14 months, which is equivalent to 250
years of computation time on a single processor.

6.12 Exercises for Self-Study

1. You need to take a trip by car to another town which you have never visited be-
fore. Therefore, you are studying a map to determine the shortest route to your
destination (from the origin). Depending on which route you choose, there are
five other towns (call them A, B, C , D, E) that you might pass through on the way.
The map shows the mileage along each road that directly connects two towns
without any intervening towns. These numbers are summarized in the following
table, where a dash indicates that there is no road directly connecting these two
towns without going through any other towns.

Miles between Adjacent Towns


Towns A B C D E Destination
Origin 40 60 50 — — —
A 10 — 70 — —
B 20 55 40 —
C — 50 —
D 10 60
E 80

Formulate and solve this problem as a shortest path problem by drawing a network
where nodes represent towns, links represent roads and numbers indicate the
length of each link in miles.

2. At a small but growing airport, the local airline company is purchasing a new
tractor for a tractor-trailer train to bring luggage to and from the airplanes. A
new mechanized luggage system will be installed in 3 years, so the tractor will
not be needed after that. However, because it will receive heavy use, so that the
running and maintenance costs will increase rapidly as the tractor ages, it may still
be more economical to replace the tractor after 1 or 2 years. The following table
gives the total net discounted cost associated with purchasing a tractor (purchase
price minus trade-in allowance, plus running and maintenance costs) at the end
of year i and trading it in at the end of year j (where year 0 is now).
106 Chapter 6. Optimisation Problems on Graphs

j
1 2 3
0 $13,000 $28,000 $48,000
i 1 $17,000 $33,000
2 $20,000

The problem is to determine at what times (if any) the tractor should be replaced
to minimise the total cost for the tractors over 3 years. Formulate and solve this
problem as a shortest path problem.

3. The next diagram depicts a system of aqueducts that originate at three rivers
(nodes R1, R2, and R3) and terminate at a major city (node T), where the other
nodes are junction points in the system.

Using units of thousands of acre feet, the tables below show the maximum amount
of water that can be pumped through each aqueduct per day.
To To To
From A B C From D E F From T
R1 130 115 — A 110 85 - D 220
R2 70 90 110 B 130 95 85 E 330
R3 — 140 120 C - 130 160 F 240

The city water manager wants to determine a flow plan that will maximise the flow
of water to the city. Formulate and solve this problem as a maximum flow problem
by identifying a source, a sink and the transhipment nodes and by drawing the
complete digraph that shows the capacity of each arc.

4. The Texaco Corporation has four oil fields, four refineries and four distribution
centers. A major strike involving the transportation industries now has sharply
curtailed Texaco’s capacity to ship oil from the oil fields to the refineries and to
6.12. Exercises for Self-Study 107

ship petroleum products from the refineries to the distribution centers. Using units
of thousands of barrels of crude oil (and its equivalent in refined products), the
following tables show the maximum number of units that can be shipped per day
from each oil field to each refinery, and from each refinery to each distribution
center.

Refinery
Oil Field New Orleans Charleston Seattle St. Louis
Texas 11 7 2 8
California 5 4 8 7
Alaska 7 3 12 6
Middle East 8 9 4 15

Distribution Centre
Refinery Pittsburgh Atlanta Kansas City San Francisco
New Orleans 5 9 6 4
Charleston 8 7 9 5
Seattle 4 6 7 8
St. Louis 12 11 9 7

The Texaco management now wants to determine a plan for how many units to
ship from each oil field to each refinery and from each refinery to each distribu-
tion center that will maximise the total number of units reaching the distribution
centers.

a) Draw a rough map that shows the location of Texaco’s oil fields, refineries
and distribution centers. Add arrows to show the flow of crude oil and then
petroleum products through this distribution network.

b) Redraw this distribution network by lining up all the nodes representing oil
fields in one column, all the nodes representing refineries in a second col-
umn, and all the nodes representing distribution centers in a third column.
Then add arcs to show the possible flow.

c) Modify the network in part b) if needed to formulate this problem as a maxi-


mum network flow problem with a single source, a single sink and a capacity
for each arc.

d) Solve the problem using the solver via AMPL.

5. The MK Company is a fully integrated company that both produces goods and sells
them at its retail outlets. After production, the goods are stored in the company’s
108 Chapter 6. Optimisation Problems on Graphs

two warehouses until needed by the retail outlets. Trucks are used to transport
the goods from the two plants to the warehouses, and then from the warehouses
to the three retail outlets.

Using units of full truckloads, the following table outlines each plant’s monthly
output, its shipping cost per truckload sent to each warehouse, and the maximum
amount that it can ship per month to each warehouse.

To
Unit Shipping Cost Shipping Capacity Output
From Warehouse 1 Warehouse 2 Warehouse 1 Warehouse 1
Plant 1 $1175 $1580 375 450 600
Plant 2 $1430 $1700 525 600 900

For each retail outlet (RO), the next table shows its monthly demand, its shipping
cost per truckload from each warehouse, and the maximum amount that can be
shipped per month from each warehouse.

To
Unit Shipping Cost Shipping Capacity
From RO1 RO2 RO3 RO1 RO2 RO3
Warehouse 1 $1370 $1505 $1490 300 450 300
Warehouse 2 $1190 $1210 $1240 375 450 225
Demand 450 600 450 450 600 450

Management now wants to determine a distribution plan (number of truckloads


shipped per month from each plant to each ware- house and from each warehouse
to each retail outlet) that will minimise the total shipping cost.

a) Draw a network that depicts the company’s distribution network. Identify


the supply nodes, transshipment nodes and demand nodes in this network.

b) Formulate and solve this problem as a minimum cost flow problem by insert-
ing all the necessary data into this network.

6. Consider an assignment problem having the following cost table, where all times
are given in hours. It should be noted that this cost table illustrates that here we
are here tasked with assigning four workers (which are labelled A, B, C, D) to four
tasks (which are labelled 1, 2, 3, 4) with the objective to minimise the total time
required to complete the tasks.

a) Draw the network representation of this assignment problem.


6.12. Exercises for Self-Study 109

Task
1 2 3 4
A 8 6 5 7
B 6 5 3 4
Assignee
C 7 8 4 6
D 6 7 5 6

b) Formulate and solve this problem as an assignment problem.

7. Four cargo ships will be used for shipping goods from one port to four other ports
(labeled 1, 2, 3, 4). Any ship can be used for making any one of these four trips.
However, because of differences in the ships and cargoes, the total cost of loading,
transporting and unloading the goods for the different ship-port combinations
varies considerably, as shown in the following table.

Port
1 2 3 4
1 $500 $400 $600 $700
2 $600 $600 $700 $500
Ship
3 $700 $500 $700 $600
4 $500 $400 $600 $600

The objective is to assign the four ships to four different ports in such a way as to
minimise the total cost for all four shipments. Formulate and solve this problem
as an appropriate optimisation problem on a graph.

8. The coach of an age group swim team needs to assign swimmers to a 200m medley
relay team to send to a regional swimming competition. Since most of their best
swimmers are relatively fast in more than one stroke, it is not immediately clear
which swimmer should be assigned to each of the four strokes. The five fastest
swimmers and the best times (in seconds) they have achieved in each of the strokes
over 50m are outlined in the following table.

Stroke Carl Chris David Tony Ken


Backstroke 37.7 32.9 33.8 37 35.4
Breaststroke 43.4 33.1 42.2 34.7 41.8
Butterfly 33.3 28.5 38.9 30.4 33.6
Freestyle 29.2 26.4 29.6 28.5 31.1

The coach wishes to determine how to assign four swimmers to the four different
strokes to minimize the sum of the corresponding best times. Formulate and solve
this problem as an appropriate optimisation problem on a graph.
110 Chapter 6. Optimisation Problems on Graphs

9. a) Recall that each maximum flow defines a minimum capacity cut. Is this min-
imum capacity cut unique? That is, for a given maximum flow, could there
be more than one minimum capacity cut for this flow? Justify your answer.

b) Does each minimum capacity cut define a unique maximum flow?

10. Consider the maximum flow problem described by the following digraph, where
the source is node A, the sink is node F and the arc capacities are the numbers
shown next to these directed arcs.

Formulate and solve this problem as a maximum flow problem. Further, determine
a minimum cut in the network.

11. Joe State lives in Gary, Indiana. He owns insurance agencies in Gary, Fort Wayne,
Evansville, Terre Haute and South Bend. Each December, they visit each of their
insurance agencies. The distance between each of their agencies (in miles) is
shown in the following table.

Gary Fort Wayne Evansville Terre Haute South Bend


Gary 0 132 217 164 58
Fort Wayne 132 0 290 201 79
Evansville 217 290 0 113 303
Terre Haute 164 201 113 0 196
South Bend 58 79 303 196 0

What order should Joe visit their agencies to minimise the total distance travelled?

12. Find a minimum s t cut for each of these networks. The numbers along the edges
represent maximum capacities.

13. Find all minimum s t cuts in the following digraph. The capacity of each arc
appears as a label next to the arc.
6.12. Exercises for Self-Study 111

14. Suppose we have a directed graph with nonnegative capacities on the arcs. Prove
or disprove the following statements.

a) If all arcs have distinct capacities, then the minimum cut is unique.

b) Multiplying all capacities by a number > 0 does not change the minimal
cuts.

c) Adding a number > 0 to all capacities does not change the minimal cuts.
112 Chapter 6. Optimisation Problems on Graphs

Figure 6.12: An optimal 49,687 stop pub crawl of the UK, presented in
https://www.math.uwaterloo.ca/tsp/uk/index.html. The computation of
this tour required 14 months, which is equivalent to 250 years of computation time
on a single processor.
113

Chapter 7

Nonlinear Optimisation Models

Recall that there are a wide variety of mathematical models, some of which fall into the
rather broad categories linear and nonlinear, integer and noninteger, and deterministic
and stochastic models. It should be noted that to this point we have only considered
deterministic models with linear constraints, where possibly some of the variables may
take integer values through our consideration of LPs, IPs, MIPs and BIPs. During this
chapter, we introduce another class of mathematical optimisation models, namely those
that are nonlinear.

7.1 An Introduction to Nonlinear Optimisation


Given a real-valued function f : Rn ! R, we denote by dom( f ) the domain of the
function f , namely the set of points x = (x 1 , x 2 , . . . , x n ) T 2 Rn for which f (x ) is defined.
For example, note that:

• the domain of the function f : x 7! log x is the set dom( f ) = {x 2 R : x > 0},

• the domain of the function f : R2 ! R defined by (x 1 , x 2 ) 7! x 1 /x 2 is the set


dom( f ) = (x 1 , x 2 ) T 2 R2 : x 2 6= 0 , and

• the domain of the function f : Rn ! R defined by f (x ) = 0 for all x 2 Rn is the


set dom( f ) = Rn .

A general nonlinear optimisation problem is of the form

(7.1)

where f i : Rn ! R for i 2 {0, 1, 2 . . . , n} . In other words, a nonlinear optimisation


problem asks to minimise a real-valued function over some finite number of real-valued
inequality constraints. It should be emphasised that the above examples demonstrate
that such a nonlinear optimisation problem is not necessarily defined everywhere.
114 Chapter 7. Nonlinear Optimisation Models

For this reason, we define the domain D of the problem (7.1) as the set of points for
which both the objective function and the constraints functions are defined, that is

where \ denotes the (set theoretic) intersection. The feasible region is the set X of all
points in D satisfying the constraints. Note that X ✓ D , however, the converse is not in
general necessarily true.
This allows us to restate the above general nonlinear optimisation problem as

(7.2)

which means the problem concerns minimising a real-valued function over some finite
number of real-valued functions where inputs belong to the problem’s domain.
Linear programming is a special case of the above problem where all the f i ’s for
i 2 {0, 1, 2, . . . , m} are affine functions, namely that they are of the form

f i (x) = a iT x + bi for a i 2 Rn and bi 2 R.

For completeness, note that the word affine comes from the Latin affinis, which trans-
lates roughly to “connected with”. Further, in geometry an affine transformation (or
mapping) between two vector spaces consists of a linear transformation followed by a
translation, meaning that an affine transformation is informally “connected with” some
linear transformation by a translation.
Note that an important property of LPs with n decision variables is that the prob-
lem’s domain is Rn . This follows because there does not exist points in Rn where affine
functions are undefined. Despite this, it should be emphasised that the feasible region
associated with an LP does not usually also equal Rn .
Nonlinear programming problems refer to those problems with general form (7.2)
which do not satisfy this linearity assumption. Because of this, the above definition is far
too generic to say anything useful regarding actually solving such problems. Observe
for example that such a general problem includes any problem with binary variables
because x i 2 {0, 1} can be expressed by the equality x i (1 x i ) = 0, i.e. via the two
inequalities x i (1 xi) 0 and x i (1 x i )  0. In particular, this argument suggests that
nonlinear programming is in general at least as hard as binary programming, which is
an N P -hard problem.
7.2. Global and Local Optimality 115

Recall that integer programming and mixed-integer programming is N P -hard, how-


ever, depending on the underlying structure and formulation of the problem it some-
times was indeed possible to solve the problem to optimality. In a similar fashion,
whether or not a nonlinear programming problem can be solved to optimality depends
on the specific structure of the problem. During this chapter, we will focus particularly
on nonlinear problems in forms that are amenable for state-of-the-art solvers.
With nonlinear programming, an important first distinction that needs to be made is
between convex and non-convex optimisation problems. Broadly speaking, convex opti-
misation problems can typically be solved, while, solving non-convex problems is often
far more difficult. Below we will give an informal explanation of the difficulties arising
when dealing with non-convex optimisation problems before describing why convexity
is such a helpful property.

7.2 Global and Local Optimality

Given a real-valued function f : Rn ! R and a set X ✓ dom( f ) ✓ Rn , we are interested


in determining
minimise or maximise f (x ) 2 R : x 2 X .

We say that a point x ⇤ 2 X is:

• a global minimum for f in X if f (x ⇤ )  f (x ) for all x 2 X ,

• a strict global minimum for f in X if f (x ⇤ ) < f (x ) for all x 2 X ,

• a global maximum for f in X if f (x ⇤ ) f (x ) for all x 2 X , and

• a strict global maximum for f in X if f (x ⇤ ) > f (x ) for all x 2 X .

Global maxima and minima are often collectively referred to as global extrema.
Further, a point x ⇤ 2 X is:

• a local minimum for f in X if there exists an ✏ > 0 such that f (x ⇤ )  f (x ) for all
x 2 X such that kx x ⇤ k  ✏,

• a strict local minimum for f in X if there exists an ✏ > 0 such that f (x ⇤ ) < f (x )
for all x 2 X such that kx x ⇤ k  ✏,

• a local maximum for f in X if there exists an ✏ > 0 such that f (x ⇤ ) f (x ) for all

x 2 X such that kx x k  ✏, and
116 Chapter 7. Nonlinear Optimisation Models

• a strict local maximum for f in X if there exists an ✏ > 0 such that f (x ⇤ ) > f (x )
for all x 2 X such that kx x ⇤ k  ✏,

where k · k denotes the `2 -norm (or Euclidean norm). In other words, a local minimum
say is a point x ⇤ in the set X such that no points x 2 X within a ball of radius ✏ (some
neighbourhood) from x ⇤ have strictly smaller function value.
Consider for example the single variable function illustrated in Figure 7.1. Observe
that the point x 0 is a (strict) local minimum of the function f since we can find a small
interval around x 0 where no point has strictly smaller function value than x 0 . Despite
this, note that the point x 0 is not a global minimum because there are points including x 00
with lower objective value. This illustrates an important fact that not all local extrema
are global extrema.

Figure 7.1: The point x 0 is a (strict) local minimum but not a global minimum.

Unfortunately, even if we were to determine that a certain point x ⇤ 2 X is a local


minimum, we would in general be unable to determine if it is a global minimum. Note
that nonlinear programming methods, including the celebrated gradient methods or
Newton methods (see e.g. [5, Chapter 1]), generate sequences of points that converge
to some local optimum. If there are multiple local optima, then the one identified by
the algorithm depends heavily on the starting point. There is in general no guarantee
that the local optimum identified by such algorithms is a global optimum.
The following simple example illustrates how AMPL can use the state-of-the-art
solver MINOS (Modular In-core Nonlinear Optimization System) to solve the nonlin-
ear problem, however, observe that MINOS has found a local not global minima.

Example. (MINOS finds a local optima) Consider the nonlinear optimisation problem

minimise x · sin(x + 4)
subject to 10  x  10.

Figure 7.2 illustrates the objective function values over [ 10, 10] .
7.3. Convex Functions 117

In this case, AMPL outputs that the optimal solution is x = 1.34995 with corresponding
objective value 1.084752213, namely the red dot in Figure 7.2. Notice that this is a local
minimiser, however, it is clearly not the global minimiser of our function on [ 10, 10] .

Figure 7.2: The nonlinear function values for x · sin(x + 4) over [ 10, 10] . The red dot
at x = 1.34995 is the point that the solver MINOS outputs as the optimal solution to
the problem.

7.3 Convex Functions


A function f : Rn ! R is convex if dom( f ) is convex and, for every x , y 2 dom( f ) and
2 [0, 1] , that
f x + (1 )y  f (x ) + (1 ) f (y). (7.3)

In other words, a function f is convex for every two points in the domain of the function
and every 2 [0, 1] the value of the weighted average of the two points has value less
than or equal the weighted average of the values. Figure 7.3 provides a geometric
interpretation of the above definition. Observe that the point

x + (1 )y, f (x ) + (1 ) f (y) 2 Rn+1

is a point on the graph of f and, therefore, (7.3) states that the point belongs to the
epigraph of f , where the epigraph of a function is the set of all points in the Cartesian
product dom( f ) ⇥ R ✓ Rn+1 lying on or above its graph. For example, the epigraph of
the function g : R ! R defined by g(x) = x 2 is the set (x, y) T 2 R2 : y x2 .
118 Chapter 7. Nonlinear Optimisation Models

Further, observe that the set of all points

x + (1 )y, f (x ) + (1 ) f (y)

for 2 [0, 1] is the line segment joining (x , f (x )) to (y, f (y)) . Hence, it follows that
(7.3) means that the epigraph of f contains the linear segment joining any two points
in the graph of f .

Figure 7.3: A function f is convex if the line segment joining any two points in the graph
of f is contained in the epigraph of f .

Some examples of convex functions include:

• e a x for any a 2 R,

• |x| p on R with p 1,

• x log(x) with x > 0

• affine functions in Rn , and


Pn p 1/p
• ` p -norms in Rn , namely kx k p = i=1 |x i | for p 1.

It is possible to verify that these functions are convex using definition (7.3). Note that
the simplest example of convex functions are affine functions.
It should be noted that the above definition of a convex function requires that the
domain of f , namely dom( f ), is convex. For this purpose, we next provide a definition
of a set being convex.
7.3. Convex Functions 119

Given a set C ✓ Rn , we say that C is convex if the line segment between any two
points in C lies in C. In other words, a set C is convex if for any x , y 2 C, the line
segment x + (1 )x : 0   1 with endpoints x and y is contained in C. Figure
7.4 illustrates both a convex and non-convex set.

Figure 7.4: The set on the left is not convex, while, the one on the right is convex.

Some examples of convex sets include:

• any point x 2 Rn ,

• any interval [a, b] ⇢ R,

• the complete space Rn ,

• any subspace of Rn ,

• the empty set ;,

• the intersection of a (possibly) uncountable number of convex sets,

• any (affine) hyperplane {x 2 Rn : a T x = b} where a 2 Rn with ai 6= 0 for at least


one i 2 {1, 2, . . . , n} and b 2 R,

• any polyhedron P(A, b) = {x 2 Rn : Ax  b} where A 2 Rm⇥n and b 2 Rm , and

• Euclidean balls, i.e. B(x 0 , r) = x 2 Rn : kx x 0 k  r, where r 2 R .

It should be noted that the definition of convex sets allows us to state an equivalent
definition to (7.3) for convex functions. In particular, a real-valued function f : Rn ! R
is convex if both dom( f ) and its epigraph are convex sets.
A simple yet important fact is that if f : Rn ! R is convex, then all sublevel sets of f
are convex, where the sublevel sets are sets of the form

x 2 dom( f ) : f (x )  ↵
120 Chapter 7. Nonlinear Optimisation Models

for some fixed ↵ 2 R. In other words, a sublevel set of a function f is the set of points
in the domain of f such that their function values are no greater than any fixed value
↵. It should be emphasised that the fact tells us that if a function f is convex, then all
sublevel sets of f are convex, however, the converse is not true in general.

7.4 Univariate Convex Functions

Recall that in the previous section we provided a definition for both convex functions
and convex sets. Given some function, it is possible to prove it is or is not convex
using definition (7.3), however, this can be difficult and nonintuitive, even for certain
seemingly simple functions. It turns out that there are often simpler conditions that we
can check, which involve the first and second derivatives of the function.
Here we consider the familiar case of univariate functions f : R ! R, namely those
functions of a single variable.

Theorem. Suppose that f : R ! R is differentiable on dom( f ) ✓ R . Then f is convex if


and only if
f (x 2 ) f (x 1 ) + f 0 (x 1 ) (x 2 x1)

holds for all x 1 , x 2 2 dom( f ) .

This result informally tells us that a univariate function f is convex if and only if the
function f lies above its tangent lines.

Theorem. Suppose that f : R ! R is twice differentiable on dom( f ) ✓ R . Then f is


convex if and only if its second derivative f 00 is nonnegative on dom( f ) .

This result informally tells us that a univariate function f is convex if and only if the
function f is always “curving upward”.
It should be noted that the second theorem yields the test that is most practical
provided the function is indeed twice differentiable on its domain. Upon making use of
this criterion, we can verify the convexity of the following univariate functions:

• f (x) = log x, where dom( f ) = {x 2 R : x > 0},

• f (x) = e x , where dom( f ) = R,

• f (x) = 1/x, where dom( f ) = {x 2 R : x > 0}, and

• f (x) = x log x, where dom( f ) = {x 2 R : x > 0}.


7.5. Minima of Convex Functions 121

7.5 Minima of Convex Functions

Recall that in general nonlinear optimisation problems, as illustrated in Figure 7.2, not
all local extrema are global extrema. Further, recall that the nonlinear programming
methods, including gradient methods or Newton methods (see e.g. [5, Chapter 1]),
generate sequences of points that converge to some local optimum and not necessarily
the global optima. The following theorem is the fundamental property of convex func-
tions that explains why convex optimisation problems are easier than general nonlinear
optimisation problems.

Theorem. Let f : Rn ! R be convex and let X ✓ dom( f ) be a convex set. Then every local
minimum for f in X is a global minimum for f in X .

Proof. Suppose f is convex and let x ⇤ be a local minimum of f is X . Then for some
neighbourhood N ✓ X about x ⇤ , we have f (x ) f (x ⇤ ) for all x 2 N . Suppose, to
derive a contradiction, that there exists some x 0 2 X such that f (x 0 ) < f (x ⇤ ) , i.e. that
x ⇤ is not a global minimum of X .
Consider the line segment x ( ) = x ⇤ + (1 )x 0 for 2 [0, 1] . Observe that
x ( ) ⇢ X by the convexity of X . Then by the convexity of f , we have

(7.4)

for all 2 (0, 1), where the strict inequality follows since f (x 0 ) < f (x ⇤ ) by assumption.
We can pick to be sufficiently close to 1 such that x ( ) 2 N . Then by the definition
of the neighbourhood N , we have f x ( ) f (x ⇤ ) , however, f x ( ) < f (x ⇤ ) by
(7.4), which yields the contradiction as required.

Note that in the above theorem, we do indeed need that both the function f and the
set X are convex. Consider for example the problem min{x 12 + x 22 : f (x)  0}, where
f (x) = min{1 x1, 1 x 2 } . The objective function is convex, however, the feasible
region is non-convex since it is the set X = {(x 1 , x 2 ) T : x 1 1 or x 2 1}. Note that
both (1, 0) and (0, 1) are local minima, however, (1, 0) is the global minimum since it
has value 1, while, (0, 1) has value 2. This is illustrated in Figure 7.5.
122 Chapter 7. Nonlinear Optimisation Models

Figure 7.5: The dashed lines represent contours of the function x 12 +2x 22 and the shaded
area represents the feasible region X .

7.6 Convex Optimisation Problems


A convex optimisation problem is of the form

(7.5)

where f0 , f1 , . . . , f m are convex functions and h1 , h2 , . . . , hk are affine functions. Recall


that h1 , h2 . . . , hk are affine if and only if there exists a1 , a2 , . . . , a k 2 Rn and b1 , b2 , . . . , bk 2
R such that hi (x ) = a iT x bi for each i.
Note that because f1 , f2 , . . . , f m are convex functions, the sets {x 2 Rn : f i (x )  0}
for each i 2 {1, 2, . . . , m} are convex. Furthermore, the sets {x 2 Rn : a iT x = bi } for
i 2 {1, 2, . . . , k} are hyperplanes and therefore convex. It follows that the feasible region
X associated with (7.5) is convex since it is the intersection of convex sets. Further, every
local optimum for f0 in X is also a global optimum by the above theorem.
Recall that for a linear programming problem, it did not make a difference from
a mathematical standpoint in linear programming if we are maximising or minimis-
ing. Despite this, in convex optimisation, the direction of optimisation is important. In
particular, minimising a convex function over a convex region guarantees that all local
optima are also global optima, however, the same does not hold if we maximise a convex
function over a convex region.
It is natural here to ask if anything similar can be said when we instead maximise
some function over a convex region? It turns out that something similar can indeed be
7.6. Convex Optimisation Problems 123

said, however, it relies on the objective function being concave. A function f : Rn ! R


is concave if f is convex. Equivalently, a real-valued function f is concave if dom( f )
and its hypograph are convex sets, where the hypograph of a function is the set of all
points in the Cartesian product dom( f ) ⇥ R ✓ Rn+1 lying on or below its graph. Some
examples of concave functions include:

• x 2 on R,

p
• x with x 0,

• affine functions in Rn , and

• the sine function on [0, ⇡] .

Note that affine function are both convex and concave. In fact, affine functions are
perhaps surprisingly the only functions that are both convex and concave.
Recall that convex optimisation problems (7.5) have been defined as minimisation
problems. However, if in a maximisation problem of the form

maximise f0 (x )
subject to f i (x )  0 for i 2 {1, 2, . . . , m},
hi (x ) = 0 for i 2 {1, 2, . . . , k}

the objective function f0 is concave, f1 , . . . , f m are convex and h1 , h2 , . . . , hk are affine


functions, we will also say that the problem is a convex optimisation problem. This
is justified by the fact that the equivalent minimisation problem obtained by replacing
“max f0 (x )” with “min f0 (x )” is a convex optimisation problem as f0 is convex.
Note for completeness that in the univariate case we similarly have the following
two theorems that allow us to verify if a function is concave using the first and second
derivatives of the function.

Theorem. Suppose that f : R ! R is differentiable on dom( f ) ✓ R . Then f is concave if


and only if
f (x 2 )  f (x 1 ) + f 0 (x 1 ) (x 2 x1)

holds for all x 1 , x 2 2 dom( f ) .

Theorem. Suppose that f : R ! R is twice differentiable on dom( f ) ✓ R . Then f is


concave if and only if its second derivative f 00 is nonpositive on dom( f ) .
124 Chapter 7. Nonlinear Optimisation Models

7.7 Example: Chemical Equilibrium


The problem of determining the chemical composition of a complex mixture under
chemical equilibrium conditions arises in the analysis of the performance of fuels and
propellants and in the synthesis of complex organic compounds. A mixture of chemical
species held at a constant temperature and pressure reaches its chemical equilibrium
state concurrently with reduction of the free energy of the mixture to a minimum. This
is a consequence of the second law of thermodynamics. The objective function to be
minimized in the chemical equilibrium model is the expression of the free energy of the
chemical mixture under study. The value of the free energy of the mixture is minimized
subject to the chemical reactions possible between species of the mixture.
Consider a mixture of m chemical elements. It has been predetermined that the m
different types of atoms can combine chemically to produce n compounds.
Define

x j = the number of moles of compound j present in the mixture at equilibrium,


X
n
s = the total number of moles in the mixture, i.e. s = xj ,
j=1

ai j = the number of atoms of element i in a molecule of compound j, and


bi = the number of atomic weights of element i in the mixture.

The mass balance relationships that must hold for the m elements are
X
n
ai j x j = bi for i 2 {1, 2, . . . , m} (7.6)
j=1

and
xj 0 j 2 {1, 2, . . . , n}. (7.7)

Determination of the composition of the mixture at equilibrium is equivalent to deter-


mination of the values of x j for j 2 {1, 2, . . . , n} that satisfy the above constraints and
also minimise the total free energy of the mixture. The total free energy of the mixture
is given by
X
n Å ⇣ x ⌘ã
j
x j c j + log , (7.8)
j=1
s

where ✓ ◆
F0
cj = + log P
RT j

is the Gibbs free energy function for the j-th compound that can be found in tables, P
denotes the total pressure in atmospheres and log denotes the natural logarithm.
7.8. Quadratic Optimisation 125

It follows that the nonlinear programming problem is as follows. Choose x j for


j 2 {1, 2, . . . , n} to minimise the nonlinear objective function (7.8) subject to the linear
constraints (7.6) and nonnegativity restrictions (7.7). This is a convex optimisation
problem since all constraints are linear and the objective is convex.
Consider for example the determination of the equilibrium composition resulting
from subjecting the compound 21 N2 H4 + 12 O2 (half part hydrazine and half part oxygen)
to a temperature T = 3500 K and a pressure of P = 51.0345Atm The following table
gives a summary of the information necessary to solve the problem.

ai j
i=1 i=2 i=3
j Compound (F 0 /RT ) j cj H N O
1 H -10.021 -6.089 1
2 H2 -21.096 -17.164 2
3 H2O -37.986 -34.054 2 1
4 N -9.846 -5.914 1
5 N2 -28.653 -24.721 2
6 NH -18.918 -14.986 1 1
7 NO -28.032 -24.100 1 1
8 O -14.640 -10.708 1
9 O2 -30.594 -26.662 2
10 OH -26.111 -22.179 1 1

Table 7.1: Data on 21 N2 H4 + 12 O2 at 3500 K and P = 51.0345Atm

The AMPL model for the general problem can be found on Moodle. To solve this
problem, we make use of the MINOS solver. It should be noted that if we chose as a
solver CPLEX or Gurobi we would receive an error message. This is since as we will dis-
cuss, CPLEX and Gurobi can handle quadratic objective and constraints but not general
nonlinear functions, even those that are convex.

7.8 Quadratic Optimisation


In this section, we introduce an important class of convex optimisation problems known
as convex quadratic optimisation problems before discussing a slightly more general
class. Informally, these are those problems where all functions are quadratic, namely
polynomial functions of degree two. Being a little more precise, a function f : Rn ! R
is quadratic if it is a degree two polynomial function. For example, the function

f (x 1 , x 2 ) = x 12 + 3x 1 x 2 + 2x 22 5x 1 + 6x 2 + 3
126 Chapter 7. Nonlinear Optimisation Models

is a quadratic function.
Let A 2 Rn⇥n be a square matrix with n rows and columns. The square matrix
A = (ai j )n⇥n is symmetric if it is equal to its transpose, i.e. if A = AT . In other words, a
matrix is symmetric if ai j = a ji for all i, j 2 {1, 2, . . . , n}.
Further, a symmetric matrix A is:

• positive semi-definite if

• negative semi-definite if

• indefinite if

The identity matrix is for example positive semi-definite. Further, a diagonal matrix
with all non-negative entries is positive semi-definite. The matrix
Ç å
1 0
A=
0 2

is indefinite because x T Ax = 1 for x = (1, 0) T and y T Ay = 2 for y = (0, 1) T .


Note for completeness that we could have instead defined positive and negative
(in)definiteness depending on the eigenvalues of the symmetric matrix A. We know
that all eigenvalues of the matrix A are real since ai j 2 R for all i, j 2 {1, 2, . . . , n} and A
is symmetric. In particular, a symmetric matrix is:

• positive semi-definite if and only if all its eigenvalues are non-negative,

• negative semi-definite if and only if all its eigenvalues are non-positive, and

• indefinite if and only if there is at least one eigenvalue that is positive and at least
one eigenvalue that is negative.

Observe that given any quadratic function f : Rn ! R, we can write it in the form

f (x ) = x T Qx + p T x + r,

where Q 2 Rn⇥n is a symmetric matrix, p 2 Rn and r 2 R. For the particular initial


example, namely f (x 1 , x 2 ) = x 12 + 3x 1 x 2 + 2x 22 5x 1 + 6x 2 + 3, the representation is
Ç å Ç å
1 1.5 5
Q= ,p= and r = 3.
1.5 3 6

The following theorem asserts when some quadratic function is convex.


7.8. Quadratic Optimisation 127

Theorem. Let Q 2 Rn⇥n be a symmetric matrix, p 2 Rn , and r 2 R. The quadratic


function f (x ) = x T Qx + p T x + r is convex if and only if Q is positive semi-definite.

Notice that in the special case when n = 1, the above theorem yields the well-known
fact that f (x) = a x 2 + b x + c is convex if and only if a 0. The following result will be
useful in the remainder.

Theorem. Let Q 2 Rn⇥n be a symmetric matrix. The Q is positive semi-definite if and only
if it can be written as Q = AT A for some m ⇥ n matrix A with m  n.

Note that if indeed Q = AT A for some A, then

where k · k denotes the `2 -norm (or Euclidean norm). In particular, this shows that if
we have Q = AT A for some A, then the symmetric matrix Q is positive semi-definite.
A convex quadratic optimisation problem is of the form

minimise x T Q 0 x + p 0T x + r0
(7.9)
subject to x T Q i x + p iT x + ri  0 for i 2 {1, 2, . . . , k},

where Q i is a symmetric positive semi-definite n ⇥ n matrix pi 2 Rn and ri 2 R for


i 2 {0, 1, 2, . . . , k}. Note that since the Q i ’s in the above are positive semi-definite, it
follows that all quadratic functions in the problem are indeed convex. Recall that this
implies that any local minima are global minima.
The following example illustrates how convex quadratic optimisation is used within
financial settings for portfolio optimisation.

Example. (Portfolio selection) Markowitz’ theory of mean-variance optimisation provides


a mechanism for the selection of portfolios of securities (or asset classes) in a manner that
trades off the expected returns and the risk of potential portfolios.
Consider n assets i 2 {1, 2 . . . , n} with random returns r1 , r2 , . . . , rn . Let µi denote the
expected return of asset i, namely µi = E[ri ] and let

Σ=( i j )i, j=1,...,n

be the n ⇥ n symmetric covariance matrix, where

ij = E[(ri µi )(r j µ j )]
128 Chapter 7. Nonlinear Optimisation Models

is the covariance of ri and r j . Note in particular that ii is the variance of ri .


Let x i denote the proportion of the total funds invested in security i. This allows us to
represent the expected return and the variance of the resulting portfolio x = (x 1 , x 2 , . . . , x n )
as
ñ ô
X
n X
n
Expected return = E ri x i = µi x i
i=1 i=1

and
 ÇX
n X
n
å2
Portfolio variance =E ri x i µi x i
i=1 i=1
X
n X n
=E (ri µi )(r j µ j) xi x j
i=1 j=1
XX
n n
⇥ ⇤
= E (ri µi )(r j µ j) xi x j
i=1 j=1
n X
X n
= i j xi x j = xTΣ x.
i=1 j=1

Markowitz’ mean-variance optimisation problem is to find the minimum variance port-


folio of the n securities that yields at least a target value of expected return. Let R denote
this target. This gives rise to the following optimisation problem

minimise xTΣ x
subject to µ T x R,
Xn
x i = 1,
i=1

x 0.

Markowitz’s model is therefore a convex quadratic optimization problem. Indeed, in this


case the objective function x T Σ x is a quadratic convex function as the matrix Σ is positive
semi-definite. In order to verify this is indeed the case we need to show that x T Σ x 0 for
n T T
all x 2 R . Recall that x Σ x is the variance of the random variable r x and as such

x T Σx = E[(r T x µ T x )2 ]

which is nonnegative because the variance of a random variable is always nonnegative.


7.9. Second Order Cone Programming 129

7.9 Second Order Cone Programming


A second order cone programming (SOCP) problem, which is sometimes known as a cone
quadratic programming problem, is of the form

where p 2 Rn and, for i 2 {1, 2, . . . , m}, Ai is some mi ⇥ n matrix (with mi rows),


b i 2 Rmi , c i 2 Rn , di 2 R and k · k denotes the `2 -norm (or Euclidean norm).
It is possible to prove that functions of the form

x 7! k Ai x + b i k c iT x di

are convex and, in consequence, it follows that SOCPs are convex optimisation problems.
It should be noted that the name comes from the fact that each constraint defines
an affine transformation of the standard second order cone (or the quadratic, Lorentz or
ice-cream cone), which, in Rn , is the set of points of the form

1
(u, t) 2 Rn ⇥ R : kuk  t .

The standard second order cone is illustrated in Figure 7.6.

Figure 7.6: The standard second order cones in R2 and R3 .

General Nonlinear Solvers vs SOCP Solvers

Note that while there are general nonlinear solvers, such as MINOS, KNITRO, Ipopt,
SNOPT, CONOPT, the main commercial solvers only accept problems in more restricted
forms. For example, CPLEX, Gurobi, Xpress and Mosek solve SOCP problems.
On the one hand, the advantages of general nonlinear solvers are clear. For example,
they accept any form of problem and they tolerate non-convexities. However, there are
130 Chapter 7. Nonlinear Optimisation Models

many disadvantages including their reliance on smoothness of functions and feasible


region, their use of complex mechanisms and they report only local optimality.
On the other hand, SOCP solvers only accept problems in conic convex quadratic
form, rejecting problems not in the required form. On the other hand, SOCP solvers
allows for non-smooth functions (the functions defining the SOCP constraints are not
smooth) and guarantee global optimality. Furthermore, the algorithms to solve SOCP
problems are derived from the algorithms for linear programming. In particular, linear
programming problems can be solved by a class of fast algorithms known as interior
point methods (see e.g. [16]). These methods can be naturally adapted to solve SOCPs,
providing fast algorithms that guarantee global optimality and tolerate non-smooth fea-
sible regions.
In particular, all the above solvers can also accept SOCPs with integer variables, since
the Branch-and-Bound method naturally extends to this setting. It should be emphasised
here that the distinction between SOCPs and SOCPs with integer variables is similar to
the case of Linear Programming problems versus Mixed Integer Linear Programming
problems. SOCPs, like LPs, can be solved reliably and efficiently. Introducing integer
variables makes the problems potentially much harder to solve.

7.10 Second Order Cone Programming Representable Sets

Linear Programming

Linear programming problems are very special types of SOCP. This follows since SOCPs
have a linear objective and any linear constraint, say a T x  b, can be equivalently
written as k0k  b a T x , where 0 is the n-dimensional zero vector. Hence, we can
always write linear constraints as part of an SOCP.

Hyperbolic Constraints

Hyperbolic constraints are constraints of the form

x 2  yz, y, z 0

where x, y, z are three variables. Such constraints can be transformed into SOCP con-
straints as follows. Observe that
⇣ y + z ⌘2 ⇣y z ⌘2
yz = ,
2 2
which implies that the hyperbolic constraint can be written as
⇣ y z ⌘2 ⇣ y + z ⌘2
x2 +  .
2 2
7.10. Second Order Cone Programming Representable Sets 131

Further, since y, z 0, the above is equivalent to


0 1
ñ ô x
1 0 0 B C y +z
1 1 @ y A  ,
0 2 2
2
z

which is an SOCP constraint.


This can be generalised in a straightforward way to the case that x 2 Rn . In such
case the constraint kxk2  yz can be written as
⇣y z ⌘2 ⇣ y + z ⌘2
kx k2 +  ,
2 2

which is equivalent to the SOCP constraint


0 1
ñ ô x
In 0 0 B C y +z
1 1 @ y A  ,
0 2 2
2
z

where In is the n-dimensional identity matrix.


Note that solvers such as CPLEX or Gurobi will accept hyperbolic constraints and will
automatically perform the above transformation to turn them into SOCP constraints.
The following example illustrates how we can maximise the product of linear func-
tions.

Example. (Maximising the product of linear functions) Consider a problem in the variables
x 1 , . . . , x n , y1 , . . . , yk , where the constraints are linear but the objective has the form

maximise (c T x )(d T y),

where c 2 Rn and d 2 Rk and we require both c T x , d T y 0. Note that the above objective
is quadratic since it is a polynomial of degree two, however, it is not concave. Hence, the
above is not a convex quadratic problem.
132 Chapter 7. Nonlinear Optimisation Models

Quadratic Programs

Quadratic programming is a special case of SOCP. Indeed, recall from an earlier theorem
that any positive semi-definite matrix Q 2 Rn⇥n can be factorised as Q = AT A for some
m ⇥ n matrix A with m  n. In particular, consider the quadratic program (7.9) and
matrices Ai such that Q i = ATi Ai for i 2 {0, 1, 2, . . . , k}.
Then, upon introducing new variables zi for i 2 {0, 1, . . . , k}, the quadratic program
(7.9) can be written as

minimise z0 + p 0T x + r0
subject to k Ai x k2  zi for i 2 {0, 1, 2, . . . , k},
zi + p iT x + ri  0 for i 2 {1, 2, . . . , k}.

Note that the constraints k Ai x k2  zi are not SOCP constraints. However, they can be
written as hyperbolic constraints of the form kuk2  wzi , where u = Ai x and w = 1.
As we have seen previously, these can be expressed as second order constraints. We
can do this explicitly using the fact that
Å ã2 Å ã2
zi + 1 zi 1
zi =
2 2

and therefore the constraint kAi x k2  zi can be written as


ñ ôÇ å Ç å
Ai 0 x 0 zi + 1
1 + 1  .
0 2 zi 2
2

7.11 Applications of Second Order Cone Programming

Robust Optimisation

In this section we show how SOCP can be used to solve some simple robust convex
optimization problems, in which uncertainty in the data is explicitly accounted for. We
consider a LP
minimise cT x
subject to a iT x  bi for i 2 {1, 2, . . . , m}.
in which there is some uncertainty (or variation) in the parameters c 2 Rn , a i 2 Rn or
bi 2 R. In order to simplify the exposition, we assume that c and bi are fixed, and the
a i ’s are known to lie in some ellipsoids a i 2 Ei . Note that ellipsoids are simply affine
transformations of the unit sphere and therefore we can write each ellipsoid Ei as

Ei = ā i + Pi u : kuk  1 ,
7.11. Applications of Second Order Cone Programming 133

where Pi is some symmetric positive semi-definite n⇥n matrix and ā i denotes the centre
of the ellipsoid.
In a worst-case framework, we require that the constraints be satisfied for all possible
values of the parameters a i , which leads us to the robust LP

minimise cT x
(7.10)
subject to a iT x  bi for a i 2 E and i 2 {1, 2, . . . , m}.

The robust linear constraint a iT x  bi for all a i 2 E can be expressed as

max{a iT x : a i 2 E }  bi .

Lemma. Given x 2 Rn , we have max{a iT x : a i 2 E } = ā iT x + kPi x k .

Proof. Recall that Ei = ā i + Pi u : kuk  1 and Pi is symmetric, hence

max a iT x : a i 2 Ei } = ā iT x + max{u T Pi x : kuk  1 .

For every u such that kuk  1, we have

u T Pi x  kukkPi x k  kPi x k

by Cauchy-Schwarz inequality (see e.g. [2, Section 10.1]) which shows

max u T Pi x : kuk  1  kPi x k .

On the other hand, if we choose u = Pi x /kPi x k , then kuk = 1 and

(Pi x ) T Pi x
u T Pi x = = kPi x k ,
kPi x k
which shows
max u T Pi x : kuk  1 = kPi x k

and the result follows.

Hence, the robust LP (7.10) can be expressed as the SOCP

minimise cT x
(7.11)
subject to kPi x k  bi ā iT x for i 2 {1, 2, . . . , m}.

Note that the additional norm terms act as “regularization terms”, discouraging large x
in directions with considerable uncertainty in the parameters a i .
There is another, equivalent, statistical interpretation of Robust optimisation. In
this interpretation, we assume that each vector a i is independently drawn at random
134 Chapter 7. Nonlinear Optimisation Models

from some Gaussian distribution. Further, we know the corresponding mean a¯i and the
covariance matrix. Our objective is to find the vector x minimising the linear function
c T x such that the probability that x violates a constraint a iT x  bi is less than some
tolerance ⌘. Such framework gives rise to an SOCP of the same form as (7.11), however,
we do not give a formal derivation here.

Maximising the Sharpe Ratio

The Sharpe ratio in finance defines an efficiency metric of a portfolio as the expected
return per unit risk, where the risk is measured as the standard deviation of the portfolio.
As in the Markowitz model, we have n assets i 2 {1, 2, . . . , n} with random returns
r1 , r2 , . . . , rn and we are given the vector of expected returns µ and covariance matrix
Σ. Given an allocation x 2 Rn of the n assets, the Sharpe ratio is defined as

µT x rf
S(x ) = ,
(x T Σ x )1/2

where r f denotes the return of a risk-free asset. The Sharpe ratio compares the pro-
jected returns relative to an investment benchmark (such as government bonds) with
the historical or expected variability of such returns.
Suppose there is a portfolio with µ T x > r f . It should be noted that if not, then
it would be better to simply invest in the risk-free asset. Further, there is by assump-
tion such a portfolio, note that maximising the Sharpe ratio is equivalent to minimising
1/S(x ) . Note that since Σ is positive semi-definite, it can be factorised as Σ = AT A for
some m ⇥ n matrix A (for some m  n). Hence, the standard deviation of µ T x can be
written as
1/2
xTΣ x = kAx k

In other words, we have the following problem

kAx k
minimise
µT x rf
Xn
subject to x i = 1,
i=1

x 0.

We can introduce a new variable z, where we will maintain µ T x r f = 1/z. In


7.12. Exercises for Self-Study 135

particular, if we let y = x z, then our problem becomes

minimise t
subject to kAyk  t,
µ T y r f = 1,
Xn
yi = z,
i=1

y 0,

which is an SOCP. It should be noted that we yield the optimal portfolio x = y/z from
the optimal solution (y, z, t) .

7.12 Exercises for Self-Study


1. Recall that the intersection of a (possibly) uncountable number of convex sets is
convex. In contrast, the union of convex sets is not necessarily convex. Provide a
counterexample to show this is indeed the case.

2. Show that if f : Rn ! R is convex, then all sublevel sets of f are convex, where
the sublevel sets are sets of the form

x 2 dom( f ) : f (x )  ↵

for some fixed ↵ 2 R.

3. For each of the following functions, show whether the function is convex, concave
or neither.

a) f (x) = 10x x 2,
b) f (x) = x 4 + 6x 2 + 12x,
c) f (x) = 2x 3 3x 2 ,
d) f (x) = x 4 + x 2 , and
e) f (x) = x 3 + x 4 .

4. Consider the function f : R ! R defined by f (x) = x 3 . Over what region is f


convex? Over what region is f concave?

5. For each of the following functions, show whether the function is convex, concave
or neither.
136 Chapter 7. Nonlinear Optimisation Models

a) f (x ) = x 1 x 2 x 12 x 22 ,

b) f (x ) = 3x 1 + 2x 12 + 4x 2 + x 22 2x 1 x 2 ,

c) f (x ) = x 12 + 3x 1 x 2 + 2x 22 ,

d) f (x ) = 20x 1 + 10x 2 , and

e) f (x ) = x 1 x 2 .

6. Consider the nonlinear programming problem

minimise x 14 + 2x 22
subject to x 12 + x 22 2.

Show that this problem is a convex programming problem both geometrically and
algebraically.

7. Solve the nonlinear programming problem from the previous exercise using the
solver via AMPL.

8. Answer the questions below for the following problem, where in each case you
must justify your answer.

1 4 1 2
minimise f (x ) = x x x2
4 1 2 1
subject to x 12 + x 22 4
x1 x 2  2.

a) Is the problem a convex programming problem?

b) Is the point x = (1, 1) T a feasible solution?

c) Is the point x = (1, 1) T an optimal solution?

d) Is the point x = (2, 2) T a feasible solution?

e) Is the point x = (2, 2) T an optimal solution?

9. Let C ✓ Rn be a convex set. Suppose x 1 , x 2 , . . . , x k 2 C and let ✓1 , ✓2 , . . . , ✓k 2 R


satisfy ✓i 0 with ✓1 + ✓2 + · · · + ✓k = 1. Show using induction that

✓1 x 1 + ✓2 x 2 + · · · + ✓k x k 2 C.

Note that the definition of convexity is that this holds for k = 2.


Part II

Simulation

137
139

Chapter 8

Statistics and Probability Background

8.1 Same Space and Events


Consider an experiment where the outcome is not known in advance. Let S denote the
set of all possible outcomes, which is known as the sample space of the experiment. For
example, if the experiment consists of tossing a single coin, the sample space is

S = {H, T },

where the outcome H means the coin is heads and the outcome T means the coin is
tails. For tossing a coin twice, the sample space becomes

S = {H H, H T, T H, T T }, (8.1)

where the outcomes are defined in a similar fashion. For less standard interesting ex-
ample consider the experiment where eight runners who are numbered 1 through 8 run
a race, then (assuming all runners complete the race) the sample space is

S = {all orderings of (1, 2, 3, 4, 5, 6, 7, 8)} ,

where the outcome (2, 7, 1, 8, 3, 4, 5, 6} for example means that runner number 2 fin-
ished first, runner number 7 finished second, and so on.
Any subset A ✓ S of the sample space is known as an event. In other words, an event is
a set consisting of possible outcomes of the experiment. If the outcome of the experiment
is contained in A, we will say that A has occurred. For example, in the experiment where
one flips a coin twice, if A = {T H, T T }, then A is the event that the first flip is a tails. In a
similar light, in the example about runners, if A = {all outcomes in S starting with a 3},
then A is the event that runner number 3 finishes first.
For any two events A and B, we define the new event A [ B, called the union of A
and B, to consist of all outcomes that are in either A or B or in both A and B. In a similar
fashion, we define the event A \ B, called the intersection of A and B, to consist of all
140 Chapter 8. Statistics and Probability Background

outcomes that are in both A and B. Note if A \ B = ; such that A and B cannot both
occur, we say that A and B are mutually exclusive.
It is also possible to define unions and intersections for more than two events. For
this purpose, let A1 , A2 , . . . , An denote n events. The union of these n events is

[
n
Ai := A1 [ A2 [ · · · [ An ,
i=1

which consists of all outcomes that are in at least one Ai . The intersection of these n
events is
\
n
Ai := A1 \ A2 \ · · · \ An ,
i=1

which consists of all outcomes that are in all of the Ai .

8.2 Axioms of Probability


Suppose that for each event A of an experiment having sample space S there is some
number, denoted by P(A) called the probability of the event A, which satisfies the fol-
lowing three axioms:
Axiom 1 0  P(A)  1,
Axiom 2 P(S) = 1, and
Axiom 3 For any sequence of mutually exclusive events A1 , A2 , . . .
✓ ◆ X
n
[
n
P Ai = P (Ai ) , n = 1, 2, . . . , 1.
i=1 i=1

In words, the first axiom states that the probability that the outcome of the experiment
lies within A is some number between 0 and 1 inclusive. The second axiom states that
with probability 1 this outcome will be a member of the sample space S. Finally, the
third axiom states that for any set of mutually exclusive events, the probability that at
least one of the events A1 , A2 , . . . occurs is precisely equal to the sum of their respective
probabilities. These three axioms may be used in order to prove a wide variety of results
about probabilities.

8.3 Conditional Probability and Independence


Recall the experiment that consists of flipping a coin twice, where we note whether the
result was heads or tails after each flip. Note that the sample space for this experiment
is (8.1). Suppose each of the four possible outcomes are equally likely to occur, namely
8.4. Random Variables 141

that each element of the sample space occurs with probability 14 . Suppose further that
that we know that the first flip lands on heads. In light of this information (regarding the
first flip), what is the probability that both flips land on heads? It is relatively straight-
forward to argue that because the first flip is a head, there are now at most two possible
outcomes, namely HH or HT, both of which are equally likely to occur and hence each
of these outcomes have (conditional) probability 21 . It is worth noting that the (condi-
tional) probability of the other two outcomes, namely TH and TT, given that the first
flip is a head are unsurprisingly both 0.
If we respectively denote by A and B the event that both flips land on heads and the
event that the first flip lands on heads, then the probability obtained above is called the
conditional probability of A given B has occurred is denoted P(A|B). We can similarly
deduce the following general formula for P(A|B) which is valid for all experiments and
events A and B, namely
P(A \ B)
P(A|B) = .
P(B)
It is worth emphasising that the conditional probability P(A|B) is defined only when
P(B) > 0, namely in the scenario that the event B can occur.
As illustrated in the coin flipping example, P(A|B), namely the conditional probabil-
ity of A given B occurred, does not in general necessarily equal P(A), the unconditional
probability of A. In the special case that P(A|B) = P(A) holds, we say that A and B are in-
dependent. Using the aforementioned general formula for P(A|B), we can equivalently
state that A and B are independent if

P(A \ B) = P(A) · P(B)

holds. It is worth noting that by symmetry, it follows that if A is independent of B, then


B must be independent of A.

8.4 Random Variables

When an experiment is performed we are sometimes concerned with the value of some
numerical quantity determined by the result. These quantities of interest that are de-
termined by the results of the experiment are known as random variables. Being a little
more precise, a random variable is a mathematical formalisation of some quantity or
object that depends on random events. It is a mapping or a function from possible out-
comes within a sample space to some “measurable space”, which for our purpose will
be the real numbers R.
142 Chapter 8. Statistics and Probability Background

The cumulative distribution function (or simply the distribution function) F of the
random variable X is defined for any real number x by

F (x) = P{X  x}.

where the right-hand side represents the probability that the random variable X takes
on a value less than or equal to x.
A random variable that can take either a finite or at most a countable number of
possible values is said to be discrete. For a discrete random variable X we define its
probability mass function p(x) by

p(x) = P{X = x},

where the right-hand side represents the probability that the discrete random variable X
is exactly equal to x. If X is a discrete random variable that takes on one of the countable
number of possible values x 1 , x 2 , . . ., then because X must take one of these values, by
the Axioms of Probability we have

X
1
p(x i ) = 1.
i=1

In contrast, a random variable X is continuous if there is a nonnegative function f (x)


defined for all real numbers x with the property that for any set C of real numbers,
Z
P{X 2 C} = f (x) d x.
C

In particular, the set of outcomes lie in a set which is formally continuous and can be
intuitively thought of as some set without gaps.
The relationship between the cumulative distribution F (·) and its probability density
function f (·) is expressed by
Z a
F (a) = P{X 2 ( 1, a)} = f (x) d x.
1

In other words, the area under the curve of the probability density function f (x) from
negative infinity to a is equal to the probability that the random variable X takes value
less than a. Upon differentiation note that the density is the derivative of the cumulative
distribution function.
8.5. Expectation 143

8.5 Expectation
One of the most fundamental and useful concepts in probability is that of the expectation
of a random variable. If X is a discrete random variable which takes one of the possible
values x 1 , x 2 , . . ., then the expected value of X or the mean of X is denoted by E[X ] and
defined by X X
E[X ] = x i P {X = x i } = x i pi , (8.2)
i i
where pi denotes the probability corresponding to the value x i . In other words, the
expected value of X is a weighted average of the possible values X can take, where the
weights are simply the probability that X assumes that value.
If X is instead a continuous random variable with probability density function f (·),
then similarly to (8.2), the expected value of X is
Z1
E[X ] = x f (x) d x.
1

Suppose now that we do not want to determine the expected value of a random
variable X but instead of the random variable g(X ), where g(·) denotes some given
function. Note that g(X ) takes on the value g(x) when the random variable X takes
on the value x. Intuitively, it seems that E[g(X )] should be a weighted average of the
possible values g(x) with, for a given x, the weight for g(x) equals the probability (or
probability density) that X equals x. Indeed, this turns out to be the case and thus the
following result holds.

Proposition. Let g(·) denote any function. If X is a discrete random variable with proba-
bility mass function p(x) , then
X
E[g(X )] = g(x) p(x) .
x

Instead, if X is continuous with probability density function f (x) , then


Z1
E[g(X )] = g(x) f (x) d x .
1

It can be shown in consequence that if a and b are constants, then

E[aX + b] = aE[X ] + b.

Further, it can be shown that expectation is a linear operation in the sense that for any
two random variables X 1 and X 2 , we have

E[X 1 + X 2 ] = E[X 1 ] + E[X 2 ] ,


144 Chapter 8. Statistics and Probability Background

or more generally that


ñ ô
X
n X
n
E Xi = E[X i ] .
i=1 i=1

8.6 Variance

Despite the clear uses of the expected value E[X ], it does not yield any information about
the variance of these values. There are several ways to measure variance, however, one
important approach is to consider the average value of the square of the difference
between X and E[X ]. This inspires the following definition.

Definition. If X is a random variable with mean µ, then the variance of X , denoted by


Var(X ) , is defined by
⇥ ⇤
Var(X ) = E (X µ)2 .

One particularly useful alternate formula for Var(X ) is


⇥ ⇤
Var(X ) = E X 2 µ2
⇥ ⇤
= E X2 (E [X ])2 ,

which can be derived by expanding and simplifying. A useful identity, the proof of which
is left as an exercise, is that for all real constants a and b, we have

Var(aX + b) = a2 · Var(X ) .

In contrast to the fact that the expected value of a sum of random variables is equal
to the sum of the expectations, the corresponding result does not in general hold for
variances. Despite this, this does in fact turn out to be true in the special case where the
random variables are independent.
We now define the useful concept of covariance between two random variables. It
should be noted that the following definition enables us to prove the claim that the sum
of the variance of the sum of independent random variables is equal to the sum of their
variances.

Definition. The covariance of two random variables X and Y , denoted by Cov(X , Y ) , is


defined by
⇥ ⇤
Cov(X , Y ) = E (X µ x )(Y µy) ,

where µ x = E[X ] and µ y = E[Y ] .


8.7. Chebyshev’s Inequality and the Laws of Large Numbers 145

One particularly useful alternate expression for Cov(X , Y ) is

Cov(X , Y ) = E[X Y ] E[X ] · E[Y ] ,

which can be obtained by expanding and then using the linearity of expectation. In
addition, it will be useful to have an expression for Var(X +Y ) in terms of their individual
variances and the covariance between them. In particular, we deduce that

Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X , Y ) ,

which follows by expanding, simplifying and using the aforementioned linearity of ex-
pectation, i.e. E[X + Y ] = µ x + µ y .
The above allows us to define the correlation between two random variables, which
informally provides a normalised measure of their covariance. It is worth noting that
the correlation is known by several names including the Pearson correlation coefficient
(PCC) or the Pearson product-moment correlation coefficient (PPMCC), named after
the English mathematician Karl Pearson who presented the coefficient in 1895 after a
related idea was previously suggested by Galton (see e.g. [18]).

Definition. The correlation between two random variables X and Y , which is denoted by
Corr(X , Y ) , is defined by
Cov(X , Y )
Corr(X , Y ) = p . (8.3)
Var(X ) · Var(Y )
Note that the random variables X and Y are said to be positively correlated if their
covariance is strictly positive, while, these random variables are said to be negatively
correlated if their covariance is strictly negative. For completeness, note that

0  |Corr(X , Y )|  1

holds, where | · | denotes the absolute value. The lower bound follows upon recalling
that Cov(X , Y ) = 0 holds if X and Y are independent, while, the upper bound follows
in light of the Cauchy-Schwarz inequality (see e.g. [2, Section 10.1]). Further, upon
rearranging the equality (8.3) we observe that the covariance between two random
variables is equal to the product of their correlation and the square root of the product
of their corresponding variances.

8.7 Chebyshev’s Inequality and the Laws of Large Numbers


In consequence to Markov’s inequality, which states that if a random variable X takes
on only nonnegative values then for any a > 0 we have P{X a}  E[X ]/a holds,
146 Chapter 8. Statistics and Probability Background

we deduce as a corollary Chebyshev’s inequality. This states that the probability that
a random variable differs from its mean by more than k of its standard deviations is
bounded by 1/k2 , where the standard deviation of a random variable is defined to be
the nonnegative square root of its variance.

Corollary (Chebyshev’s Inequality). If X is a random variable having mean µ and vari-


2
ance , then for any value k > 0, we have
1
P{|X µ| k } .
k2
It is possible (see e.g. [21, pp. 17]) to use Chebyshev’s inequality in order to prove
the weak law of large numbers, which states that the probability that the average of the
first n terms of a sequence of independent and identically distributed random variables
differs from its mean by more than ✏ goes to 0 as the number of terms n goes to infinity.

Theorem (The Weak Law of Large Numbers). Let X 1 , X 2 , . . . be a sequence of independent


and identically distributed random variables with mean µ. Then, for any ✏ > 0,

X1 + · · · + Xn
P µ > ✏ ! 0 as n ! 1.
n
A generalisation of the weak law is known as the strong law of large numbers which
states, with probability 1, that
X1 + · · · + Xn
lim = µ.
n!1 n
In other words, with certainty the long-run average of a sequence of independent and
identically distributed random variables will converge to its mean.

8.8 Some Discrete Random Variables


There are certain types of random variables that frequently appear in applications. In
this section we survey some of the most widely used discrete ones.

Binomial Random Variables

Suppose that n independent trials are to be performed, where with probability p the
result of each trial is a “success”. If X represents the number of successes which occur
within the n trials, then X is called a binomial random variable with parameters (n, p).
Its probability mass function is
Å ã
n i
Pi ⌘ P{X = i} = p (1 p)n i , i = 0, 1, . . . , n (8.4)
i
8.8. Some Discrete Random Variables 147

where Å ã
n n!
=
i i!(n i)!
is the binomial coefficient that equals the number of different subsets of i elements that
can be chosen from a set of n elements.
A binomial (1, p) random variable is known as a Bernoulli random variable, named
after Swiss mathematician Jacob Bernoulli. Note that since a binomial (n, p) random
variable X represents the number of successes within n independent trials, each of which
with success probability p, we can perhaps unsurprisingly represent it as

X
n
X= Xi (8.5)
i=1

where 8
<1, if the i-th trial is a success,
Xi =
:0, otherwise.

Now
E [X i ] = P{X i = 1} = p
⇥ ⇤
Var(X i ) = E X i2 E ( [X i ] )2 = E [X i ] E ( [X i ] )2
=p p2 = p (1 p),

where X i2 = X i since X i = 1 or 0. In particular, upon recalling (8.5) we note, for a


binomial (n, p) random variable X , that

X
n
E[X ] = E[X i ] = np
i=1
Xn
Var(X ) = Var(X i ) since the X i ’s are independent
i=1

= np(1 p).

It is worth noting for completeness that one can generalise the binomial distribution
such that for our n independent trials, each trial now leads to a “success” for exactly
one of k possible categories, where each category has a given success probability. Such
a distribution is called the multinomial distribution and gives the probability of any par-
ticular combination of numbers of successes for the various categories. Note if k = 2
and n = 1, we obtain the Bernoulli distribution, while, if k = 2 and n 2, we obtain
the binomial distribution.
148 Chapter 8. Statistics and Probability Background

Poisson Random Variables

A random variable X that takes on one of the values 0, 1, 2, . . . is said to be a Poisson ran-
dom variable, named after French mathematician Siméon Denis Poisson, with parameter
, where > 0, if its probability mass function is
i
pi = P{X = i} = e , i = 0, 1, . . . ,
i!
where the symbol e denotes the famous constant in mathematics (with rough value
2.7183) which is defined by e = limn!1 (1 + 1/n)n .
Poisson random variables have a wide array of applications. One reason for this is
that such random variables may be used to approximate the distribution of the number
of successes in a large number of trials (which are either independent or at most “weakly
dependent”) when each trial has a small probability of success. In order to see why this
is the case, suppose that X is a binomial (n, p) random variable, i.e. that X represents the
number of successes in n independent trials, where each has success probability equal
to p, and let = np. Then (8.4) becomes

n!
P {X = i} = p i (1 p)n i
i!(n i)!
Å ãi Å ãn i
n!
= 1 (8.6)
(n i)! i! n n
n(n 1) · · · (n i + 1) i (1 /n)n
= .
ni i! (1 /n)i

Note that if n is large and p is small, then


Å ãn Å ãi
n(n 1) · · · (n i + 1)
1 ⇡e , ⇡ 1, 1 ⇡ 1,
n ni n

which means that (8.6) becomes


i
P {X = i} ⇡ e .
i!
Upon recalling that the mean and variance for a binomial random variable Y are

E[Y ] = np, Var(Y ) = np(1 p) ⇡ np for small p,

it is perhaps unsurprising that given the relationship between binomial and Poisson
random variables, that for a Poisson random variable X with parameter , that

E[X ] = Var(X ) = .
8.8. Some Discrete Random Variables 149

Geometric Random Variables

Consider independent trials, where each has success probability equal to p. If X repre-
sents the number of the first trial that is a success, then

P{X = n} = p(1 p)n 1 , n 1. (8.7)

Note (8.7) is obtained via independence and by observing that in order for the first
success to occur on the n-th trial, the first n 1 trials must all be failures (where each
failure occurs with probability 1 p) and the n-th trial a success.
A random variable with probability mass function (8.7) is called a geometric random
variable with parameter p. The mean is

X
1
1
1
E[X ] = np(1 p)n = ,
n=1
p

P1 n 1
where the final equality follows from the algebraic identity n=1 nx = 1/(1 x)2
for 0 < x < 1. Further, in this case

1 p
Var(X ) = .
p2

The Negative Binomial Random Variable

If X denotes the number of trials needed to amass a total of r successes when each trial
has independent success probability p, then X is said to be a negative binomial random
variable (or a Pascal random variable) with parameters p and r. The probability mass
function of such a random variable is given by
Å ã
n 1 r
P{X = n} = p (1 p)n r , n r. (8.8)
r 1

Note that (8.8) is valid since in order for it to take exactly n trials to amass r successes,
the first n 1 trials must result in exactly r 1 successes, which occurs with probability
n 1 r 1 n r
r 1 p (1 p) and then the n-th trial must be a success.
If we denote by X i with i = 1, . . . , r the number of trials needed after the (i 1)-th
success in order to obtain the i-th success, then it follows that each X i is an independent
geometric random variable with common parameter p. Since

X
r
X= Xi
i=1
150 Chapter 8. Statistics and Probability Background

it follows that ñ ô
X
r X
r
r
E[X ] = E Xi = E[X i ] =
i=1 i=1
p
X
r
r(1 p)
Var(X ) = Var(X i ) = .
i=1
p2

8.9 Some Continuous Random Variables

There are certain types of random variables that frequently appear in applications. In
this section we survey some of the most widely used continuous ones.

Uniformly Distributed Random Variables

A random variable X is said to be uniformly distributed over the interval (a, b), a < b, if
its probability density function is
8
< 1
if a < x < b,
b a,
f (x) =
:0, otherwise.

In particular, X is uniformly distributed over (a, b) if it places all of its mass on that
interval and it is equally likely to be “near” any point on that interval.
The mean of a uniform (a, b) random variable is
Z b
1 b2 a2 b+a
E[X ] = x dx = = .
b a a 2(b a) 2

It is worth emphasising that the expected value is perhaps unsurprisingly the midpoint
of the interval (a, b). Further, upon noting that
Z b
⇥ 2
⇤ 1 b3 a3 a2 + b2 + a b
E X = x2 d x = = ,
b a a 3(b a) 3

we deduce
a2 + b2 + a b a2 + b2 + 2a b (b a)2
Var(X ) = = .
3 4 12
The distribution function of X is given, for a < x < b, by
Z x
1 x a
F (x) = P{X  x} = (b a) dx = .
a b a
8.9. Some Continuous Random Variables 151

Normal Random Variables


2
A random variable X is said to be normally distributed with mean µ and variance if
its probability density function is given by
1 1 2
( x µ) ,
f (x) = p e 2 1 < x < 1.
2⇡
The normal (or Gaussian) density is a bell-shaped curve that is symmetric about µ, as
shown in Figure 8.1.

Figure 8.1: This figure illustrates the normal density function.

2
The parameters µ and equal the expectation and variance of the normal, respec-
tively. In other words, we have

2
E[X ] = µ and Var(X ) = .

An important fact about normal random variables is that if X is normal with mean
2
µ and variance , then for any constants a, b 2 R, it follows that aX + b is normally
distributed (since it is the linear transformation of a normally distributed random vari-
able X ) with mean aµ + b and variance a2 2
. It follows that if X is normal with mean
2
µ and variance , then
X µ
Z=
152 Chapter 8. Statistics and Probability Background

is a normal with mean 0 and variance 1. Such a random variable Z is said to have
a standard (or unit) normal distribution. Let Φ denote the distribution function of a
standard normal random variable. This function is given by
Z x
1 2
Φ(x) = p e x /2 d x, 1 < x < 1.
2⇡ 1
x2
It should be noted for completeness that no elementary antiderivative for e exists,
however, it is possible to evaluate the definite Gaussian integral through methods from
multivariate calculus.
The observation that Z = (X µ)/ has a standard normal distribution provided
2
that X is normal with mean µ and variance turns out to be very useful since it enables
us to evaluate all probabilities related to X in terms of Φ(x). To demonstrate, observe
that the distribution function of X can be expressed as

F (x) = P{X  x}
ß ™
X µ x µ
=P 
n x µo
=P Z
⇣ x µ⌘
=Φ .

The value Φ(x) can be determined simply by either looking it up in a table or by writing
a computer program to approximate it.
Given any a 2 (0, 1) , let za be such that a standard normal variable will exceed za
with probability a, namely

P{Z > za } = 1 Φ(za ) = a.

The value of za can be obtained simply using a table of values of Φ. For example, since
Φ(2.33) = 0.99 we see that z0.01 = 2.33. In light of the symmetry of the standard normal
about the zero, it follows that

P{|Z| > za } = 2 1 Φ(za ) = 2a,

where | · | denotes the absolute value.


The importance and wide applicability of normal random variables is partly due
to one of the most important theorems in probability theory, namely the central limit
theorem. This theorem asserts that the sum of a large number of independent random
variables has approximately a normal distribution. The simplest form of this rather
remarkable theorem is as follows.
8.9. Some Continuous Random Variables 153

Theorem (The Central Limit Theorem). Let X 1 , X 2 , . . . be a sequence of independent and


2
identically distributed random variables having finite mean µ and finite variance . Then
ß ™
X 1 + · · · + X n nµ
lim P p < x = Φ(x) .
n!1 n

Exponential Random Variables

A continuous random variable with probability density function


x
f (x) = e , 0< x <1

for some > 0 is said to be an exponential random variable with parameter (or rate) .
Its cumulative distribution is
Z x
x x
F (x) = e dx = 1 e , 0 < x < 1.
0

The expected value and variance of an exponential random variable are


1 1
E[X ] = and Var(X ) = 2
.

The most important property of exponential random variables is that they possess
the “memoryless property”, which informally means that the probability distribution is
independent of its history. Being more precise, we say that the nonnegative random
variable X is memoryless if

P{X > s + t | X > s} = P{X > t} for all s, t 0. (8.9)

In order to understand why the above is called the memoryless property, imagine that X
represents the lifetime of some unit and consider the probability that a unit of age s will
survive an additional time t. This example demonstrates that (8.9) is simply a statement
that expresses that the remaining life of some unit with age s does not depend on s.
Another useful property of exponential random variables is that they remain expo-
nential after multiplication with a positive constant. In order to show that this is indeed
the case, suppose that X is an exponential random variable with parameter and let c
be a positive number. Then
n xo x
P{cX  x} = P X  =1 e c ,
c
which shows that cX is an exponential random variable with parameter /c.
Let X 1 , . . . , X n be independent exponential random variables with rate parameters
respectively. A useful (and perhaps surprising) result is that min {X 1 , . . . , X n }
1, . . . , n,
P
is exponentially distributed with rate i i . It is worth noting max {X 1 , . . . , X n } is not
in general exponential.
154 Chapter 8. Statistics and Probability Background

The Poisson Process and Gamma Random Variables

Suppose that “events” occur at random time points and denote by N (t) the number of
events that occur in the time interval [0, t]. These events are said to constitute a Poisson
process with rate , where > 0, if

(a) N (0) = 0,

(b) the number of events occurring in disjoint time intervals are independent,

(c) the distribution of the number of events that occur in a given interval depends only
on the length of the interval (and not on its location),

P{N (h)=1}
(d) limh!0 h = , and

P{N (h) 2}
(e) limh!0 h = 0.

In particular, Condition (a) states the process begins at time 0. Condition (b), known
as the independent increment assumption, tells us that the number events that occur by
time t, i.e. N (t), is independent of the number of events that occur between time t and
t + s, i.e. N (t + s) N (t). Condition (c), which is known as the stationary increment
assumption, states that the probability distribution of N (t + s) N (t) is the same for all
values of t. Condition (d) states that in a small interval of length h, the probability of one
event occurring is approximately h, while, Condition (e) tells us that the probability
that two or more events occur in such an interval is approximately 0.
These assumptions imply the number of events occurring in an interval of length t
is a Poisson random variable with mean t. In order to show this, one could consider
the interval [0, t], break this into n nonoverlapping (disjoint) subintervals of length t/n
and then consider the number of these which contain an event (see e.g. [21, pp. 29]).
For a Poisson process, let the time of the first event be denoted by X 1 . Further, for
n > 1, let X n denote the time that elapsed between the (n 1)-th and n-th event. The
sequence {X n : n = 1, 2, . . .} is called the sequence of interarrival times. This sequence
of interarrival times X i are independent and identically distributed exponential random
variables with (common) parameter .
Let
X
n
Sn = Xi = X1 + X2 + · · · + Xn (8.10)
i=1
8.9. Some Continuous Random Variables 155

denote the time of the n-th event. Observe that Sn will be less than or equal to t if and
only if there has been at least n events by time t, hence

P {Sn  t} = P {N (t) n}
= P {N (t) = n} + P {N (t) = n + 1} + · · ·
X
1
t) j
t(
= e ,
j=n
j!

where the final inequality follows since the number of events occurring in an interval of
length t is a Poisson random variable with mean t. Because the left-hand side of the
above equality is the cumulative distribution of Sn , upon differentiation, we yield the
density function for Sn , denoted here by f n (t), which is

t ( t)n 1
f n (t) = e .
(n 1)!
This inspires the following definition.
A random variable with probability density function

t ( t)n 1
f (t) = e , t >0
(n 1)!
is called a gamma random variable with parameters (n, ).
In particular, it follows that Sn , namely the time of n-th event of a Poisson process
with rate , is a gamma random variable with parameters (n, ). Further, in light of
(8.10) and since a sequence of independent and identically distributed interarrival times
are exponential random variables with parameter , we deduce the following.

Corollary. The sum of n independent exponential random variables, each which have pa-
rameter , is a gamma random variable with parameters (n, ).

The Nonhomogeneous Poisson Process

From a modelling viewpoint, a major weakness of the Poisson process is the rather strong
stationary increment assumption which tells us that events are just as likely to occur in
all intervals of equal size. A generalisation of the (standard) Poisson process, which
relaxes this assumption, leads to the nonhomogeneous or nonstationary process.
If “events” occur randomly in time and N (t) denotes the number of events that occur
by time t, then we say that {N (t) : t 0} constitutes a nonhomogeneous Poisson process
with rate (or intensity function) (t), where t 0, if

(a) N (0) = 0,
156 Chapter 8. Statistics and Probability Background

(b) the number of events that occur in disjoint time intervals are independent,
P{exactly 1 event occurs between t and t+h}
(c) limh!0 h = (t), and
P{2 or more events occur between t and t+h}
(d) limh!0 h = 0.

The function m(t) defined by


Z t
m(t) = (s) ds, t 0
0

is called the mean-value function. This function allows us to state that the number of
events that occur between time t and t + s, namely N (t + s) N (t), is a Poisson random
variable with mean m(t + s) m(t).
It should be noted that the intensity at time t, denoted by (t), indicates how likely
it is that an event will occur around time t. Further, if we set (t) = for all t, then
the nonhomogeneous process simply reverts to the usual Poisson process. The following
proposition is a useful result that allows us a way to interpret a nonhomogeneous Poisson
process.

Proposition. Suppose that events are occurring according to a Poisson process with rate
and suppose that, independently of anything that came before, an event that occurs at
time t is counted with probability p(t) . Then the process of counted events constitutes a
nonhomogeneous Poisson process with intensity function (t) = · p(t) .

8.10 Exercises for Self-Study


1. The random variable X takes on one of the values {1, 2, 3, 4} with probabilities

P{X = i} = ic, i = 1, 2, 3, 4

for some value c. Find P{2  X  3}.

2. Let A and B be events. The difference B A is defined to be the set of elements of


B that are not in A. Using the axioms of probability show that if A ✓ B, then

P(B A) = P(B) P(A) .

3. Suppose that X has probability density function

f (x) = ce x , 0 < x < 1.

Determine Var(X ) .
8.10. Exercises for Self-Study 157

4. The continuous random variable X has probability density function


®
k(4 x 2 ) , for 0  x  2,
f (x) =
0, otherwise,

where k is a constant. Find the values of k, E[X ] and Var(X ) .

5. Show that for all real constants a and b, we have Var(aX + b) = a2 · Var(X ) .

6. Suppose X is binomial random variable with parameters (n, p). Show that the
probability P{X = i} firstly increases and then decreases, reaching its maximum
value when i is the largest integer less than or equal to (n + 1) p.

7. Show if X and Y are independent binomial random variables with respective pa-
rameters (n, p) and (m, p), then X + Y is binomial with parameters (n + m, p).

8. Show that if X us a Poisson random variable with parameter , then

E[X ] = Var( ) = .

2
9. For a normal random variable X with parameters µ and , show that

a) E[X ] = µ, and
2
b) Var(X ) = .

10. Show that a linear combination of a countable number of independent normal


random variables has a normal distribution.

11. Consider a Poisson process in which events occur at a rate of 0.3 per hour. What
is the probability that no events occur between 10 AM and 2 PM?
159

Chapter 9

Random Numbers

One of the most fundamental building blocks of a simulation study is the ability to gener-
ate random numbers. People who think about the topic of random number generation
frequently get into philosophical discussions about what the word “random” actually
means. In some sense, there is no such thing as a random number since, for example,
would you say that 11 is a random number? In light of this, we instead will talk about a
sequence of independent random numbers with a specified distribution. Informally, this
means that each number is obtained by chance, having no relation to the other numbers
in the sequence and that each number has a specified probability of falling in any given
range of values.
The construction of a random number generator may initially appear to be the kind
of thing that any good programmer can do easily. Despite this, it turns out that gen-
erating truly random numbers is not such a simple task. Historically, some options for
generating random numbers in scientific work include:

• rolling dice,

• coin flipping,

• drawing shuffled cards,

• drawing balls from a “well-stirred urn”,

• drawing yarrow stalks,

• selecting random digits from census reports,

• measuring atmospheric noise, and

• measuring thermal noise.

Because of the rather mechanical nature of some of the aforementioned techniques, it


should not be surprising that generating large quantities of sufficiently random numbers
160 Chapter 9. Random Numbers

via these approaches requires a great deal of both time and effort. Hence, in this section
we will discuss how such numbers are computationally generated and illustrate a small
number of their uses. For our purpose, we say a random number represents the value
of a random variable that is uniformly distributed on (0, 1).

9.1 Pseudorandom Number Generation


In contrast to the previously mentioned manual or mechanical approaches to random
number generation, the modern approach is to make use of a computer to successively
generate pseudorandom numbers. These pseudorandom numbers constitute a sequence
of values, which, although they are deterministically generated (i.e. that the same input
will always yield the same sequence as the output), they have the appearance of being
independent uniform (0, 1) random variables.
Von Neumann, who developed an early approach for generating pseudorandom
numbers in the 1940s, famously stated in 1951 [22] that “anyone who considers arith-
metical methods of producing random digits is, of course, in a state of sin”. This quote
highlights that a sequence of numbers generated using the following techniques is not
truly random, but instead it simply appears to be random. It turns out that random num-
bers generated deterministically have worked well in nearly all scientific applications,
provided that a suitable method has been carefully selected.
One of the most common approaches to generating pseudorandom numbers was
introduced by Lehmer in 1949 [19]. The method starts with some initial value x 0 ,
known as the seed, and then recursively computes successive values x n , for n 1, using

(9.1)

where a and m are given positive integers and the above means that x n takes the value
of the remainder of a x n 1 upon division by m. Note that for all values of n, each x n
is one of the integers {0, 1, . . . , m 1} and then the quantity x n /m, the pseudorandom
number, is taken as an approximation to the value of a uniform (0, 1) random variable.

Example. (Multiplicative congruential method)


9.2. Using Random Numbers to Evaluate Integrals 161

The approach specified by (9.1) in order to generate random numbers is known


as the multiplicative congruential method. Because each x n assumes one of the values
{0, 1, . . . , m 1}, it follows that after some finite number (which is at most m) of gen-
erated values, a value must then repeat and, in consequence, the whole sequence will
then begin to repeat. For this reason, we want to choose the constants a and m such
that, given any initial seed x 0 , the number of variables that can be generated before
repetition occurs is large.
It turns out, the constants a and m in general should be chosen to satisfy:

1.

2.

3.

As a guideline, it turns out that m should be chosen to be a large prime number that
can be fitted to the computer word size. For a 32-bit word machine (where the first bit
is a sign bit), it has been shown that m = 231 1 and a = 75 = 16, 807 result in desirable
properties. For a 36-bit word machine, the choices 235 31 and a = 55 = 3, 125 appear
to work well.
Another generator of pseudorandom numbers uses recursions of the type

where c is a nonnegative integer. These type of generators are called mixed congruential
generators since they involve both an additive and a multiplicative term. It is worth
noting that if c = 0, then we yield the multiplicative congruential generator (9.1). When
making use of mixed congruential generators, one may choose m equal to the computer’s
word length as this makes computing the division of a x n 1 + c by m quite efficient.
As our starting point for computer simulation, we suppose that we can generate a
sequence of pseudorandom numbers that can be taken as an approximation to the values
of a sequence of independent uniform (0, 1) random variables.

9.2 Using Random Numbers to Evaluate Integrals


One of the most early applications of random numbers was in the computation of inte-
grals. Let g(x) be a function and suppose we want to compute the definite integral
Z 1
✓= g(x) d x.
0
162 Chapter 9. Random Numbers

In order to compute the value of ✓ , observe that if U is a random variable uniformly


distributed over (0, 1), then we can express ✓ equivalently as
Z1 Z1
✓= g(x) d x = g(x) · 1 d x
0 0
Z1
= g(x) f (x) d x = E[g(U)] ,
1
where f (x) is used to denote the probability density function for a uniform (0, 1) ran-
dom variable and where the final equality follows by the proposition from Section 8.5
(entitled Expectation).
If U1 , . . . , Uk are independent uniform (0, 1) random variables, it follows that the
corresponding random variables g(U1 ), . . . , g(Uk ) are independent and identically dis-
tributed random variables with mean ✓ . Further, it follows by the strong law of large
numbers, with probability 1, that

In particular, this means that we can approximate ✓ , i.e. the definite integral, by gen-
erating a large number of random numbers ui and taking as our approximation the
average value of g(ui ). This approach for approximating integrals is called the Monte
Carlo approach.
It is worth noting that in previous calculus courses you will have seen that the diffi-
cultly of evaluating ✓ using standard techniques from calculus depends significantly on
the given integrand g(x). For example, if the integrand was a “simple” polynomial, then
calculating ✓ would be a relatively straightforward task. In contrast, from the standard
R 2
normal distribution, it is known that no elementary antiderivative for e x d x exists
and, in such case, one would need to make use of other techniques (such as the approach
outlined above) in order to evaluate ✓ .
Note that in the above, the limits of integration were 0 and 1, respectively. If we
instead wanted to compute
Z b
✓= g(x) d x
a
then, using the substitution y = (x a)/(b a) and hence d y = d x/(b a), we obtain
Z1 Z1
✓= g a + (b a) y (b a) d y = h( y) d y,
0 0

where h( y) = (b a) g a + (b a) y . This means that we again approximate ✓ by


generating random numbers and then taking the average value of h evaluated at these
random numbers.
9.2. Using Random Numbers to Evaluate Integrals 163

In a similar fashion, if we wanted to evaluate


Z1
✓= g(x) d x,
0

we could use the substitution y = 1/(x + 1) and hence d y = d x/(x + 1)2 = y2 d x


in order to obtain Z 1
✓= h( y) d y,
0
where Ä ä
1
g y 1
h( y) = .
y2
Using random numbers to approximate integrals becomes more clear in the case of
multidimensional integrals. Suppose now that g is a function with an n-dimensional
argument and that we want to compute
Z 1Z 1 Z 1
✓= ··· g (x 1 , x 2 , . . . , x n ) d x 1 d x 2 · · · d x n .
0 0 0

The key to estimating ✓ via the Monte Carlo approach is that we can similarly express
✓ as
✓ = E [g(U1 , . . . , Un )] ,

where U1 , . . . , Un are independent uniform (0, 1) random variables. Hence , if we gen-


erate k independent sets, each consisting of n independent uniform (0, 1) random vari-
ables
U11 , . . . , Un1
U12 , . . . , Un2
..
.
U1k , . . . , Unk

then, because the random variables g(U1i , . . . , Uni ) , where i = 1, . . . , k, are all indepen-
dent and identically distributed random variables with mean ✓ , we can estimate ✓ using

X
k
g U1i , . . . , Uni
.
i=1
k

Example. (Estimating ⇡)
164 Chapter 9. Random Numbers

9.3 Exercises for Self-Study


1. Suppose x 0 = 5 and x n ⌘ 3x n 1 (mod 150). Find x 1 , x 2 , . . . , x 10 .

2. Suppose x 0 = 3 and x n ⌘ 5x n 1 + 7 (mod 200). Find x 1 , x 2 , . . . , x 10 .


R1 x
3. Use simulation to approximate 0
e e d x.
R1
4. Use simulation to approximate 0
(1 x 2 )3/2 d x.
R1 x2
5. Use simulation to approximate 1
e d x.
R1R1 2
6. Use simulation to approximate 0 0
e(x+ y) d y d x.

7. Let U be a uniform random variable on (0, 1). Use simulation to approximate the
correlation Corr(U, 1 U).

8. Let U be a uniform random variable on (0, 1). Use simulation to approximate the
p
correlation Corr(U, 1 U 2 ).

9. Let U be a uniform random variable on (0, 1). Use simulation to approximate the
p
correlation Corr(U 2 , 1 U 2 ).
165

Chapter 10

Generating Discrete Random Variables

Recall that a random variable that can take either a finite or at most a countable number
of possible values is said to be discrete. For a discrete random variable X , its probability
mass function p(x) is p(x) = P{X = x}, where the right-hand side represents the prob-
ability that X is exactly equal to x. Within this chapter, we introduce several approaches
to generating discrete random variables.

10.1 The Inverse Transform Method

Suppose we want to generate the value of some discrete random variable X with prob-
ability mass function
X
P X = x j = p j , where j 2 {0, 1, . . .} and p j = 1. (10.1)
j

To demonstrate how we can generate the value of such a random variable, let us
consider firstly the following example.

Example. (The discrete inverse transform method)

Hence, in order to generate a discrete random variable X with probability mass


function (10.1), one approach is that we firstly generate some random number U, where
166 Chapter 10. Generating Discrete Random Variables

U is uniformly distributed over (0, 1), and then set


8
>
> x 0 , if U < p0
>
>
>
>
> x 1 , if p0  U < p0 + p1
>
<
.
X = ..
>
> Pj 1 Pj
>
>
>
> x j , if i=0 pi  U < i=0 pi .
>
>
:...

Since, for 0 < a < b < 1, we have

P{a  U < b} = P{U < b} P{U  a} = b a,

it follows that ( )
X
j 1 X
j
P X = xj = P pi  U < pi = pj
i=0 i=0

holds, which implies that X has the desired distribution described by (10.1).

Remarks.

1. the preceding approach can be written algorithmically, namely:

Generate a random number U, then

if U < p0 , set X = x 0 and stop,

if U < p0 + p1 , set X = x 1 and stop

if U < p0 + p1 + p2 , set X = x 2 and stop


..
.

2. if the x i ’s are ordered such that x 0 < x 1 < x 2 < · · · and if we let F (·) denote the
cumulative distribution function for X , then

F (x k ) = P {X  x k }
X
k
= P {X = x 0 } + P {X = x 1 } + · · · + P {X = x k } = pi
i=0

and therefore the random variable X will equal x j if F (x j 1)  U < F (x j ) holds.

In other words, after we generate a random number U, we can determine the value

of some discrete random variable X by finding the half-open interval F (x j 1 ), F (x j )
1
that contains U. We could equivalently determine X by finding F (U) , namely the
10.1. The Inverse Transform Method 167

inverse of F (U). It is for this reason that this approach is known as the discrete inverse
transform method for generating X .
The first remark demonstrates that the amount of time it takes to generate a discrete
random variable using this approach is proportional to the number of intervals one must
search. In light of this, it is sometimes worthwhile to rearrange the x i ’s such that they
appear in decreasing order of the p j ’s.

Example. (The discrete inverse transform method with ordering)

Example. (Generating permutations)

It should be noted that the ability to generate a random subset is particularly useful
when conducting medical trials. For example, suppose that a medical centre wishes to
test a new drug that is designed to reduce the user’s symptoms of long COVID after ex-
posure to the coronavirus infection. In order to test the effectiveness, suppose that the
medical centre has recruited 2000 volunteers to be subjects in the test. In order to take
account of the possibility that the subjects’ response to the infection could be impacted
by factors that are external to the test (such as a change in behaviour or weather condi-
tions), it has been decided to split the volunteers into two groups of size 1000, namely
a treatment group that are given the drug and a control group that will instead be given
a placebo. Further, both the volunteers and the administrators of the drug will not be
told who is in each group during the trial (and for this reason the approach is known as
called a double-blind trial).
168 Chapter 10. Generating Discrete Random Variables

It now remains to decide which of the 2000 volunteers should be chosen for the
treatment group. It is clear that we would want to the treatment and control groups
to be as similar as possible in all respects with the exception that the members of one
group receive the drug while those in the other receive a placebo. If this occurs, then
it would be indeed possible to draw conclusions that any difference in response is due
to the drug. There is general agreement that the most effective approach to accomplish
this is to simply choose the 1000 volunteers to be in the treatment group completely at
2000
random. In particular, the choice should be made such that each of the 1000 subsets of
1000 volunteers is equally likely to constitute the set of volunteers.

Example. (Approximating)

Example. (Generating geometric random variables)

Example. (Generating Bernoulli random variables)

10.2 Generating a Poisson Random Variable


Recall that a random variable X that takes on one of the values 0, 1, 2, . . . is said to be a
Poisson random variable with mean , where > 0, if its probability mass function is
i
pi = P{X = i} = e , i = 0, 1, . . . ,
i!
where the symbol e denotes the famous constant defined by e = limn!1 (1 + 1/n)n .
10.2. Generating a Poisson Random Variable 169

The key to using the inverse transform method to generate such a Poisson random
variable is to make use of the identity

pi+1 = pi , i 0, (10.2)
i+1
which follows upon rearranging
i+1
e
pi+1 (i+1)!
= i
= .
pi e i+1
i!

Upon using the above recursion (10.2) to compute the Poisson probabilities as they
become needed, the inverse transform algorithm to generate a Poisson random variable
with mean can be expressed as follows. Here we use i 2 {0, 1, 2, . . .} to denote the
value currently under consideration, p = pi is the probability that X equals i and F =
F (i) is the probability that X is less than or equal to i. The algorithm is:

Step 1: Generate a random number U.

Step 2: Let i = 0, p = e and F = p.

Step 3: If U < F , set X = i and stop.

Step 4: Set p = p/(i + 1), F = F + p and i = i + 1.

Step 5: Go to Step 3.

It should be emphasised that in the above when we write, for example i = i+1, we do not
mean that i is equal to i + 1, but rather we mean that the value of i should be increased
by 1. Further, in order to see why the above does indeed generate a Poisson random
variable with mean (which recall takes on one of the values 0, 1, 2, . . .), observe that
we firstly generate the random number U and then check if U < e = p0 . If this is
indeed the case, we set X = 0. If not, in Step 4 we compute p1 using (10.2). Then,
we check if U < p0 + p1 , where the right-hand side is the updated value of F . If this is
the case, we now set X = 1. If not, the process continues and we compute the values
2, 3, . . . for as long as necessary.
The algorithm outlined above checks firstly if the Poisson value is 0, then whether it
is 1, then 2, and so on. Observe that the number of comparisons needed will be more one
more than the value generated for the Poisson. If for example we generated the value
X = 0, then we would have compared U with F (in Step 3) once, while, if instead we
generated the value X = 1, then we would have completed the comparison (from Step
3) twice. In consequence, the algorithm that has been outlined will require on average
170 Chapter 10. Generating Discrete Random Variables

1+ searches. In particular, this is fine when is small, however, this approach can be
greatly improved on when is large.
Because a Poisson random variable is most likely to take on one of the two integral
values closest to , a more efficient approach would be first to check one of these values
rather than starting at 0. For example, if we let I = b c , namely the largest integer that
is less than or equal to , and then use (10.2) to recursively determine F (I) . In order
to generate a Poisson random variable X with mean we generate firstly a random
number U and see whether or not X  I holds by checking if U  F (I) holds. We then
search downwards if X  I holds and search upwards starting from I +1 otherwise. The
number of searches needed by this algorithm is approximately one more than absolute
p
difference between X and its mean, which is around 1 + 0.798 on average.

10.3 Generating Binomial Random Variables


Suppose that n independent trials are to be performed, where with probability p the re-
sult of each trial is a “success”. Recall that if X represents the number of successes which
occur within the n trials, then X is called a binomial random variable with parameters
(n, p) and its probability mass function is
Å ã
n i
pi ⌘ P{X = i} = p (1 p)n i , i = 0, 1, . . . , n
i

where Å ã
n n!
=
i i!(n i)!
is the binomial coefficient.
In order to use the inverse transform method to generate such a Binomial random
variable, we similarly make use of the recursive identity

n i p
P{X = i + 1} = P{X = i} ,
i+1 1 p

which follows since


n!
pi+1 = p i+1 (1 p)n i 1
(n i 1)! (i + 1)!
n! (n i) p n i p
= p i (1 p)n i = pi
(n i) i! (i + 1) 1 p i+1 1 p

holds. Let i denote the value currently under consideration, pr = P{X = i} be the
probability that X is equal to i and F = F (i) be the probability that X is less than or
equal to i. The inverse algorithm for generating a binomial random variable is:
10.4. The Acceptance-Rejection Technique 171

Step 1: Generate a random number U.

Step 2: Let c = p/(1 p), i = 0, pr = (1 p)n and F = pr.

Step 3: If U < F , set X = i and stop.

Step 4: Set pr = pr c(n i)/(i + 1) , F = F + pr and i = i + 1.

Step 5: Go to Step 3.

It should be noted that the algorithm outlined checks firstly if X = 0, then whether
it is 1, then 2, and so on. The number of searches needed will similarly be one more
than the generated value of X and hence it will take 1 + np searches to generate X
on average. Observe that because a binomial (n, p) random variable represents the
number of successes that occur within n independent trials, where each trial has success
probability p, it follows that we can also generate this random variable by subtracting
from n the value of a binomial (n, 1 p) random variable. This follows since each trial
can be either a success (with probability p) or a failure (with probability 1 p). For
this reason, if p > 1/2 , then a more efficient approach would be to use the outlined
approach to generate a binomial (n, 1 p) random variable before subtracting its value
from n to obtain the desired value.

Remarks.

1. Recall that a binomial (n, p) random variable X can be interpreted as the number
of successes in n independent Bernoulli trials, where each Bernoulli trial has suc-
cess probability p. In light of this, another way to simulate X is to instead generate
the outcomes of these n Bernoulli trials.

2. Similarly to the Poisson case, when the mean np is large it will be more efficient to
determine if the generated value is less than or equal to (or greater than) I = bnpc.
In the first case, one should start the search with I and then successively search
downwards, while, in the second case, start from I + 1 and move upwards.

10.4 The Acceptance-Rejection Technique


Suppose that we have an efficient method for simulating a random variable with prob-
ability mass function {q j : j 0}, where q j denotes the probability that the discrete
random variable is exactly equal to j. It is possible to use this as the basis for simulating
from a distribution with probability mass function {p j : j 0} by firstly simulating a
172 Chapter 10. Generating Discrete Random Variables

random variable Y whose probability mass function is {q j } before then accepting this
simulated value with a probability that is proportional to pY /qY .
Being a little more precise, let c be a strictly positive constant such that
pj
c for all j such that p j > 0
qj

holds, i.e. that p j  c · q j for all j such that p j > 0. The following technique, which is
called the acceptance-rejection method or the rejection method, allows us to generate a
discrete random variable X with probability mass function p j = P{X = j} for each j. In
particular, the algorithm is:

Step 1: Simulate the value of Y , with probability mass function {q j : j 0}.

Step 2: Generate a random number U.

Step 3: If U < pY /c · qY , set X = Y and stop.

Step 4: Go to Step 1.

Informally, this algorithm simulates a random variable X with probability mass function
is p j by instead generating another random variable whose mass function is q j , where
the mass function q j is “close” to p j , namely that the ratio p j /g j is bounded by a constant
value. In practice, we would like the constant to take value as close to 1 as possible, i.e.
that the two mass functions q j and p j are as similar as possible.
The power of the rejection method, an early version was initially proposed by von
Neumann, will become even more clear when we consider its analogue for generating
continuous random variables. We now show that the rejection method works.

Theorem. The acceptance-rejection method generates a discrete random variable X such


that P{X = j} = p j , with j = 0, 1, . . . and the number of iterations needed is a geometric
random variable with mean c.

Proof.
10.5. Exercises for Self-Study 173

Example. (The discrete acceptance-rejection method)

10.5 Exercises for Self-Study


1. Write a computer program to generate n values from the probability mass function
p1 = 1/3 and p2 = 2/3.

a) Let n = 100 and run your computer program to determine the proportion of
values that are equal to 1.

b) Repeat the above with n = 1000.

c) Repeat the above with n = 10, 000.

2. Write a computer program that, when given a probability mass function {p j : j =


1, 2, . . . , n} as an input, gives as an output the value of a random variable having
this mass function.

3. Give an efficient algorithm to simulate the value of a random variable X such that

P{X = 1} = 0.3, P{X = 2} = 0.2, P{X = 3} = 0.35, P{X = 4} = 0.15.

4. A deck of 100 cards that are numbered 1, 2, . . . , 100 is shuffled and then turned
over one card at a time. We say that a “match” occurs whenever card i is the i-th
card to be turned over, where i = 1, 2, . . . , 100. Write a simulation program to
estimate the expectation and variance of the total number of matches. Run your
computer program to find estimates for the desired values and then compare these
with exact answers.

5. A pair of fair dice are continually rolled until all possible outcomes 2, 3, . . . , 12
have occurred at least once. Develop a simulation study to estimate the expected
number of dice rolls that are needed.
175

Chapter 11

Generating Continuous Random


Variables

Recall that a random variable X is continuous if there is a nonnegative function f (x)


defined for all real x with the property that for any set C of real numbers,
Z
P{X 2 C} = f (x) d x.
C

Further, the relationship between the cumulative distribution F (·) and its probability
density function f (·) is expressed by
Z a
F (a) = P{X 2 ( 1, a)} = f (x) d x.
1

Within this chapter, we discuss several approaches to generating continuous random


variables. It turns out that each of the techniques that saw in the previous chapter for
generating a discrete random variable has an analogue in the continuous case.

11.1 The Inverse Transform Method

Consider a continuous random variable with cumulative distribution function F (·). The
inverse transformation method provides a general method for generating continuous ran-
dom variables and is based on the following proposition.

Proposition. Let U be a uniform (0, 1) random variable. Given any continuous cumulative
distribution function F (·) , the random variable X defined by

1
X=F (U)

1
has distribution F , where F (u) takes the value x such that F (x) = u holds.
176 Chapter 11. Generating Continuous Random Variables

It is worth emphasising that this proposition demonstrates that we can generate a


continuous random variable X with (continuous) cumulative distribution function F (·)
1
by firstly generating a random number U and then by setting X = F (U) . Let us prove
that this proposition holds.

Proof.

Observe F (·) is a cumulative distribution function and hence it follows that it is a


monotonically increasing (i.e. a nondecreasing) continuous function. Further, it follows
by the continuous inverse theorem (see e.g. [3, Section 5.6]) that if F (·) is strictly in-
1
creasing continuous function, then F (·) exists and is a strictly increasing continuous
function. Despite this, the existence of an inverse function does not necessarily mean
that one can obtain an explicit closed formula for the inverse. For example, the cumu-
lative distribution function for a normal distribution is continuous and strictly increas-
ing, however, via this approach we would naturally look to integrate the corresponding
probability density function, however, recall (from Chapter 8) that this function does
not have an elementary antiderivative. It turns out that the inverse transform method
can be used in practice provided that we are able to find an explicit formula for the
1
inverse function F (·) in closed-form.
In order to demonstrate how we can generate continuous random variables using
this method, let us consider several examples.

Example. (Generating a random variable with a “polynomial” distribution function)

Example. (Generating an exponential random variable)


11.1. The Inverse Transform Method 177

Remark. The example outlined provides us with an additional algorithm for generating
a Poisson random variable. Recall that a Poisson process with rate results when the
times between successive events are independent exponential random variables with
rate . For such a process, the number of events that occur by time 1, denoted by N (1),
is Poisson distributed with mean . Further, we can alternatively express the number
of events by time 1 depending on the successive interarrival times of these events. In
particular, if we let X i with i 2 {1, 2, . . .} denote the successive arrival times, then the
Pn
n-th event will occur at time i=1 X i . Therefore, N (1) can be expressed as

i.e. the number of events that occur by time 1 is equal to the maximal n such that
the n-th event occurs by time 1. Upon using of the techniques from the example, we
can generate a Poisson random variable with mean , denoted by N = N (1), by firstly
generating random numbers U1 , U2 , . . . , Un and then by setting
® ´
Xn
1
N = max n : log(Ui )  1
i=1
® ´
X
n
= max n : log(Ui )
i=1

= max {n : log(U1 · · · Un ) }
= max n : U1 · · · Un e

In particular, this shows one can generate a Poisson variable N with mean by succes-
sively generating random numbers until their product falls below e and then set N to
equal one less than the number of random numbers required, i.e.

N = min n : U1 · · · Un < e 1.

Recall (from Chapter 8) that the sum of n independent exponential random vari-
ables, each with parameter , is a gamma random variable with parameters (n, ) . It
follows that the example above allows us to further generate a gamma (n, ) random
variable efficiently. The following example demonstrates how we do this.
178 Chapter 11. Generating Continuous Random Variables

Example. (Generating a gamma random variable)

This example provides an efficient way of generating a set of exponential random


variables by firstly generating their sum and then make use of this to generate the in-
dividual values conditional on the sum. In particular, it is possible to generate a pair
of independent and identically distributed exponentials with mean 1, say X and Y , by
firstly generating X + Y and, given that t = X + Y , then using the fact that the condi-
tional distribution of X is here uniform on (0, t). The following algorithm can be used
to generate a pair of exponential random variables with mean 1:

Step 1: Generate random numbers U1 and U2 .

Step 2: Set t = log(U1 U2 ).

Step 3: Generate a random number U3 .

Step 4: Set X = t · U3 and Y = t X.

It is worth noting this algorithm saves a logarithmic computation at the cost of two
multiplications and the generation of a random number when compared with the more
direct approach of generating two random numbers U1 and U2 and setting X = log(U1 )
and Y = log(U2 ). Similarly, in order to generate k independent exponential random
variables with mean 1 we can generate firstly their sum, say t = log(U1 · · · Uk ), and
then by generating k 1 additional random numbers U10 , . . . , Uk0 1 which are ordered.
0 0
Suppose that U(1) < ... < U(k 1)
denote the corresponding ordered values, then the k
exponentials are
⇥ ⇤
t U(i) U(i 1) , i = 1, 2, . . . , k, where U(0) = 0 and U(k) = 1.

11.2 The Acceptance-Rejection Method


Suppose that we have a method in order to generate a continuous random variable with
probability density function g(x) efficiently. It is possible to use this as the basis for gen-
erating from a continuous distribution with density function f (x) by firstly generating a
random variable Y from g before then accepting this generated value with a probability
proportional to f (Y )/g(Y ).
11.2. The Acceptance-Rejection Method 179

More precisely, let c be a strictly positive constant satisfying

f ( y)
c for all y,
g( y)

i.e. c must satisfy f ( y)  c · g( y) for all y. In particular, the algorithm for generating a
random variable with probability density function f (x) using this approach is:

Step 1: Generate the value of Y , with probability density function g(x).

Step 2: Generate a random number U.

Step 3: If U  f (Y )/c · g(Y ), set X = Y and stop.

Step 4: Go to Step 1.

Recall that this algorithm informally simulates a random variable X with probability
density function g(x) by instead generating another random variable whose density
function is g(x), where the density function g(x) is “close” to f (x), namely that their
ratio is bounded by a constant value. In practice, we would like the constant to take
value as close to 1 as possible, meaning that the two density functions are as similar as
possible, however, the constant will not take on value 1 because in such case the two
densities would coincide. It is worth emphasising that the algorithm is the same as the
previously discussed discrete case, where the only difference is that we have replaced
mass functions by densities. In the same way as in the discrete setting, we can prove
the following.

Theorem. The acceptance-rejection method generates a continuous random variable with


probability density function f . Further, the number of iterations of the algorithm needed
to obtain the random variable is a geometric random variable with mean c.

Example. (The continuous acceptance-rejection method)


180 Chapter 11. Generating Continuous Random Variables

Example. (Generating a gamma random variable)

Note that during this example we generated a gamma random variable using the
acceptance-rejection approach by making use of an exponential distribution with the
same mean as the gamma. It turns out that generating a gamma random variable in
this way is always the most efficient approach (see e.g. [21, Section 5.2]), i.e. that this
approach minimises the mean number of iterations needed.
The following example demonstrates how the acceptance-rejection method allows
us to generate normal random variables.

Example. (Generating
(Generatingaanormal
normalrandom
randomvariable)
variable)

Hence, this demonstrates that the following algorithm generates an exponential with
rate 1 and an independent standard normal random variable. The algorithm is:

Step 1: Generate Y1 , an exponential random variable with rate 1.

Step 2: Generate Y2 , an exponential random variable with rate 1.

Step 3: If Y2 (Y1 1)2 /2 > 0, set Y = Y2 (Y1 1)2 /2 and go to Step 5.

Step 4: Return to Step 1.


11.2. The Acceptance-Rejection Method 181

Step 5: Generate a random number U and set


8
< Y , if U  1
1 2
Z=
: Y1 , if U > 1
.
2

The random variables Z and Y generated are independent where Z is normal with mean
0 and variance 1, while, Y is exponential with rate 1. It is worth noting that if we wish
2
to generate a normal random variable mean µ and variance , we use µ + Z.
Remarks.
p
1. Because c = 2e/⇡ ⇡ 1.32, the number of steps via the above approach is geo-
metrically distributed with mean 1.32

2. If we want to generate a sequence of standard normal random variables, it is more


efficient to use the exponential random variable Y obtained in Step 3 as the initial
exponential needed in Step 1 for the next normal that is generated

3. The sign of the standard normal can be determined without the need to generate
a new random number (as in Step 4). It is possible to instead use the first digit of
an earlier random number in order to decide the sign

The acceptance-rejection method is particularly useful when we need to generate a


random variable conditional on it being in some region. The following example demon-
strates how we can achieve this.

Example. (Generating a random variable with conditions)

Note that just as how we earlier generated a normal random variable by using the
acceptance-rejection method based on an exponential random variable, we could al-
ternatively simulate a normal random variable that is conditioned to lie within some
interval using this method based on an exponential random variable.
182 Chapter 11. Generating Continuous Random Variables

11.3 Generating a Poisson Process


Suppose we want to generate the first n event times of a Poisson process with rate . In
order to do this we will make use of the result that the sequence of interarrival times are
independent and identically distributed exponential random variables with (common)
parameter . Hence, one way in order to generate the process is to instead generate
these interarrival times. In particular, if we generate n random numbers U1 , U2 , . . . , Un
1
and set X i = log(Ui ) , then X i can be regarded as the time between the (i 1)-th and
i-th event of the Poisson process. Because the actual time of the j-th event will equal
the sum of the first j interarrival times, it follows that the generated values of the first
Pj
n events are simply j=1 X j for j = 1, 2, . . . , n.
Further, upon following this procedure, we could instead generate the first T time
units of a Poisson process, i.e. we could generate all event times of a Poisson process
with rate that occur in (0, T ). For the following algorithm, let t denote time, I denote
the number of events that have occurred by time t and S(I) denote the most recent
event time. The algorithm is:

Step 1: Let t = 0 and I = 0.

Step 2: Generate a random number U.


1
Step 3: Set t = t log(U). If t > T , stop.

Step 4: Let I = I + 1 and S(I) = t.

Step 5: Go to Step 2.

It is worth emphasising that the final value of I via this approach will be the number of
events that occur by time T and the values S(1), S(2), . . . , S(I) will be the event times
of those events in increasing order.
Alternatively, we could simulate the first T time units of a Poisson process with
parameter by firstly simulating N (T ), i.e. the total number of events that occur by
time T . Recall (from Chapter 8) that N (T ) is a Poisson random variable with mean
T and we can therefore use of the techniques from the previous chapter to generate
this value. Finally, if n denotes the simulated value of N (T ), then n random numbers
U1 , U2 , . . . , Un are generated and finally

{T U1 , T U2 , . . . , T Un }

are the set of event times by time T of the Poisson process. The preceding approach
works because conditional on N (T ) = n, the unordered set of event times are distributed
11.4. Generating a Nonhomogeneous Poisson Process 183

as a set of n independent uniform (0, t) random variables (see e.g. [21, pp. 84]). If
we only desired to simulate the set of event times of the Poisson process, then the pre-
ceding approach would be more efficient than generating the exponentially distributed
interarrival times. It should be noted that we would normally desire the event times to
be presented in increasing order and therefore we would additionally need to order the
values T Ui for i = 1, 2, . . . , n.

11.4 Generating a Nonhomogeneous Poisson Process


One particularly important counting process for mathematical modelling is the nonho-
mogeneous Poisson process. Recall that this is a generalisation of the (standard) Poisson
process, which relaxes the strong stationary increment assumption. This relaxation al-
lows for the possibility that the arrival rate is not constant and can therefore vary with
time. It is worth noting that obtaining analytical results for a mathematical model that
assumes a nonhomogeneous Poisson process can be a very difficult task, however, be-
cause simulation can be used to analyse such models, one can still feasibly use this
counting process for mathematical modelling.
Suppose we want to generate the first T time units of a nonhomogeneous Poisson
process with intensity function (t). The first method we present, called the thinning
or random sampling approach, is very popular and can be intuitively thought of as the
“process analogue” of the acceptance-rejection method. The intuitive idea behind thin-
ning is to firstly find a constant rate function which dominates the intensity function
(t) and then to reject an appropriate fraction of the generated events such that the
desired rate (t) is achieved.
More precisely, this method begins by selecting a value satisfying

(11.1)

and then, by the proposition from Section 8.9 (entitled Some Continuous Random Vari-
ables), such a nonhomogeneous Poisson process can be generated by a random selection
of the event times of Poisson process with rate . More precisely, if an event of a Poisson
process with rate occurs at time t is counted (independently of anything that came be-
fore) with probability (t)/ , then the process of counted events is a nonhomogeneous
Poisson process with intensity function (t) , where 0  t  T . It is worth noting that
since (t)/ is here a probability, our assumption (11.1) follows in light of the first ax-
iom of probability (from Section 8.2). In other words, upon simulating a Poisson process
and then randomly counting its events, we can generate the desired nonhomogeneous
Poisson process. The algorithm is:
184 Chapter 11. Generating Continuous Random Variables

Step 1: Let t = 0 and I = 0.

Step 2: Generate a random number U.

1
Step 3: Set t = t log(U). If t > T , stop.

Step 4: Generate a random number U.

Step 5: If U  (t)/ , let I = I + 1 and S(I) = t.

Step 6: Go to Step 2.

Note that here (t) is the intensity function and denotes a value satisfying (11.1).
The final value of I denotes the number of events by time T and S(1), S(2), . . . , S(I) are
the corresponding event times.
This procedure, that is referred to as the thinning algorithm (because it “thins” the
homogeneous Poisson points), becomes more efficient when is close to (t) through-
out the interval as in this case we will reject a minimal number of event times. The
approach can become inefficient when the intensity function (t) exhibits heavy fluctu-
ation in time.
It is possible to easily modify the thinning algorithm with the objective of mitigating
excessive rejection when (t) heavily fluctuates. The intuitive idea behind this exten-
sion, called piecewise thinning, is to break up the interval into k subintervals and then
performing standard thinning within each subinterval. Being a little more precise, the
extension determines appropriate values k, 0 = t 0 < t 1 < t 2 < · · · < t k < t k+1 = T and
1, 2, . . . , k+1 such that

(s)  i if t i 1  s < ti, i = 1, 2, . . . , k + 1. (11.2)

In order to generate the nonhomogeneous Poisson process over the interval (t i 1, t i ) for
i 2 {1, 2, . . . , k + 1}, we firstly generate exponential random variables with correspond-
ing rate i and then accept a generated event that occurs at time s 2 (t i 1, t i ) with
probability (s)/ i .

11.5 Exercises for Self-Study

1. Give a method for generating a random variable with probability density function

ex
f (x) = , 0  x  1.
e 1
11.5. Exercises for Self-Study 185

2. Use the inverse transform method to generate a random variable having distribu-
tion function
x2 + x
F (x) = , 0  x  1.
2
3. Show how to generate a random variable whose distribution function is

1
F (x) = (x + x 2 ) , 0 x 1
2
using:

a) the inverse transform method, and

b) the acceptance-rejection method.

Which method do you think is best for this example? Justify your answer.

4. Use the acceptance-rejection method to find an efficient way to generate a random


variable with probability density function

1 x
f (x) = (1 + x)e , 0 < x < 1.
2

5. Write a computer program to generate normal variables by following the method


presented earlier in this chapter.

6. Write a computer program to generate the first T time units of a Poisson process
with (common) rate .
187

Chapter 12

Discrete Event Simulation

In order to simulate a probabilistic model, we would generate the (stochastic) mech-


anisms of the model that are well-described by a random probability distribution and
then observe the flow of the model over time. There will be unsurprisingly certain quan-
tities that we will be interested in determining, where these will depend on the specific
reasons for study. Despite this, because a model’s evolution may depend on the com-
plex underlying structure, it may not immediately always be obvious how one should
determine these quantities of interest. Thankfully, a general framework, built around
the idea of “discrete events”, has been developed for this purpose. In particular, this
approach to simulation, known as the discrete event simulation approach, allows one to
follow a model over time and determine the relevant quantities of interest.
Informally, the discrete event simulation approach models the operation of a system
as a discrete sequence of events in time. Each event occurs at a particular instant in time
that marks a change of state in the system. Between consecutive events, no change in
the system is assumed to occur, which means that the simulation time can jump directly
to the time when the next occurs event. Within this chapter, we outline this approach
before discussing several situations where the approach is useful.

12.1 Simulation via Discrete Events

The two key components in discrete event simulation are variables and events. In order
to complete a simulation, we need to continually keep track of certain variables. In
particular, in general three types of variables are often used, these are:

1.

2.

3.
188 Chapter 12. Discrete Event Simulation

When an “event” occurs, the values of the aforementioned variables are updated and
then we collect any relevant data that is of interest as output. To determine when the
next event occurs, it will be useful to maintain an “event list”, that lists the nearest future
events and when they are scheduled to occur. Upon an event “occurring”, we then reset
the time and all state and counter variables and collect the relevant data. Through this
approach, we are able to “follow” the system as it evolves over time.
It is worth noting that the above is only supposed to provide a very high-level idea
of the elements of discrete event simulation. In particular, it will be useful to look at
some examples. In Section 12.2 we consider the simulation of a single-sever queuing (or
waiting line) system. In Sections 12.3 and 12.4 we consider the simulation of multiple-
sever queuing system, where the first section supposes that the servers are arranged in
series, while, the second supposes that the servers are arranged in parallel. Finally, in
Section 12.5 we consider an inventory stocking model.
In all the queuing models, we will suppose that the customers arrive in accordance
with a nonhomogeneous Poisson process with bounded intensity function (t) , where
t > 0. Recall the nonhomogeneous Poisson process is a generalisation of the (standard)
Poisson process, where the strong stationary increment assumption is relaxed, which
means that the average rate of arrivals is allowed to vary with time. While simulating
these queuing models, we will frequently make use of the following subroutine (or func-
tion) in order to generate the value of a random variable Ts , defined to equal the time
of the first arrival after time s.
Let be chosen such that (t)  for all t. Suppose that the intensity function (t)
for t > 0 and are both specified, then the following subroutine generates the value of
the random variable Ts . The subroutine is:

Step 1: Let t = s.

Step 2: Generate a random number U.


1
Step 3: Let t = t log(U).

Step 4: Generate a random number U.

Step 5: If U  (t)/ , set Ts = t and stop.

Step 6: Go to Step 2.

It is worth emphasising that this is very the algorithm demonstrated for simulating the
first T time units of a nonhomogeneous Poisson process. The difference is that in this
case we run the process until an event occurs and that there is now no time limit.
12.2. A Queuing System with a Single Server 189

It should be noted for completeness that in this chapter we will be using discrete
event simulation in order to understand the behaviour of queues rather than using more
classical queuing theory techniques. In particular, classical queuing theory studies the
long run behaviour of the queue under simple models (such as when arrivals follow
a homogeneous Poisson process) in order to derive analytical formulae that often rely
on some steady state behaviour. In contrast, using discrete event simulation instead
allows us to simulate queuing systems both in the short and long term where there is
no guarantee of steady state (such as when arrivals occur following a nonhomogeneous
Poisson process).

12.2 A Queuing System with a Single Server


Consider a fuel station in the UK in which customers in accordance with a nonhomoge-
neous Poisson process with (t) , where t 0. Note that because the average rate of
arrivals is allowed to vary with time, the underlying process allows for increases in de-
mand around commuting times. Further, suppose that the fuel station employs a single
server and that upon arrival a customer either gets served by this server or they join the
waiting queue if the server is busy. When the server has finished serving a customer,
they then begin serving the customer who has been waiting the longest (called the “first
come, first served” or the “first in, first out” (FIFO) queuing discipline) if there are any
waiting customers or if instead there are no waiting customers, they remain free until
the next customer’s arrival. The amount of time it takes to serve a customer is a random
variable, which is independent of both the arrival process and all other service times,
with probability distribution G. Further, there is a fixed time T after which no addi-
tional arrivals can enter the system (when the fuel station closes), however, the server
completes servicing all those who are already in the system.
It should be noted for completeness that FIFO is not the only queue discipline. In
particular, a few other common examples include:

• “last in, first out” (LIFO),

• random,

• priority queuing, and

• rule-based queuing.

The LIFO queue discipline could be used to model the usage of plates in a cafeteria,
where when new clean plates are available they are added to the top of an existing
190 Chapter 12. Discrete Event Simulation

stack and customers take the top one from the stack. The random queue discipline
could be used to model the usage of screws by a builder, where they reach into a box
full of parts and select one screw at random. The priority queuing discipline is used
by the National Health Service (NHS) within Accident & Emergency departments. In
particular, when patients arrive they go through a preliminary assessment that evalu-
ates their symptoms and the urgency of their medical needs. The patients with more
life-threatening conditions are given the highest priority and are then attended to im-
mediately, those with less critical but still urgent issues are placed in a second priority
group, while those with non-urgent conditions are given the lowest priority. The rule-
based queuing discipline is supposedly used by Tesla for delivering pre-ordered vehicles,
where the company reportedly prioritise deliveries based on the customer’s proximity
to the factory irrespective of when an order was placed during the pre-order period.
In this single-server queuing scenario, we are interested in determining different
quantities such as:

(a)

(b)

Recall (from Section 12.1) that the two key components in discrete event simulation
are variables and events. In particular, in order to do a simulation of the preceding
system we can use the following variables:

1.

2.

3.

Further, because we update the values of these variables and collect any relevant data
upon an “event” occurring, it is natural to take both arrivals and departures as these
events. Hence the event list contains the time of the next arrival and the time of the
departure of the customer currently being served. In other words, the events list EL is

EL = {t A, t D } ,

where t A is the time of the next arrival after time t and t D is the service completion
time of the customer currently being served. If no customer is being served at present,
then we set t D equal to 1. In this scenario, the output variables that we collect are the
arrival time A(i) of customer i, the departure time D(i) of customer i and the time Tp
past time T that the last customer departs the system. It is worth emphasising that A(i)
12.2. A Queuing System with a Single Server 191

and D(i) will provide us with information about average waiting times, while, Tp tells
us about the server overtime. To begin the simulation, we initialise the variables and
the event times as:

1. Set t = NA = ND = 0.

2. Set n = 0.

3. Generate T0 and set t A = T0 and t D = 1.

In order to update the system, we need to increase time (move along the time axis)
until we encounter the next event. In order to see how this is accomplished, we consider
different cases that depend upon how members of the events list EL = {t A, t D } compare.
In particular, the cases that we distinguish are:

Case 1: t A = min {t A, t D , T }, which is an arrival,

Case 2: t D = min {t A, t D , T }, which is a departure,

Case 3: T = min {t A, t D , T } and n > 0, which is a departure when the time has ended
but there are still customers remaining, and

Case 4: T = min {t A, t D , T } and n = 0, which is a departure after the time has ended
and there are no customers remaining.

In the following, let Y be the random variable with probability distribution G that
gives the service time of a the server for one customer. We have a subroutine for each
of the above cases.

Case 1: t A = min {t A, t D , T }

Step 1: Set t = t A.

Step 2: Set NA = NA + 1.

Step 3: Set n = n + 1.

Step 4: Generate Tt and reset t A = Tt .

Step 5: If n = 1, generate the random variable Y and set t D = t + Y .

Step 6: Collect the output data A(NA) = t.

Case 2: t D = min {t A, t D , T }

Step 1: Set t = t D .
192 Chapter 12. Discrete Event Simulation

Step 2: Set ND = ND + 1.
Step 3: Set n = n 1.
Step 4: If n = 0, set t D = 1 and go to Step 6.
Step 5: If n > 0, generate the random variable Y and set t D = t + Y .
Step 6: Collect the output data D(ND ) = t.

Case 3: T = min {t A, t D , T } and n > 0

Step 1: Set t = t D .
Step 2: Set ND = ND + 1.
Step 3: Set n = n 1.
Step 4: If n > 0, generate the random variable Y and set t D = t + Y .
Step 5: Collect the output data D(ND ) = t.

Case 4: T = min {t A, t D , T } and n = 0

Step 1: Collect output data Tp = max(t T, 0) and stop.

The above is illustrated in the flow diagram, namely Figure 12.1.


It is natural here to ask what one does with the collected output data, i.e. the server
overtime Tp and the arrival and departure times A(i) and D(i), respectively. Because for
every i, where i = 1, 2, . . . , NA, the respective arrival and departure times are A(i) and
D(i) , we therefore see that D(i) A(i) represents the amount of time that customer i
spends in the system. The average amount of time that customers spend in the system
during this simulation run is therefore

Further, in order to estimate the average time that a customer spends in the system, we
run the simulation K times and take averages. In a similar fashion, to estimate the mean
time past T that the last customer departs, we simply run the simulation K times and
then take averages over all values Tp .

12.3 A Queuing System with Two Servers in Series


Consider a two-server system which customers arrive in accordance with a nonhomo-
geneous Poisson process. Further, suppose that each arrival must first be served by the
12.3. A Queuing System with Two Servers in Series 193

Figure 12.1: Simulating the Single Server Queue [21, Chapter 7].

first server and upon completion of service the customer goes to server 2. This type of
system is called sequential or a tandem queuing system. Upon arrival a customer either
enter service with the first server if that server is free, else they join a queue if the server
is busy. In a similar fashion, when the customer has been served they either enter service
with the second server if they are free, else they join their queue. After being served by
the second server, the customer then departs the system. If there are customers in the
queue, they are served in order of which customer has been waiting the longest. The
service times for server i where i 2 {1, 2} have corresponding distribution Gi . This is
illustrated in Figure 12.2.

Figure 12.2: A Sequential / Tandem Queue [21, Chapter 7].


194 Chapter 12. Discrete Event Simulation

Analogously to the previous section, suppose now that we are interested using sim-
ulation in order to study the distribution of times that a customer would spend at both
server 1 and 2. In particular, in order to do a simulation of the preceding system we will
use the following variables:

1.

2.

3.

In this case, since the system now features two servers we must therefore amend our
previous event list to include the corresponding completion times for each server.
In particular, the event list contains the time of the next arrival, the time of service
completion for the first server and the time of service completion for the second server,
i.e. the time of departure from system. In other words, the events list EL is

EL = {t A, t 1 , t 2 } ,

where t A denotes the time of the next arrival after time t and t i denotes the service
completion time of the customer presently being served by server i, where i 2 {1, 2}. If
no customer is presently with server i, then we set t i equal to 1. In this scenario, the
output variables collected are the arrival time A1 (n) of customer n, where n 1, the
arrival time A2 (n) of customer n at the second server and the departure time D(n) of
customer n. It is worth emphasising that these variables give us information about the
time spent with each server and additionally the total time spent in the system.
To begin the simulation, we initialise the variables and the event times as:

1. Set t = NA = ND = 0.

2. Set (n1 , n2 ) = (0, 0).

3. Generate T0 and set t A = T0 and t 1 = t 2 = 1.

Similarly, in order to update the system we increase time until we encounter the next
event. We consider different cases that depend upon which member of the events list
EL = {t A, t 1 , t 2 } is smallest. In particular, the first case t A = min {t A, t 1 , t 2 } is an arrival,
the second case t 1 = min {t A, t 1 , t 2 } is a departure from the first server and the third case
t 3 = min {t A, t 1 , t 2 } is a departure from the second server (and hence a departure from
12.3. A Queuing System with Two Servers in Series 195

the system). It is worth noting that we do not specify a stopping rule in the following
pseudocode, however, when we write the script we will make use the same rule as in
the single server case.
In the following, denote by Yi the random variable with corresponding probability
distribution Gi that gives the service time of the i-th server, where i = 1, 2. We have the
following subroutines for each case.

Case 1: t A = min {t A, t 1 , t 2 }

Step 1: Set t = t A.

Step 2: Set NA = NA + 1.

Step 3: Set n1 = n1 + 1.

Step 4: Generate Tt and reset t A = Tt .

Step 5: If n1 = 1, generate the random variable Y1 and set t 1 = t + Y1 .

Step 6: Collect the output data A1 (NA) = t.

Case 2: t 1 = min {t A, t 1 , t 2 }

Step 1: Set t = t 1 .

Step 2: Set n1 = n1 1 and n2 = n2 + 1.

Step 3: If n1 = 0, set t 1 = 1 and go to Step 6.

Step 4: If n1 > 0, generate the random variable Y1 and set t 1 = t + Y1 .

Step 5: If n2 = 1, generate the random variable Y2 and set t 2 = t + Y2 .

Step 6: Collect the output data A2 (NA n1 ) = t.

Case 3: t 2 = min {t A, t 1 , t 2 }

Step 1: Set t = t 2 .

Step 2: Set ND = ND + 1.

Step 3: Set n2 = n2 1.

Step 4: If n2 = 0, set t 2 = 1 and go to Step 6.

Step 5: If n2 > 0, generate the random variable Y2 and set t 2 = t + Y2 .

Step 6: Collect the output data D(ND ) = t.

The above allows us to update the system during the simulation process and collect the
relevant data as explained in the previous section.
196 Chapter 12. Discrete Event Simulation

12.4 A Queuing System with Two Servers in Parallel

Consider a two-sever system which customers arrive in accordance with a nonhomoge-


neous Poisson process. Further, suppose upon arrival the customer will join the queue
if both servers are busy, enter service with the first server if the first server is free, other-
wise they enter service with the second server. After the customer has completed service
with either server, that customer then departs the system. If there are customers in the
queue, they are served in accordance with the “first in, first out” rule, namely in order
of which customer has been waiting the longest. The service distribution for server i is
denoted by Gi , where i 2 {1, 2}. This is illustrated in Figure 12.3.

Figure 12.3: A Sequential / Tandem Queue [21, Chapter 7].

It is worth emphasising that we assume that if both servers are idle and there is a
new arrival, then that customer goes to the first server. Suppose similarly that we are
interested using simulation in order to study the distribution of times that a customer
would spend in the system and the number of services performed by each server. An im-
portant observation is that since there are multiple servers, the order in which customers
depart the system will not necessarily coincide with the order of arrivals. This means
that customers cannot be labelled as before and additionally in order to know which
customer is departing the system, we must formally keep track of which customers are
in the system.
Because customers arrive and join a single queue if both servers are busy, the natural
choice is simply to label the customers as they arrive. In particular, let the first arrival be
customer number one, the next be customer number two, and so on. In order to identify
which customers are waiting it is sufficient to know which customers are currently being
served and also the number that are waiting in the queue. More formally, let us suppose
that customers i and j are being served, that customer i arrived first, i.e. that i < j
12.4. A Queuing System with Two Servers in Parallel 197

and that the queue is nonempty, i.e. that n 2 > 0, where n denotes the number of
customers in the system. Notice that all customers with numbers strictly less than j
would have entered service before j and all customers with numbers strictly greater
than j could not have completed service. In light of this, it follows immediately that
customers j + 1, j + 2, . . . , j + n 2 are currently waiting in the queue.
Recall that here we are interested using simulation in order to study the distribution
of times that a customer would spend in the system and the number of services per-
formed by each server. In order to analyse the preceding system we will make use of
the following variables:

1.

2.

3.

It is worth emphasising that if the system state triple is (0, 0, 0) , then the whole system
is empty. If instead the triple is (1, j, 0) or (1, 0, j) , then the only customer is j and they
are being served by the first or second server, respectively.
Similarly to the previous section, the system features two servers and as such the
event list is defined as before. In particular, the events list EL is

EL = {t A, t 1 , t 2 } ,

where t A denotes the time of the next arrival after time t and t i denotes the service
completion time of the customer presently being served by server i, where i 2 {1, 2}. If
no customer is presently with server i, then we set t i equal to 1. In this scenario, the
output variables are the arrival time A(n) of customer n, where n 1 and the departure
time D(n) of customer n. It is worth noting that the output variables are different to
those from the previous section since here we only have one arrival time.
To begin the simulation, we initialise the variables and the event times as:

1. Set t = NA = C1 = C2 = 0.

2. Set SS = (n, i1 , i2 ) = (0, 0, 0).

3. Generate T0 and set t A = T0 and t 1 = t 2 = 1.


198 Chapter 12. Discrete Event Simulation

Similarly, we increase time until we encounter the event and consider different cases
that depend upon which member of EL = {t A, t 1 , t 2 } is smallest. In particular, the first
case is an arrival, the second case is a departure from the first server and the third case
is a departure from the second server. Let Yi be the random variable with corresponding
probability distribution Gi that gives the service time of server i, where i = 1, 2. We
have the following subroutines for each case.

Case 1: t A = min {t A, t 1 , t 2 }, where SS = (n, i1 , i2 )

Step 1: Set t = t A.

Step 2: Set NA = NA + 1.

Step 3: Generate Tt and reset t A = Tt .

Step 4: Collect the output data A(NA) = t.

Step 5: If SS = (0, 0, 0) , reset SS = (1, NA, 0) , generate Y1 and set t 1 = t + Y1 .

Step 6: If SS = (1, j, 0) , reset SS = (2, j, NA) , generate Y2 and set t 2 = t + Y2 .

Step 7: If SS = (1, 0, j) , reset SS = (2, NA, j) , generate Y1 and set t 1 = t + Y1 .

Step 8: If n > 1, set SS = (n + 1, i1 , i2 ) .

Case 2: t 1 = min {t A, t 1 , t 2 }, where SS = (n, i1 , i2 )

Step 1: Set t = t 1 .

Step 2: Set C1 = C1 + 1.

Step 3: Collect the output data D (i1 ) = t.

Step 4: If n = 1, reset SS = (0, 0, 0) and set t 1 = 1.

Step 5: If n = 2, reset SS = (1, 0, i2 ) and set t 1 = 1.

Step 6: If n > 2, let m = max(i1 , i2 ) , reset SS = (n 1, m + 1, i2 ) , generate the


random variable Y1 and set t 1 = t + Y1 .

Case 3: t 2 = min {t A, t 1 , t 2 }, where SS = (n, i1 , i2 )

The procedure for the third case is left as an exercise, namely Exercise 4. The above
allows us to update the system during the simulation process, where we stop this process
at some predetermined termination point. Then using the output variables A(n) and
D(n) and the counting variables C1 and C2 enable us to obtain data on the arrival and
departure times of the customers and the number of services performed by each server.
12.5. An Inventory Model 199

12.5 An Inventory Model


Consider a store which sticks a particular type of product that it sells for a fixed price
of r per unit. Suppose that customers who demand this product arrive in accordance
with a Poisson process with rate and that the quantity demand by each customer is a
random variable with probability distribution G. The store manager must maintain the
inventory levels of this product in order to meet demand. In particular, whenever the on
hand inventory becomes too low, additional units of this product are ordered from the
supplier. It is worth emphasising that the store needs to maintain the inventory levels
in order to balance excessive holding costs (as holding stock is not generally free) and
loss of demand.
In this situation the shopkeeper uses the (s, S) ordering policy, which means that if
the on hand inventory is less than s and there is no present outstanding order, then an
amount is ordered in order to bring the stock level back to S, where s < S. Suppose that
the cost of ordering y units of the product from the supplier is a specified function c( y)
and it takes L units of time until the order is delivered, where payment is made upon
delivery. The shop must additionally pay an inventory storage (holding) cost of h per
unit item per unit time. Suppose that if a customer demands more of this product that
is currently available, then the amount on hand is sold to that customer and that the
remaining demand is lost.
In the following, we outline how we can use simulation in order to estimate the
expected profit for the store up to some (perhaps large) fixed time T . In order to analyse
the preceding system we will make use of the following variables:

1. time variable t,

2. counter variables: the total amount C of ordering costs by time t, the total amount
H of inventory holding costs by time t and the total amount R of revenue earned
by time t, and

3. system state variable: the pair (x, y) , where x denotes the inventory on hand and
y is the amount on order from the supplier.

In this case, the events will either be a customer arriving or an order being deliv-
ered/completed. Hence, our events lists EL is

EL = {t 0 , t 1 },

where t 0 denotes the time of the next customer arrival and t 1 is the time at which the
order that is being filled by the supplier will be delivered. If no orders from the supplier
are outstanding (i.e. that are yet to be delivered), then we set t 1 = 1.
200 Chapter 12. Discrete Event Simulation

We can here run the simulation until the first event occurs after some large preas-
signed time T and use the expression

in order to estimate the average profit per unit time. It should be noted that doing
this while varying the values of s and S would allow us to determine a good inventory
ordering policy for the store.
To begin the simulation, we suppose that there is an initial inventory of size I and
initialise the variables and the event times as:

1. Set t = C = H = R = 0.

2. Set (x, y) = (I, 0).


1
3. Generate a random number U and set t 0 = t log(U).

4. Set t 1 = 1.

1
Note that we set t 0 = t log(U) since we assume that customers arrive in accordance
with a Poisson process with rate . We once more increase time until we encounter the
next event and then consider different cases. Ignoring the predetermined time T for
simplicity, we have only two cases, where the first case t 0 < t 1 is a customer arrival and
the second case t 0 t 1 is a supplier order completion. If we are at time t, then we move
along in time using the following subroutines for each case.

Case 1: t 0 < t 1

Step 1: Set H = H + (t 0 t)xh.

Step 2: Set t = t 0 .

Step 3: Generate the random variable D, the demand of the arriving customer
following probability distribution G.

Step 4: Let w = min(D, x).

Step 5: Set R = R + wr.

Step 6: Set x = x w.

Step 7: If x < s and y = 0, set y = S x and t 1 = t + L.


1
Step 8: Generate a random number U and set t 0 = t log(U).

Case 2: t 0 t1
12.6. Exercises for Self-Study 201

Step 1: Set H = H + (t 1 t)xh.


Step 2: Set t = t 1 .
Step 3: Set C = C + c( y).
Step 4: Set x = x + y.
Step 5: Set y = 0 and t 1 = 1.

It is worth noting that in Step 5 of the second case we assumed that when an order of
size y is delivered, the total inventory level is now no less than s, which means that
no additional order is then placed. It is possible to guarantee this is the case by simply
assuming that y > s holds, which can be ensured by assuming that S 2s.
The above allows us to update the system during the simulation process, which
enables us to provide useful information to the store owner about balancing costs under
the aforementioned assumptions.

12.6 Exercises for Self-Study


1. Write a computer program to generate the desired output for the model presented
in Section 12.2. Use this to estimate the average time that a customer spends in
the system and the average amount of overtime put in by the single server in the
case where the arrival process is a Poisson process with rate 10, the service time
probability density
40x
g(x) = 20e (40x)2 , x >0

and T = 9. Perform 100 runs before then 1000 runs.

2. Suppose in the model presented Section 12.2 that we are additionally interested
in obtaining information about the amount of time a server would be idle in a day.
Explain how this could be accomplished.

3. Suppose that jobs arrive at a single server queuing system according to a nonho-
mogeneous Poisson process, whose rate is initially 4 per hour, increases steadily
until it hits 19 per hour after 5 hours, before then decreasing steadily until it hits
4 per hour after an additional 5 hours. The rate then repeats indefinitely in this
fashion, i.e. (t + 10) = (t) holds for all t 0. Suppose that the service dis-
tribution is exponential with rate 25 per hour. Suppose also that whenever the
server completes a service and finds no jobs waiting they go on a break for a time
that is uniformly distributed on (0, 0.3). If upon returning from their break there
are no jobs waiting, then they go on another break.
202 Chapter 12. Discrete Event Simulation

Use simulation to estimate the amount of time that the server is on break during
the first 100 hours of operation. Perform 500 simulation runs.

4. Complete the updating scheme for Case 3 in the model presented in Section 12.4.

5. In the model presented in Section 12.4, suppose that G1 is the exponential distri-
bution with rate 4 and that G2 is exponential with rate 3. Suppose further that
the arrivals occur in accordance to a Poisson process with rate 6. Write a simula-
tion program to generate data corresponding to the first 1000 arrivals. Use this
to simulate

a) the average time spent in the system by these customers, and

b) the proportion of services performed by the first server.

Perform now a second simulation of the first 1000 arrivals and use this to once
more answer parts a) and b). Compare your answers to the ones you obtained
previously.

6. Suppose in the two-sever parallel presented in Section 12.4 that each server now
have their own queue and that upon arrival a customer joins the shortest queue.
If an arrival finds that both queues are of the same size (or finds that both severs
are empty), then they go to server 1.

a) Determine the appropriate variables and events to analyse this model and
give the updating procedure.

b) Using the same distributions and parameters as in Exercise 5, find the aver-
age time spent in the system by the first 1000 customers.

c) Using the same distributions and parameters as in Exercise 5, find the pro-
portion of the first 1000 services that are performed by the first server.
203

Chapter 13

Statistical Analysis of Simulated Data

Usually one is motivated for undertaking a simulation study in order to determine the
value of some quantity (or quantities), denoted here by ✓ , that are inherently connected
with some underlying probabilistic model. Being a little more precise, a simulation of
some given system results in output data X , whose expected value is the aforementioned
quantity of interest ✓ . We then undertake a second simulation run which provides a
new and independent random variable with mean ✓ . This is repeated until we have
amassed n total runs and, in particular, the n independent and identically distributed
random variables X 1 , X 2 , . . . , X n which all have mean ✓ . It is then possible to take the
average of these values, namely calculate
Xn
Xi
X̄ =
i=1
n

and use this as an estimator of the quantity of interest ✓ .


Recall from the previous chapter that in the single server queuing model we esti-
mated variables including the time the server works on a single day. In that case, we
denoted by T the (fixed) time the doors close for arrivals and by Tp is the (variable)
amount of time after T that the server stays in order to finish serving all customers. In
order to determine T + Tp , we need to estimate the value of Tp . In order to do this we
run the simulation n times, obtained an observation Tpi for each i 2 {1, 2, . . . , n} and
then averaged over these observations, namely we calculated

1X i
n
T ,
n i=1 p

which was used as an estimator for the random variable Tp . The following two natural
questions arise out of what is outlined above:

1. How good are estimates obtained via this approach?

2. What value of n should be chosen? In other words, how many times should one
run the simulation?
204 Chapter 13. Statistical Analysis of Simulated Data

13.1 The Sample Mean and Sample Variance

Suppose that X 1 , X 2 , . . . , X n are n independent and identically distributed random vari-


ables. Let
2
E[X i ] = ✓ and Var(X i ) =

denote the population mean and population variance of the X i ’s. The quantity

namely the arithmetic mean of the n values, is called the sample mean. It is worth
emphasising the sample mean is simply the average value of a sample (i.e. a subset
with possible duplicates) of numbers which are taken from some larger population of
numbers. In the scenario when the population mean ✓ is unknown, we make use of the
sample mean to estimate it.
Observe that

(13.1)

where the second equality follows since expectation is a linear operation (as shown in
Chapter 8). In particular, (13.1) demonstrates that the sample mean X̄ is an unbiased
estimator of the population mean ✓ , where an estimator is said to unbiased if the differ-
ence between the estimator’s expected value and the true value of the parameter being
estimated is zero.
In order determine the “worth” of the sample mean X̄ as an estimator of the popula-
tion mean ✓ , we consider its mean squared error, which is defined as the expected value
⇥ ⇤
of squared difference between X̄ and ✓ , namely E (X̄ ✓ )2 .
It is worth noting that squaring the differences eliminates negative values for the
differences and hence ensures that the mean squared error is always greater than or
equal to zero. Further, the mean squared error takes on almost always strictly positive
(and not zero) because of randomness. In addition, squaring increases the impact of
larger errors (differences), which in fact turns out to be a favourable property.
13.1. The Sample Mean and Sample Variance 205

Notice that

(13.2)

In particular, the sample mean X̄ of the n data points X 1 , X 2 , . . . , X n is a random variable


2
with mean ✓ and variance /n. Recall that standard deviation is defined to be the
nonnegative square root of variance. Random variables are known to be “unlikely” to
be too many standard deviations from their mean and as such it follows that the sample
p
mean X̄ is a good estimator of ✓ when / n is small.
In order to formalise the above statement regarding the unlikeliness that a random
variable is too many standard deviations from their mean, we make use of two results
introduced in Chapter 8, namely the Chebyshev inequality and, more importantly for
simulation studies, the central limit theorem (from Chapter 8). For any c > 0, Cheby-
shev’s inequality yields the bound
ß ™
c 1
P |X̄ ✓| > p  ,
n c2

which means that the probability that the sample mean is c standard deviations from
the population mean is no greater than 1/c 2 . For example, this bound tells us that the
probability that the sample mean differs from the population mean ✓ by more than 1.96
standard deviations, i.e. when c = 1.96, is no more than 1/(1.96)2 = 0.2603.
This rather conservative bound can be drastically improved upon when the value of
n is large, which usually is the case when running simulations. In particular, if n is large,
then since the X i ’s are independent and identically distributed random variables by as-
p
sumption, we can apply the central limit theorem, which tells us that (X̄ ✓ )/( / n)
is approximately distributed as a standard normal variable and therefore
ß ™
c
P |X̄ ✓| > p ⇡ P {|Z| > c} where Z is a standard normal
n
=2 1 Φ(c) by symmetry of the standard normal,

where Φ denotes the standard normal distribution function (from Chapter 8). For exam-
ple, the probability the sample mean differs from the ✓ by greater than 1.96 standard
deviations is approximately 0.05, which follows as Φ(1.96) = 0.975. This means we can
206 Chapter 13. Statistical Analysis of Simulated Data

be approximately 95% certain that the sample mean does not differ from the popula-
tion mean by more than 1.96 standard deviations. It is worth emphasising that 0.05 is
indeed much stronger than the conservative bound of 0.2603 that was deduced using
Chebyshev’s inequality.
It is likely that this sounds very promising since the above argument suggests that
p
provided the quantity / n (or 2 /n) is small then the sample mean will be a good
estimator for the population mean. The natural difficultly with using this value as an
indicator of how accurately the sample mean X̄ of n values estimates the population
2
mean is that the population variance is not usually known in advance. Hence, we
need to estimate its value. Recall by definition that

2
= E[(X ✓ )2 ]

is the average of the squared difference between the random variable X and its (un-
known) mean. In light of this, it is perhaps natural that when we wish to make use of
the sample mean X̄ as the estimator of the population mean that a natural estimator for
2
would instead to take the average of the squared distances between the X i ’s and the
Pn
estimated mean X̄ , i.e. by using i=1 (X i X̄ )2 /n. For technical reasons and in order
to make the estimator unbiased we instead prefer to divide the sum of squares by n 1
Pn
rather than n. Informally, observing that the sum of deviations i=1 (X i X̄ ) equals
zero as shown by the equalities
X
n X
n X
n
Xi X̄ = Xi X̄
i=1 i=1 i=1
Xn
= Xi nX̄ (13.3)
i=1
Ç å
Xn Xn
Xi
= Xi n =0
i=1 i=1
n

implies that only n 1 of these deviations are needed in order to determine all the
deviations (since they have the property that they must sum to zero). In particular, this
argument means that there are only n 1 “degrees of freedom” in our sample variance
sum. This informal argument inspires the following definition.

Definition. The quantity S 2 defined by

1 X
n
2
2
S = Xi X̄
n 1 i=1

is called the sample variance.


13.1. The Sample Mean and Sample Variance 207

Upon making use the identity

X
n
2
X
n
Xi X̄ = X i2 nX̄ 2 , (13.4)
i=1 i=1

which follows in light of the equality (13.3), we show that the sample variance is an
2
unbiased estimator of the population variance . In particular, we now prove the
following proposition.

Proposition. Suppose that X 1 , X 2 , . . . , X n are independent and identically distributed ran-


2
dom variables with population variance Var(X i ) = . Then the sample variance S 2 is an
2
unbiased estimator of , namely
E[S 2 ] = 2
.

To prove this proposition it is useful to firstly recall (from Chapter 8) that for all
random variables Y , we have
⇥ ⇤
E Y 2 = Var(Y ) + (E[Y ])2 .

Proof.

The above tells us that we can use the sample variance S 2 as our estimator of the
p
population variance 2 . The so-called sample standard deviation, S = S 2 , is used as
p p
our estimator of . Further, we use S/ n as an estimator for / n, namely for the
standard deviation of X̄ .
Consider now the second natural question, namely when should we stop generating
extra data values? Suppose for this purpose that as in a simulation we have the option to
continually generate additional data values X i as needed. Further, suppose the quantity
that we are interesting in estimating is the population mean ✓ = E[X i ] . Intuitively, we
will require a sufficiently large number of data values to allow the central limit theorem
to apply, however, when our estimate is “good enough”, in the sense that it is not too
far away from the quantity of interest, we can stop generating additional values.
208 Chapter 13. Statistical Analysis of Simulated Data

Being a little more precise, we firstly choose an acceptable value d for the standard
deviation of our estimator. If d is the standard deviation of the estimator X̄ , then recall
that we can for example be 95% certain that X̄ will not differ from ✓ by more than 1.96d
provided the central limit theorem applies. We should then continue to generate new
data until we we have generate n data values for which our estimate of the standard
p
deviation of X̄ , namely S/ n is less than our accepted value d. It is worth emphasising
that in order for the sample standard deviation S to be a good estimator of the popula-
tion standard deviation in general we require the sample to be sufficiently large. In
light of this, the following procedure can be used to determine when to stop generating
additional data values:

Step 1: Choose an acceptable value d for the standard deviation of the estimator.

Step 2: Generate at least 100 data values.

Step 3: Continue to generate additional data values, stopping after we generate k


p
values such that S/ k < d holds, where S is the sample standard deviation
based on those k values.
Pn
Step 4: The estimate of ✓ is given by X̄ = i=1 X i /n.

The following example demonstrates how one would decide when to stop generating
values in the setting when we once more work with a single server queuing model in
order to estimate the time that the last customer departs the system.

Example. (Estimating the expected time the last customer leaves the system)

Notice that in the previous procedure we need to compute the sample standard
deviation S at each iteration. In order to calculate S, one may naïvely recompute S
from scratch each time a new value is generated. In order to improve the efficiency of
the approach, it would be favourable if we found a method for recursively computing
successive sample means and sample variances. For this purpose, consider the sequence
13.1. The Sample Mean and Sample Variance 209

of data values X 1 , X 2 , . . . and denote by X̄ j and S 2j be the sample mean and sample
variance of the first j observations, respectively. In other words, let

1X
j
X̄ j = Xi
j i=1

and
1 X
j
2
S 2j = Xi X̄ j , where j 2.
j 1 i=1

These expressions allows us to deduce the following recursions via simple algebraic
manipulation. Let S12 = 0 and X̄ 0 = 0, then

X j+1 X̄ j
X̄ j+1 = X̄ j + , and
j+1
Å ã (13.5)
1 2 2
S 2j+1 = 1 S j + ( j + 1) X̄ j+1 X̄ j .
j

Example. (Recursion)

This analysis is modified in the scenario when the X i ’s are Bernoulli (or 0,1) random
variables, as would be the case when we are estimating some probability. Suppose we
can generate Bernoulli random variables X i such that
8
<1, with probability p,
Xi =
:0, with probability 1 p.

Suppose further that we wish to estimate the expected value of X i , which (from Chapter
8) we know is given by
E [X i ] = P {X i = 1} = p.

In this case, since (from Chapter 8) we know that

Var(X i ) = p (1 p) ,
210 Chapter 13. Statistical Analysis of Simulated Data

it follows that there is no need to use sample variance to estimate Var(X i ) . Being more
precise, observe that if we have generated the n values X 1 , X 2 , . . . , X n , then since the
estimate of E [X i ] = p is once more given by the sample mean

1X
n
X̄ n = Xi,
n i=1

we notice that a natural estimate of Var(X i ) = p (1 p) is

X̄ n 1 X̄ n .

Furthermore, observing that


Ç n å
1X
Var(X̄ n ) = Var Xi
n i=1
Ç n å
1 X
= 2 Var Xi by the identity from Chapter 8
n i=1
1
= n · Var(X i ) since Var(X i ) = Var(X j ) for all i, j 2 {1, 2, . . . , n}
n2
1
= Var(X i ) ,
n

it follows that the estimator of Var(X̄ n ) is

1
X̄ n 1 X̄ n ,
n
where taking the nonnegative square root yields the estimator of standard deviation.
In light of this, the following procedure can be used to determine when to stop
generating additional Bernoulli random variables:

Step 1: Choose an acceptable value d for the standard deviation of the estimator.

Step 2: Generate at least 100 data values.

Step 3: Continue to generate additional data values, stopping after we generate k


⇥ ⇤1/2
values such that 1k X̄ k (1 X̄ k ) < d holds.

Step 4: The estimate of p is given by X̄ k .

Example. (Estimating at closing that the queue is nonempty)


13.2. Interval Estimates of a Population Mean 211

13.2 Interval Estimates of a Population Mean

Suppose once more that X 1 , X 2 , . . . , X n are independent and identically distributed ran-
dom variables with mean ✓ and variance 2 . The previous section argues that the sam-
Pn
ple mean X̄ = i=1 X i /n is an effective estimator of the population mean ✓ . Despite
this, it should be emphasised that we should not expect X̄ to equal ✓ but rather that
they are in some sense “close”. It is sometimes valuable to be able to formally quantify
this notion of “closeness”, by which we explicitly specify an interval for which we have
a certain degree of confidence that the population mean ✓ lies within.
To find such an interval of confidence we require the approximate distribution of the
estimator X̄ . Recall, for this purpose, that (13.1) and (13.2) show that

2
E[X̄ ] = ✓ and Var(X̄ ) =
n

and, in light of the central limit theorem, for large n we deduce that

p (X̄ ✓)
n ⇠
˙ N (0, 1) ,

where ⇠
˙ N (0, 1) means “is approximately distributed as a standard normal”. If we
additionally replace the (unknown) population standard deviation by its estimator
the sample standard deviation S, then the resulting quantity remains approximately a
standard normal by Slutsky’s theorem (see e.g. [15, Chapter 3]). In other words, if n is
large, then
p (X̄ ✓)
n ⇠
˙ N (0, 1) . (13.6)
S
For any ↵ 2 (0, 1) , let z↵ be such that a standard normal variable Z will exceed z↵
with probability ↵, namely
P{Z > z↵ } = ↵.
212 Chapter 13. Statistical Analysis of Simulated Data

Recall (from Chapter 8) that the value z↵ could be obtained using for example a table
of values for the distribution function of a standard normal random variable. In light of
the symmetry of the standard normal density function about zero, it follows that

z1 ↵ = z↵ ,

where z1 ↵ is the point at which to its right the area under the standard normal density
is equal to 1 ↵. Further, it follows that

P z↵/2 < Z < z↵/2 = 1 ↵

and therefore by (13.6) we obtain



p (X̄ ✓)
P z↵/2 < n < z↵/2 ⇡ 1 ↵,
S

which, upon algebraically manipulating, is equivalent to

(13.7)

In other words, (13.7) tells us that with probability 1 ↵, the population mean ✓ will
p
lie within the region X̄ ± z↵/2 S/ n about the sample mean X̄ .
The above inspires the following definition of an approximate 100(1 ↵) percent
confidence estimate if the population mean ✓ .

Definition. If the observed values of the sample mean and sample standard deviation are
p
X̄ = x̄ and S = s, then we call the interval x̄ ± z↵/2 s/ n an approximate 100(1 ↵)
percent confidence interval estimate of ✓ .

Suppose again that as in a simulation we have the option to continually generate


additional data values as needed. In this case, further suppose that we are interested in
determining when to stop generating data values. Intuitively, we require a sufficiently
large number of data values to allow the central limit theorem to apply, however, pro-
vided that the 100(1 ↵) percent confidence interval is also “small enough”, then we
can stop generating additional values. Being a little more precise, a natural approach
is to choose values ↵ and l, where l denotes the length of the interval, and to continue
generating data until the approximate 100(1 ↵) percent confidence interval estimate
p
of the population mean ✓ is less than l. The length of such an interval is 2z↵/2 S/ n and
hence we can achieve this as follows. The following procedure can be used to decide
when to stop generating new data for a confidence interval:

Step 1: Choose an acceptable value ↵ and l.


13.3. Exercises for Self-Study 213

Step 2: Generate at least 100 data values.

Step 3: Continue to generate additional data values, stopping after we generate k


p
values such that 2z↵/2 S/ k < l holds.

Step 4: If x̄ and s are the observed values of X̄ and S, then the 100(1 ↵) confidence
p
interval estimate of ✓ , whose length is less than l, is x̄ ± z↵/2 s/ k.

As was previously noted, in the case when the X i ’s are Bernoulli (or 0,1) random
variables, the analysis is modified. More precisely, suppose X 1 , X 2 , . . . , X n are Bernoulli
random variables such that
8
<1, with probability p,
Xi =
:0, with probability 1 p.

Recall that E[X i ] = p and that Var(X i ) can be approximated using X̄ (1 X̄ ) . It follows
that the when n is large, the analogous statement to (13.6) is

p (X̄ p)
n ⇠
˙ N (0, 1) . (13.8)
X̄ (1 X̄ )
For any ↵ 2 (0, 1), we therefore have
® v v ´
t X̄ (1 X̄ ) t X̄ (1 X̄ )
P X̄ z↵/2 < p < X̄ + z↵/2 z↵/2 ⇡1 ↵.
n n

In particular, if the observed value of the sample mean X̄ is denoted by pn , we say that
the 100(1 ↵) percent confidence interval estimate of the expected value p is
v
t p (1 p )
n n
pn ± z↵/2 .
n

13.3 Exercises for Self-Study


1. For any (finite) set of numbers x 1 , x 2 , . . . , x n , show that
X
n X
n
2
(x i x̄) = x i2 nx̄ 2 ,
i=1 i=1
Pn
where x̄ = i=1 x i /n.

2. Give a probabilistic proof of the result of the previous exercise. This can be
achieved by letting X denote a random variable that is equally likely to take any
of the n values before applying suitable algebraic identities from Chapter 8.
214 Chapter 13. Statistical Analysis of Simulated Data

3. Write a computer program that uses the recursions (13.5) in order to calculate the
sample mean and sample variance of a data set.

4. Continually generate n standard normal random variables until n 100 and


p
S/ n < 0.1 hold, where S is the sample standard deviation of the n values.

a) How many normals do you think will be generated?

b) How many normals did you generate?

c) What is the sample mean of all the generated normals?

d) What is the sample variance?

e) Comment on the results c) and d). Were these surprising?

5. Repeat the previous exercise with the exception that you now continue generating
p
standard normals until S/ n < 0.01.
R1 2
6. Estimate 0
e x d x by generating random numbers. Generate at least 100 values
and stop when the standard deviation of your estimator is less than 0.01.
215

Chapter 14

Variance Reduction Techniques

Recall (from the Chapter 13) that typically we are usually interested in determining
some parameter ✓ that is connected with some stochastic model when undertaking a
simulation study. To estimate this parameter, a simulation of the model results in output
data X with the property that E[X ] = ✓ . Repeated runs of the simulation are performed,
2
where the i-th run yields the output variable X i . Let = Var(X i ) denote the variance
of the X i ’s. As explained in Chapter 13, we terminate the simulation study after n runs
and the estimate of ✓ is given by calculating the sample mean X̄ of the X i ’s, namely

Xn
Xi
X̄ = .
i=1
n

Further, recall the sample mean X̄ is an unbiased estimator of ✓ and it therefore follows
by (13.2) (from Chapter 13) that its mean squared error equals its variance, namely that

⇥ ⇤ 2
E (X̄ ✓ )2 = Var(X̄ ) = .
n

To this point we have reduced the variance of our estimator X̄ by increasing the value
of n. The issue with this approach is that sometimes we would have to simulate a very
large number of observations n in order to get the variance within some predetermined
acceptable range. It turns out that we could face such an issue even when working with
a seemingly quite simple model.
In the following we present other more efficient methods that one can use to reduce
the variance of the simulation estimator X̄ , namely Var(X̄ ) . In particular, we will out-
line two such approaches for variance reduction called antithetic varieties and control
varieties, respectively. Informally, antithetic varieties makes use of pairs of variables
that are highly negatively correlated to reduce variance. In contrast, control varieties
informally reduces variance by making use of linear combinations of random variables
with high (positive or negative) correlation, where the population mean of one of the
random variables is known.
216 Chapter 14. Variance Reduction Techniques

14.1 The Use of Antithetic Variables


Suppose once more that we are interested in using simulation to estimate ✓ = E[X ] and
additionally assume that we have generated two random variables X 1 and X 2 that are
identically distributed random variables with mean ✓ . Upon making use of an expression
from Chapter 8, notice that
Å ã
X1 + X2
Var =
2
Recall that if the random variables X 1 and X 2 are independent, then their covariance
satisfies Cov (X 1 , X 2 ) = 0. If in contrast X 1 and X 2 are negatively correlated, i.e. that
Cov(X 1 , X 2 ) < 0, then the variance of the estimator (X 1 + X 2 )/2 decreases which is
advantageous. We later show this a little more formally. It should be noted that the
word antithetic means “opposite” or directly opposed/contrasted, which should provide
some appreciation for why that word is used in this context.
A natural question is to here ask how we would arrange for the X 1 and X 2 to be
negatively correlated? For this purpose, let us suppose that X 1 is a function of m random
numbers, namely
X 1 = h (U1 , U2 , . . . , Um ) ,

where U1 , U2 , . . . , Um are m independent uniform random numbers and h denotes some


function. Note that if U denotes some random number, in the sense it is uniformly
distributed on (0, 1) , then it follows that so is 1 U. In light of this, the random variable

X 2 = h (1 U1 , 1 U2 , . . . , 1 Um )

has the same distribution as X 1 . Further, since we note that 1 U is negatively correlated
with U, our hope is that X 2 is therefore negatively correlated with X 1 . Recall that our
random variables X 1 and X 2 depend on some unknown function h and perhaps unsur-
prisingly deciding if the random variables are negatively correlated will depend on this
underlying function. It turns out that the random variables X 1 and X 2 are negatively
correlated when the underlying function h is monotone. Note that using X 1 and X 2 as
described above has double benefit, where not only do we reduce the variance of our
estimator provided the function h is monotone, we additionally save some computation
time as we do not need to generate a second set of random numbers.
To show that the use of antithetic variables will lead to a reduction in variance when-
ever the function h is monotone, we make use of the following Theorem before deducing
the result of interest as a corollary. It should be noted that we state the initial result with-
out proof in order to simplify the presentation of the material, however, you can find
the complete proof here [21, Section 9.9].
14.1. The Use of Antithetic Variables 217

Theorem. If X 1 , X 2 , . . . , X n are independent, then for any increasing functions f and g of


n variables, then
E[ f (X ) · g(X)] E[ f (X )] · E[g(X )] ,

where X = (X 1 , X 2 , . . . , X n ) .

Using an expression from Chapter 8, we note that the above is equivalent to

Cov ( f (X ), g(X) 0.

The following corollary proves the result that provided the function h is monotone
on each of its arguments, then the random variables X 1 and X 2 cannot be positively
correlated, which as described above is advantageous for variance reduction.

Corollary. If h denotes a monotone function in each of its n arguments, then for a set
U1 , U2 , . . . , Um of independent uniform random numbers we have

Cov h(U1 , U2 , . . . , Um ), h(1 U1 , 1 U2 , . . . , 1 Um )  0 . (14.1)

Proof.

It should be emphasised that we have shown that if the random variables are

X 1 = h(U1 , U2 , . . . , Um ) and X 2 = h(1 U1 , 1 U2 , . . . , 1 Um )

and that h is a monotone function, then (14.1) yields that Cov(X 1 , X 2 )  0 holds, namely
that we have a reduction in variance as desired. A natural question is to here ask how
much is this reduction in variance?
For this purpose, we will now formally compare the variance of the estimator be-
tween two independent and two antithetic variables. Let X 1 and X 2 be an antithetic
pair of random variables as defined above. Let Y1 and Y2 be independent over the same
distribution as the X i ’s. Suppose further that the aforementioned random variables all
2
have the same variance .
218 Chapter 14. Variance Reduction Techniques

Recall (from Chapter 8) that the correlation coefficient of two random variables X
and Y , denoted here by ⇢X Y , is defined as

Cov(X , Y )
⇢X Y = Corr(X , Y ) = p . (14.2)
Var(X ) · Var(Y )

Upon making use of expressions from Chapter 8, notice that


Å ã Å ã Å ã Å ã
X1 + X2 X1 X2 X1 X2
B = Var = Var + Var + Cov ,
2 2 2 2 2
1
= Var (X 1 ) + Var (X 2 ) + Cov (X 1 , X 2 )
4
1
= 2 2 + 2Cov (X 1 , X 2 )
4
2
+ ⇢X 1 X 2 2 2
= = 1 + ⇢X 1 X 2 ,
2 2
where the penultimate equality follows by rearranging (14.2) and upon recalling that
2
the random variables X 1 and X 2 have the same variance by assumption.
In a similar fashion, when we have the independent random variables Y1 and Y2 , we
deduce by independence that
Å ã 2
Y1 + Y2 1
C = Var = Var(Y1 ) + Var(Y2 ) = .
2 4 2

In consequence, we deduce that the reduction in variance by making use of a pair of


antithetic variables is
C B
= ⇢X 1 X 2 .
C
In particular, this demonstrates that one reduces variance if ⇢X 1 X 2 < 0 holds, i.e. if the
random variables X 1 and X 2 are negatively correlated. Further, we observe the variance
reduction increases as the negative correlation between X 1 and X 2 increases.
The following example shows how one could make use of antithetic varieties for
variance reduction when simulating the reliability function. This is one of the most
widely used functions in both applied data analysis and reliability engineering as it gives
the probability of an item or some component operating for a certain amount of time
without failure.

Example. (Variance reduction for the reliability function)


14.1. The Use of Antithetic Variables 219

Recall that our input variables have to date been uniform and we have made use of
the negative correlation between the uniform input U with 1 U to reduce variance. In
some scenarios, the relevant output of a simulation study is some function of the input
variables Y1 , Y2 , . . . , Ym . In other words, sometimes the relevant output is

X = h(Y1 , Y2 , . . . , Ym ) ,

where h once more denotes some function. Similarly, the approach we take is to generate
two random variables X 1 and X 2 that estimate the relevant output X by making use of the
estimator (X 1 + X 2 )/2 . Further, we simultaneously reduce the variance of this estimator
by making use of underlying antithetic variables.
Let us suppose that the input variable Yi has corresponding cumulative distribu-
tion function Fi for each i = 1, 2, . . . , m. Note that if we generate the input variables
Y1 , Y2 , . . . , Ym using the inverse transform method, then

X = h F1 1 (U1 ), F2 1 (U2 ), . . . , Fm1 (Um ) .

Recall that each distribution function Fi is a monotonically increasing (i.e. a nondecreas-


1
ing) function and, in consequence, it follows that its inverse Fi is also monotonically
increasing. Hence, if h is a monotone function on each of its arguments, then

h F1 1 (U1 ), F2 1 (U2 ), . . . , Fm1 (Um )

is a monotone function on the underlying Ui ’s. In particular, if we use U1 , U2 , . . . , Um to


generate X 1 and 1 U1 , 1 U2 , . . . , 1 Um to generate X 2 , then by using the antithetic
variate method, the variance of the estimator (X 1 + X 2 )/2 would reduce when compared
to simply generating a new set of random numbers for X 2 .
The following example shows what kind of improvement could be gained when
using antithetic variables. We will once more compare the variance reduction via this
approach against simply using two independent random numbers.
220 Chapter 14. Variance Reduction Techniques

Example. (Using antithetic variables for definite integration)

The following example demonstrates how one can use antithetic variables to esti-
mate the value of some quantity. In particular, we outline how we may estimate the
famous constant e in mathematics (which was introduced in Chapter 8), which is de-
fined by e = limn!1 (1 + 1/n)n and takes on rough value 2.7183.

Example. (Variance reduction while estimating e)

To this point we have reduced variance by generating antithetic variables using the
relation between uniform random numbers U and 1 U. Upon working with normally
distributed random variables, we can apply similar antithetic ideas in order to reduce
variance. For this purpose, let us suppose that we are working with normal random
2
variables with mean µ and variance . Suppose we have generated such a random
variable Y and then consider the variable Y 0 = 2µ Y . Upon making use of several
expressions from Chapter 8, we notice that

E[Y 0 ] = E[2µ Y ] = 2µ E[Y ] = µ


Var(Y 0 ) = Var(2µ Y ) = Var(Y ) = 2
,
14.2. The Use of Control Varieties 221

which shows that both Y and Y 0 are both normal random variables with the same mean
and variance. Further, observe that

Cov(Y, Y 0 ) = Cov(Y, 2µ Y)
1
= Var(Y + 2µ Y ) Var(Y ) Var(Y 0 )
2
1
= Var(2µ) 2 2 = 2
,
2

which shows that Y and Y 0 are negatively correlated and suggests that utilising such
random variables will yield the desired variance reduction.
In particular, if we were interested in using simulation for computing
⇥ ⇤
E h(Y1 , Y2 , . . . , Ym ) ,

where the Yi ’s are independent normal random variances with corresponding means µi
for i = 1, 2, . . . , m and h denotes some function. Recall inequality (14.1) and note that
the result was proven not using density of uniform random variables but rather that they
are independent and identically distributed. In light of this, it turns out that no if h is
once more a monotone function on its coordinates, then (14.1) holds upon replacing Ui
and 1 Ui by Yi and 2µi Yi for each i = 1, 2, . . . , m, respectively.
Being a little more precise, in this setting the antithetic approach is to generate
m normal random variables Y1 , Y2 , . . . , X m with corresponding means µi for each i to
compute h(Y1 , Y2 , . . . , Ym ) , before using the corresponding antithetic variables 2µi Yi
to compute the next simulated value of h. In particular, if h is monotone, we yield that

Cov h(Y1 , Y2 , . . . , Ym ), h(2µ1 Y1 , 2µ2 Y2 , . . . , 2µm Ym )  0

holds, showing that we obtain a reduction in variance when compared with simply gen-
erating a second set of m normal random variables.

14.2 The Use of Control Varieties


Suppose again that we are interested in estimating the value of ✓ = E[X ], where X is
the output of a simulation. Further, suppose that for some other output variable Y , the
expected value of Y is known. Denote the known expected value of this other output
variable by E[Y ] = µY .
Notice that for any constant c, the quantity

X + c(Y µY ) ,
222 Chapter 14. Variance Reduction Techniques

known as the controlled estimator, is an unbiased estimator of ✓ since

which follows upon making use of several algebraic expressions for expectation (from
Chapter 8). Further, upon using similar expressions for variance (from Chapter 8), we
observe that

Var X + c(Y µY ) =Var (X + cY )


=Var (X ) + Var (cY ) + 2 Cov(X , cY )
=Var (X ) + c 2 Var (Y ) + 2c Cov(X , Y ) .

Motivated by the task of determining the best value of c, denoted here by c ⇤ , that
minimises this variance, we use standard techniques from calculus to find that

Cov(X , Y )
c⇤ =
Var(Y )

and, in consequence, for this value, the variance of the controlled estimator is

2

Cov(X , Y )
Var X + c (Y µY ) = Var(X ) . (14.3)
Var(Y )

Recall that the quantity Y is by assumption an output variable with already known
expected value µY . In light of this, the quantity Y is called a control variate for the
simulation estimator X , where we have intuitively “assumed some control” over this
output variable Y . In order to reduce variance we want X and Y to be either highly
positively or highly negatively correlated.
Upon dividing (14.3) by Var(X ), we yield that

Var X + c ⇤ (Y µY )
=1 Corr2 (X , Y ) = 1 ⇢X2 Y ,
Var(X )

where recall from (14.2) that ⇢X Y = Corr(X , Y ) denotes the correlation between the
outputs X and Y . Further, in light of this equality, we deduce that the variance reduction
obtained using the control variate Y is a percentage reduction of 100 ⇢X2 Y .
It should be emphasised that the quantities Cov(X , Y ), Var(X ) and Var(Y ) would
not necessarily generally be known in advance. Hence, we must once more estimate
their values using the simulated data. For this purpose, let us suppose that n simulation
runs have been performed, where we have obtained the outputs X i and Yi for each
14.2. The Use of Control Varieties 223

i = 1, 2, . . . , n. Using this output data, we then calculate the corresponding sample


means X̄ and Ȳ (from Chapter 13). Then (from Chapter 13) we make use the estimators

1 X
n
d ,Y) =
Cov(X Xi X̄ Yi Ȳ
n 1 i=1

1 Xn
2

Var(X ) = Xi X̄
n 1 i=1

1 X
n
2

Var(Y ) = Yi Ȳ
n 1 i=1


to approximate c ⇤ , where note that Var(·) denotes the sample variance. Let us denote
the approximate value of c ⇤ by b
c ⇤ , where
Pn
i=1 Xi X̄ Yi Ȳ
c⇤ =
b Pn 2
.
i=1 Yi Ȳ

Further, following a similar argument that was presented in Chapter 13, the variance of
the controlled estimator
Ç å
1X
n
Var X̄ + c ⇤ (Ȳ µY ) = Var X i + c ⇤ (Yi µY )
n i=1
Ç n å
1 X
= 2 Var X i + c ⇤ (Yi µY )
n i=1
1 Ä ä
= 2 n · Var X + c ⇤ (Y µY )
n
2
!
1 Cov(X , Y )
= Var(X ) ,
n Var(Y )

where the final inequality follows by (14.3). In particular, this shows that the variance of
the controlled estimator can be estimated using the estimator Cov(Xd , Y ) for covariance

and the sample variance estimators V ”
ar(X ) and Var(Y ) , respectively.
The following example makes use of the previously discussed reliability function to
demonstrate how control varieties can be used to reduce variance.

Example. (Reducing variance for the reliability function)


224 Chapter 14. Variance Reduction Techniques

The next example considers a queuing system where customers arrive in accordance
with the nonhomogeneous Poisson process with intensity function (s), where s 0.

Example. (Reducing variance in a queuing based setting)

During our next example, we consider how control varieties may be used reduce
variance when we are interested in estimating the value of some definite integral. This
integral was introduced previously when we demonstrated how antithetic variables can
lead to significant variance reduction.

Example. (Control variates for definite integration)

The following example introduces a list recording problem. Suppose for this purpose
that we are given a (finite) set of n elements, that are arranged in an ordered list. A
request is made at each unit time to retrieve one of these elements with some probability,
where the selected element is then put back into the list but not necessarily in the same
position. Note that when we place the selected element back into the list we normally
would make use of some reordering rule (such as it is interchanged with its preceding
14.2. The Use of Control Varieties 225

element). The problem starts with an initial ordering (where any of the possible n!
orderings of the n elements are equally likely), before we determine the expected sum
of the positions of the first N elements requested. The following example demonstrates
how we may use simulation to accomplish this task efficiently.

Example. (A list recording problem)

Recall that for any constant c the controlled estimator X + c(Y µY ) is an unbiased
estimator of ✓ = E[X ] , where the expected value of Y is assumed known. It is perhaps
unsurprising that we could use more than a single variable as a control if needed. If
for example a simulation study results in output variables Yi for i = 1, 2, . . . , k and the
values E[Yi ] = µi are known for each i, then for any constants ci we can use

X
k
X+ ci (Yi µi )
i=1

as an unbiased estimator of E[X ] .


The final example demonstrates how control variates can be used to reduce variance
when using simulation to estimate a player’s expected winnings per round when playing
blackjack. This is a card game that is often played with the dealer shuffling multiple
decks of cards, putting aside used cards and finally reshuffling when the number of
remaining cards is below some limit.

Example. (Blackjack)
226 Chapter 14. Variance Reduction Techniques

To conclude this chapter, we make a number of remarks regarding both control va-
rieties and antithetic variables.

Remarks.

1. One particularly valuable way of interpreting the control variates approach is that
it combines unbiased estimators of ✓ . In particular, suppose that X and W are
determined by the simulation with the property that E[X ] = E[W ] = ✓ . We may
then consider any unbiased estimator of the form

↵X + (1 ↵)W,

which is unbiased for all ↵. Similarly, the best such estimator, that is obtained by
choosing the value of ↵ that minimise variance, denoted here by ↵⇤ , is given by
Var(W ) Cov(X , W )
↵⇤ = , (14.4)
Var(X ) + Var(W ) 2 Cov(X , W )
which follows using expressions for variance (from Chapter 8) and by standard
techniques from calculus.
Suppose once more that for some other output variable Y , that the expected value
E[Y ] = µY is known. Note that we have two unbiased estimators, namely X and
X +Y µY . Further, these can be combined to yield the combined estimator

↵X + (1 ↵) (X + Y µY ) = X + (1 ↵) (Y µY ) ,

which is exactly the controlled estimator with c = 1 ↵.


In the other direction, if we have the raw estimator X and make use of the control
variate X W , which is known to have mean 0, then we obtain an estimator of
the form
X + c(X W ) = (1 + c)X cW,

which is exactly the controlled estimator with ↵ = 1 + c. In particular, this argu-


ment demonstrates an equivalence between combined unbiased estimators and
control variables.

2. The above remark suggests that the antithetic variable approach can be thought
of as a special case of combined unbiased estimators and thus control variates.
In particular, if E[X ] = ✓ , where X = h(U1 , U2 , . . . , Un ), then E[W ] = ✓ , where
W = h(1 U1 , 1 U2 , . . . , 1 Un ) . The estimators X and W are both unbiased and
we combine them to yield
↵X + (1 ↵)W.
14.3. Exercises for Self-Study 227

Since X and W have the same distribution we see that Var(X ) = Var(W ). It follows
by (14.4) that the best value of ↵ is ↵ = 1/2 and as such
X +W
↵X + (1 ↵)W = ,
2
i.e. the combined unbiased estimators become the antithetic variable estimator.

3. The above remark further indicates why it is not usually possible to effectively
combine antithetic variables with a control variable. In particular, if a control
variate Y has large positive correlation with X = h(U1 , U2 , . . . , Un ), then Y likely
has large negative correlation with W = h(1 U1 , 1 U2 , . . . , 1 Un ) . It follows
that Y is unlikely to have a large correlation with the antithetic variate

1
h(U1 , U2 , . . . , Un ) + h(1 Un , 1 U2 , . . . , 1 Un ) .
2

14.3 Exercises for Self-Study


1. Suppose we want to estimate ✓ , where
Z 1
2
✓= e x d x.
0

a) Show that
2
e U (1 + e1 2U )
(14.5)
2
is an unbiased estimator of ✓ , where U denotes a random number.
b) Show that using the unbiased estimator (14.5) is better than generating two
2 2
random numbers U1 and U2 and using the estimator (e U1 + e U2 )/2.

2. Explain how antithetic variables can be used in obtaining a simulation of the quan-
tity
Z 1Z 1
2
✓= e(x+ y) d y d x.
0 0
Is it clear in this case that using antithetic variables is more efficient than gener-
ating a new pair of random numbers?

3. Let X i , where i = 1, 2, . . . , 5 be independent exponential random variables each


with mean 1. Consider the quantity ✓ defined by
® 5 ´
X
✓=P i X i 21.6 .
i=1
228 Chapter 14. Variance Reduction Techniques

a) Explain how we can use simulation to estimate ✓ .

b) Give the antithetic variable estimator.

c) Is the use of antithetic variables efficient in this case?


X +Y
4. Show that if X and Y have the same distribution then Var 2  Var(X ) and con-
clude that the use of antithetic variables can never increase variance. It should be
noted that this does not imply that using antithetic variables is always as efficient
as generating an independent set of random numbers.

5. If Z is a standard normal random variable, design a simulation study to estimate


the value
✓ = E[Z 3 e Z ] .

Perform this simulation to obtain an interval of length no greater than 0.1 that
you can assert with 95% confidence contains the value of ✓ .

6. Show that Var(↵X + (1 ↵)W ) is minimised by ↵ being equal to the value given
by (14.4) and determine the resulting variance.

7. Recall from Exercise 1 that we wish to estimate ✓ , where


Z 1
2
✓= e x d x.
0

a) Explain how control variables may be used to estimate ✓ .

b) Perform 100 simulation runs, using the control given in a), in order to esti-
mate firstly c ⇤ and then the variance of the estimator.

c) Using the same data as in b), determine the variance of the antithetic variable
estimator.

d) Which of the two types of variance reduction worked better in this scenario?
229

Chapter 15

A Brief Introduction to Markov Chains

For a brief introduction to Markov chains you should read Sections 11.1, 11.2, 11.3
and 11.4 from [6]. These sections are highlighted and can be found on Moodle. It is
relatively easy reading. You may additionally look at Sections 11.5 and 11.6 from the
same textbook, which can be found on Moodle.
231

Chapter 16

Markov Chain Monte Carlo methods

For an overview of the Markov chain Monte Carlo method you should read Sections 12.1
and 12.2 from [6]. These sections are highlighted and can be found on Moodle.
233

Bibliography

[1] Ravindra K Ahuja, Thomas L Magnanti, James B Orlin, and MR Reddy. “Applica-
tions of network optimization”. In: Handbooks in Operations Research and Man-
agement Science 7 (1995), pp. 1–83.
[2] Martin Anthony and Michele Harvey. Linear algebra: concepts and methods. Cam-
bridge University Press, 2012.
[3] Robert G Bartle and Donald R Sherbert. Introduction to real analysis. 4th ed. John
Wiley & Sons, Inc., 2011.
[4] Richard Bellman. “On a routing problem”. In: Quarterly of applied mathematics
16.1 (1958), pp. 87–90.
[5] Dimitri P Bertsekas. Nonlinear Programming. 3rd ed. Athena Scientific, 2016.
[6] Joseph K Blitzstein and Jessica Hwang. Introduction to probability. CRC Press,
Taylor & Francis Group, 2015.
[7] George Dantzig, Ray Fulkerson, and Selmer Johnson. “Solution of a large-scale
traveling-salesman problem”. In: Journal of the operations research society of Amer-
ica 2.4 (1954), pp. 393–410.
[8] George B Dantzig. “Maximization of a linear function of variables subject to linear
inequalities”. In: Activity analysis of production and allocation 13 (1951), pp. 339–
347.
[9] George B Dantzig. “Origins of the simplex method”. In: A history of scientific com-
puting. 1990, pp. 141–151.
[10] George B Dantzig. “Reminiscences about the origins of linear programming”. In:
Mathematical Programming The State of the Art. Springer, 1983, pp. 78–86.
[11] Lester R Ford and Delbert R Fulkerson. “Flows in Networks”. In: Flows in Net-
works. Princeton University Press, 1962.
[12] Lester R Ford Jr. Network flow theory. Tech. rep. Rand Corp Santa Monica Ca,
1956.
234 Bibliography

[13] Michael R Garey and David S Johnson. Computers and Intractability: A Guide to
the Theory of NP-Completeness. Vol. 174. Freeman San Francisco, 1979.

[14] Bezalel Gavish and Stephen C Graves. “The travelling salesman problem and re-
lated problems”. In: (1978).

[15] Arthur S Goldberger. Econometric Theory. New York: John Wiley & Sons, Inc.,
1964.

[16] Narendra Karmarkar. “A new polynomial-time algorithm for linear programming”.


In: Proceedings of the sixteenth annual ACM symposium on Theory of computing.
1984, pp. 302–311.

[17] Leonid G Khachiyan. “A polynomial algorithm in linear programming”. In: Dok-


lady Akademii Nauk. Vol. 244. 5. Russian Academy of Sciences. 1979, pp. 1093–
1096.

[18] Joseph Lee Rodgers and W Alan Nicewander. “Thirteen ways to look at the cor-
relation coefficient”. In: The American Statistician 42.1 (1988), pp. 59–66.

[19] Derrick H Lehmer. “Mathematical methods in large-scale computing units”. In:


Annals of the computation laboratory of Harvard University 26 (1951), pp. 141–
146.

[20] Clair E Miller, Albert W Tucker, and Richard A Zemlin. “Integer programming
formulation of traveling salesman problems”. In: Journal of the ACM (JACM) 7.4
(1960), pp. 326–329.

[21] Sheldon M Ross. Simulation. 5th ed. Academic Press, 2013.

[22] John Von Neumann. “Various techniques used in connection with random digits”.
In: National Bureau of Standards Applied Mathematics Series 12 (1951), pp. 36–
38.

You might also like