0% found this document useful (0 votes)
12 views31 pages

26 Making Decisions

This document provides an overview of making decisions using probabilistic machine learning models. It discusses how probabilistic models can provide predictions conditional on actions, and how decision theory aims to choose actions that minimize expected loss. Experimental design and learning by doing approaches are presented that estimate returns while accounting for uncertainty through exploration of options. The document discusses balancing exploration of uncertain options with exploitation of options believed to have high returns.

Uploaded by

ian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

26 Making Decisions

This document provides an overview of making decisions using probabilistic machine learning models. It discusses how probabilistic models can provide predictions conditional on actions, and how decision theory aims to choose actions that minimize expected loss. Experimental design and learning by doing approaches are presented that estimate returns while accounting for uncertainty through exploration of options. The document discusses balancing exploration of uncertain options with exploitation of options believed to have high returns.

Uploaded by

ian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Probabilistic Machine Learning

Lecture 26
Making Decisions

Philipp Hennig
20 July 2021

Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
# date content Ex # date content Ex
1 20.04. Introduction 1 14 09.06. Generalized Linear Models
2 21.04. Reasoning under Uncertainty 15 15.06. Exponential Families 8
3 27.04. Continuous Variables 2 16 16.06. Graphical Models
4 28.04. Monte Carlo 17 22.06. Factor Graphs 9
5 04.05. Markov Chain Monte Carlo 3 18 23.06. The Sum-Product Algorithm
6 05.05. Gaussian Distributions 19 29.06. Example: Modelling Topics 10
7 11.05. Parametric Regression 4 20 30.06. Mixture Models
8 12.05. Learning Representations 21 06.07. EM 11
9 18.05. Gaussian Processes 5 22 07.07. Variational Inference
10 19.05. Understanding Kernels 23 13.07. Tuning Inference Algorithms 12
11 26.05. Gauss-Markov Models 24 14.07. Kernel Topic Models
12 25.05. An Example for GP Regression 6 25 20.07. Outlook
13 08.06. GP Classification 7 26 21.07. Revision

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 1
The Toolbox

Framework:
Z
p(y | x)p(x)
p(x1 , x2 ) dx2 = p(x1 ) p(x1 , x2 ) = p(x1 | x2 )p(x2 ) p(x | y) =
p(y)

Modelling: Computation:
▶ graphical models ▶ Monte Carlo
▶ Gaussian distributions ▶ Linear algebra / Gaussian inference
▶ (deep) learnt representations ▶ maximum likelihood / MAP
▶ Kernels ▶ Laplace approximations
▶ Markov Chains ▶ EM / variational approximations
▶ Exponential Families / Conjugate Priors
▶ Factor Graphs & Message Passing

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 2
So you’ve got yourself a posterior …now what?
Taking a decision means conditioning on a variable you control
mass [kg]

running gorging dieting gorging gym veg

2010 2011 2012 2013

p(w′ | run) p(w′ | diet)

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 3
Decision Theory
The limit of probabilistic reasoning?

▶ probabilistic models can provide predictions p(x | a) for a variable x conditional on an action a
▶ given the choice, which value of a do you prefer?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 4
Decision Theory
The limit of probabilistic reasoning?

▶ probabilistic models can provide predictions p(x | a) for a variable x conditional on an action a
▶ given the choice, which value of a do you prefer?

▶ assign a loss or utility ℓ(x)


▶ choose a such that it minimizes expected loss
Z
a∗ = arg min ℓ(x)p(x | a) dx
a

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 4
Expected Regret/utility
if you keep having to take the same decision, optimise the sum of its return

▶ consider independent draws xi with xi ∼ p(x | ai )


▶ choose all ai = a∗ to minimize the accumulated loss
" #
X
L(n) = Ep xi
i

▶ but what if you don’t know p?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 5
Motivating (Historical) Example
Experimental Design

0.8

0.6
payout

0.4

0.2

0
100 101 102 103
N

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 (2,3) 6
Learning by Doing
Estimating return while taking actions

Perhaps we shouldn’t rule out an option yet if the posteriors over their expected return overlaps with that
of our current guess for the best option?
▶ Assume K choices.
▶ Taking choice k ∈ [1, . . . , K] at time i yields binary (Bernoulli) reward/loss xi with probability
πk ∈ [0, 1], iid.
▶ conjugate priors p(πk ) = B(π, a, b) = B(a, b)−1 π a−1 (1 − π)b−1
▶ posteriors from nk trys of choice k with mk successes:
p(πk | nk , mk ) = B(πk ; a + mk , b + (nk − mk ))
▶ for a, b _ 0, posterior has mean and variance
mk mk (nk − mk )
π̄k := Ep(πk |nk ,mk ) [π] = σk2 := varp(πk |nk ,mk ) [π] = = O(n−1
k )
nk n2k (nk + 1)
q
Choose option k that maximizes π̄k + c σk2 for some c. Which c?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 (2,3) 7
Learning by Doing
Estimating return while taking actions

Perhaps we shouldn’t rule out an option yet if the posteriors over their expected return overlaps with that
of our current guess for the best option?q
Choose option k that maximizes π̄k + c σk2 for some c. Which c?

▶ A large c ensures uncertain options are preferred. If we make it too large, we will only explore.
▶ A small c largely ignores uncertainty. We will only exploit.
1/2
▶ Idea: Let c grow slowly over time, at rate less than O(nk ). Then variance of chosen options will
drop faster than c grows, so their exploration will stop, unless their mean is good. But unexplored
choices will eventually become dominant, thus always explored eventually.

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 8
Not just for Bernoulli variables!
posterior contraction rates are universal

Theorem (Chernoff-Hoeffding)
Let X1 , . . . , Xn be random variables with common range [0, 1] and such that E[Xt | X1 , . . . , Xt−1 ] = µ.
Let Sn = X1 + · · · + Xn . Then for all a ≥ 0,
2 2
p(Sn − nµ ≤ −a) ≤ e−2a /n and p(Sn − nµ ≥ a) ≤ e−2a /n

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 9
The Multi-Armed Bandit Setting
Discrete-Choice Experimental Design [Auer, Cesa-Bianchi, Fischer, Machine Learning 47(2002), 235–256]

Definitions:
▶ A K-armed bandit is a collection Xkn of random variables, 1 ≤ k ≤ K, n ≥ 1 where k is the arm of
the bandit. Successive plays of k yield rewards Xk1 , Xk2 , . . . which are independent and identically
distributed according to an unknown p with Ep (Xki ) = µi .
▶ A policy A chooses the next machine to play at time n, based on past plays and rewards.
▶ Let Tk (n) be number of times machine k was played by A during the first n plays. The regret of A is
X
RA (n) = µ∗ · n − µj · Ep [Tj (n)] with µ∗ := max µk
1≤k≤K
j

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 10
The Multi-Armed Bandit Setting
Discrete-Choice Experimental Design [Auer, Cesa-Bianchi, Fischer, Machine Learning 47(2002), 235–256]

Algorithm: Let x̄j : empirical average of rewards from j, nj : number of plays at j in n plays
1 procedure UCB(K) Upper Confidence Bound
2 play each machine once
3 while true do  q 
4 play j = arg max x̄j + 2 log nj
n

5 end while
6 end procedure

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 11
The Multi-Armed Bandit Setting
Discrete-Choice Experimental Design [Auer, Cesa-Bianchi, Fischer, Machine Learning 47(2002), 235–256]

Theorem (Auer, Cesa-Bianchi, Fischer)


Consider K machines (K > 1) having arbitrary reward distributions P1 , . . . , PK with support in [0, 1] and
expected values µi = EP (Xi ). Let ∆i := µ∗ − µi . Then, the expected regret of UCB after any number n
of plays is at most
   
X  log n  
π 2
 X
EP [RA (n)] ≤ 8 + 1+  ∆j 

∆i 3
i:µi ≤µ j

Nb: The sums are over K, not n. So the regret is O(K log n). UCB plays a sub-optimal arm at most
logarithmically often.

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 12
Visualization
K = 3, binary rewards

2,500 104
regret bound
3 expected regret
2,000 10
sampled regret

102
1,500
p = 50%
nit

regret
p = 55% 101
t

p = 45%
1,000
100

500
10−1

0 10−2 0
0 500 1,000 1,500 2,000 2,500 3,000 10 101 102 103 104
N N

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 13
Multi-Armed Bandit Algorithms
▶ apply to independent, discrete choice problems with stochastic pay-off
▶ algorithms based on upper confidence bounds incur regret bounded by O(log n)
▶ this even applies for the adversarial setting (Auer, Cesa-Bianchi, Freund, Schapire, 1995)

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 14
Multi-Armed Bandit Algorithms
▶ apply to independent, discrete choice problems with stochastic pay-off
▶ algorithms based on upper confidence bounds incur regret bounded by O(log n)
▶ this even applies for the adversarial setting (Auer, Cesa-Bianchi, Freund, Schapire, 1995)

Unfortunately…
▶ No problem is ever discrete, finite and independent
▶ in a continuous problem, no “arm” can and should ever be played twice
▶ in many prototyping settings, early exploration is free

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 14
Continuous-Armed Bandits
example application: parameter optimization

0
f

−2

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

X
T
p(y | x) = N (y; fx , σ 2 ) x∗ = arg min f(x) = ? R(T) := f(xt ) − f(x∗ )
x∈D
t=1

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 15
Continuous-Armed Bandits
example application: parameter optimization

0
f

−2

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

Z Z
p(y | x) = N (y; fx , σ 2 ) p(f) = GP(f; µ, k) ⇒ pmin (x∗ = x) = I(f(x) < f(x̃)) dx̃ dp(f | y)
R D

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 15
GP Upper Confidence Bound
Evaluate optimistally, where the function may be low Srinivas, Krause, Kakade, Seeger, ICML 2009

2
▶ utility under p(f | y) = GP(f; µt−1 , σt−1
2
)
p
ui (x) = µi−1 (x) − βt σt−1 (x)
0

▶ choose xt as xt = arg minx∈D u(x)

−2 Theorem (Srinivas et al., 2009)


f

Let δ ∈ (0, 1) and βt = 2 log(|D|t2 π 2 /6δ).


Running GP-UCB with βt for a sample f ∼ GP(µ, k),
−4  q 
p RT ≤ 8TβT γT / log(1 + σ 2 ) ∀T ≥ 1 ≥ 1−δ

−6 thus limT _ ∞ RT /T = 0 (“no regret”).


−4 −2 0 2 4
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 16
GP Upper Confidence Bound
Evaluate optimistally, where the function may be low Srinivas, Krause, Kakade, Seeger, ICML 2009

2 ▶ utility under p(f | y) = GP(f; µt−1 , σt−1


2
)
p
ui (x) = µi−1 (x) − βt σt−1 (x)

0 ▶ choose xt as xt = arg minx∈D u(x)

Theorem (Srinivas et al., 2009)


−2 Assume that f ∈ Hk with ∥f∥2k ≤ B, and the noise is
f

zero-mean and σ-bounded almost surely. Let


3
δ ∈ (0, 1) and βt = 2B + 300γt log (t/δ). Running
−4 GP-UCB with βt and p(f) = GP(f; 0, k),
 q 
p RT ≤ 8TβT γT / log(1 + σ ) ∀T ≥ 1 ≥ 1−δ
2

−6
−4 −2 0 2 4
x thus limT _ ∞ RT /T = 0 (“no regret”).
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 16
What if you have budget for several experiments?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 17
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

▶ p(f) = GP(f; m, k) and


p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1


∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1


∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )


(requires nontrivial numerics)
−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

▶ p(f) = GP(f; m, k) and


p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1


∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1


∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )


(requires nontrivial numerics)
−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

▶ p(f) = GP(f; m, k) and


p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1


∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1


∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )


(requires nontrivial numerics)
−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

▶ p(f) = GP(f; m, k) and


p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1


∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1


∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )


(requires nontrivial numerics)
−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

▶ p(f) = GP(f; m, k) and


p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1


∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1


∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )


(requires nontrivial numerics)
−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

0 ▶ don’t evaluate where you think the minium lies!


▶ instead, evaluate where you expect to learn
most about the minimum!
Z
−2 p(x)
f

H(p) := − p(x) log dx


b(x)

with base measure b. Use utility


−4

u(x) = Ht (pmin ) − Eyt+1 [Ht+1 (pmin )]

−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Information vs. Regret
Entropy Search is qualitatively different from regret-based formulations

Settings in which information-based search is preferrable


▶ “prototyping-phase” followed by “product release”
▶ structured uncertainty with variable signal-to-noise ratio
▶ “multi-fidelity”: Several experimental channels of different cost and quality, e.g.
▶ simulations vs. physical experiments
▶ training a learning model for a variable time
▶ using variable-size datasets
Regret-based optimization is easy to implement and works well on standard problems. But it is a strong
simplification of reality, in which many pratical complications can not be phrased.

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 19
Bayesian Optimization in Practice
recent (and not so recent) libraries

▶ https://amzn.github.io/emukit/

▶ https://github.com/HIPS/Spearmint

▶ https://github.com/hyperopt

▶ https://hpolib.readthedocs.io/en/development/

▶ https://github.com/automl

▶ https://sigopt.com/product/

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 20
Summary — Experimental Design
▶ the bandit setting formalizes iid. sequential decision making under uncertainty
▶ bandit algorithms can achieve “no regret” performance, even without explicit probabilistic priors
▶ Bayesian optimization extends to continuous domain
▶ it lies right at the intersection of computational and physical learning
▶ requires significant computational resources to run a numerical optimizer inside the loop
▶ allows rich formulation of global, stochastic, continuous, structured, multi-channel design
problems
▶ is currently the state of the art in the solution of challenging optimization problems

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 21

You might also like