0% found this document useful (0 votes)

12 views31 pages

26 Making Decisions

This document provides an overview of making decisions using probabilistic machine learning models. It discusses how probabilistic models can provide predictions conditional on actions, and how decision theory aims to choose actions that minimize expected loss. Experimental design and learning by doing approaches are presented that estimate returns while accounting for uncertainty through exploration of options. The document discusses balancing exploration of uncertain options with exploitation of options believed to have high returns.

Uploaded by

ian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views31 pages

26 Making Decisions

Uploaded by

ian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Probabilistic Machine Learning

Lecture 26
Making Decisions

Philipp Hennig
20 July 2021

Faculty of Science
Department of Computer Science
Chair for the Methods of Machine Learning
# date content Ex # date content Ex
1 20.04. Introduction 1 14 09.06. Generalized Linear Models
2 21.04. Reasoning under Uncertainty 15 15.06. Exponential Families 8
3 27.04. Continuous Variables 2 16 16.06. Graphical Models
4 28.04. Monte Carlo 17 22.06. Factor Graphs 9
5 04.05. Markov Chain Monte Carlo 3 18 23.06. The Sum-Product Algorithm
6 05.05. Gaussian Distributions 19 29.06. Example: Modelling Topics 10
7 11.05. Parametric Regression 4 20 30.06. Mixture Models
8 12.05. Learning Representations 21 06.07. EM 11
9 18.05. Gaussian Processes 5 22 07.07. Variational Inference
10 19.05. Understanding Kernels 23 13.07. Tuning Inference Algorithms 12
11 26.05. Gauss-Markov Models 24 14.07. Kernel Topic Models
12 25.05. An Example for GP Regression 6 25 20.07. Outlook
13 08.06. GP Classification 7 26 21.07. Revision

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 1
The Toolbox

Framework:
Z
p(y | x)p(x)
p(x1 , x2 ) dx2 = p(x1 ) p(x1 , x2 ) = p(x1 | x2 )p(x2 ) p(x | y) =
p(y)

Modelling: Computation:
▶ graphical models ▶ Monte Carlo
▶ Gaussian distributions ▶ Linear algebra / Gaussian inference
▶ (deep) learnt representations ▶ maximum likelihood / MAP
▶ Kernels ▶ Laplace approximations
▶ Markov Chains ▶ EM / variational approximations
▶ Exponential Families / Conjugate Priors
▶ Factor Graphs & Message Passing

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 2
So you’ve got yourself a posterior …now what?
Taking a decision means conditioning on a variable you control
mass [kg]

running gorging dieting gorging gym veg

2010 2011 2012 2013

p(w′ | run) p(w′ | diet)

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 3
Decision Theory
The limit of probabilistic reasoning?

▶ probabilistic models can provide predictions p(x | a) for a variable x conditional on an action a
▶ given the choice, which value of a do you prefer?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 4
Decision Theory
The limit of probabilistic reasoning?

▶ probabilistic models can provide predictions p(x | a) for a variable x conditional on an action a
▶ given the choice, which value of a do you prefer?

▶ assign a loss or utility ℓ(x)

▶ choose a such that it minimizes expected loss
Z
a∗ = arg min ℓ(x)p(x | a) dx
a

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 4
Expected Regret/utility
if you keep having to take the same decision, optimise the sum of its return

▶ consider independent draws xi with xi ∼ p(x | ai )

▶ choose all ai = a∗ to minimize the accumulated loss
" #
X
L(n) = Ep xi
i

▶ but what if you don’t know p?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 5
Motivating (Historical) Example
Experimental Design

0.8

0.6
payout

0.4

0.2

0
100 101 102 103
N

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 (2,3) 6
Learning by Doing
Estimating return while taking actions

Perhaps we shouldn’t rule out an option yet if the posteriors over their expected return overlaps with that
of our current guess for the best option?
▶ Assume K choices.
▶ Taking choice k ∈ [1, . . . , K] at time i yields binary (Bernoulli) reward/loss xi with probability
πk ∈ [0, 1], iid.
▶ conjugate priors p(πk ) = B(π, a, b) = B(a, b)−1 π a−1 (1 − π)b−1
▶ posteriors from nk trys of choice k with mk successes:
p(πk | nk , mk ) = B(πk ; a + mk , b + (nk − mk ))
▶ for a, b _ 0, posterior has mean and variance
mk mk (nk − mk )
π̄k := Ep(πk |nk ,mk ) [π] = σk2 := varp(πk |nk ,mk ) [π] = = O(n−1
k )
nk n2k (nk + 1)
q
Choose option k that maximizes π̄k + c σk2 for some c. Which c?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 (2,3) 7
Learning by Doing
Estimating return while taking actions

Perhaps we shouldn’t rule out an option yet if the posteriors over their expected return overlaps with that
of our current guess for the best option?q
Choose option k that maximizes π̄k + c σk2 for some c. Which c?

▶ A large c ensures uncertain options are preferred. If we make it too large, we will only explore.
▶ A small c largely ignores uncertainty. We will only exploit.
1/2
▶ Idea: Let c grow slowly over time, at rate less than O(nk ). Then variance of chosen options will
drop faster than c grows, so their exploration will stop, unless their mean is good. But unexplored
choices will eventually become dominant, thus always explored eventually.

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 8
Not just for Bernoulli variables!
posterior contraction rates are universal

Theorem (Chernoff-Hoeffding)
Let X1 , . . . , Xn be random variables with common range [0, 1] and such that E[Xt | X1 , . . . , Xt−1 ] = µ.
Let Sn = X1 + · · · + Xn . Then for all a ≥ 0,
2 2
p(Sn − nµ ≤ −a) ≤ e−2a /n and p(Sn − nµ ≥ a) ≤ e−2a /n

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 9
The Multi-Armed Bandit Setting
Discrete-Choice Experimental Design [Auer, Cesa-Bianchi, Fischer, Machine Learning 47(2002), 235–256]

Definitions:
▶ A K-armed bandit is a collection Xkn of random variables, 1 ≤ k ≤ K, n ≥ 1 where k is the arm of
the bandit. Successive plays of k yield rewards Xk1 , Xk2 , . . . which are independent and identically
distributed according to an unknown p with Ep (Xki ) = µi .
▶ A policy A chooses the next machine to play at time n, based on past plays and rewards.
▶ Let Tk (n) be number of times machine k was played by A during the first n plays. The regret of A is
X
RA (n) = µ∗ · n − µj · Ep [Tj (n)] with µ∗ := max µk
1≤k≤K
j

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 10
The Multi-Armed Bandit Setting
Discrete-Choice Experimental Design [Auer, Cesa-Bianchi, Fischer, Machine Learning 47(2002), 235–256]

Algorithm: Let x̄j : empirical average of rewards from j, nj : number of plays at j in n plays
1 procedure UCB(K) Upper Confidence Bound
2 play each machine once
3 while true do q
4 play j = arg max x̄j + 2 log nj
n

5 end while
6 end procedure

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 11
The Multi-Armed Bandit Setting
Discrete-Choice Experimental Design [Auer, Cesa-Bianchi, Fischer, Machine Learning 47(2002), 235–256]

Theorem (Auer, Cesa-Bianchi, Fischer)

Consider K machines (K > 1) having arbitrary reward distributions P1 , . . . , PK with support in [0, 1] and
expected values µi = EP (Xi ). Let ∆i := µ∗ − µi . Then, the expected regret of UCB after any number n
of plays is at most
   
X log n
π 2
X
EP [RA (n)] ≤ 8 + 1+  ∆j 
∗
∆i 3
i:µi ≤µ j

Nb: The sums are over K, not n. So the regret is O(K log n). UCB plays a sub-optimal arm at most
logarithmically often.

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 12
Visualization
K = 3, binary rewards

2,500 104
regret bound
3 expected regret
2,000 10
sampled regret

102
1,500
p = 50%
nit

regret
p = 55% 101
t
∑

p = 45%
1,000
100

500
10−1

0 10−2 0
0 500 1,000 1,500 2,000 2,500 3,000 10 101 102 103 104
N N

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 13
Multi-Armed Bandit Algorithms
▶ apply to independent, discrete choice problems with stochastic pay-off
▶ algorithms based on upper confidence bounds incur regret bounded by O(log n)
▶ this even applies for the adversarial setting (Auer, Cesa-Bianchi, Freund, Schapire, 1995)

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 14
Multi-Armed Bandit Algorithms
▶ apply to independent, discrete choice problems with stochastic pay-off
▶ algorithms based on upper confidence bounds incur regret bounded by O(log n)
▶ this even applies for the adversarial setting (Auer, Cesa-Bianchi, Freund, Schapire, 1995)

Unfortunately…
▶ No problem is ever discrete, finite and independent
▶ in a continuous problem, no “arm” can and should ever be played twice
▶ in many prototyping settings, early exploration is free

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 14
Continuous-Armed Bandits
example application: parameter optimization

0
f

−2

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

X
T
p(y | x) = N (y; fx , σ 2 ) x∗ = arg min f(x) = ? R(T) := f(xt ) − f(x∗ )
x∈D
t=1

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 15
Continuous-Armed Bandits
example application: parameter optimization

0
f

−2

−5 −4 −3 −2 −1 0 1 2 3 4 5
x

Z Z
p(y | x) = N (y; fx , σ 2 ) p(f) = GP(f; µ, k) ⇒ pmin (x∗ = x) = I(f(x) < f(x̃)) dx̃ dp(f | y)
R D

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 15
GP Upper Confidence Bound
Evaluate optimistally, where the function may be low Srinivas, Krause, Kakade, Seeger, ICML 2009

2
▶ utility under p(f | y) = GP(f; µt−1 , σt−1
2
)
p
ui (x) = µi−1 (x) − βt σt−1 (x)
0

▶ choose xt as xt = arg minx∈D u(x)

−2 Theorem (Srinivas et al., 2009)

Let δ ∈ (0, 1) and βt = 2 log(|D|t2 π 2 /6δ).

Running GP-UCB with βt for a sample f ∼ GP(µ, k),
−4 q
p RT ≤ 8TβT γT / log(1 + σ 2 ) ∀T ≥ 1 ≥ 1−δ

−6 thus limT _ ∞ RT /T = 0 (“no regret”).

−4 −2 0 2 4
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 16
GP Upper Confidence Bound
Evaluate optimistally, where the function may be low Srinivas, Krause, Kakade, Seeger, ICML 2009

2 ▶ utility under p(f | y) = GP(f; µt−1 , σt−1

2
)
p
ui (x) = µi−1 (x) − βt σt−1 (x)

0 ▶ choose xt as xt = arg minx∈D u(x)

Theorem (Srinivas et al., 2009)

−2 Assume that f ∈ Hk with ∥f∥2k ≤ B, and the noise is
f

zero-mean and σ-bounded almost surely. Let

3
δ ∈ (0, 1) and βt = 2B + 300γt log (t/δ). Running
−4 GP-UCB with βt and p(f) = GP(f; 0, k),
q
p RT ≤ 8TβT γT / log(1 + σ ) ∀T ≥ 1 ≥ 1−δ
2

−6
−4 −2 0 2 4
x thus limT _ ∞ RT /T = 0 (“no regret”).
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 16
What if you have budget for several experiments?

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 17
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

▶ p(f) = GP(f; m, k) and

p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1

∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

(requires nontrivial numerics)
−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Entropy Search
evaluate where you expect to learn most about the minimum [Villemonteix et al., 2009; Hennig & Schuler, 2012]

▶ p(f) = GP(f; m, k) and

p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1

∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

▶ p(f) = GP(f; m, k) and

p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1

∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

▶ p(f) = GP(f; m, k) and

p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1

∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

▶ p(f) = GP(f; m, k) and

p(y | f) = N (y; fx , σ 2 ) gives
0 p(f | y) = N (f; µ, k), and

µ̄a = µa + κa∗ κ−1

∗∗ (y∗ − µ∗ )
−1/2 −1/2
−2 = µa + κa∗ κ∗∗ · κ∗∗ (y∗ − µ∗ )
| {z } | {z }
f

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

∗∗ κ∗b
−4 = κab − La∗ L∗b

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

0 ▶ don’t evaluate where you think the minium lies!

▶ instead, evaluate where you expect to learn
most about the minimum!
Z
−2 p(x)
f

H(p) := − p(x) log dx

b(x)

with base measure b. Use utility

−4

u(x) = Ht (pmin ) − Eyt+1 [Ht+1 (pmin )]

−6
xn
x
Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 18
Information vs. Regret
Entropy Search is qualitatively different from regret-based formulations

Settings in which information-based search is preferrable

▶ “prototyping-phase” followed by “product release”
▶ structured uncertainty with variable signal-to-noise ratio
▶ “multi-fidelity”: Several experimental channels of different cost and quality, e.g.
▶ simulations vs. physical experiments
▶ training a learning model for a variable time
▶ using variable-size datasets
Regret-based optimization is easy to implement and works well on standard problems. But it is a strong
simplification of reality, in which many pratical complications can not be phrased.

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 19
Bayesian Optimization in Practice
recent (and not so recent) libraries

▶ https://amzn.github.io/emukit/

▶ https://github.com/HIPS/Spearmint

▶ https://github.com/hyperopt

▶ https://hpolib.readthedocs.io/en/development/

▶ https://github.com/automl

▶ https://sigopt.com/product/

Probabilistic ML — P. Hennig, SS 2020 — Lecture 26: Making Decisions — © Philipp Hennig, 2020 CC BY-NC-SA 3.0 20
Summary — Experimental Design
▶ the bandit setting formalizes iid. sequential decision making under uncertainty
▶ bandit algorithms can achieve “no regret” performance, even without explicit probabilistic priors
▶ Bayesian optimization extends to continuous domain
▶ it lies right at the intersection of computational and physical learning
▶ requires significant computational resources to run a numerical optimizer inside the loop
▶ allows rich formulation of global, stochastic, continuous, structured, multi-channel design
problems
▶ is currently the state of the art in the solution of challenging optimization problems

Product Development and Innovation
71% (7)
Product Development and Innovation
25 pages
RL Unit
No ratings yet
RL Unit
595 pages
RL L2 MultiArmedBandits
No ratings yet
RL L2 MultiArmedBandits
44 pages
Bandit Algorithms (Tor Lattimore, Csaba Szepesvári) (Z-Library)
0% (1)
Bandit Algorithms (Tor Lattimore, Csaba Szepesvári) (Z-Library)
537 pages
AR23
No ratings yet
AR23
159 pages
Fryer Combined G9 Solutions
100% (1)
Fryer Combined G9 Solutions
142 pages
Schedule of Rates
100% (3)
Schedule of Rates
90 pages
Obeah Witchcraft in The West Indies by Hesketh Bell PDF
100% (1)
Obeah Witchcraft in The West Indies by Hesketh Bell PDF
219 pages
ABAP Training For Functional Consultants 60
0% (1)
ABAP Training For Functional Consultants 60
62 pages
Catalogue 2018 PDF
No ratings yet
Catalogue 2018 PDF
124 pages
Daily Lesson Log - Ucsp
91% (11)
Daily Lesson Log - Ucsp
3 pages
Bandit Algorithms
No ratings yet
Bandit Algorithms
596 pages
9th International Symposium On Gas Kinetics
No ratings yet
9th International Symposium On Gas Kinetics
484 pages
RL Sem Ans
No ratings yet
RL Sem Ans
90 pages
Unit:1 Reinforcement Learning
No ratings yet
Unit:1 Reinforcement Learning
9 pages
Online Genealogy Research Resources
80% (5)
Online Genealogy Research Resources
77 pages
RL Unit5
No ratings yet
RL Unit5
101 pages
Book PDF
No ratings yet
Book PDF
582 pages
Mod6 Slides
No ratings yet
Mod6 Slides
105 pages
Holistic Approach
No ratings yet
Holistic Approach
42 pages
RL SEM Updated
No ratings yet
RL SEM Updated
89 pages
Eitan, Granot
100% (1)
Eitan, Granot
37 pages
Manual Installation of Pegs and Ground Control Points
No ratings yet
Manual Installation of Pegs and Ground Control Points
4 pages
Bandit
No ratings yet
Bandit
8 pages
IntroMulti Armed Bandits Slivkin Microsoft PDF
No ratings yet
IntroMulti Armed Bandits Slivkin Microsoft PDF
174 pages
70 Awesome Coaching Questions Using The GROW Model
100% (1)
70 Awesome Coaching Questions Using The GROW Model
3 pages
Auer - Using Ucb For Exploration-Exploitation Tradeoffs
No ratings yet
Auer - Using Ucb For Exploration-Exploitation Tradeoffs
26 pages
DLMAIRIL01 Q4-2024 Session3
No ratings yet
DLMAIRIL01 Q4-2024 Session3
47 pages
Lec07 Baysian Opti
No ratings yet
Lec07 Baysian Opti
94 pages
16 - Reinforcement Learning and Bandits
No ratings yet
16 - Reinforcement Learning and Bandits
41 pages
Cs6046-Notes 2
No ratings yet
Cs6046-Notes 2
34 pages
CCTV Expansion-Rfp
No ratings yet
CCTV Expansion-Rfp
30 pages
Mid Term Report SoS
No ratings yet
Mid Term Report SoS
18 pages
Position Control Performance Improvement of DTC-SVM For An Induction Motor: Application To Photovoltaic Panel Position
No ratings yet
Position Control Performance Improvement of DTC-SVM For An Induction Motor: Application To Photovoltaic Panel Position
14 pages
EXP3
No ratings yet
EXP3
36 pages
27 Revision
No ratings yet
27 Revision
80 pages
Unit II
No ratings yet
Unit II
10 pages
1.RL Unit 1
No ratings yet
1.RL Unit 1
47 pages
A12-Online Learning Short 2020
No ratings yet
A12-Online Learning Short 2020
61 pages
Q1. Explain The Multi-Armed Bandit Problem and Its Key Characteristics. Illustrate Their Real-World Applications
No ratings yet
Q1. Explain The Multi-Armed Bandit Problem and Its Key Characteristics. Illustrate Their Real-World Applications
11 pages
Lec10 PDF
No ratings yet
Lec10 PDF
8 pages
Bayesian Reinforcement Learning
No ratings yet
Bayesian Reinforcement Learning
27 pages
Sim-Ace Users Guide Ep-Dcx346
No ratings yet
Sim-Ace Users Guide Ep-Dcx346
56 pages
Fan Glynn
No ratings yet
Fan Glynn
32 pages
MD Risul Haque Rahat
100% (1)
MD Risul Haque Rahat
2 pages
79.-Gaussian Process Optimization
No ratings yet
79.-Gaussian Process Optimization
8 pages
Multi Armed Bandits
No ratings yet
Multi Armed Bandits
34 pages
Exploration Exploitation
No ratings yet
Exploration Exploitation
40 pages
21 Efficient Inference A K-Means
No ratings yet
21 Efficient Inference A K-Means
32 pages
MCQ& FB - Unit 1
No ratings yet
MCQ& FB - Unit 1
9 pages
Reading 3-Russo & Van Roy 2014
No ratings yet
Reading 3-Russo & Van Roy 2014
24 pages
ConcaveBandits ICML2025
No ratings yet
ConcaveBandits ICML2025
19 pages
Evendar 06 A
No ratings yet
Evendar 06 A
27 pages
23 Free Energy
No ratings yet
23 Free Energy
29 pages
20 Latent Dirichlet Allocation
No ratings yet
20 Latent Dirichlet Allocation
27 pages
Class 12 English Notes
No ratings yet
Class 12 English Notes
13 pages
Multi-Arm-Bandit Problem
No ratings yet
Multi-Arm-Bandit Problem
11 pages
18 Sum Product
No ratings yet
18 Sum Product
22 pages
Checkpoint Monitor
No ratings yet
Checkpoint Monitor
30 pages
14 Generalized Linear Models Note
No ratings yet
14 Generalized Linear Models Note
5 pages
RL Unit 1 - QA
No ratings yet
RL Unit 1 - QA
10 pages
NeurIPS 2021 Breaking The Moments Condition Barrier No Regret Algorithm For Bandits With Super Heavy Tailed Payoffs Paper
No ratings yet
NeurIPS 2021 Breaking The Moments Condition Barrier No Regret Algorithm For Bandits With Super Heavy Tailed Payoffs Paper
11 pages
pdf24 Images Merged
No ratings yet
pdf24 Images Merged
12 pages
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
No ratings yet
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
16 pages
26202-Article Text-30265-1-2-20230626
No ratings yet
26202-Article Text-30265-1-2-20230626
8 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
Exploration in Contextual Bandits: Reedy Reedy
No ratings yet
Exploration in Contextual Bandits: Reedy Reedy
16 pages
EE290S Lecture Note 22
No ratings yet
EE290S Lecture Note 22
12 pages
Atkinson 2000
No ratings yet
Atkinson 2000
20 pages
Cornel Session Plan Final
No ratings yet
Cornel Session Plan Final
9 pages
EE675A Lecture 3
No ratings yet
EE675A Lecture 3
8 pages
The Mathematical Relationship Between Heart Rate, Cardiac Output and Pulse Pressure in The Human Systemic Vasculature
No ratings yet
The Mathematical Relationship Between Heart Rate, Cardiac Output and Pulse Pressure in The Human Systemic Vasculature
10 pages
Expanded Multi Armed Bandit and Probability Basics
No ratings yet
Expanded Multi Armed Bandit and Probability Basics
5 pages
SF 9 - ES ( (Learner's Progress Report Card)
No ratings yet
SF 9 - ES ( (Learner's Progress Report Card)
2 pages
Assignment 1: CS747: F I L A
No ratings yet
Assignment 1: CS747: F I L A
10 pages
The Impact of Parenting Styles On Attitudes and Believes Related To Juvenile Delinquency Among Students
No ratings yet
The Impact of Parenting Styles On Attitudes and Believes Related To Juvenile Delinquency Among Students
12 pages
EE675A Lecture 4
No ratings yet
EE675A Lecture 4
7 pages
Vacuum Arc Remelting
No ratings yet
Vacuum Arc Remelting
8 pages
11 Slides
No ratings yet
11 Slides
6 pages
Pong Game Tutorial - Kivy 1.9
No ratings yet
Pong Game Tutorial - Kivy 1.9
12 pages
Note 2
No ratings yet
Note 2
4 pages
Essay Foundation
No ratings yet
Essay Foundation
2 pages
Lecture 2 EE675
No ratings yet
Lecture 2 EE675
4 pages
Unit 6
No ratings yet
Unit 6
3 pages
Online Learning For Causal Bandits
No ratings yet
Online Learning For Causal Bandits
7 pages
Lecture 03: Adaptive Exploration-Based Algorithms: 1.1 Outline of The Algorithm
No ratings yet
Lecture 03: Adaptive Exploration-Based Algorithms: 1.1 Outline of The Algorithm
4 pages
Bandits
No ratings yet
Bandits
2 pages
Data Challenge - NC Soft
No ratings yet
Data Challenge - NC Soft
4 pages
Mid-Semester Examination
No ratings yet
Mid-Semester Examination
2 pages
MAB Assignment 2
No ratings yet
MAB Assignment 2
2 pages
Upload Di Scribd
No ratings yet
Upload Di Scribd
2 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

26 Making Decisions

Uploaded by

26 Making Decisions

Uploaded by

Probabilistic Machine Learning

running gorging dieting gorging gym veg

2010 2011 2012 2013

p(w′ | run) p(w′ | diet)

▶ assign a loss or utility ℓ(x)

▶ consider independent draws xi with xi ∼ p(x | ai )

▶ but what if you don’t know p?

Theorem (Auer, Cesa-Bianchi, Fischer)

▶ choose xt as xt = arg minx∈D u(x)

−2 Theorem (Srinivas et al., 2009)

Let δ ∈ (0, 1) and βt = 2 log(|D|t2 π 2 /6δ).

−6 thus limT _ ∞ RT /T = 0 (“no regret”).

2 ▶ utility under p(f | y) = GP(f; µt−1 , σt−1

0 ▶ choose xt as xt = arg minx∈D u(x)

Theorem (Srinivas et al., 2009)

zero-mean and σ-bounded almost surely. Let

▶ p(f) = GP(f; m, k) and

µ̄a = µa + κa∗ κ−1

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

▶ p(f) = GP(f; m, k) and

µ̄a = µa + κa∗ κ−1

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

▶ p(f) = GP(f; m, k) and

µ̄a = µa + κa∗ κ−1

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

▶ p(f) = GP(f; m, k) and

µ̄a = µa + κa∗ κ−1

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

▶ p(f) = GP(f; m, k) and

µ̄a = µa + κa∗ κ−1

=:La∗ u∼N (0,I)

κ̄ab = κab − κa∗ κ−1

▶ use this to predict p̂min (x) under p(f | y, yt+1 )

0 ▶ don’t evaluate where you think the minium lies!

H(p) := − p(x) log dx

with base measure b. Use utility

u(x) = Ht (pmin ) − Eyt+1 [Ht+1 (pmin )]

Settings in which information-based search is preferrable

You might also like