100% found this document useful (1 vote)

46 views61 pages

Bayesian Optimization : Theory and Practice Using Python Peng Liu instant download

The document promotes the book 'Bayesian Optimization: Theory and Practice Using Python' by Peng Liu, which focuses on Bayesian optimization techniques for sequential decision-making under uncertainty. It covers theoretical foundations and practical implementations using Python libraries, aimed at students and practitioners in data science and machine learning. The book includes case studies and source code available on GitHub for further exploration.

Uploaded by

lasunedaky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

46 views61 pages

Bayesian Optimization : Theory and Practice Using Python Peng Liu instant download

Uploaded by

lasunedaky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Download the full version and explore a variety of ebooks

or textbooks at https://ebookmass.com

Bayesian Optimization : Theory and Practice Using

Python Peng Liu

_ Tap the link below to start your download _

https://ebookmass.com/product/bayesian-optimization-theory-
and-practice-using-python-peng-liu/

Find ebooks or textbooks at ebookmass.com today!

Here are some recommended products for you. Click the link to
download, or explore more at ebookmass.com

Bayesian Optimization: Theory and Practice Using Python

1st Edition Peng Liu

https://ebookmass.com/product/bayesian-optimization-theory-and-
practice-using-python-1st-edition-peng-liu/

Quantitative Trading Strategies Using Python: Technical

Analysis, Statistical Testing, and Machine Learning
Peng Liu
https://ebookmass.com/product/quantitative-trading-strategies-using-
python-technical-analysis-statistical-testing-and-machine-learning-
peng-liu/

Machine learning: A Bayesian and optimization perspective

2nd Edition Theodoridis S

https://ebookmass.com/product/machine-learning-a-bayesian-and-
optimization-perspective-2nd-edition-theodoridis-s/

Advanced Data Analytics Using Python : With Architectural

Patterns, Text and Image Classification, and Optimization
Techniques 2nd Edition Sayan Mukhopadhyay
https://ebookmass.com/product/advanced-data-analytics-using-python-
with-architectural-patterns-text-and-image-classification-and-
optimization-techniques-2nd-edition-sayan-mukhopadhyay/
Magnetic Communications: Theory and Techniques Liu

https://ebookmass.com/product/magnetic-communications-theory-and-
techniques-liu/

eTextbook 978-0134379760 The Practice of Computing Using

Python (3rd Edition)

https://ebookmass.com/product/etextbook-978-0134379760-the-practice-
of-computing-using-python-3rd-edition/

Critical thinking in clinical research : applied theory

and practice using case studies Fregni

https://ebookmass.com/product/critical-thinking-in-clinical-research-
applied-theory-and-practice-using-case-studies-fregni/

Tensors for Data Processing. Theory, Methods, and

Applications Yipeng Liu

https://ebookmass.com/product/tensors-for-data-processing-theory-
methods-and-applications-yipeng-liu/

Implementing Cryptography Using Python Shannon Bray

https://ebookmass.com/product/implementing-cryptography-using-python-
shannon-bray/
Peng Liu

Bayesian Optimization
Theory and Practice Using Python
Peng Liu
Singapore, Singapore

ISBN 978-1-4842-9062-0 e-ISBN 978-1-4842-9063-7

https://doi.org/10.1007/978-1-4842-9063-7

© Peng Liu 2023

Apress Standard

The use of general descriptive names, registered names, trademarks,

service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress

Media, LLC, part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
For my wife Zheng and children Jiaxin, Jiaran, and Jiayu.
Introduction
Bayesian optimization provides a unified framework that solves the
problem of sequential decision-making under uncertainty. It includes
two key components: a surrogate model approximating the unknown
black-box function with uncertainty estimates and an acquisition
function that guides the sequential search. This book reviews both
components, covering both theoretical introduction and practical
implementation in Python, building on top of popular libraries such as
GPyTorch and BoTorch. Besides, the book also provides case studies on
using Bayesian optimization to seek a simulated function's global
optimum or locate the best hyperparameters (e.g., learning rate) when
training deep neural networks. The book assumes readers with a
minimal understanding of model development and machine learning
and targets the following audiences:
Students in the field of data science, machine learning, or
optimization-related fields
Practitioners such as data scientists, both early and middle in their
careers, who build machine learning models with good-performing
hyperparameters
Hobbyists who are interested in Bayesian optimization as a global
optimization technique to seek the optimal solution as fast as
possible
All source code used in this book can be downloaded from
github.com/apress/Bayesian-optimization.
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub
(https://github.com/Apress). For more detailed information, please
visit http://www.apress.com/source-code.
Acknowledgments
This book summarizes my learning journey in Bayesian optimization
during my (part-time) Ph.D. study. It started as a personal interest in
exploring this area and gradually grew into a book combining theory
and practice. For that, I thank my supervisors, Teo Chung Piaw and
Chen Ying, for their continued support in my academic career.
Table of Contents
Chapter 1:Bayesian Optimization Overview
Global Optimization
The Objective Function
The Observation Model
Bayesian Statistics
Bayesian Inference
Frequentist vs.Bayesian Approach
Joint, Conditional, and Marginal Probabilities
Independence
Prior and Posterior Predictive Distributions
Bayesian Inference:An Example
Bayesian Optimization Workflow
Gaussian Process
Acquisition Function
The Full Bayesian Optimization Loop
Summary
Chapter 2:Gaussian Processes
Reviewing the Gaussian Basics
Understanding the Covariance Matrix
Marginal and Conditional Distribution of Multivariate
Gaussian
Sampling from a Gaussian Distribution
Gaussian Process Regression
The Kernel Function
Extending to Other Variables
Learning from Noisy Observations
Gaussian Process in Practice
Drawing from GP Prior
Obtaining GP Posterior with Noise-Free Observations
Working with Noisy Observations
Experimenting with Different Kernel Parameters
Hyperparameter Tuning
Summary
Chapter 3:Bayesian Decision Theory and Expected Improvement
Optimization via the Sequential Decision-Making
Seeking the Optimal Policy
Utility-Driven Optimization
Multi-step Lookahead Policy
Bellman’s Principle of Optimality
Expected Improvement
Deriving the Closed-Form Expression
Implementing the Expected Improvement
Using Bayesian Optimization Libraries
Summary
Chapter 4:Gaussian Process Regression with GPyTorch
Introducing GPyTorch
The Basics of PyTorch
Revisiting GP Regression
Building a GP Regression Model
Fine-Tuning the Length Scale of the Kernel Function
Fine-Tuning the Noise Variance
Delving into Kernel Functions
Combining Kernel Functions
Predicting Airline Passenger Counts
Summary
Chapter 5:Monte Carlo Acquisition Function with Sobol Sequences
and Random Restart
Analytic Expected Improvement Using BoTorch
Introducing Hartmann Function
GP Surrogate with Optimized Hyperparameters
Introducing the Analytic EI
Optimization Using Analytic EI
Grokking the Inner Optimization Routine
Using MC Acquisition Function
Using Monte Carlo Expected Improvement
Summary
Chapter 6:Knowledge Gradient:Nested Optimization vs.One-Shot
Learning
Introducing Knowledge Gradient
Monte Carlo Estimation
Optimizing Using Knowledge Gradient
One-Shot Knowledge Gradient
Sample Average Approximation
One-Shot Formulation of KG Using SAA
One-Shot KG in Practice
Optimizing the OKG Acquisition Function
Summary
Chapter 7:Case Study:Tuning CNN Learning Rate with BoTorch
Seeking Global Optimum of Hartmann
Generating Initial Conditions
Updating GP Posterior
Creating a Monte Carlo Acquisition Function
The Full BO Loop
Hyperparameter Optimization for Convolutional Neural
Network
Using MNIST
Defining CNN Architecture
Training CNN
Optimizing the Learning Rate
Entering the Full BO Loop
Summary
Index
About the Author
Peng Liu
is an assistant professor of quantitative
finance (practice) at Singapore
Management University and an adjunct
researcher at the National University of
Singapore. He holds a Ph.D. in Statistics
from the National University of
Singapore and has ten years of working
experience as a data scientist across the
banking, technology, and hospitality
industries.
About the Technical Reviewer
Jason Whitehorn
is an experienced entrepreneur and
software developer and has helped many
companies automate and enhance their
business solutions through data
synchronization, SaaS architecture, and
machine learning. Jason obtained his
Bachelor of Science in Computer Science
from Arkansas State University, but he
traces his passion for development back
many years before then, having first
taught himself to program BASIC on his
family’s computer while in middle
school. When he’s not mentoring and
helping his team at work, writing, or
pursuing one of his many side-projects,
Jason enjoys spending time with his wife and four children and living in
the Tulsa, Oklahoma, region. More information about Jason can be
found on his website: https://jason.whitehorn.us.
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2023
P. Liu, Bayesian Optimization
https://doi.org/10.1007/978-1-4842-9063-7_1

1. Bayesian Optimization Overview

Peng Liu1
(1) Singapore, Singapore

As the name suggests, Bayesian optimization is an area that studies

optimization problems using the Bayesian approach. Optimization aims
at locating the optimal objective value (i.e., a global maximum or
minimum) of all possible values or the corresponding location of the
optimum in the environment (the search domain). The search process
starts at a specific initial location and follows a particular policy to
iteratively guide the following sampling locations, collect new
observations, and refresh the guiding policy.
As shown in Figure 1-1, the overall optimization process consists of
repeated interactions between the policy and the environment. The
policy is a mapping function that takes in a new input observation (plus
historical ones) and outputs the following sampling location in a
principled way. Here, we are constantly learning and improving the
policy, since a good policy guides our search toward the global
optimum more efficiently and effectively. In contrast, a good policy
would save the limited sampling budget on promising candidate
locations. On the other hand, the environment contains the unknown
objective function to be learned by the policy within a specific
boundary. When probing the functional value as requested by the
policy, the actual observation revealed by the environment to the policy
is often corrupted by noise, making learning even more challenging.
Thus, Bayesian optimization, a specific approach for global
optimization, would like to learn a policy that can help us efficiently and
effectively navigate to the global optimum of an unknown, noise-
corrupted environment as quickly as possible.

Figure 1-1 The overall Bayesian optimization process. The policy digests the
historical observations and proposes the new sampling location. The environment
governs how the (possibly noise-corrupted) observation at the newly proposed
location is revealed to the policy. Our goal is to learn an efficient and effective policy
that could navigate toward the global optimum as quickly as possible

Global Optimization
Optimization aims to locate the optimal set of parameters of interest
across the whole domain through carefully allocating limited resources.
For example, when searching for the car key at home before leaving for
work in two minutes, we would naturally start with the most promising
place where we would usually put the key. If it is not there, think for a
little while about the possible locations and go to the next most
promising place. This process iterates until the key is found. In this
example, the policy is digesting the available information on previous
searches and proposing the following promising location. The
environment is the house itself, revealing if the key is placed at the
proposed location upon each sampling.
This is considered an easy example since we are familiar with the
environment in terms of its structural design. However, imagine
locating an item in a totally new environment. The policy would need to
account for the uncertainty due to unfamiliarity with the environment
while sequentially determining the next sampling location. When the
sampling budget is limited, as is often the case in real-life searches in
terms of time and resources, the policy needs to argue carefully on the
utility of each candidate sampling location.
Let us formalize the sequential global optimization using
mathematical terms. We are dealing with an unknown scalar-valued
objective function f based on a specific domain Α. In other words, the
unknown subject of interest f is a function that maps a certain sample
in Α to a real number in ℝ, that is, f : Α → ℝ. We typically place no
specific assumption about the nature of the domain Α other than that it
should be a bounded, compact, and convex set.
Unless otherwise specified, we focus on the maximization setting
instead of minimization since maximizing the objective function is
equivalent to minimizing the negated objective, and vice versa. The
optimization procedure thus aims at locating the global maximum f∗ or
its corresponding location x∗ in a principled and systematic manner.
Mathematically, we wish to locate f∗ where

Or equivalently, we are interested in its location x∗ where

Figure 1-2 provides an example one-dimensional objective function

with its global maximum f ∗ and its location x∗ highlighted. The goal of
global optimization is thus to systematically reason about a series of
sampling decisions within the total search space Α, so as to locate the
global maximum as fast as possible, that is, sampling as few times as
possible.
Figure 1-2 An example objective function with the global maximum and its location
marked with star. The goal of global optimization is to systematically reason about a
series of sampling decisions so as to locate the global maximum as fast as possible
Note that this is a nonconvex function, as is often the case in real-life
functions we are optimizing. A nonconvex function means we could not
resort to first-order gradient-based methods to reliably search for the
global optimum since it will likely converge to a local optimum. This is
also one of the advantages of Bayesian optimization compared with
other gradient-based optimization procedures.

The Objective Function

There are different types of objective functions. For example, some
functions are wiggly shaped, while others are smooth; some are convex,
while others are nonconvex. An objective function is an unknown object
to us; the problem would be considered solved if we could access its
underlying mathematical form. Many complex functions are almost
impossible to be expressed using an explicit expression. For Bayesian
optimization, the specific type of objective function typically bears the
following attributes:
We do not have access to the explicit expression of the objective
function, making it a “black-box” function. This means that we can
only interact with the environment, that is, the objective function, to
perform a functional evaluation by sampling at a specific location.
The returned value by probing at a specific location is often
corrupted by noise and does not represent the exact true value of the
objective function at that location. Due to the indirect evaluation of
its actual value, we need to account for such noise embedded in the
actual observations from the environment.
Each functional evaluation is costly, thus ruling out the option for an
exhaustive probing. We need to have a sample-efficient method to
minimize the number of evaluations of the environment while trying
to locate its global optimum. In other words, the optimizer needs to
fully utilize the existing observations and systematically reason
about the next sampling decision so that the limited resource is well
spent on promising locations.
We do not have access to its gradient. When the functional evaluation
is relatively cheap and the functional form is smooth, it would be
very convenient to compute the gradient and optimize using the first-
order procedure such as gradient descent. Access to the gradient is
necessary for us to understand the adjacent curvature of a particular
evaluation point. With gradient evaluations, the follow-up direction
of travel is easier to determine.
The “black-box” function is challenging to optimize for the
preceding reasons. To further elaborate on the possible functional form
of the objective, we list three examples in Figure 1-3. On the left is a
convex function with only one global minimum; this is considered easy
for global optimization. In the middle is a nonconvex function with
multiple local optima; it is difficult to ascertain if the current local
optimum is also globally optimal. It is also difficult to identify whether
this is a flat region vs. a local optimum for a function with a flat region
full of saddle points. All three scenarios are in a minimization setting.
Figure 1-3 Three possible functional forms. On the left is a convex function whose
optimization is easy. In the middle is a nonconvex function with multiple local
minima, and on the right is also a nonconvex function with a wide flat region full of
saddle points. Optimization for the latter two cases takes a lot more work than for
the first case
Let us look at one example of hyperparameter tuning when training
machine learning models. A machine learning model is a function that
involves a set of parameters to be optimized given the input data. These
parameters are automatically tuned via a specific optimization
procedure, typically governed by a set of corresponding meta
parameters called hyperparameters, which are fixed before the model
training starts. For example, when training deep neural networks using
the gradient descent algorithm, a learning rate that determines the step
size of each parameter update needs to be manually selected in
advance. If the learning rate is too large, the model may diverge and
eventually fails to learn. If the learning rate is too small, the model may
converge very slowly as the weights are updated by only a small margin
in this iteration. See Figure 1-4 for a visual illustration.
Figure 1-4 Slow convergence due to a small learning rate on the left and divergence
due to a large learning rate on the right
Choosing a reasonable learning rate as a preset hyperparameter
thus plays a critical role in training a good machine learning model.
Locating the best learning rate and other hyperparameters is an
optimization problem that fits Bayesian optimization. In the case of
hyperparameter tuning, evaluating each learning rate is a time-
consuming exercise. The objective function would generally be the
model’s final test set loss (in a minimization setting) upon model
convergence. A model needs to be fully trained to obtain a single
evaluation, which typically involves hundreds of epochs to reach stable
convergence. Here, one epoch is a complete pass of the entire training
dataset. The book’s last chapter covers a case study on tuning the
learning rate using Bayesian optimization.
The functional form of the test set loss or accuracy may also be
highly nonconvex and multimodal for the hyperparameters. Upon
convergence, it is not easy to know whether we are in a local optimum,
a saddle point, or a global optimum. Besides, some hyperparameters
may be discrete, such as the number of nodes and layers when training
a deep neural network. We could not calculate its gradient in such a
case since it requires continuous support in the domain.
The Bayesian optimization approach is designed to tackle all these
challenges. It has been shown to deliver good performance in locating
the best hyperparameters under a limited budget (i.e., the number of
evaluations allowed). It is also widely and successfully used in other
fields, such as chemical engineering.
Next, we will delve into the various components of a typical
Bayesian optimization setup, including the observation model, the
optimization policy, and the Bayesian inference.

The Observation Model

Earlier, we mentioned that a functional evaluation would give an
observation about the true objective function, and the observation may
likely be different from the true objective value due to noise. The
observations gathered for the policy learning would thus be inexact and
corrupted by an additional noise term, which is often assumed to be
additive. The observation model is an approach to formalize the
relationship between the true objective function, the actual
observation, and the noise. It governs how the observations would be
revealed from the environment to the policy.
Figure 1-5 illustrates a list of observations of the underlying
objective function. These observations are dislocated from the objective
function due to additive random noises. These additive noises manifest
as the vertical shifts between the actual observations and the
underlying objective function. Due to these noise-induced deviations
inflicted on the observations, we need to account for such uncertainty
in the observation model. When learning a policy based on the actual
observations, the policy also needs to be robust enough to focus on the
objective function’s underlying pattern and not be distracted by the
noises. The model we use to approximate the objective function, while
accounting for uncertainty due to the additive noise, is typically a
Gaussian process. We will cover it briefly in this chapter and in more
detail in the next chapter.
Figure 1-5 Illustrating the actual observations (in dots) and the underlying
objective function (in dashed line). When sampling at a specific location, the
observation would be disrupted by an additive noise. The observation model thus
determines how the observation would be revealed to the policy, which needs to
account for the uncertainty due to noise perturbation
To make our discussion more precise, let us use f (x) to denote the
(unknown) objective function value at location x. We sometimes write f
(x) as f for simplicity. We use y to denote the actual observation at
location x, which will slightly differ from f due to noise perturbation. We
can thus express the observation model, which governs how the policy
sees the observation from the environment, as a probability
distribution of y based on a specific location x and true function value f:

Let us assume an additive noise term ε inflicted on f; the actual

observation y can thus be expressed as

Here, the noise term ε arises from measurement error or inaccurate

statistical approximation, although it may disappear in certain
computer simulations. A common practice is to treat the error as a
random variable that follows a Gaussian distribution with a zero mean
and fixed standard deviation σ, that is, ε~N(0, σ2). Note that it is
unnecessary to fix σ across the whole domain A; the Bayesian
optimization allows for both homoscedastic noise (i.e., fixed σ across A)
and heteroskedastic noise (i.e., different σ that depends on the specific
location in A).
Therefore, we can formulate a Gaussian observation model as
follows:

This means that for a specific location x, the actual observation y is

treated as a random variable that follows a Gaussian/normal
distribution with mean f and variance σ2. Figure 1-6 illustrates an
example probability distribution of y centered around f. Note that the
variance of the noise is often estimated by sampling a few initial
observations and is expected to be small, so that the overall observation
model still strongly depends on and stays close to f.

Figure 1-6 Assuming a normal probability distribution for the actual observation as
a random variable. The Gaussian distribution is centered around the objective
function f value evaluated at a given location x and spread by the variance of the
noise term
The following section introduces Bayesian statistics to lay the
theoretical foundation as we work with probability distributions along
the way.

Bayesian Statistics
Bayesian optimization is not a particular algorithm for global
optimization; it is a suite of algorithms based on the principles of
Bayesian inference. As the optimization proceeds in each iteration, the
policy needs to determine the next sampling decision or if the current
search needs to be terminated. Due to uncertainty in the objective
function and the observation model, the policy needs to cater to such
uncertainty upon deciding the following sampling location, which bears
both an immediate impact on follow-up decisions and a long-term
effect on all future decisions. The samples selected thus need to
reasonably contribute to the ultimate goal of global optimization and
justify the cost incurred due to sampling.
Using Bayesian statistics in optimization paves the way for us to
systematically and quantitatively reason about these uncertainties
using probabilities. For example, we would place a prior belief about
the characteristics of the objective function and quantify its
uncertainties by assigning high probability to specific ranges of values
and low probability to others. As more observations are collected, the
prior belief is gradually updated and calibrated toward the true
underlying distribution of the objective function in the form of a
posterior distribution.
We now cover the fundamental concepts and tools of Bayesian
statistics. Understanding these sections is essential to appreciate the
inner workings of Bayesian optimization.

Bayesian Inference
Bayesian inference essentially relies on the Bayesian formula (also
called Bayes’ rule) to reason about the interactions among three
components: the prior distribution p(θ) where θ represents the
parameter of interest, the likelihood p(data| θ) given a specific
parameter θ, and the posterior distribution p(θ| data). There is one
more component, the evidence of the data p(data), which is often not
computable. The Bayesian formula is as follows:

Let us look closely at this widely used, arguably the most important
formula in Bayesian statistics. Remember that any Bayesian inference
procedure aims to derive the posterior distribution p(θ| data) (or
calculate its marginal expectation) for the parameter of interest θ, in
the form of a probability density function. For example, we might end
up with a continuous posterior distribution as in Figure 1-7, where θ
varies from 0 to 1, and all the probabilities (i.e., area under the curve)
would sum to 1.

Figure 1-7 Illustrating a sample (continuous) posterior distribution for the

parameter of interest. The specific shape of the curve will change as new data are
being collected

We would need access to three components to obtain the posterior

distribution of θ. First, we need to derive the probability of seeing the
actual data given our choice of θ, that is, p(data| θ). This is also called
the likelihood term since we are assessing how likely it is to generate
the data after specifying a certain observation model for the data. The
likelihood can be calculated based on the assumed observation model
for data generation.
The second term p(θ) represents our prior belief about the
distribution of θ without observing any actual data; we encode our pre-
experimental knowledge of the parameter θ in this term. For example,
p(θ) could take the form of a uniform distribution that assigns an equal
probability to any value between 0 and 1. In other words, all values in
this range are equally likely, and this is a common prior belief we would
place on θ given that we do not have any information that suggests a
preference over specific values. However, as we collect more
observations and gather more data, the prior distribution will play a
decreasing role, and the subjective belief will gradually reduce in
support of the factual evidence in the data. As shown in Figure 1-8, the
distribution of θ will progressively approach a normal distribution
given that more data is being collected, thus forming a posterior
distribution that better approximates the true distribution of θ.

Figure 1-8 Updating the prior uniform distribution toward a posterior normal
distribution as more data is collected. The role of the prior distribution decreases as
more data is collected to support the approximation to the true underlying
distribution

The last term is the denominator p(data), also referred to as the

evidence, which represents the probability of obtaining the data over
all different choices of θ and serves as a normalizing constant
independent of θ in Bayes’ theorem. This is the most difficult part to
compute among all the components since we need to integrate over all
possible values of θ by taking an integration. For each given θ, the
likelihood is calculated based on the assumed observation model for
data generation, which is the same as how the likelihood term is
calculated. The difference is that the evidence considers every possible
value of θ and weights the resulting likelihood based on the probability
of observing a particular θ. Since the evidence is not connected to θ, it is
often ignored when analyzing the proportionate change in the
posterior. As a result, it focuses only on the likelihood and the prior
alone.
A relatively simple case is when the prior p(θ) and the likelihood
p(data| θ) are conjugate, making the resulting posterior p(θ| data)
analytic and thus easy to work with due to its closed-form expression.
Bayesian inference becomes much easier and less restrictive if we can
write down the explicit form and generate the exact shape of the
posterior p(θ| data) without resorting to sampling methods. The
posterior will follow the same distribution as the prior when the prior
is conjugate with the likelihood function. One example is when both the
prior and the likelihood functions follow a normal distribution, the
resulting posterior will also be normally distributed. However, when
the prior and the likelihood are not conjugate, we can still get more
insight on the posterior distribution via efficient sampling techniques
such as Gibbs sampling.

Frequentist vs. Bayesian Approach

The Bayesian approach is a systematic way of assigning probabilities to
possible values of θ and updating these probabilities based on the
observed data. However, sometimes we are only interested in the most
probable (expected) value of θ that gives rise to the data we observe.
This can be achieved using the frequentist approach, treating the
parameter of interest (i.e., θ) as a fixed quantity instead of a random
variable. This approach is often adopted in the machine learning
community, placing a strong focus on optimizing a specific objective
function to locate the optimal set of parameters.
More generally, we use the frequentist approach to find the correct
answer about θ. For example, we can locate the value of θ by
maximizing the joint probability of the actual data via maximum
likelihood estimation (MLE), where the resulting solution is
. There is no distribution involved with θ since
we treat it as a fixed quantity, which makes the calculation easier as we
only need to work with the probability distribution for the data. The
final solution using the frequentist approach is a specific value of θ. And
since we are working with samples that come from the underlying data-
generating distribution, different samples would vary from each other,
and the goal is to find the optimal parameter θ that best describes the
current sample we are observing.
On the other hand, the Bayesian approach takes on the extra
complexity by treating θ as a random variable with its own probability
distribution, which gets updated as more data is collected. This
approach offers a holistic view on all possible values of θ and the
corresponding probabilities instead of the most probable value of θ
alone. This is a different approach because the data is now treated as
fixed and the parameter θ as a random variable. The optimal
probability distribution for θ is then derived, given the observed fixed
sample. There is no right or wrong in the Bayesian approach, only
probabilities. The final solution is thus a probability distribution of θ
instead of one specific value. Figure 1-9 summarizes these two different
schools of thought.
Figure 1-9 Comparing the frequentist approach and the Bayesian approach
regarding the parameter of interest. The frequentist approach treats θ as a fixed
quantity that can be estimated via MLE, while the Bayesian approach employs a
probability distribution which gets refreshed as more data is collected

Joint, Conditional, and Marginal Probabilities

We have been characterizing the random variable θ using a
(continuous) probability distribution p(θ). A probability distribution is
a function that maps a specific value of θ to a probability, and the
probabilities of all values of θ sum to one, that is, ∫ p(θ)dθ = 1.
Things become more interesting when we work with multiple
(more than one) variables. Suppose we have two random variables x
and y, and we are interested in two events x = X and y = Y, where both X
and Y are specific values that x and y may assume, respectively. Also, we
assume the two random variables are dependent in some way. This
would lead us to three types of probabilities commonly used in modern
machine learning and Bayesian optimization literature: joint
probability, marginal probability, and conditional probability, which we
will look at now in more detail.
The joint probability of the two events refers to the probability of
them occurring simultaneously. It is also referred to as the joint
probability distribution since the probability now represents all
possible combinations of the two simultaneous events.
We can write the joint probability of the two events as p(X and
Y) = p(x = X ∩ y = Y) = p(X ∩ Y). Using the chain rule of probability, we
can further write p(X and Y) = p(X given Y) ∗ p(Y) = p(X| Y)p(Y), where
p(X| Y) denotes the probability of event x = X occurs given that the
event y = Y has occurred. It is thus referred to as conditional probability,
as the probability of the first event is now conditioned on the second
event. All conditional probabilities for a (continuous) random variable x
given a specific value of another random variable (i.e., y = Y) form the
conditional probability distribution p(x| y = Y). More generally, we can
write the joint probability distribution of random variables x and y as
p(x, y) and conditional probability distribution as p(x ∣ y).
The joint probability is also symmetrical, that is, p(X and Y) = p(Y
and X), which is a result of the exchangeability property of probability.
Plugging in the definition of joint probability using the chain rule gives
the following:

If you look at this equation more closely, it is not difficult to see that
it can lead to the Bayesian formula we introduced earlier, namely:

Understanding this connection gives us one more reason not to

memorize the Bayesian formula but to appreciate it. We can also
replace a single event x = X with the random variable x to get the
corresponding conditional probability distribution p(x| y = Y).
Lastly, we may only be interested in the probability of an event for
one random variable alone, disregarding the possible realizations of the
other random variable. That is, we would like to consider the
probability of the event x = X under all possible values of y. This is
called the marginal probability for the event x = X. The marginal
probability distribution for a (continuous) random variable x in the
presence of another (continuous) random variable y can be calculated
as follows:

The preceding definition essentially sums up possible values p(x| y)

weighted by the likelihood of occurrence p(y). The weighted sum
operation resolves the uncertainty in the random variable y and thus in
a way integrates it out of the original joint probability distribution,
keeping only one random variable. For example, the prior probability
p(θ) in Bayes’ rule is a marginal probability distribution of θ, which
integrates out other random variables, if any. The same goes for the
evidence term p(data) which is calculated by integrating over all
possible values of θ.
Similarly, we have the marginal probability distribution for random
variable y defined as follows:
Figure 1-10 summarizes the three common probability
distributions. Note that the joint probability distribution focuses on two
or more random variables, while both the conditional and marginal
probability distributions generally refer to a single random variable. In
the case of the conditional probability distribution, the other random
variable assumes a specific value and thus, in a way, “disappears” from
the joint distribution. In the case of the marginal probability
distribution, the other random variable is instead integrated out of the
joint distribution.

Figure 1-10 Three common probability distributions. The joint probability

distribution represents the probability distribution for two or more random
variables, while the conditional and marginal probability distributions generally
refer to the probability distribution for one random variable. The conditional
distribution represents the probabilities of a random variable by
assuming/conditioning a specific value for other variables, while the marginal
distribution converts a joint probability to a single random variable by integrating
out other variables

Let us revisit Bayes’ rule in the context of conditional and marginal

probabilities. Specifically, the likelihood term p(data| θ) can be treated
as the conditional probability of the data given the parameter θ, and the
evidence term p(data) is a marginal probability that needs to be
evaluated across all possible choices of θ. Based on the definition of
marginal probability, we can write the calculation of p(data) as a
weighted sum (assuming a continuous θ):

where we have a different likelihood conditioned by a specific

parameter θ, and these likelihood terms are weighted by the prior
probabilities. Thus, the evidence considers all the different ways we
could use to get to the particular data.

Independence
A special case that would impact the calculation of the three
probabilities mentioned earlier is independence, where the random
variables are now independent of each other. Let us look at the joint,
conditional, and marginal probabilities with independent random
variables.
When two random variables are independent of each other, the
event x = X would have nothing to do with the event y = Y, that is, the
conditional probability for x = X given y = Y becomes p(X| Y) = p(X). The
conditional probability distribution for two independent random
variables thus becomes p(x| y) = p(x). Their joint probability becomes
the multiplication of individual probabilities: p(X ∩ Y) = P(X|
Y)P(Y) = p(X)p(Y), and the joint probability distribution becomes a
product of individual probability distributions: p(x, y) = p(x)p(y). The
marginal probability of x is just its own probability distribution:

where we have used the fact that p(x) can be moved out of the
integration operation due to its independence with y, and the total area
under a probability distribution is one, that is, ∫ p(y)dy = 1.
We can also extend to conditional independence, where the random
variable x could be independent from y given another random variable
z. In other words, we have p(x, y| z) = p(x| z)p(y| z).

Prior and Posterior Predictive Distributions

Let us shift gear to focus on the actual predictions by quantifying the
uncertainties using Bayes’ rule. To facilitate the discussion, we will use
y to denote the data in Bayes’ formula or the actual observations as in
the Bayesian optimization setting. We are interested in its predictive
distribution p(y), that is, the possible values of y and the corresponding
probabilities. Our decision-making would be much more informed if we
had a good understanding of the predictive distribution of the future
unknown data, particularly in the Bayesian optimization framework
where one needs to decide the next sampling location carefully.
Before we collect any data, we would work with a prior predictive
distribution that considers all possible values of the underlying
parameter θ. That is, the prior predictive distribution for y is a marginal
probability distribution that could be calculated by integrating out all
dependencies on the parameter θ:

which is the exact definition of the evidence term in Bayes’ formula.

In a discrete world, we would take the prior probability for a specific
value of the parameter θ, multiply the likelihood of the resulting data
given the current θ, and sum across all weighted likelihoods.
Now let us look at the posterior predictive distribution for a new
data point y′ after observing a collection of data points collectively
denoted as We would like to assess how the future data would be
distributed and what value of y′ we would likely to observe if we were
to run the experiment and acquire another data point again, given that
we have observed some actual data. That is, we want to calculate the
posterior predictive distribution .
We can calculate the posterior predictive distribution by treating it
as a marginal distribution (conditioned on the collected dataset ) and
applying the same technique as before, namely:

where the second term is the posterior distribution of the

parameter θ that can be calculated by applying Bayes’ rule. However,
the first term is more involved. When assessing a new data
point after observing some existing data points, a common assumption
is that they are conditionally independent given a particular value of θ.
Such conditional independence implies that ,
which happens to be the likelihood term. Thus, we can simplify the
posterior predictive distribution as follows:

which follows the same pattern of calculation compared to the prior

predictive distribution. This would then give us the distribution of
observations we would expect for a new experiment (such as probing
the environment in the Bayesian optimization setting) given a set of
previously collected observations. The prior and posterior predictive
distributions are summarized in Figure 1-11.

Figure 1-11 Definition of the prior and posterior predictive distributions. Both are
calculated based on the same pattern of a weighted sum between the prior and the
likelihood

Let us look at an example of the prior predictive distribution under

a normal prior and likelihood function. Before the experiment starts,
we assume the observation model for the likelihood of the data y to
follow a normal distribution, that is, y~N(θ, σ2), or p(y| θ, σ2) = N(θ, σ2),
where θ is the underlying parameter and σ2 is a fixed variance. For
example, in the case of the observation model in the Bayesian
optimization setting introduced earlier, the parameter θ could
represent the true objective function, and the variance σ2 originates
from an additive Gaussian noise. The distribution of y is dependent on
θ, which itself is an uncertain quantity. We further assume the
parameter θ to follow a normal distribution as its prior, that is,
, or , where θ0 and are the mean
and variance of our prior normal distribution assumed before
collecting any data points. Since we have no knowledge of the
environment of interest, we would like to understand how the data
point (treated as a random variable) y could be distributed in this
unknown environment under different values of θ.
Understanding the distribution of y upon the start of any
experiment amounts to calculating its prior predictive distribution p(y).
Since we are working with a continuous θ, the marginalization needs to
consider all possible values of θ from negative to positive infinity in
order to integrate out the uncertainty due to θ:

The prior predictive distribution can thus be calculated by plugging

in the definition of normal likelihood term p(y| θ) and the normal prior
term p(θ). However, there is a simple trick we can use to avoid the
math, which would otherwise be pretty heavy if we were to plug in the
formula of the normal distribution directly.
Let us try directly working with the random variables. We will start
by noting that y = (y − θ) + θ. The first term y − θ takes θ away from y,
which decentralizes y by changing its mean to zero and removes the
dependence of y on θ. In other words, (y − θ)~N(0, σ2), which also
represents the distribution of the random noise in the observation
model of Bayesian optimization. Since the second term θ is also
normally distributed, we can derive the distribution of y as follows:

where we have used the fact that the addition of two independent
normally distributed random variables will also be normally
distributed, with the mean and variance calculated based on the sum of
individual means and variances.
Therefore, the marginal probability distribution of y becomes
. Intuitively, this form also makes sense. Before
we start to collect any observation about y, our best guess for its mean
would be θ0, the expected value of the underlying random variable θ. Its
variance is the sum of individual variances since we are considering
uncertainties due to both the prior and the likelihood; the marginal
distribution needs to absorb both variances, thus compounding the
resulting uncertainty. Figure 1-12 summarizes the derivation of the
prior predictive distributions under the normality assumption for the
likelihood and the prior for a continuous θ.

Figure 1-12 Derivation process of the prior predictive distribution for a new data
point before collecting any observations, assuming a normal distribution for both the
likelihood and the prior

We can follow the same line of reasoning for the case of posterior
predictive distribution for a new observation y′ after collecting some
data points under the normality assumption for the likelihood p(y′|
θ) and the posterior , where p(y′| θ) = N(θ, σ2) and
. We can see that the posterior distribution for θ
has an updated set of parameters θ′ and using Bayes’ rule as more
data is collected.
Now recall the definition of the posterior predictive distribution
with a continuous underlying parameter θ:

Again, plugging in the expression of two normally distributed

density functions and working with an integration operation would be
too tedious. We can instead write y′ = (y′ − θ) + θ, where (y′ − θ)~N(0,
σ2) and . Adding the two independent normal
distributions gives the following:

Figure 1-13 summarizes the derivation of the posterior predictive

distributions under normality assumption for the likelihood and the
prior for a continuous θ.

Figure 1-13 Derivation process of the posterior predictive distribution for a new
data point after collecting some observations, assuming a normal distribution for
both the likelihood and the prior

Bayesian Inference: An Example

After going through a quick and essential primer on Bayesian inference,
let us put the mathematics in perspective by going through a concrete
example. To start with, we will choose a probability density function for
the prior and the likelihood p(y| θ) = N(θ, σ2), both
of which are normally distributed. Note that the prior and the
likelihood are with respect to the random variables θ and y,
respectively, where θ represents the true underlying objective value
that is unknown, and y is the noise-corrupted actual observation that
follows an observation model y = θ + ε with an additive and normally
distributed random noise ε~N(0, σ2). We would like to infer the
distribution of θ based on actual observed realization of y.
The choice for the prior is based on our subjective experience with
the parameter of interest θ. Using the Bayes’ theorem, it helps jump-
start the learning toward its real probability distribution. The range of
possible values of θ spans across the full support of the prior normal
distribution. Upon observing an actual realization Y of y, two things will
happen: the probability of observing θ = Y will be calculated and
plugged in Bayes’ rule as the prior, and the likelihood function will be
instantiated as a conditional normal distribution p(y| θ = Y) = N(Y, σ2).
Figure 1-14 illustrates an example of the marginal prior distribution
and the conditional likelihood function (which is also a probability
distribution) along with the observation Y. We can see that both
distributions follow a normal curve, and the mean of the latter is
aligned to the actual observation Y due to the conditioning effect from
Y = θ. Also, the probability of observing Y is not very high based on the
prior distribution p(θ), which suggests a change needed for the prior in
the posterior update of the next iteration. We will need to change the
prior in order to improve such probability and conform the subjective
expectation to reality.
Figure 1-14 Illustrating the prior distribution and the likelihood function, both
following a normal distribution. The mean of the likelihood function is equal to the
actual observation due to the effect of conditioning
The prior distribution will then gradually get updated to
approximate the actual observations by invoking Bayes’ rule. This will
give the posterior distribution in solid line,
whose mean is slightly nudged from θ0 toward Y and updated to θ′, as
shown in Figure 1-15. The prior distribution and likelihood function
are displayed in dashed lines for reference. The posterior distribution
of θ is now more aligned with what is actually observed in reality.
Figure 1-15 Deriving the posterior distribution for θ using Bayes’ rule. The updated
mean θ′ is now between the prior mean θ0 and actual observation Y, suggesting an
alignment between subjective preference and reality
Finally, we would be interested in the predictive distribution of the
actual data point if we acquired a new observation. Treating it as a
random variable y′ enables us to express our uncertainty in the form of
an informed probability distribution, which benefits follow-up tasks
such as deciding where to sample next (more on this later). Based on
our previous discussion, the resulting probability distribution for y′ will
assume a normal distribution with the same mean as the posterior
distribution of θ and an inflated variance that absorbs uncertainties
from both θ and the observation model for y, as shown in Figure 1-16.
The prior and posterior distributions and the likelihood function are
now in the dashed line for reference.

Figure 1-16 Illustrating the posterior predictive distribution if we acquire another

observation from the system/environment under study. The posterior predictive
distribution shares the same mean as the posterior distribution of θ but now has a
larger spread due to uncertainty from both θ and the observation model

Bayesian Optimization Workflow

Having gone through the essentials in Bayesian statistics, you may
wonder how it connects to the Bayesian optimization setting. Recall
that the predictive posterior distribution quantifies the probabilities of
different observations if we were to probe the environment and receive
a new realization. This is a powerful tool when reasoning about the
utility of varying sampling choices x ∈ A in search of the global
maximum f ∗.
To put this in perspective, let us add the location x as an explicit
conditioning on the prior/posterior predictive distribution and use f
(the true objective value) to denote θ (the underlying parameter of
interest). For example, the prior predictive distribution p(y| x)
represents the conditional probability distribution of the actual
observation y at location x. Since we have many different locations
across the domain A, there will also be many prior predictive
distributions, each following the same class of probability distributions
when assuming specific properties about the underlying true objective
function before probing the environment.
Take a one-dimensional objective function f (x) ∈ ℝ, for example. For
any location x0 ∈ A, we will use a prior predictive distribution p(y| x0) to
characterize the possible values of the unknown true value f0 = f (x0).
These prior predictive distributions will jointly form our prior
probabilistic belief about the shape of the underlying objective
function f. Since there are infinitely many locations x0 and thus
infinitely many random variables y, these probability distributions of
the infinite collection of random variables are jointly used as a
stochastic process to characterize the true objective function. Here, the
stochastic process simplifies to our running example earlier when
limited to a single location.

Gaussian Process
A prevalent choice of stochastic process in Bayesian optimization is the
Gaussian process, which requires that these finite-dimensional
probability distributions are multivariate Gaussian distributions in a
continuous domain with infinite number of variables. It is a flexible
framework to model a broad family of functions and quantify their
uncertainties, thus being a powerful surrogate model used to
approximate the true underlying function. We will delve into the details
of the Gaussian process in the next chapter, but for now, let us look at a
few visual examples to see what it offers.
Figure 1-17 illustrates an example of a “flipped” prior probability
distribution for a single random variable selected from the prior belief
of the Gaussian process. Each point follows a normal distribution.
Plotting the mean (solid line) and 95% credible interval (dashed lines)
of all these prior distributions gives us the prior process for the
objective function regarding each location in the domain. The Gaussian
process thus employs an infinite number of normally distributed
random variables within a bounded range to model the underlying
objective function and quantify the associated uncertainty via a
probabilistic approach.

Figure 1-17 A sample prior belief of the Gaussian process represented by the mean
and 95% credible interval for each location in the domain. Every objective value is
modeled by a random variable that follows a normal prior predictive distribution.
Collecting the distributions of all random variables could help us quantify the
potential shape of the true underlying function and its probability

The prior process can thus serve as the surrogate data-generating

process to generate samples in the form of functions, an extension of
sampling single points from a probability distribution. For example, if
we were to repeatedly sample from the prior process earlier, we would
expect the majority (around 95%) of the samples to fall within the
credible interval and a minority outside this range. Figure 1-18
illustrates three functions sampled from the prior process.

Figure 1-18 Three example functions sampled from the prior process, where
majority of the functions fall within the 95% credible interval
In the Gaussian process, the uncertainty on the objective value of
each location is quantified using the credible interval. As we start to
collect observations and assume a noise-free and exact observation
model, the uncertainties at the collection locations will be resolved,
leading to zero variance and direct interpolation at these locations.
Besides, the variance increases as we move further away from the
observations, resulting from integrating the prior process with the
information provided by the actual observations. Figure 1-19 illustrates
the updated posterior process after collecting two observations. The
posterior process with updated knowledge based on the observations
will thus make a more accurate surrogate model and better estimate
the objective function.
Figure 1-19 Updated posterior process after incorporating two exact observations
in the Gaussian process. The posterior mean interpolates through the observations,
and the associated variance reduces as we move nearer the observations

Acquisition Function
The tools from Bayesian inference and the extension to the Gaussian
process provide principled reasoning on the distribution of the
objective function. However, we would still need to incorporate such
probabilistic information in our decision-making to search for the
global maximum. We need to build a policy that absorbs the most
updated information on the objective function and recommends the
following most promising sampling location in the face of uncertainties
across the domain. The optimization policy thus plays an essential role
in connecting the Gaussian process to the eventual goal of Bayesian
optimization. In particular, the posterior predictive distribution
provides an outlook on the objective value and associated uncertainty
for locations not explored yet, which could be used by the optimization
policy to quantify the utility of any alternative location within the
domain.
When converting the posterior knowledge about candidate
locations, that is, posterior parameters such as the mean and the
variance, to a single utility score, the acquisition function comes into
play. An acquisition function is a manually designed mechanism that
evaluates the relative potential of each candidate location in the form of
a scalar score, and the location with the maximum score will be used as
the recommendation for the next round of sampling. It is a function that
assesses how valuable a candidate location when we acquire/sample it.
The acquisition function is often cheap to evaluate as a side
computation since we need to evaluate it at every candidate location
and then locate the maximum utility score, another (inner)
optimization problem.
Many choices of acquisition function have been proposed in the
literature. In a later part of the book, we will cover the popular ones,
such as expected improvement (EI) and knowledge gradient (KG). Still,
it suffices, for now, to understand that it is a predesigned function that
needs to balance two opposing forces: exploration and exploitation.
Exploration encourages resolving the uncertainty across the domain by
sampling at unfamiliar and distant locations, since these areas may
bear a big surprise due to a high certainty. Exploitation recommends a
greedy move at promising regions where we expect the observation
value to be high. The exploration-exploitation trade-off is a common
topic in many optimization settings.
Another distinguishing feature is the short-term and long-term
trade-off. A short-term acquisition function only focuses on one step
ahead and assumes this is the last chance to sample from the
environment; thus, the recommendation is to maximize the immediate
utility. A long-term acquisition function employs a multi-step lookahead
approach by simulating potential evolutions/paths in the future and
making a final recommendation by maximizing the long-run utility. We
will cover both types of policies in the book.
There are many other emerging variations in the design of the
acquisition function, such as adding safety constraints to the system
under study. In any case, we would judge the quality of the policy using
a specific acquisition function based on how close we are to the location
of the global maximum upon exhausting our budget. The distance
between the current and optimal locations is often called instant regret
or simple regret. Alternatively, the cumulative regret (cumulative
distances between historical locations and the optimum location)
incurred throughout the sampling process can also be used.

The Full Bayesian Optimization Loop

Bayesian optimization is an iterative process between the
(uncontrolled) environment and the (controlled) policy. The policy
Other documents randomly have
different content
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project

Gutenberg:

1.E.1. The following sentence, with active links to, or other

immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United

States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is

derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is

posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project

Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute

this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,

performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or

providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who

notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of

any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project

Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend

considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except

for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set

forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied

warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the

Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission

of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the

assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project

Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500

West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to

the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws

regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states

where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot

make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current

donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About

Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several

printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,

including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge

connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and

personal growth every day!

ebookmasss.com

Bayesian Applications in Pharmaceutical Development 1st Edition Mani Lakshminarayanan (Editor) 2024 Scribd Download
100% (2)
Bayesian Applications in Pharmaceutical Development 1st Edition Mani Lakshminarayanan (Editor) 2024 Scribd Download
55 pages
Beginning Photo Retouching and Restoration Using GIMP 2nd Edition Phillip Whitt instant download
100% (1)
Beginning Photo Retouching and Restoration Using GIMP 2nd Edition Phillip Whitt instant download
78 pages
The Definitive Guide to PCI DSS Version 4: Documentation, Compliance, and Management 1st Edition Arthur B. Cooper Jr. instant download
100% (1)
The Definitive Guide to PCI DSS Version 4: Documentation, Compliance, and Management 1st Edition Arthur B. Cooper Jr. instant download
56 pages
Lessons in Estimation Theory For PDF
100% (1)
Lessons in Estimation Theory For PDF
570 pages
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch and Stream Data Processing Alfonso Antolínez García download
100% (1)
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch and Stream Data Processing Alfonso Antolínez García download
79 pages
Tactical Wireshark Kevin Cardwell download
100% (1)
Tactical Wireshark Kevin Cardwell download
73 pages
Introduction To Normal Distribution: Nathaniel E. Helwig
0% (1)
Introduction To Normal Distribution: Nathaniel E. Helwig
56 pages
Comparative Study of Bayesian Optimization Process For The Best Machine Learning Hyperparameters
No ratings yet
Comparative Study of Bayesian Optimization Process For The Best Machine Learning Hyperparameters
11 pages
A11Y Unraveled: Become a Web Accessibility Ninja Dimitris Georgakas download
100% (1)
A11Y Unraveled: Become a Web Accessibility Ninja Dimitris Georgakas download
64 pages
Pro .NET on Amazon Web Services: Guidance and Best Practices for Building and Deployment 1st Edition William Penberthy instant download
100% (1)
Pro .NET on Amazon Web Services: Guidance and Best Practices for Building and Deployment 1st Edition William Penberthy instant download
81 pages
Full download Bayesian Optimization : Theory and Practice Using Python Peng Liu pdf docx
100% (4)
Full download Bayesian Optimization : Theory and Practice Using Python Peng Liu pdf docx
66 pages
Bayesian Optimization : Theory and Practice Using Python Peng Liu - The ebook with rich content is ready for you to download
100% (1)
Bayesian Optimization : Theory and Practice Using Python Peng Liu - The ebook with rich content is ready for you to download
57 pages
Bayesian Optimization with Application to Computer Experiments Tony Pourmohamad - Download the ebook now to never miss important content
No ratings yet
Bayesian Optimization with Application to Computer Experiments Tony Pourmohamad - Download the ebook now to never miss important content
67 pages
[Ebooks PDF] download Bayesian Optimization in Action MEAP V07 1st / chapters 1 to 8 of 13 Edition Quan Nguyen full chapters
No ratings yet
[Ebooks PDF] download Bayesian Optimization in Action MEAP V07 1st / chapters 1 to 8 of 13 Edition Quan Nguyen full chapters
87 pages
Gelbart Dissertation 2015
No ratings yet
Gelbart Dissertation 2015
137 pages
Module 2 Rnsit
No ratings yet
Module 2 Rnsit
15 pages
2107.11784v1
No ratings yet
2107.11784v1
19 pages
Bayesian Optimization in Action MEAP V07 1st / chapters 1 to 8 of 13 Edition Quan Nguyen - Read the ebook online or download it for a complete experience
No ratings yet
Bayesian Optimization in Action MEAP V07 1st / chapters 1 to 8 of 13 Edition Quan Nguyen - Read the ebook online or download it for a complete experience
80 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
Lec07 Baysian Opti
No ratings yet
Lec07 Baysian Opti
94 pages
Pro .NET on Amazon Web Services: Guidance and Best Practices for Building and Deployment 1st Edition William Penberthy instant download
100% (1)
Pro .NET on Amazon Web Services: Guidance and Best Practices for Building and Deployment 1st Edition William Penberthy instant download
75 pages
HW2 - Problem 4.2.1: General Model
No ratings yet
HW2 - Problem 4.2.1: General Model
6 pages
Bayesian Approach
No ratings yet
Bayesian Approach
6 pages
Evaluation of Bayesian Optimization Applied To Discrete-Event Simulation
No ratings yet
Evaluation of Bayesian Optimization Applied To Discrete-Event Simulation
9 pages
Gonzalez 2021
No ratings yet
Gonzalez 2021
67 pages
21 Mle
No ratings yet
21 Mle
24 pages
NIPS 2016 Bayesian Optimization With Robust Bayesian Neural Networks Paper
No ratings yet
NIPS 2016 Bayesian Optimization With Robust Bayesian Neural Networks Paper
9 pages
bayesian optimization
No ratings yet
bayesian optimization
15 pages
Bayesian Optimization For Accelerating Hyper-Parameter Tuning
No ratings yet
Bayesian Optimization For Accelerating Hyper-Parameter Tuning
4 pages
A Tutorial On Bayesian Optimization of
No ratings yet
A Tutorial On Bayesian Optimization of
49 pages
Mil780 Classification
No ratings yet
Mil780 Classification
18 pages
Frazier 2018
No ratings yet
Frazier 2018
25 pages
18 Ba1110
No ratings yet
18 Ba1110
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
67 pages
4-2 Generalizing Bayesian Optimization With Likelihood-Free Inference and Decision-Theoretic Entropies
No ratings yet
4-2 Generalizing Bayesian Optimization With Likelihood-Free Inference and Decision-Theoretic Entropies
45 pages
Bayesian Optimization With Unknown Constraints
No ratings yet
Bayesian Optimization With Unknown Constraints
14 pages
Hands-On Bayesian Neural Network
No ratings yet
Hands-On Bayesian Neural Network
28 pages
Energy Policy: Toshiyuki Sueyoshi, Mika Goto
No ratings yet
Energy Policy: Toshiyuki Sueyoshi, Mika Goto
8 pages
Numerical Transformation of Geochemical Data
No ratings yet
Numerical Transformation of Geochemical Data
10 pages
Introducing Public Administration 9th Edition, (Ebook PDF) instant download
100% (1)
Introducing Public Administration 9th Edition, (Ebook PDF) instant download
64 pages
Ch4 Sol
No ratings yet
Ch4 Sol
21 pages
Gumbel Distribution
No ratings yet
Gumbel Distribution
6 pages
Apress Bayesian Optimization Theory and Practice Using Python 1484290623
No ratings yet
Apress Bayesian Optimization Theory and Practice Using Python 1484290623
243 pages
Bayesian Optimisation (AutoML)
No ratings yet
Bayesian Optimisation (AutoML)
26 pages
Bayesian Optimization - Garnett CAMBRIDGE 2023
No ratings yet
Bayesian Optimization - Garnett CAMBRIDGE 2023
374 pages
Bayesian Optimization PDF
No ratings yet
Bayesian Optimization PDF
22 pages
Robert Engle Dan McFadden Handbook of Econometrics PDF
No ratings yet
Robert Engle Dan McFadden Handbook of Econometrics PDF
1,024 pages
SSRN Id4377891
No ratings yet
SSRN Id4377891
35 pages
Minka Dirichlet PDF
No ratings yet
Minka Dirichlet PDF
14 pages
CIR Modeling of Interest Rates: Degree Project
No ratings yet
CIR Modeling of Interest Rates: Degree Project
29 pages
Understanding Heterogeneous Consumer Preferences in Chinese Milk Markets: A Latent Class Approach
No ratings yet
Understanding Heterogeneous Consumer Preferences in Chinese Milk Markets: A Latent Class Approach
15 pages
Presentation Generalized Linear Model Theory
No ratings yet
Presentation Generalized Linear Model Theory
77 pages
Avramov Doron Financial Econometrics
No ratings yet
Avramov Doron Financial Econometrics
554 pages
Bayes Optimization For Machine Learning
No ratings yet
Bayes Optimization For Machine Learning
29 pages
Dimitri B.) Reliability Life Testing Vol-I
No ratings yet
Dimitri B.) Reliability Life Testing Vol-I
798 pages
Lecture Notes in Statistics 148
No ratings yet
Lecture Notes in Statistics 148
241 pages
Econometrics Shalabh
No ratings yet
Econometrics Shalabh
300 pages
A Tutorial On Bayesian Optimization
No ratings yet
A Tutorial On Bayesian Optimization
22 pages
Bayesian Inference
No ratings yet
Bayesian Inference
380 pages
Bayesoptbook A4
No ratings yet
Bayesoptbook A4
374 pages
Probabilistic Methods in Engineering: Exercise Set 4
No ratings yet
Probabilistic Methods in Engineering: Exercise Set 4
3 pages
Math562TB 06F PDF
No ratings yet
Math562TB 06F PDF
701 pages
Ryan Adams 140814 Bayesopt Ncap
No ratings yet
Ryan Adams 140814 Bayesopt Ncap
84 pages
(J. G. Kalbfleisch) Probability and Statistical I PDF
No ratings yet
(J. G. Kalbfleisch) Probability and Statistical I PDF
188 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Upgrad Data Streak NOV19 PDF
No ratings yet
Upgrad Data Streak NOV19 PDF
32 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
Building Machine Learning Systems with Python
From Everand
Building Machine Learning Systems with Python
Willi Richert
4/5 (3)
Testing with JUnit
From Everand
Testing with JUnit
Frank Appel
No ratings yet
Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones
From Everand
Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones
Justin Bozonier
No ratings yet
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
From Everand
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Margaux Masson-Forsythe
No ratings yet
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
From Everand
XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn how to build, evaluate, and deploy predictive models with expert guidance
Partha Pritam Deka
No ratings yet
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
From Everand
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
Sebastian Raschka
4/5 (20)
Mastering Probabilistic Graphical Models Using Python
From Everand
Mastering Probabilistic Graphical Models Using Python
Ankur Ankan
3/5 (2)
Building Machine Learning Systems Using Python: Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
From Everand
Building Machine Learning Systems Using Python: Practice to Train Predictive Models and Analyze Machine Learning Results with Real Use-Cases (English Edition)
Deepti Chopra
No ratings yet
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
From Everand
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
Mirza Rahim Baig
No ratings yet
Mastering PostgreSQL 12 - Third Edition: Advanced techniques to build and administer scalable and reliable PostgreSQL database applications, 3rd Edition
From Everand
Mastering PostgreSQL 12 - Third Edition: Advanced techniques to build and administer scalable and reliable PostgreSQL database applications, 3rd Edition
Hans-Jurgen Schonig
No ratings yet
Learning Data Mining with Python - Second Edition
From Everand
Learning Data Mining with Python - Second Edition
Robert Layton
No ratings yet
Python High Performance - Second Edition
From Everand
Python High Performance - Second Edition
Gabriele Lanaro
No ratings yet
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
From Everand
Basics of Python Programming: Learn Python in 30 days (Beginners approach) - 2nd Edition
Dr. Pratiyush Guleria
No ratings yet
Learning Apache Mahout Classification
From Everand
Learning Apache Mahout Classification
Ashish Gupta
No ratings yet
Mastering Python Data Analysis
From Everand
Mastering Python Data Analysis
Magnus Vilhelm Persson
No ratings yet
Elasticsearch Indexing: How to Improve User's Search Experience
From Everand
Elasticsearch Indexing: How to Improve User's Search Experience
Huseyin Akdogan
1/5 (1)
Python Machine Learning By Example
From Everand
Python Machine Learning By Example
Yuxi (Hayden) Liu
4/5 (7)
Microsoft Azure Machine Learning
From Everand
Microsoft Azure Machine Learning
Sumit Mund
4.5/5 (3)
Hands-On Genetic Algorithms with Python: Applying genetic algorithms to solve real-world deep learning and artificial intelligence problems
From Everand
Hands-On Genetic Algorithms with Python: Applying genetic algorithms to solve real-world deep learning and artificial intelligence problems
Eyal Wirsansky
No ratings yet
Data Analytics with SAS: Explore your data and get actionable insights with the power of SAS (English Edition)
From Everand
Data Analytics with SAS: Explore your data and get actionable insights with the power of SAS (English Edition)
Nishant Sidana
No ratings yet
Learning Data Mining with Python
From Everand
Learning Data Mining with Python
Robert Layton
No ratings yet
Mathematica Data Analysis
From Everand
Mathematica Data Analysis
Suchok Sergiy
No ratings yet
Bayesian Analysis with Python
From Everand
Bayesian Analysis with Python
Osvaldo Martin
4.5/5 (3)
Apache Mahout Essentials
From Everand
Apache Mahout Essentials
Jayani Withanawasam
No ratings yet
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
From Everand
Mastering Machine Learning Algorithms - Second Edition: Expert techniques for implementing popular machine learning algorithms, fine-tuning your models, and understanding how they work, 2nd Edition
Giuseppe Bonaccorso
2/5 (1)
Learning Bayesian Models with R
From Everand
Learning Bayesian Models with R
M.Koduvely Dr. Hari
5/5 (1)
Learning Probabilistic Graphical Models in R
From Everand
Learning Probabilistic Graphical Models in R
David Bellot
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges
From Everand
Reinforcement Learning Algorithms with Python: Learn, understand, and develop smart algorithms for addressing AI challenges
Andrea Lonza
No ratings yet
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
From Everand
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Sebastian Raschka
4.5/5 (3)

Bayesian Optimization : Theory and Practice Using Python Peng Liu instant download

Uploaded by

Bayesian Optimization : Theory and Practice Using Python Peng Liu instant download

Uploaded by

Download the full version and explore a variety of ebooks

Bayesian Optimization : Theory and Practice Using

_____ Tap the link below to start your download _____

Find ebooks or textbooks at ebookmass.com today!

Bayesian Optimization: Theory and Practice Using Python

Quantitative Trading Strategies Using Python: Technical

Machine learning: A Bayesian and optimization perspective

Advanced Data Analytics Using Python : With Architectural

eTextbook 978-0134379760 The Practice of Computing Using

Critical thinking in clinical research : applied theory

Tensors for Data Processing. Theory, Methods, and

Implementing Cryptography Using Python Shannon Bray

ISBN 978-1-4842-9062-0 e-ISBN 978-1-4842-9063-7

© Peng Liu 2023

The use of general descriptive names, registered names, trademarks,

This Apress imprint is published by the registered company APress

1. Bayesian Optimization Overview

As the name suggests, Bayesian optimization is an area that studies

Or equivalently, we are interested in its location x∗ where

Figure 1-2 provides an example one-dimensional objective function

The Objective Function

The Observation Model

Let us assume an additive noise term ε inflicted on f; the actual

Here, the noise term ε arises from measurement error or inaccurate

This means that for a specific location x, the actual observation y is

Figure 1-7 Illustrating a sample (continuous) posterior distribution for the

We would need access to three components to obtain the posterior

The last term is the denominator p(data), also referred to as the

Frequentist vs. Bayesian Approach

Joint, Conditional, and Marginal Probabilities

Understanding this connection gives us one more reason not to

The preceding definition essentially sums up possible values p(x| y)

Figure 1-10 Three common probability distributions. The joint probability

Let us revisit Bayes’ rule in the context of conditional and marginal

where we have a different likelihood conditioned by a specific

Prior and Posterior Predictive Distributions

which is the exact definition of the evidence term in Bayes’ formula.

where the second term is the posterior distribution of the

which follows the same pattern of calculation compared to the prior

Let us look at an example of the prior predictive distribution under

The prior predictive distribution can thus be calculated by plugging

Again, plugging in the expression of two normally distributed

Figure 1-13 summarizes the derivation of the posterior predictive

Bayesian Inference: An Example

Figure 1-16 Illustrating the posterior predictive distribution if we acquire another

Bayesian Optimization Workflow

The prior process can thus serve as the surrogate data-generating

The Full Bayesian Optimization Loop

1.E. Unless you have removed all references to Project

1.E.1. The following sentence, with active links to, or other

This eBook is for the use of anyone anywhere in the United

1.E.2. If an individual Project Gutenberg™ electronic work is

1.E.3. If an individual Project Gutenberg™ electronic work is

1.E.4. Do not unlink or detach or remove the full Project

1.E.5. Do not copy, display, perform, distribute or redistribute

1.E.7. Do not charge a fee for access to, viewing, displaying,

1.E.8. You may charge a reasonable fee for copies of or

• You provide a full refund of any money paid by a user who

• You provide, in accordance with paragraph 1.F.3, a full refund of

1.E.9. If you wish to charge a fee or distribute a Project

1.F.1. Project Gutenberg volunteers and employees expend

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except

1.F.4. Except for the limited right of replacement or refund set

1.F.5. Some states do not allow disclaimers of certain implied

1.F.6. INDEMNITY - You agree to indemnify and hold the

Section 2. Information about the Mission

Volunteers and financial support to provide volunteers with the

Section 3. Information about the Project

The Foundation’s business office is located at 809 North 1500

Section 4. Information about Donations to

The Foundation is committed to complying with the laws

While we cannot and do not solicit contributions from states

International donations are gratefully accepted, but we cannot

Please check the Project Gutenberg web pages for current

Section 5. General Information About

Project Gutenberg™ eBooks are often created from several

_ Tap the link below to start your download _