0% found this document useful (0 votes)

9 views47 pages

Lecture 1

Uploaded by

jayasreepalani02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views47 pages

Lecture 1

Uploaded by

jayasreepalani02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

The Oxford logo

At the heart of our visual identity is the Oxford logo. Quadrangle Logo
It should appear on everything we produce, from
letterheads to leaflets and from online banners to
bookmarks.
This is the square
The primary quadrangle logo consists of an Oxford blue logo of first
(Pantone 282) square with the words UNIVERSITY OF choice or primary
OXFORD at the foot and the belted crest in the top Oxford logo.
right-hand corner reversed out in white.

The word OXFORD is a specially drawn typeface while all

other text elements use the typeface Foundry Sterling.

The secondary version of the Oxford logo, the horizontal

rectangle logo, is only to be used where height (vertical
space) is restricted.

Lecture 1: Machine Learning Paradigms

These standard versions of the Oxford logo are intended
for use on white or light-coloured backgrounds, including
light uncomplicated photographic backgrounds.

Examples of how these logos should be used for various

Rectangle Logo
applications appear in the following pages.

Advanced Topics in Machine Learning

NOTE
The minimum size for the quadrangle logo and the
rectangle logo is 24mm wide. Smaller versions with The rectangular
secondary Oxford
bolder elements are available for use down to 15mm
logo is for use only
wide. See page 7.
where height is
restricted.

Dr. Tom Rainforth

January 22nd, 2020
[email protected]
Course Outline

• Slightly unusual course covering different topics in machine

learning
• Aim is to get you interacting with actual research
• Fully assessed by coursework
• There are no examples sheets: you are instead expected to
take the initiative to investigate areas you find interesting and
familiarize yourself will software tools (we will suggest
resources and the practicals are there to help with software
familiarity)

1
Course Structure

• 6 lectures on Bayesian Machine Learning from me

• 8 lectures on Natural Language Processing from Dr Alejo
Nevado-Holgado
• A few guest lectures at the end
• Many of the lectures we be delivered back-to-back (e.g. I will
effectively give 2x1 hour lectures and 2x2 hour lectures)

2
Course Assessment

• Team project working in groups of 4

• Based on reproducing a research paper
• Each team has a different paper
• Produce a group report + statement of individual
contributions + poster
• Individual oral vivas
• Groups will be assigned by department, details are still being
sorted
• Check online materials—may end up being some tweaks
before you start

3
Bayesian Machine Learning—Course Outline

Lectures
• Machine Learning Paradigms (1 hour)
• Bayesian Modeling (2 hours)
• Foundations of Bayesian Inference (1 hour)
• Advanced Inference Methods (1 hour)
• Variational Auto-Encoders (1 hour)—key lecture for
assessments!

I will upload notes after each lecture. These will not perfectly
overlap with the lectures/slides so you will need to separately
digest each

4
What is Machine Learning?

Arthur Samuel, 1959

Field of study that gives computers the ability to learn without
being explicitly programmed.

Tom Mitchell, 1997

Any computer program that improves its performance at some task
through experience.

Kevin Murphy, 2012

To develop methods that can automatically detect patterns in
data, and then to use the uncovered patterns to predict future
data or other outcomes of interest.

5
Motivation: Why Should we Take a Bayesian Approach?

Bayesian Reasoning is the Language of Uncertainty

Medical Diagnostics
• Bayesian reasoning is the basis
for how to make decisions with
incomplete information
nopathy• Diagnostics
Bayesian methods allow us to
construct models that return
nally:
principled uncertainty
n relies on estimates
expert confidence in
rather than just
g medical record ! advises patient
point estimates
reatment • Bayesian models are often
interpretable, such that they
arning: can be easily queried, criticized,
dical record
and is unlike
built on by prev seen !
humans
6
tem guesses at random, biases expert
Motivation: Why Should we Take a Bayesian Approach?

Bayesian Modeling Lets us Utilize Domain Expertise

• Bayesian modeling allows us to

combine information from data
with that from prior expertise
• This means we can exploit
existing knowledge, rather than
purely relying on black-box
processing of data
• Models make clear assumptions
and are explainable
• We can easily update our
beliefs as new information
becomes available
7
Motivation: Why Should we Take a Bayesian Approach?

Bayesian Modeling is Powerful

• Bayesian models are

state-of-the-art for a huge
variety of prediction and
decision making tasks
• They make use of all the data
and can still be highly effective
when data is scarce
• By averaging over possible
parameters, they can form rich
model classes for explaining
how data is generated.
Image Credit: PyMC3 Documentation

8
Learning From Data

8
Learning from Data

• Machine learning is all about learning from data

• There is generally a focus on making predictions at unseen
datapoints
• Starting point is typically a dataset—we can delineate
approaches depending on type of dataset

9
Supervised Learning

• We have access to a labeled dataset of input–output pairs:

D = {xn , yn }N
n=1 .
• Aim is to learn a predictive model f that takes an input
x ∈ X and aims to predict its corresponding output y ∈ Y.
• The hope is that these example pairs can be used to “teach”
f how to accurately make predictions.

10
Classification
Supervised Learning—Classification

Classification CatCat

Classification CatDog

Flying
CatSpaghetti
Monster

Input x Predictor f (x) Class label y

11
Supervised Learning—Regression

12
Supervised Learning
Supervised Learning
Input Features Outputs

}
}
Datapoint
Training Data

x1 x2 x3 … xM y
Index
1 0.24 0.12 -0.34 … 0.98 3
2 0.56 1.22 0.20 … 1.03 2
3 -3.20 -0.01 0.21 … 0.93 1
… … … … … … …
N 2.24 1.76 -0.47 … 1.16 2

• Use this data to learn a predictive model fθ : X → Y (e.g. by

optimizing θ)
• Once learned, we can use this to predict outputs for new input
7
points, e.g. fθ ([0.48 1.18 0.34 . . . 1.13]) = 2
13
Unsupervised Learning

• In unsupervised Learning we have no clear output variable

that we are attempting to predict: D = {xn }N
n=1
• This is sometimes referred to as unlabeled data
• Aim is to exact some salient features for the dataset, such as
underlying structure, patterns, or characteristics
• Examples: clustering, feature extraction, density estimation,
representation learning, data visualization, data compression

14
Unsupervised Learning—Clustering

Classification

Cat

Unlabeled Data Group into Clusters

15
Two major unsolved problems in the field of machine learning are (1) data-efficiency: the ability to
Unsupervised
learn from fewLearning—Deep Generative
datapoints, like humans; and (2) generalization: Models
robustness to changes of the task or
its context. AI systems, for example, often do not work at all when given inputs that are different
⇤
Equal contribution.

Learn32ndpowerful models for generating new datapoints

Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.

Figure 1: Synthetic celebrities sampled from our model; see Section 3 for architecture and method,
and Section 5 for more results.
These are not real faces: they are samples from a learned model!

1
D P Kingma and P Dhariwal. “Glow: Generative flow with invertible 1x1 convolutions”. In: NeurIPS. 2018. 16
Discriminative vs
Generative Machine
Learning

16
Discriminative vs Generative Machine Learning

• Discriminative methods try to directly predict outputs (they

are primary used for supervised tasks)
• Generative methods try to explain how the data was
generated

17
Image credit: Jason Martuscello, medium.com
Discriminative Machine Learning

• Given data D = {xn , yn }Nn=1 , discriminative methods directly

learn a mapping fθ from inputs x to outputs y
• Training uses D to estimate optimal values of the parameters
θ∗ . This is typically done by minimizing an empirical risk over
the training data:
N
∗ 1 X
θ = arg min L(yi , fθ (xi )) (1)
θ N
n=1
where L(y , ŷ ) is a loss function for prediction ŷ and truth y .
• Prediction at a new input x involves simply applying fθ̂ (x),
where θ̂ is our estimate of θ∗
• Note we often do not predict y directly, e.g. in a classification
task we might predict the class probabilities instead
• For non-parametric approaches, the dimensionality of θ
increases with the dataset size 18
Discriminative Machine Learning

Common approaches: neural networks, support vector machines,

random forests, linear/logistic regression
Pros
• Simpler to directly solve prediction problem than model the
whole data generation process
• Few assumptions
• Often very effective for large datasets
• Some methods can be used effectively in a black-box manner

Cons
• Can be difficult to impart prior information
• Typically lack interpretability
• Do not usually provide natural uncertainty estimates
19
Generative Machine Learning

• Generative approaches construct a probabilistic model to

explain how the data is generated
• For example, with labeled data D = {xn , yn }Nn=1 , we might
construct a model p(x, y ; θ) of the form xn ∼ p(x; θ),
yn |xn ∼ p(y |x = xn ; θ) where θ are model parameters
• This in turns implies a predictive model
• Can also be generative about the model parameters θ:
e.g. with unsupervised data D = {xn }Nn=1 , we can construct a
generative model p(θ, x), such that θ ∼ p(θ), xn |θ ∼ p(x|θ).
• This is the foundation for Bayesian machine learning

20
Generative Machine Learning

Common approaches: Bayesian approaches, deep generative

models, mixture models
Pros
• Allow us to make stronger modeling assumptions and thus
incorporate more problem–specific expertise
• Provide explanation for how data was generated
• More interpretable
• Can provide additional information other than just prediction
• Many methods naturally provide uncertainty estimates
• Allow us to use Bayesian methods

21
Generative Machine Learning

Cons
• Can be difficult to construct—typically require problem
specific expertise
• Can impart unwanted assumptions—often less effective for
huge datasets
• Tackling an inherently more difficult problem than straight
prediction

22
The Bayesian Paradigm

22
Bayesian Probability is All About Belief

Frequentist Probability
The frequentist interpretation of probability is that it is the average
proportion of the time an event will occur if a trial is repeated
infinitely many times.

Bayesian Probability
The Bayesian interpretation of probability is that it is the
subjective belief that an event will occur in the presence of
incomplete information

23
Bayesianism vs Frequentism

https://xkcd.com/1132/ 24
Bayesianism vs Frequentism

https://xkcd.com/1132/
24
Bayesianism vs Frequentism

Warning
Bayesiansism has its shortfalls too—see the course notes

24
The Basic Laws of Probability

We can derive most of Bayesian statistics from two rules:

The Product Rule

The probability of two events occurring is the probability of one of
the events occurring times the conditional probability of the other
event happening given the first event happened:

P(A, B) = P(A|B)P(B) = P(B|A)P(A) (2)

The Sum Rule

The probability that either A or B occurs, P(A ∪ B), is given by

P(A ∪ B) = P(A) + P(B) − P(A, B). (3)

25
Bayes’ Rule

p(A|B)p(B)
p(B|A) =
p(A)

26
Using Bayes’ Rule

• Encode initial belief about parameters θ using a prior p(θ)

• Characterize how likely different values of θ are to have given
rise to observed data D using a likelihood function p(D|θ)
• Combined these to give posterior, p(θ|D), using Bayes’ rule:
p(D|θ)p(θ)
p(θ|D) = (4)
p(D)
• This represents our updated belief about θ once the
information from the data has been incorporated
• Finding the posterior is known as Bayesian inference
R
• p(D) = p(D|θ)p(θ)dθ is a normalization constant known as
the marginal likelihood or model evidence
• This does not depend on θ so we have
p(θ|D) ∝ p(D|θ)p(θ) (5)
27
Multiple Observations: Using the Posterior as the Prior

• One of the key characteristics of Bayes’ rule is that it is

self-similar under multiple observations
• We can use the posterior after our first observation as the
prior when considering the next:
p(D2 |θ, D1 )p(θ|D1 )
p(θ|D1 , D2 ) = (6)
p(D2 |D1 )
p(D2 |θ, D1 )p(D1 |θ)p(θ)
= (7)
p(D2 |D1 )p(D1 )
p(D1 , D2 |θ)p(θ)
= (8)
p(D1 , D2 )
• We can thinking of this as continuous updating of beliefs as
we receive more information
28
Example: Positive Cancer Test

We have just had a result back from the Doctor for a cancer
screen and it comes back positive. How worried should we be given
the test isn’t perfect?

29
Example: Positive Cancer Test (2)

Before these results came in, the chance of us having this type of
cancer was quite low: 1/1000. Let’s say θ represents us having
cancer so our prior is p(θ) = 1/1000.
For people who do have cancer, the test is 99.9% accurate.
Denoting the event of the test returning positive as D = 1, we
thus have p(D = 1|θ = 1) = 999/1000.
For people who do not have cancer, the test is 99% accurate. We
thus have p(D = 1|θ = 0) = 1/100.
Our prospects might seem quite grim at this point given how
accurate the test is.

30
Example: Positive Cancer Test (3)

To figure out the chance we have cancer properly though, we now

need to apply Bayes rule:

p(D = 1|θ = 1)p(θ = 1)

p(θ = 1|D = 1) =
p(D = 1)
p(D = 1|θ = 1)p(θ = 1)
=
p(D = 1|θ = 1)p(θ = 1) + p(D = 1|θ = 0)p(θ = 0)
0.999 × 0.001
=
0.999 × 0.001 + 0.01 × 0.999
= 1/11

So the chances are that we actually don’t have cancer!

31
Alternative Viewpoint

An alternative (equivalent) viewpoint for Bayesian reasoning is that

we first define a joint model over parameters and data: p(θ, D)
We then condition this model on the data taking the observed
value, i.e. we fix D
This produces the posterior p(θ|D) by simply normalizing this to
be a valid probability distribution, i.e. the posterior is proportional
to the joint for a fixed D:

p(θ|D) ∝ p(θ, D) (9)

32
2 3 4
How
BridgingMight x
the Gap Between =Bayesian
we2 Write
the ≠1a System x3 =
to Break
Ideal and Common Practice 6
Captchas? x4
Tom Rainforth

⁄” a6 = “⁄” a7 = “⁄” a8
i6 = 4 i7 = 5 i8
8 x6 = 53 x7 = 17 x8

⁄” Noise: Noise: No
displacement stroke ell
field

e 6: Pseudo algorithm and a sample 2

33
i2 = 1
Bridging the Gap Between the Bayesian Ideal and Common Practice
i3 = 1
x2 = ≠1 x3 = 6
TomSimulating
Rainforth Captchas is Much Easier

a5 = “⁄” a6 = “⁄
i5 = 3 i6 = 4
” gxs2rRj
=
a65 =
x 18
“⁄” = “⁄
x76 = 53
Generation

a
i6 = 4 i7 = 5

Inference
8 x6 = 53 x7 = 17
a9 = “⁄” Noise:
i9 = 7 displace
” 9 = 9
Noise:
x field
Noise:
displacement stroke
[Le, Baydin, and Wood. Inference Compilation and Universal 34
3
Bridging the Gap Between the Bayesian Ideal and Common Practice
Tom Rainforth
The Bayesian Pipeline

The Bayesian Pipeline

Prior Likelihood Data
}
}
}
Inference
p(✓) p(D|✓) D Method

} p(✓|D) / p(D|✓)p(✓)
Posterior

35
Breaking Captchas with Bayesian Models

https://youtu.be/ZTKx4TaqNrQ?t=9

2
TA Le, A G Baydin, and F Wood. “Inference Compilation and Universal Probabilistic Programming”. In:
AISTATS. 2017.

36
Making Predictions

• Prediction in Bayesian models is done using the posterior

= Ep(θ|D) [p(D∗ |θ, D)]. (12)

• This often done dependent on an input point, i.e. we actually

calculate p(y |D, x) = Ep(θ|D) [p(y |θ, D, x)]

37
Making Predictions (2)

Points of Note
• We usually assume that p(D∗ |θ, D) = p(D∗ |θ), i.e. data is
conditionally independent given θ
• p(D∗ |θ) is equivalent to the likelihood model of the new data:
in almost all cases we just use the likelihood from the original
model
• Calculating the posterior predictive can be computationally
challenging: sometimes we resort to approximations,
e.g. taking a point estimate for θ (see Lecture 4)
• There are lots of things we might use the posterior for other
than just calculating the posterior predictive, e.g. making
decisions (see course notes) and calculating expectations

38
Recap

• Supervised learning has access to outputs, unsupervised

learning does not
• Discriminative methods try and directly make predictions,
generative methods try to explain how the data is generated
• Bayesian machine learning is a generative approach that
allows us to incorporate uncertainty and information from
prior expertise
• Bayes’ rule: p(θ|D) ∝ p(D|θ)p(θ)
• Posterior predictive: p(D∗ |D) = Ep(θ|D) [p(D∗ |θ, D)]

39
Further Reading

• Look at the course notes! For this lecture there are discussion
of Bayesian vs frequentist approaches, and a worked example
of Bayesian modeling for a biased coin.
• Chapter 1 of K P Murphy. Machine learning: a probabilistic
perspective. 2012. https://www.cs.ubc.ca/~murphyk/MLbook/pml-intro-22may12.pdf.
• L Breiman. “Statistical modeling: The two cultures”. In:
Statistical science (2001)
• Chapter 1 of C Robert. The Bayesian choice: from
decision-theoretic foundations to computational
implementation. 2007. https://www.researchgate.net/publication/41222434_The_
Bayesian_Choice_From_Decision_Theoretic_Foundations_to_Computational_Implementation.

• Michael I Jordan. Are you a Bayesian or a frequentist? Video

lecture, 2009. http://videolectures.net/mlss09uk_jordan_bfway/
40

Machine Learning Unit 1
100% (7)
Machine Learning Unit 1
112 pages
CE880_lecture5_slides
No ratings yet
CE880_lecture5_slides
32 pages
Lecture01 Introduction To Machine Learning (Chapter1)
No ratings yet
Lecture01 Introduction To Machine Learning (Chapter1)
64 pages
4. Ai_foundations of Machine Learning i
No ratings yet
4. Ai_foundations of Machine Learning i
40 pages
Module 1
No ratings yet
Module 1
175 pages
Introduction to machine learning
No ratings yet
Introduction to machine learning
33 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
39 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
ML1-Introduction To Machine Learning
No ratings yet
ML1-Introduction To Machine Learning
46 pages
Unit 1 ML
No ratings yet
Unit 1 ML
70 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
machine 2023 part 1
No ratings yet
machine 2023 part 1
4 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
mlintro-2
No ratings yet
mlintro-2
28 pages
Lecture 2 Introduction To ML
No ratings yet
Lecture 2 Introduction To ML
35 pages
Unit - 1 - SC
No ratings yet
Unit - 1 - SC
98 pages
Introduction to ML Unit-1 PPT
No ratings yet
Introduction to ML Unit-1 PPT
90 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
01_ml_basics
No ratings yet
01_ml_basics
61 pages
1 Leaning Introduction
No ratings yet
1 Leaning Introduction
29 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
M2_AI_Chap1_neural-network
No ratings yet
M2_AI_Chap1_neural-network
60 pages
Course Logistics and Introduction: CSN-526 Machine Learning
No ratings yet
Course Logistics and Introduction: CSN-526 Machine Learning
23 pages
Lecture 01 Introducing ML 13102022 031101pm
No ratings yet
Lecture 01 Introducing ML 13102022 031101pm
36 pages
Week 01
No ratings yet
Week 01
37 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
28 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
mlintro-3
No ratings yet
mlintro-3
28 pages
01 Introduction
No ratings yet
01 Introduction
23 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Machine Learning 2025
No ratings yet
Machine Learning 2025
111 pages
Chap-6 Machine Learning Introduction
No ratings yet
Chap-6 Machine Learning Introduction
49 pages
Module1_ Deep Learning
No ratings yet
Module1_ Deep Learning
26 pages
01 Intro Slides
No ratings yet
01 Intro Slides
67 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
ML Revision
No ratings yet
ML Revision
207 pages
Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
No ratings yet
Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
33 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Lecture 1.2 Introduction to Machine Learning
No ratings yet
Lecture 1.2 Introduction to Machine Learning
31 pages
Lec 2 Basics of machine learning (1)
No ratings yet
Lec 2 Basics of machine learning (1)
35 pages
Introduction To ML
No ratings yet
Introduction To ML
48 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
1 Sup
No ratings yet
1 Sup
80 pages
Ch7 Introduction to Machine Learning
No ratings yet
Ch7 Introduction to Machine Learning
29 pages
Inductive Learning and Machine Learning
100% (1)
Inductive Learning and Machine Learning
321 pages
Machine Learning: What Is Data and Model? Machine Learning Workflow Distance Based Classifiers Bayes Decision Theory
No ratings yet
Machine Learning: What Is Data and Model? Machine Learning Workflow Distance Based Classifiers Bayes Decision Theory
81 pages
Module1 Introduction
No ratings yet
Module1 Introduction
35 pages
LN ML Rug
No ratings yet
LN ML Rug
267 pages
1 Lecture 1: Introduction To Machine Learning
No ratings yet
1 Lecture 1: Introduction To Machine Learning
12 pages
Agile Foundation Courseware – English
From Everand
Agile Foundation Courseware – English
Nader Rad
No ratings yet
DevOps Foundation Courseware - English
From Everand
DevOps Foundation Courseware - English
Oleg Skrynnik
No ratings yet
The Swift Codebook: A Beginner's Guide from Basics to Best Practices
From Everand
The Swift Codebook: A Beginner's Guide from Basics to Best Practices
Grace Huang
No ratings yet
DM Tools Sample-1
No ratings yet
DM Tools Sample-1
72 pages
CD Notes
No ratings yet
CD Notes
52 pages
Advanced Java Lab
No ratings yet
Advanced Java Lab
60 pages
Bottomup
No ratings yet
Bottomup
39 pages
AUTOMATIC FIRE ALARM-Infrared Flame Sensor
No ratings yet
AUTOMATIC FIRE ALARM-Infrared Flame Sensor
7 pages
T4 Probability
No ratings yet
T4 Probability
7 pages
Aldrich - R. A. Fisher On Bayes and Bayes' Theorem
No ratings yet
Aldrich - R. A. Fisher On Bayes and Bayes' Theorem
10 pages
Frequentist Vs Bayesian
No ratings yet
Frequentist Vs Bayesian
48 pages
The Bernoullis and The Origin of Probability Theory: Looking Back After 300 Years
No ratings yet
The Bernoullis and The Origin of Probability Theory: Looking Back After 300 Years
17 pages
HW2 Solutions
No ratings yet
HW2 Solutions
22 pages
(Elements in The Philosophy of Physics) Roman Frigg, Charlotte Werndl - Foundations of Statistical Mechanics-Cambridge University Press (2024)
No ratings yet
(Elements in The Philosophy of Physics) Roman Frigg, Charlotte Werndl - Foundations of Statistical Mechanics-Cambridge University Press (2024)
84 pages
Four Concepts of Probability
No ratings yet
Four Concepts of Probability
7 pages
A7 - RGiarelli - 1
No ratings yet
A7 - RGiarelli - 1
16 pages
Aldous 2016 Wilmott
No ratings yet
Aldous 2016 Wilmott
4 pages
(Ebook) Probability Theory: The Logic of Science by E.T. Jaynes; G. Larry Bretthorst (Editor) ISBN 9780511065897, 9780521592710, 0511065892, 0521592712, B00AKE1Q40 instant download
No ratings yet
(Ebook) Probability Theory: The Logic of Science by E.T. Jaynes; G. Larry Bretthorst (Editor) ISBN 9780511065897, 9780521592710, 0511065892, 0521592712, B00AKE1Q40 instant download
59 pages
(Timothy Childers) Philosophy and Probability
No ratings yet
(Timothy Childers) Philosophy and Probability
213 pages
Probability and Statistical Theory: With Applications To Games and Gambling
No ratings yet
Probability and Statistical Theory: With Applications To Games and Gambling
36 pages
Chapter 2: Elementary Probability Theory: Chiranjit Mukhopadhyay Indian Institute of Science
No ratings yet
Chapter 2: Elementary Probability Theory: Chiranjit Mukhopadhyay Indian Institute of Science
48 pages
Astronomical Statistics: Introductory Notes
No ratings yet
Astronomical Statistics: Introductory Notes
18 pages
Objective Bayesian Statistics: 0.-A. Al-Hujaj and H.L. Harney
No ratings yet
Objective Bayesian Statistics: 0.-A. Al-Hujaj and H.L. Harney
8 pages
Download Full Bayesian Rationality The Probabilistic Approach to Human Reasoning 1st Edition Mike Oaksford PDF All Chapters
100% (8)
Download Full Bayesian Rationality The Probabilistic Approach to Human Reasoning 1st Edition Mike Oaksford PDF All Chapters
85 pages
Bayesian Rationality The Probabilistic Approach to Human Reasoning Mike Oaksford Nick Chater z-lib
No ratings yet
Bayesian Rationality The Probabilistic Approach to Human Reasoning Mike Oaksford Nick Chater z-lib
317 pages
Bayes' Theorem: Probability Theory Statistics
No ratings yet
Bayes' Theorem: Probability Theory Statistics
9 pages
Topic 1 - Basic Notions
No ratings yet
Topic 1 - Basic Notions
36 pages
Bootcamp 2 Session PPT Day 1 Probability Statistics Ankit Javeri 2ND May 2024
No ratings yet
Bootcamp 2 Session PPT Day 1 Probability Statistics Ankit Javeri 2ND May 2024
37 pages
BSA 2-4 Management Science Group Number 4 Chapter 11: Probability and Statistics Question Number 1
No ratings yet
BSA 2-4 Management Science Group Number 4 Chapter 11: Probability and Statistics Question Number 1
3 pages
Section 6-2 Basics of Probability
No ratings yet
Section 6-2 Basics of Probability
35 pages
Quantitative Techniques in Management (NEW)
No ratings yet
Quantitative Techniques in Management (NEW)
41 pages
The Elements of Statistical Data Mining, Inferece and Prediction
No ratings yet
The Elements of Statistical Data Mining, Inferece and Prediction
3 pages
Foundations of Chemical Kinetics Unimolecular Reactions in The Gas Phase: RRK Theory
No ratings yet
Foundations of Chemical Kinetics Unimolecular Reactions in The Gas Phase: RRK Theory
23 pages
Reliability Engineering and System Safety: The Risk Concept-Historical and Recent Development Trends
No ratings yet
Reliability Engineering and System Safety: The Risk Concept-Historical and Recent Development Trends
12 pages
L08-Probability Basics
No ratings yet
L08-Probability Basics
29 pages
Bayesian Rationality The Probabilistic Approach to Human Reasoning 1st Edition Mike Oaksford instant download
100% (2)
Bayesian Rationality The Probabilistic Approach to Human Reasoning 1st Edition Mike Oaksford instant download
49 pages
ch04 Ken Black Student Solutions
No ratings yet
ch04 Ken Black Student Solutions
30 pages
PDF Bayesian Rationality The Probabilistic Approach to Human Reasoning 1st Edition Mike Oaksford download
100% (2)
PDF Bayesian Rationality The Probabilistic Approach to Human Reasoning 1st Edition Mike Oaksford download
67 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

The Oxford logo

The word OXFORD is a specially drawn typeface while all

The secondary version of the Oxford logo, the horizontal

Lecture 1: Machine Learning Paradigms

Examples of how these logos should be used for various

Advanced Topics in Machine Learning

Dr. Tom Rainforth

• Slightly unusual course covering different topics in machine

• 6 lectures on Bayesian Machine Learning from me

• Team project working in groups of 4

Arthur Samuel, 1959

Tom Mitchell, 1997

Kevin Murphy, 2012

Bayesian Reasoning is the Language of Uncertainty

Bayesian Modeling Lets us Utilize Domain Expertise

• Bayesian modeling allows us to

Bayesian Modeling is Powerful

• Bayesian models are

• Machine learning is all about learning from data

• We have access to a labeled dataset of input–output pairs:

Input x Predictor f (x) Class label y

• Use this data to learn a predictive model fθ : X → Y (e.g. by

• In unsupervised Learning we have no clear output variable

Unlabeled Data Group into Clusters

Learn32ndpowerful models for generating new datapoints

• Discriminative methods try to directly predict outputs (they

• Given data D = {xn , yn }Nn=1 , discriminative methods directly

Common approaches: neural networks, support vector machines,

• Generative approaches construct a probabilistic model to

Common approaches: Bayesian approaches, deep generative

We can derive most of Bayesian statistics from two rules:

The Product Rule

P(A, B) = P(A|B)P(B) = P(B|A)P(A) (2)

The Sum Rule

P(A ∪ B) = P(A) + P(B) − P(A, B). (3)

• Encode initial belief about parameters θ using a prior p(θ)

• One of the key characteristics of Bayes’ rule is that it is

To figure out the chance we have cancer properly though, we now

p(D = 1|θ = 1)p(θ = 1)

So the chances are that we actually don’t have cancer!

An alternative (equivalent) viewpoint for Bayesian reasoning is that

p(θ|D) ∝ p(θ, D) (9)

e 6: Pseudo algorithm and a sample 2

The Bayesian Pipeline

• Prediction in Bayesian models is done using the posterior

= Ep(θ|D) [p(D∗ |θ, D)]. (12)

• This often done dependent on an input point, i.e. we actually

• Supervised learning has access to outputs, unsupervised

• Michael I Jordan. Are you a Bayesian or a frequentist? Video

You might also like