0% found this document useful (0 votes)

14 views

Lecture 6

Uploaded by

bhavesh agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lecture 6

Uploaded by

bhavesh agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

MACHINE LEARNING (CS 403/603)

Probabilistic modelling: Introduction

Dr. Puneet Gupta

Univariate Gaussian Distribution
μ= 0, σ = 1
μ= 0, σ = 2
μ= 2, σ = 2
μ= 2, σ = 0.5

-5 -4 -3 -2 -1 0 1 2 3 4 5

● Distribution over real-valued scalar random variable x

● Defined by a scalar mean μ and a scalar variance σ2.
● Useful in Regression with one input.
Linear Regression: Probabilistic
View
p(y|w,x)

wTxn wTxn
yn yn

Xn Xn
Let’s maximize
Previously, we see that likelihood
linear or ridge estimates which
regression tries to will eventually
minimize the sum of reduce the error.
actual squared error.
Multivariate Gaussian Distribution

p(x

p(x
1)
2)

Identify OR
heatmaps?
Probabilistic Modeling of Data
Why probablistic modeling?
● Parameter Estimation: Find θ using input data.
● Prediction: Perform prediction from new test data by formulating it as supervised or
unsupervised learning.

Assumption:
● Training data is given by y={y1,y2,...,yN} and there is no x, for simplicity.
● It is obtained from a probability model: yn ∼ p(y|θ), ∀n. θ is a unknown parameter.
● The data is independently & identically distributed (i.i.d.).
● One such case can be a sequence of N coin toss outcomes. Each outcome is a
binary random variable. Head = 1, Tail = 0

Likelihood is a fuction of θ, but how can best θ be estimated?

Parameter Estimation
Maximum Likelihood Estimation (MLE) How to choose best θ?
Find θ that fits the data most probable or
likely, i.e., maximizes the likelihood p(y|θ)
w.r.t. θ.
p(y|θ)

In MLE, we maximize log-likelihood rather than θ θMLE

likelihood because: i) it reduces numerical
instabilities by converting product to sum; and ii) Example
log is monotonic thus, does not effect estimation. What can be the probability distribution
for the case of N coin toss outcomes?
● Bernoulli distribution can model it.

● It’s an optimization problem w.r.t. θ

[ Differentiate
● Negative log-likelihood can be thought as above w.r.t. θ
and set to 0]
a loss function then, MLE is minimizing
training data loss. Overfitting in absence of abundant data.
(MLE uses training data without any regularization)
Incorporating Prior Belief in
Parameter Estimation
In probabilistic models, we can incorporate our prior belief by Example of a prior
distribution, p(θ)
specifying a prior distribution p(θ) on the unknown θ.
Why it is required?
● For specifying which values of θ are more likely than
others.
It also aids in regularization for θ (see soon)
0 0.5 1
●

Prior p(θ) can be combined with the likelihood p(y|θ) using p(y|θ)
Bayes rule to get the posterior distribution p(θ|y) over θ, i.e., p(θ)
p(θ|y)

What happen, if prior has uniform distribution?

● It is the same as using no prior. θMAP
θMLE

θ can be estimated by MAP maximizing posterior probability p(θ|y) (or, find θ that is most
likely given data and our prior belief) rather than MLE which maximizes likelihood.

θMLE and prior p(θ) sort of attract each other to reach to a final consensus.
Maximum-a-Posteriori (MAP)
Estimation
● MAP is same as MLE except an
addition log-prior-distribution term
which provides regularization on θ.
● When p(θ) is a uniform prior, MAP
reduces to MLE.

Our Example

● Hyperparameters of p(θ), α and β are

expected numbers of heads and tails
respectively before tossing the coin.
● If (α = β = 1) in p(θ), p(θ) exhibit
uniform prior distribution, hence no
regularization and θMAP = θMLE
How to choose hyperparameters?
These rules of thumb follow directly from the nature of the Bayesian analysis
procedure:
● If the prior is uninformative, the posterior is very much determined by the data
(the posterior is data-driven)

● If the prior is informative, the posterior is a mixture of the prior and the data

● The more informative the prior, the more data you need to "change" your
beliefs, so to speak because the posterior is very much driven by the prior
information

● If you have a lot of data, the data will dominate the posterior distribution (they
will overwhelm the prior)
How to choose prior such that it models equi-probable and high belief of head and
tail?
● Observation: MAP estimation “pulls” the estimate to-ward the prior.
● The more focused our prior belief, the larger the pull toward the prior.
● By using larger values for α and β (but keeping them equal), we can narrow the
peak of the beta distribution around the value of p= 0.5. This would cause the
MAP estimate to move closer to the prior.
Fully Bayesian Inference
MLE/MAP Fully Bayesian
● Both MLE and MAP provide single
p(y|θ) p(θ|y)
value of θ p(θ)
● Bayesian estimation calculates full p(θ|y)

posterior distribution p(θ|X). Then,

select a value that we consider best
θMAP
in some sense like in MAP. θML
Importance: E
Large variance thus,
● The variance calculated for θ using less confident. May
its posterior distribution, can neglect.
provides us some confidence in our
prediction. If the variance is too
large, we may declare that there 0 20

does not exist a good estimate for θ. p(y|θ) p(y|θ)

● Online learning p(θ) Step 1 p(θ) Step 2
p(θ|y) p(θ|y)
Bayesian estimation is made
complex because now the
denominator in the Bayes’ Rule,
a.k.a evidence cannot be ignored.
Online learning:
● Old posterior becomes new prior.
● Our belief about θ keeps getting updated as we
see more and more data.
Fully Bayesian Inference
For a given likelihood function, if we have a choice regarding how we express our prior
beliefs, we must use that form which allows us to carry out the integration in the
numerator of Bayes rule. This is known as choosing the conjugate priors.

Example of coin toss

With Bernoulli likelihood for parameter θ Sketch of proof:
and prior of Beta(α, β) (which is a
conjugate pair), the posterior is Beta(α+N1,
β+N0), where N1 and N0 are the number of
heads and tails respectively.

Important point to remember for beta distribution

It is known as Beta function

p(X) is a beta
distribution with
parameters α and β

Details of beta and gamma function can be found at: http://math.feld.cvut.cz/mt/txtd/5/txe3da5h.htm

How to make predictions?
● Once θ is learned from training data, it can be used to make predictions about test set
● E.g., predict probability of next toss being head by analyzing the previous coin tosses
● This can be accomplished by utilizing point estimates (MLE/MAP), or full posterior.
● Hence, predictions in our coin toss example are:

• When doing MLE/MAP, we approximate the posterior 𝑝(𝜃|𝒚) by a single point

• For fully Bayesian, we need to compute the predictive distribution by averaging over the
full posterior – basically calculate 𝑝(𝑦N+1│𝜃) for each possible 𝜃, weighs it by how likely
this 𝜃 is under the posterior 𝑝(𝜃│𝒚), and sum all such posterior weighted predictions.
Note that not each value of 𝜃 is given equal importance in the averaging.

Recall
How to make predictions?
● Once θ is learned from training data, it can be used to make predictions about test set
● E.g., predict probability of next toss being head by analyzing the previous coin tosses
● This can be accomplished by utilizing point estimates (MLE/MAP), or full posterior.
● Hence, predictions in our coin toss example are:

Why delta function

is required?

Here, fully Bayesian

approach for prediction,
averages over all
possible values of θ,
weighted by their
respective posterior
probabilities (easy in this
example, but a hard
problem in general)
● Probabilistic model is an intuitive and flexible way to model data where likelihood
(corresponds to a loss function) and prior (corresponds to a regularizer) are choosen based
on the property of data.
● MLE and MAP estimation can be considered as unregularized and regularized loss function
minimization respectively.

Lurker in The Library 8 People
No ratings yet
Lurker in The Library 8 People
23 pages
Leo Strauss-On The Basis of Hobbes' Political Philosophy (1956) PDF
No ratings yet
Leo Strauss-On The Basis of Hobbes' Political Philosophy (1956) PDF
29 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
ML-Map-and-Bayseian
No ratings yet
ML-Map-and-Bayseian
35 pages
Ds 7
No ratings yet
Ds 7
20 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
L08 MAP (1)
No ratings yet
L08 MAP (1)
8 pages
Basics of Probabilistic/Bayesian Modeling and Parameter Estimation
No ratings yet
Basics of Probabilistic/Bayesian Modeling and Parameter Estimation
21 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
ML Lecture 03 - Probabilistic Inference (Spring 2024)
No ratings yet
ML Lecture 03 - Probabilistic Inference (Spring 2024)
46 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
ANNParameter Estimation-II,III
No ratings yet
ANNParameter Estimation-II,III
2 pages
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
32 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
2 Mle
No ratings yet
2 Mle
28 pages
BR 2
No ratings yet
BR 2
36 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
LinReg
No ratings yet
LinReg
34 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Notes4_BayesianLearning
No ratings yet
Notes4_BayesianLearning
8 pages
Wk04 machine learning
No ratings yet
Wk04 machine learning
6 pages
MAP&MLE
No ratings yet
MAP&MLE
44 pages
6 Min Read: Siwei Xu Aug 27
No ratings yet
6 Min Read: Siwei Xu Aug 27
4 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Chapter 4 ML Parametric Classification
No ratings yet
Chapter 4 ML Parametric Classification
42 pages
Week 6 Mle
No ratings yet
Week 6 Mle
41 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
No ratings yet
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
18 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
No ratings yet
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
6 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
CS464_Ch3_Estimation
No ratings yet
CS464_Ch3_Estimation
56 pages
SL-Chapter2
No ratings yet
SL-Chapter2
14 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
An Overview of Bayesian Econometrics
No ratings yet
An Overview of Bayesian Econometrics
30 pages
Bayesian Uncertainty Quantification
No ratings yet
Bayesian Uncertainty Quantification
23 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
03_lecturenote_MLE_MAP
No ratings yet
03_lecturenote_MLE_MAP
7 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Regression-probabilistic-perspective
No ratings yet
Regression-probabilistic-perspective
20 pages
Slide 1
No ratings yet
Slide 1
37 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
output_25
No ratings yet
output_25
8 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Bishop CH 3 Notes
No ratings yet
Bishop CH 3 Notes
6 pages
Questions_for_Unit_4 (2)
No ratings yet
Questions_for_Unit_4 (2)
6 pages
CS772-Lec3
No ratings yet
CS772-Lec3
27 pages
[MLE] - MLE-vs-Bayes
No ratings yet
[MLE] - MLE-vs-Bayes
11 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Lec_4
No ratings yet
Lec_4
35 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Basicsof Probability
No ratings yet
Basicsof Probability
8 pages
Microprocessor Technology: Presented by Anshika Porwal Scholar No-222116609
No ratings yet
Microprocessor Technology: Presented by Anshika Porwal Scholar No-222116609
27 pages
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
No ratings yet
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
15 pages
Database Management System: Refer Below To Answer The Questions (Q.1 To Q4)
No ratings yet
Database Management System: Refer Below To Answer The Questions (Q.1 To Q4)
6 pages
Foot2hip: A Deep Neural Network Model For Predicting Lower Limb Kinematics From Foot Measurements
No ratings yet
Foot2hip: A Deep Neural Network Model For Predicting Lower Limb Kinematics From Foot Measurements
11 pages
Integration of Artificial Intelligence, Blockchain, and Wearable Technology For Chronic Disease Management: A New Paradigm in Smart Healthcare
No ratings yet
Integration of Artificial Intelligence, Blockchain, and Wearable Technology For Chronic Disease Management: A New Paradigm in Smart Healthcare
11 pages
Database Management System: Questions (1 - 20) Are Based On The Following 3 Tables
0% (1)
Database Management System: Questions (1 - 20) Are Based On The Following 3 Tables
6 pages
Towards A Blockchain Based Fall Prediction Model For Aged Care
No ratings yet
Towards A Blockchain Based Fall Prediction Model For Aged Care
10 pages
Database Management System
No ratings yet
Database Management System
5 pages
Transactions and Concurrency Control
No ratings yet
Transactions and Concurrency Control
6 pages
Transactions and Concurrency Control
100% (1)
Transactions and Concurrency Control
7 pages
Transactions and Concurrency Control
No ratings yet
Transactions and Concurrency Control
7 pages
File Structures & Indexing
No ratings yet
File Structures & Indexing
4 pages
Database Management System 1. For A Database Relation R (A, B, C, D) Where The Domains of A, B, C and D Only Include Atomic
No ratings yet
Database Management System 1. For A Database Relation R (A, B, C, D) Where The Domains of A, B, C and D Only Include Atomic
4 pages
CD Q4 PDF
No ratings yet
CD Q4 PDF
2 pages
CD Q4 PDF
No ratings yet
CD Q4 PDF
2 pages
CD Q5 PDF
No ratings yet
CD Q5 PDF
3 pages
CD Q5 PDF
No ratings yet
CD Q5 PDF
3 pages
Compiler Design: 1. The Advantage of Panic Mode of Error Recovery Is That
No ratings yet
Compiler Design: 1. The Advantage of Panic Mode of Error Recovery Is That
4 pages
Lab Maual For Experiments 6 To 10
No ratings yet
Lab Maual For Experiments 6 To 10
19 pages
Interview Vocab
No ratings yet
Interview Vocab
4 pages
FINC304 Managerial Economics: Session 7: Organisation of The Firm
No ratings yet
FINC304 Managerial Economics: Session 7: Organisation of The Firm
13 pages
The Power of Appreciation in Everyday Life
100% (1)
The Power of Appreciation in Everyday Life
213 pages
Complete Syllabus of CH Department
No ratings yet
Complete Syllabus of CH Department
47 pages
Once Upon aTime Long Answer
No ratings yet
Once Upon aTime Long Answer
3 pages
Chapter 45
No ratings yet
Chapter 45
9 pages
Tutorial 3.2 (B) 1 Hour
No ratings yet
Tutorial 3.2 (B) 1 Hour
4 pages
ECG Interpretation
100% (1)
ECG Interpretation
82 pages
Loughren V Lion Et Al Complaint in Equity
No ratings yet
Loughren V Lion Et Al Complaint in Equity
78 pages
Exp 4 Aas
100% (6)
Exp 4 Aas
9 pages
Flupentixol Drug Study
No ratings yet
Flupentixol Drug Study
7 pages
Review of Probation Law in India: Chapter-1
100% (1)
Review of Probation Law in India: Chapter-1
34 pages
Metals Industry Journals&Magazines
No ratings yet
Metals Industry Journals&Magazines
9 pages
Wholesale Jerseys From Chinasbos518kngts PDF
No ratings yet
Wholesale Jerseys From Chinasbos518kngts PDF
8 pages
CH 3 Yoga As Preventive Measure For Lifestyle DiseaseG 12-1
No ratings yet
CH 3 Yoga As Preventive Measure For Lifestyle DiseaseG 12-1
73 pages
6.4.3.4 Packet Tracer - Troubleshooting Default Gateway Issues Instructions
0% (1)
6.4.3.4 Packet Tracer - Troubleshooting Default Gateway Issues Instructions
4 pages
Freedom Songs Lesson Plan
No ratings yet
Freedom Songs Lesson Plan
3 pages
Exotic Options: - Digital and Chooser Options
No ratings yet
Exotic Options: - Digital and Chooser Options
8 pages
Case Summary 2 - With Cause
No ratings yet
Case Summary 2 - With Cause
6 pages
Basic Maths 01 Trigonometry
No ratings yet
Basic Maths 01 Trigonometry
21 pages
Letter Puppet Game As A Media For Early Childhood Literacy Stimulation
No ratings yet
Letter Puppet Game As A Media For Early Childhood Literacy Stimulation
7 pages
It Means Do Not Hold Back, Release It.
100% (1)
It Means Do Not Hold Back, Release It.
1 page
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Barriers To Communication
No ratings yet
Barriers To Communication
10 pages
VLSI Design Important Questions On MOSFET For Exams: Voltage
No ratings yet
VLSI Design Important Questions On MOSFET For Exams: Voltage
5 pages
Torres Et Al Final
No ratings yet
Torres Et Al Final
24 pages
AD101 CA Group Planning 1.0
No ratings yet
AD101 CA Group Planning 1.0
6 pages
Propuesta Educativa
100% (1)
Propuesta Educativa
274 pages
"Har Bacche Ki Apni Khoobi Hoti Hai, Kaabiliyat Hoti Hai, Apni Chahat Hoti Hai." - Taare Zameen Par
No ratings yet
"Har Bacche Ki Apni Khoobi Hoti Hai, Kaabiliyat Hoti Hai, Apni Chahat Hoti Hai." - Taare Zameen Par
4 pages

Lecture 6

Uploaded by

Lecture 6

Uploaded by

MACHINE LEARNING (CS 403/603)

Probabilistic modelling: Introduction

Dr. Puneet Gupta

● Distribution over real-valued scalar random variable x

Likelihood is a fuction of θ, but how can best θ be estimated?

In MLE, we maximize log-likelihood rather than θ θMLE

● It’s an optimization problem w.r.t. θ

What happen, if prior has uniform distribution?

● Hyperparameters of p(θ), α and β are

posterior distribution p(θ|X). Then,

does not exist a good estimate for θ. p(y|θ) p(y|θ)

Example of coin toss

Important point to remember for beta distribution

It is known as Beta function

Details of beta and gamma function can be found at: http://math.feld.cvut.cz/mt/txtd/5/txe3da5h.htm

• When doing MLE/MAP, we approximate the posterior 𝑝(𝜃|𝒚) by a single point

Why delta function

Here, fully Bayesian

You might also like