0% found this document useful (0 votes)
35 views15 pages

CS772 Lec1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views15 pages

CS772 Lec1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Course Logistics and Introduction to

Probabilistic Machine Learning


CS772A: Probabilistic Machine Learning
Piyush Rai
2
Course Logistics
▪ Course Name: Probabilistic Machine Learning – CS772A
▪ 2 classes each week
▪ Mon/Thur 18:00-19:15
▪ Venue: RM-101

▪ Attendance policy: None but biometric attendance will be taken


▪ All material (readings etc) will be posted on course webpage (internal access)
▪ URL: https://web.cse.iitk.ac.in/users/piyush/courses/pml_spring25/pml.html

▪ Q/A and announcements on Piazza. Please sign up


▪ URL: https://piazza.com/iitk.ac.in/secondsemester2025/cs772
▪ If need to contact me by email ([email protected]), prefix subject line with “CS772”

▪ Unofficial auditors are welcome CS772A: PML


3
Workload and Grading Policy
▪ 3 quizzes: 30%
▪ In class, closed-book

▪ Mid-sem exam: 20% (date as per DOAA schedule). Closed book

▪ End-sem exam: 30% (date as per DOAA schedule). Closed book

▪ Research project (to be done in groups of 4-5): 20%


▪ Some topics will be suggested (research papers)
▪ You can propose your own topic (but must be related to probabilistic ML)
▪ More details will be shared soon

▪ Proration: If you miss any quiz/mid-sem, we can prorate it using end-sem marks
▪ Proration only allowed on limited grounds (e.g., health related)
CS772A: PML
4
Textbooks and Readings
▪ Some books that you may use as reference (freely available online)
▪ Kevin P. Murphy, Probabilistic Machine Learning: An Introduction (PML-1), The MIT Press, 2022.
▪ Kevin P. Murphy, Probabilistic Machine Learning: Advanced Topics(PML-2), The MIT Press, 2022.
▪ Chris Bishop, Pattern Recognition and Machine Learning (PRML), Springer, 2007.
▪ Chris Bishop and Hugh Bishop, Deep Learning: Foundations and Concepts (DLFC), Springer, 2023.

▪ Follow the suggested readings for each lecture (may also include some portions
from these books), rather than trying to read these books in a linear fashion
CS772A: PML
5
Probabilistic Machine Learning
▪ Machine Learning primarily deals with
𝑁
▪ Predicting output 𝑦∗ for new (test) inputs 𝒙∗ given training data 𝑿, 𝒚 = 𝒙𝑖 , 𝑦𝑖 𝑖=1
▪ Generating new (synthetic) data given some training data 𝑿 = 𝒙𝑖 𝑁 𝑖=1
▪ Probabilistic ML gives a natural way to solve both these tasks (with some advantages)
▪ Prediction: Learning the predictive distribution PML is about estimating
these distributions accurately
Using this, we can not only
and efficiently
get the mean but also the
variance (uncertainty) of the 𝑝 𝑦∗ 𝑥∗ , 𝑿, 𝒚) Estimating them exactly is
predicted output 𝑦∗
hard in general but we can
▪ Generation: Learning a generative model of data use approximations
Can “sample” (simulate) from
this distribution to generate 𝑝 𝒙∗ 𝑿) Both are conditional
distributions
new data

▪ At its core, both problems require estimating the underlying distribution of data
CS772A: PML
6
Probabilistic Machine Learning
▪ With a probabilistic approach to ML, we can also easily incorporate “domain knowledge”

▪ Can specify our assumptions about data using suitable probability distributions over
inputs/outputs, usually in the forms Distribution of the input
Probability distribution of
𝑝 𝑦𝑛 𝑥𝑛 , 𝜃) 𝑝(𝑥𝑛 |𝑦𝑛 , 𝜃) conditioned on its “label/output”
the output as a function Distribution of
of input Unknown parameters
of this distribution
𝑝(𝑥𝑛 |𝜃) the inputs

▪ Can specify our assumptions about the unknowns 𝜃 using a “prior distribution”
Represents our belief
about the unknown
parameters before we
see the data
𝑝(𝜃)

▪ After seeing some data 𝒟, can update the prior into a posterior distribution 𝑝(𝜃|𝒟)
CS772A: PML
7
The Core of PML: Two Basic Rules of Probability
▪ Sum Rule (marginalization): Distribution of 𝑎 considering for all possibilities of 𝑏
If 𝑏 is a discrete r.v. If 𝑏 is a continuous r.v.

𝑝 𝑎 = ෍ 𝑝(𝑎, 𝑏) or 𝑝 𝑎 = න 𝑝 𝑎, 𝑏 𝑑𝑏
▪ Product Rule 𝑏

𝑝 𝑎, 𝑏 = 𝑝 𝑎 𝑝 𝑏 𝑎 = 𝑝 𝑏 𝑝 𝑎 𝑏
▪ These two rules are the core of most of probabilistic/Bayesian ML
▪ Bayes rule easily derived from the sum and product rules
𝑝 𝑏 𝑝 𝑎𝑏 𝑝 𝑏 𝑝 𝑎𝑏 Assuming 𝑏 is a
𝑝 𝑏𝑎 = = continuous r.v.
𝑝 𝑎 ‫𝑎 𝑝 ׬‬, 𝑏 𝑑𝑏
CS772A: PML
8

ML and Uncertainty
(and how PML handles uncertainty)

CS772A: PML
9
Uncertainty due to Limited Training Data
▪ Model/parameter uncertainty is due to not having enough training data
Same model class (linear models) Uncertainty not just about the
but uncertainty about the weights weights but also the model class

3 different model classes


considered here (with
linear, polynomial, circular
decision boundaries)

Each model class itself will have


uncertainty(like left fig) since
there isn’t enough training data

▪ Also called epistemic uncertainty. Usually reducible


▪ Vanishes with “sufficient” training data
Image credit: Balaji L, Dustin T, Jasper N. (NeurIPS 2020 tutorial)

Image credit: Balaji L, Dustin T, Jasper N. (NeurIPS 2020 tutorial) CS772A: PML
10
Uncertainty due to Inherent Noise in Training Data
▪ Data uncertainty can be due to various reasons, e.g.,
▪ Intrinsic hardness of labeling, class overlap
▪ Labeling errors/disagreements (for difficult training inputs)
▪ Noisy or missing features

Image credit: Eric Nalisnick Image source: “Improving machine classification using human uncertainty measurements” (Battleday et al, 2021)

▪ Also called aleatoric uncertainty. Usually irreducible


▪ Won’t vanish even with infinite training data
▪ Note: Can sometimes vanish by adding more features
(figure on the right) or switching to a more complex model
Image source: “Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods” (HW 2021)

CS772A: PML
Image source: “Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods” (H&W 2021)
In this course, we will mostly focus 11
How to Estimate Uncertainty? on the Bayesian approach but other
two approaches are also popular
and will also be discussed

▪ Uncertainty in parameters: This can be estimated/quantified via mainly three ways:


Sampling multiple training sets and estimating
𝜃 (1) 𝜃 (2) 𝜃 (𝑆)
𝑝(𝜃|𝒟) A case of 2-dim 𝜃
the parameters from each training set

Bayesian way: Treat params as 𝜃2


random variables and estimate
their distribution conditioned on Frequentist way: Treat params as fixed Ensemble: Train the same model with 𝑆
the given training data (a.k.a. unknowns and estimate them using different initializations or different
posterior distribution) 𝜃1 multiple datasets. This yields a subsets of the training data. Each run
set/distribution over the params(not a will give a different estimate, so we get
“posterior” but a distribution nevertheless!) a set of param estimates

▪ Uncertainty in predictions: Usually estimated by computing and reporting the mean and
variance of predictions made using many possible values of 𝜃. Commonly reported as:
Predictive Distribution Can get both mean
𝑝(𝑦∗ |𝑥∗ , 𝒟) and variance/quantiles Sets/intervals of possible predictions
of the prediction

CS772A: PML
12
Predictive Uncertainty
▪ Information about uncertainty gives an idea about how much to trust a prediction
▪ It can also “guide” us in sequential decision-making:
Test output Test input Given our current estimate of the
regression function, which training
𝑝 𝑦∗ 𝑥∗ , 𝐷) = 𝒩(𝑦∗ |𝜇∗ , 𝜎∗2 )
input(s) should we add next to
improve its estimate the most?
Training data

Blue curve is the mean of the Uncertainty can help here: Acquire training
function (learned so far using inputs from regions where the function is
the available data), shaded most uncertain about its current predictions
region denotes the current
predictive uncertainty

▪ Applications in active learning, reinforcement learning, Bayesian optimization, etc


CS772A: PML
13
Generative Models
▪ PML is not just about parameter/predictive uncertainty

▪ Generative models invariably are also probabilistic models

▪ Learning such models will also be a topic of study in this course


Figure credit: Lilian Weng CS772A: PML
14
Probabilistic Modeling of Data: The Setup
▪ We are given some training data 𝒟
▪ For supervised learning, 𝒟 contains 𝑁 input-label pairs 𝒙𝑖 , 𝑦𝑖 𝑁𝑖=1
▪ For unsupervised learning, 𝒟 contains 𝑁 inputs 𝒙𝑖 𝑁 𝑖=1
▪ Other settings are also possible (e.g., semi-sup., reinforcement learning, etc)
▪ Assume that the observations are generated by a probability distribution
▪ For now, assume the form of the distribution to be known (e.g. a Gaussian)
▪ Parameters of this distribution, collectively denoted by 𝜃 are unknown
▪ Our goal is to estimate the distribution (and thus 𝜃) using training data
▪ Once the distribution is estimated, we can do things such as
▪ Predict labels of new inputs, along with our confidence in these predictions
▪ Generate new data with similar properties as training data
▪ .. and a lot of other useful tasks, e.g., detecting outliers CS772A: PML
15
Probabilistic Modeling of Data: The Setup
▪ We will denote the data distribution as 𝑝𝜃 (𝒟) or 𝑝(𝒟|𝜃)
▪ Assume that, conditioned on 𝜃, observations are independently and identically
distributed (i.i.d. assumption). Depending on the problem, this may look like:
Supervised generative model i.i.d.
𝑁
(both inputs and output are (𝒙𝑛 , 𝑦𝑛 ) ∼ 𝑝(𝒙, 𝑦|𝜃) 𝑝 𝒟𝜃 = ෑ 𝑝(𝒙𝑖 , 𝑦𝑖 |𝜃)
modeled using a distribution) 𝑖=1
Supervised discriminative model
i.i.d.
𝑁
(only the output is modeled using
a distribution); input is assumed
𝑦𝑛 ∼ 𝑝(𝑦|𝒙, 𝜃) 𝑝 𝒟𝜃 = ෑ 𝑝(𝑦𝑖 |𝒙𝑖 , 𝜃)
“given” and not modeled
𝑖=1
i.i.d.
𝑁
Unsupervised generative 𝑝 𝒟𝜃 = ෑ 𝑝(𝒙𝑖 |𝜃)
model (there are only
𝒙𝑛 ∼ 𝑝(𝒙|𝜃)
inputs; no labels) 𝑖=1

▪ Assume that both training and test data come from the same distribution
▪ This assumption, although standard, may be violated in real-world applications of ML and
there are “adaptation” methods to handle that
CS772A: PML

You might also like