0% found this document useful (0 votes)

122 views93 pages

Probabilistic ML Course Overview

This document provides information about the course "Probabilistic Machine Learning for Mechanics (APL 744)" taught by Dr. Souvik Chakraborty at the Indian Institute of Technology Delhi. The key details include: - The course will include both offline lectures and online video lectures. Students can contact the instructor via email or meet in person for any doubts. - Homework, practical assignments, and a term project will be submitted on Moodle. All course information will be posted on the course webpage. - The class will be held on Monday and Thursday from 9:30-11:00am. Practicals will be held on Friday from 3:00-5:00pm. Three T

Uploaded by

Mirzaadnanbeig

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views93 pages

Probabilistic ML Course Overview

Uploaded by

Mirzaadnanbeig

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Probabilistic Machine Learning for

Mechanics (APL 744)

Dr. Souvik Chakraborty

Department of Applied Mechanics
Indian Institute of Technology Delhi
Hauz Khas – 110016, Delhi, India.

E-mail: [email protected]
Website: https://www.csccm.in/

APl 744 CSCCM@IITD 1

Course organization
https://www.csccm.in/courses/probabilistic-machine-learning-for-mechanics
✓ Although the course is offline, video lectures will also be uploaded

✓ For any doubts, you can contact me at [email protected] or meet me in

person.

✓ Homework, practical, and term project should be submitted using moodle.

✓ All information regarding the course will be posted on the webpage.

✓ This includes homework, slides, references, etc.

✓ Class Timing: B-slot

✓ Lectures: MTh: 9:30 – 11:00 am
✓ Practical: Fr: 3:00 – 5:00 pm

✓ TAs: Shailesh Garg, Navaneeth N., and Tapas Tripura

APl 744 CSCCM@IITD 2

Course evaluation
✓ 8 homework will be provided (approximately 10 days per homework) – 30% weightage.
• Each will have programming component.
• Copy pasted program will be awarded ZERO marks
✓ 8 practical sheets will be provided - 15% weightage
• Each practical will have 1-2 questions.
• In practical session, you will be explained the procedure (theory + algorithm).
However, you will have to complete the practical at hostel.
✓ Term project – 30% weightage
• You will be judged on your understanding of the project, clarity of thoughts,
sophistication of the software developed
• A team can have maximum 2 members. We will provide you broad topics and you
will have to select from there.
✓ Minor – 10% weightage.
✓ Major – 15% weightage.
✓ In case any of you decide to audit the course, the passing grade will be 40% (absolute).
Additionally, you will have to complete the project and appear for both minor and major.
✓ We will mention which libraries can be used and which cannot be used in all cases.
APl 744 CSCCM@IITD 3
Homework

What kind of questions will be asked?

• Theoretical questions involving derivations.

• Practical questions involving programming.

Important dates

• HW1: Announcement (August 2); Submission (August 12).

• HW2: Announcement (August 15); Submission (August 25).
• HW3: Announcement (August 28); Submission (September 08).
• HW4: Announcement (September 10); Submission (September 20).
• HW5: Announcement (September 22); Submission (October 02).
• HW6: Announcement (October 05); Submission (October 15).
• HW7: Announcement (October 20); Submission (October 30).
• HW8: Announcement (November 31); Submission (November 10).

APl 744 CSCCM@IITD 4

Practical

What kind of questions will be asked?

• Practical questions involving programming

Important dates

• P1: Announcement (July 27); Submission (August 11).

• P2: Announcement (August 10); Submission (August 25).
• P3: Announcement (August 24); Submission (September 08).
• P4: Announcement (September 07); Submission (September 22).
• P5: Announcement (September 21); Submission (September 06).
• P6: Announcement (October 05); Submission (October 20).
• P7: Announcement (October 19); Submission (November 03).
• P8: Announcement (November 02); Submission (November 10).

APl 744 CSCCM@IITD 5

Term project
What to do?

• We will provide you a pool of research areas

• Select a research area and find a paper in that area (or propose something new in
that area)
• Read the paper, understand and reproduce the results.
• Prepare a report, make a MS PowerPoint

Important dates

• Announcement of broad topics – July 31st

• Topic selection will be first come first serve – Last date Aug 7th.
• Submission of paper/title and abstract – Last date Aug 14th
• First quarter report on project: September 05 (review of relevant literature).
• Mid term report on project (should have some implementation): September 25.
• Third quarter report on project (should have some implementation): October 15.
• Final report on project (6 pages report in NIPS format): November 05.
• Should be accompanied with the written codes and readme files.
• Source files (word or tex) should also be submitted
• Final presentation
APl 744 CSCCM@IITD 6
Books

✓ Bishop, C.M. Pattern recognition and Machine learning, Springer, 2007.

✓ Murphy, K.P. Machine learning: A Probabilistic Perspective, MIT press,

2012.

✓ C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning,

MIT Press, 2006 (a free ebook is also available from the Gaussian Processes web
site)

Additional references may be provided (with html link) may be provided in

each lecture

APl 744 CSCCM@IITD 7

Topics to Cover
✓ Introduction to Statistical Computing and Probability and Statistics
✓ Sum and Product Rules, Conditional Probability, Independence, PDF and CDF,
Bernoulli, Categorical and Multinomial Distributions, Poisson, Student’s T,
Laplace, Gamma, Beta and Pareto distribution.
✓ Generative Models; Bayesian concept learning, Likelihood, Prior, Posterior,
Posterior predictive distribution, Plug-in Approximation
✓ Bayesian Model Selection (continued) and Prior Models, Hierarchical Bayes,
Empirical Bayes
✓ Bayesian linear regression
✓ Introduction to Monte Carlo Methods, Sampling from Discrete and Continuum
Distributions, Reverse Sampling, Transformation Methods, Composition Methods,
Accept-Reject Methods, Stratified/Systematic Sampling
✓ Importance sampling, Gibbs sampling, MCMC, Metropolis Hasting algorithm
✓ Sequential importance sampling, Sequential Monte Carlo
✓ Latent variable model, probabilistic PCA, Expectation maximization
✓ Gaussian process and variational inference
✓ Some advanced topics in probabilistic ML: Bayesian neural network, GAN, VAE,
Flow-based model, Diffusion model

APl 744 CSCCM@IITD 8

Announcement

✓ Ungraded Quiz on 27th July 2023.

✓ Syllabus: Univariate and multivariate statistics and probability
✓ Purpose: To check your fundamentals on probability and statistics
✓ Outcome:
• If performance is okay – then the course will proceed as mentioned previously
• If performance is medium – I will upload additional lecture videos on the
probability and statistics part
• If performance is bad – I will take additional classes on weekends (or convert
a few practical classes into lectures).

So, if you don’t want to attend extra classes, prepare for the exam. Syllabus – second
chapter of Bishop.

APl 744 CSCCM@IITD 9

Machine learning
• We are in the era of big data (size of the web, YouTube, etc.) [link].

• How to utilize the data?

• Detect pattern in data.
• Predict future data
• Decision making under uncertainty
• Analysis and design (with and without uncertainty)

• Machine learning is the answer to all these questions.

• Machine learning algorithms can either be frequentist and probabilistic in nature.

• In this course, we will be focusing on probabilistic machine learning.

• The probabilistic approach to machine learning is closely related to the field of

computational statistics.

• Rajaraman , A. and J. Ullman (2010 ). Mining of massive datasets.

• Bekkerman , R., M. Bilenko , and J . Langford (Eds.) (2011). Scaling Up Machine Learning . Cambridge online presentation

APl 744 CSCCM@IITD 10

Supervised learning

• Machine learning is divided into two types . In the supervised learning approach ,
the goal is to learn a mapping from inputs 𝒙 to output 𝑦, given a labelled set of
data, 𝒟 = 𝒙𝑖 , 𝑦𝑖 , 𝑖 = 1, … , 𝑁.

• 𝒟 is called the training set and 𝑁 is the number of training samples.

• 𝒙𝑖 is a 𝐷-dimensional vector of inputs (e.g., features, attributes or covariates).

• 𝑦𝑖 is the output and, in principle, can be anything.

• If 𝑦𝑖 is categorical (discrete values) (e.g., cat vs elephant), then the problem is a

classification problem.

• If 𝑦𝑖 is real-valued (continuous values) (e.g., price of a commodity), then the

problem is a regression problem.

APl 744 CSCCM@IITD 18

Unsupervised learning

• In the unsupervised learning approach , we have input data, 𝒟 = 𝒙𝑖 , 𝑖 = 1, … , 𝑁,

and the objective is to find pattern in the data (knowledge discovery).

• We work with un-labelled data; in other words, we are not told what kind of pattern
to look for.

• This is a more realistic scenario (from AI-point of view) and is more challenging.

APl 744 CSCCM@IITD 19

Reinforcement learning

• There is a third type of machine learning, known as reinforcement learning.

• This is also very important from AI point-of-view.

• This is how to act based on occasional reward or punishment.

• When it comes to application of reinforcement learning in mechanics, this has been

less explored.

Kaelbling , L., M. Littman, and A . Moore (1996). Reinforcement learning : A survey . J. of AI Research 4 , 237
285.
Sutton , R. and A. Barto (1998). Reinforcment Learning : An Introduction . MIT Press
Russell , S. and P. Norvig (1995). Artificial Intelligence : A Modern Approach . Englewood Cliffs, NJ: Prentice
Hall.
Szepesvari , C. (2010). Algorithms for Reinforcement Learning . Morgan Claypool.
Wiering , M. and M. van Otterlo (Eds .) (2012). Reinforcement learning: State of the art .

APl 744 CSCCM@IITD 20

Supervised learning - classification
• We learn a mapping from inputs 𝒙 to output 𝑦 where
𝑦𝑖 ∈ 1,2, … , 𝐶
• If 𝐶 = 2, it is a binary classification.
• If 𝐶 ≥ 3, it is a multi-class classification.
• If class labels are not mutually exclusive (tall and strong), we call it multi-label
classification.
• The objective is in generalization; i.e., given a new 𝒙∗ , find 𝑦 ∗ .
• Left : T raining examples of colored shapes, along with 3unlabeled test cases.
• Right: : Training data as an 𝑁 × 𝐷 design matrix. 𝑖 −th row represent the feature
vector 𝑥𝑖 . The last column is the label 𝑦𝑖

APl 744 CSCCM@IITD 21

Probabilistic predictions
• In our classification example, we work with posterior probabilities 𝑝 𝑦 ∗ 𝒙∗ , 𝒟

• For example, in binary classification, we compute the followings:

• 𝑝 𝑦 ∗ = 0|𝒙∗ , 𝒟
• 𝑝 𝑦 ∗ = 1|𝒙∗ , 𝒟

• Given as probabilistic output, we can always compute our “best guess” as

𝑦ො = argmax 𝑝 𝑦 ∗ = 𝑐|𝒙∗ , 𝒟
𝑐= 1,2,…,𝐶

• This corresponds to the most probable class label and is the mode of the
distribution 𝑝 𝑦 ∗ 𝒙∗ , 𝒟 . This is known as maximum a posteriori estimate (MAP
estimate).

• Point estimates are often not the best solution – what if 𝑝 𝑦 ∗ = 1 𝒙∗ , 𝒟 is far away
from 1?

• Therefore, it is extremely important to work with 𝑝 𝑦 ∗ = 1 𝒙∗ , 𝒟 and not with

point estimates.

APl 744 CSCCM@IITD 22

Point estimations
• Point estimates can be misleading and need to be avoided.

• Specifically, in domains such as medicine, finance, design and analysis where

failure can have significant consequences.

• IBM Watson beat the top human Jeopardy champion by containing a module that
estimates how confident it is of its answer.

• Google’s SmartASS (ad selection system) predicts the probability (click through
rate, CTR) you will click on an ad based on your search history and other user and
ad specific features. CTR can be used to maximize expected profit

• Ferrucci, D., E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg,J. Prager, N.
Schlaefter, and C. Welty (2010). Building Watson: An Overview of the DeepQAProject.AI Magazine, 59–79.
• Metz, C. (2010). Google behavioral ad targeter is a Smart Ass. The Register.

APl 744 CSCCM@IITD 23

Supervised Learning: Regression
• Consider a real-valued input 𝑥𝑖 ∈ ℝ, and a single real-valued response 𝑦𝑖 ∈ ℝ.

• We fit a polynomial of order 1, 2 and 20.

• Many applications with high dimensional

input data.
• Issues of model selection are essential
(overfitting, etc.)

linRegPolyVsDegree from PMTK

APl 744 CSCCM@IITD 24

Unsupervised Vs. Supervised Learning

• There are two differences from the supervised case.

• Supervised learning is conditional density estimation, 𝑝 𝑦𝑖 𝒙; 𝜽
• 𝑦𝑖 is usually a single variable (class label) we are trying to predict.
Thus, for most supervised learning problems, we can use univariate
probability models.

• Unsupervised learning is unconditional density estimation, 𝑝 𝒙𝑖 𝜽

• 𝒙𝑖 are vectors of features and hence we need to create multivariate
probability distribution.

• Cheeseman, P., J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman (1988). Autoclass: A Bayesian classification system.
In Proc. of the Fifth Intl. Workshop on Machine Learning.
• Lo, C. H. (2009). Statistical methods for high throughput genomics. Ph.D. thesis, UBC.
• Berkhin, P. (2006). A survey of clustering datamining techniques. In J. Kogan, C. Nicholas, and M. Teboulle(Eds.),
Grouping Multidimensional Data: Recent Advances in Clustering, pp. 25–71. Springer.

APl 744 CSCCM@IITD 25

Unsupervised Learning: Hidden Variables
• Consider clustering data into groups -- height and weight of a group of 210
people. It is not clear how many clusters we have

• Our first goal is to estimate the distribution over the number of clusters, 𝑝 𝐾 𝒟 ;
this tells us if there are subpopulations within the data.

• For simplicity, we often approximate the distribution, 𝑝 𝐾 𝒟 by its mode,

𝐾 ∗ = argmax 𝑝 𝐾 𝒟
𝐾 kmeansHeightWeight from PMTK

• The second objective is to assign each data point to the corresponding cluster
(hidden or latent variables).
𝑧𝑖∗ = argmax 𝑝 𝑧𝑖 = 𝑘|𝒙𝑖 , 𝒟
𝑘
• Picking a model of the right complexity (here the number of clusters) is called
model selection.
APl 744 CSCCM@IITD 26
Dimensionality Reduction
• Reduce the dimensionality by projecting the data to a lower-dimensional
subspace which captures the essence of the data.
• Latent factors: although the data may appear high-dimensional, there may only
be a small number of degrees of variability.
• Principal Components Analysis (PCA): common approach to dimensionality
reduction. Useful for visualization, nearest neighbor searchers, etc.

pcaDemo3d from PMTK

APl 744 CSCCM@IITD 27

Discovering Graph Structure
• We measure a set of variables, and we like to discover which ones are most
correlated with which others. This is represented by a graph, 𝒢, in which nodes
represent variables, and edges represent dependence between variables. We look
to compute
𝒢መ = argmax 𝑝 𝒢|𝒟

• A sparse undirected Gaussian graphical model is

shown learned using graphical lasso applied to
some flow cytometry data which measures the
phosphorylation status of 11 proteins.

• Sachs, K., O. Perez, D. Pe’er, D. Lauffenburger, and G. Nolan (2005). Causal

protein-signaling networks derived from multiparameter single-cell data. Science
308.
• Smith, V., J. Yu, T. Smulders, A. Hartemink, and E. Jarvis (2006). Computational
Inference of Neural Information Flow Networks. PLOS Computational Biology 2,
1436–1439 ggmLassoDemo PMTK
• Horvitz, E., J. Apacible, R. Sarin, and L. Liao (2005). Prediction, Expectation,
and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting
Service. In UAI.
• Carvalho, C. M. and M. West (2007). Bayesian Analysis 2(1), 69–98.

APl 744 CSCCM@IITD 28

Bayesian statistics
• Frequentist approach: Long run frequencies of ‘events’.

• Bayesian approach: Quantifying our uncertainty about something.

• The rules of probability are same for both.

• Sample space: Set of all possible outcomes of an experiment.

• Event: A subset of sample space.

For more details: S. Ross, Introduction to Probability Models

Bayesian statistics: click here
Frequentist statistics: click here

APl 744 CSCCM@IITD 29

Inference
• An inference problem requires statement about an unobserved (latent) variable 𝑥
based on observations 𝑦 which are related to 𝑥 but may not be sufficient to fully
determine 𝑥.

• Naturally this requires a notion of uncertainty.

• In real-life, most problems are of this nature.

• However, we tend to simplify it by ignoring the uncertain part.

• Example, predicting weather.

APl 744 CSCCM@IITD 30

Example – the roulette wheel

APl 744 CSCCM@IITD 31

Definition

• Let 𝐸 be a space of elementary events. Consider the power subset 2𝐸 , and let
ℱ ⊂ 2𝐸 be a subset of Ω. Elements of ℱ are called random events. If ℱ satisfies
the following properties, it is called 𝜎 − algebra.
• 𝐸∈ℱ
• 𝐴, 𝐵 ∈ ℱ ⇒ 𝐴 − 𝐵 ∈ ℱ
• 𝐴1 , … , 𝐴𝑛 ∈ ℱ ⇒ ‫∞ڂ‬ ∞
𝑖=1 𝐴𝑖 ∈ ℱ ∧ ‫=𝑖ځ‬1 𝐴𝑖 ∈ ℱ

• If ℱ is 𝜎 −algebra, then its elements are called measurable sets and (𝐸, ℱ) is
called a measurable space or Borel space.

APl 744 CSCCM@IITD 32

Probability axiom
• Probability space: We define ΩE , ℱ, 𝒫 to be the probability space such that ΩE
is the sample space, ℱ is the event space and 𝒫 is the probability measure such
that 𝒫 𝐸 is the probability of an event 𝐸 and 𝒫 ΩE = 1.

• First axiom: The probability of an event is a non-negative real number:

𝒫 𝐸 ∈ ℝ, 𝒫 𝐸 ≥ 0 ∀𝐸 ∈ ℱ
It follows that 𝒫 𝐸 is always finite

• Second axiom: This is the assumption of unit measure: that the probability that at
least one of the elementary events in the entire sample space will occur is 1,
𝒫 ΩE = 1

• Third axiom: This is the assumption of 𝜎-additivity: Any countable sequence of

disjoint sets (synonymous with mutually exclusive events)
∞ ∞

𝒫 ራ 𝐸𝑖 = ෍ 𝒫 𝐸𝑖
𝑖=1 𝑖=1

APl 744 CSCCM@IITD 33

Laws of probability

• ℙ 𝐸 : The probability of event 𝐸. It is a number satisfying the following two

conditions:
• 0 ≤ ℙ 𝐸 ≤ 1.
• ℙ Ω = 1, ℙ ∅ = 0.

• For any sequence of events 𝐸1 , 𝐸2 , … that are mutually exclusive,

∞ ∞

ℙ ራ 𝐸𝑖 = ෍ ℙ 𝐸𝑖
𝑖=1 𝑖=1

• 𝐸 𝑐 : This indicates complement of 𝐸, i.e., all events that are not in 𝐸.

• 𝐸‫ = 𝑐 𝐸ڂ‬Ω

• ℙ 𝐸‫ = 𝐹ڂ‬ℙ 𝐸 + ℙ 𝐹 − ℙ 𝐸‫𝐹ځ‬

APl 744 CSCCM@IITD 34

Union and Intersection
• ‫ ڂ‬operator: Consider two events 𝐸 and 𝐹 of sample-space Ω. Then an event
𝐸‫ 𝐹ڂ‬indicates all outcomes that are either in 𝐸 or in 𝐹 or in both.

• ‫ ځ‬operator: Consider two events 𝐸 and 𝐹 of sample-space Ω. Then an event

𝐸‫ 𝐹ځ‬indicates all outcomes that are in both 𝐸 and 𝐹.

• Consider 𝐸 = 1,2,5 and 𝐹 = 2,4 , then

• 𝐸‫ = 𝐹ڂ‬1,2,4,5
• 𝐸‫ = 𝐹ځ‬2

• If 𝐸‫∅ = 𝐹ځ‬, then we call 𝐸 and 𝐹 are mutually exclusive.

• ∅ is called empty-set.

• Generalizing, ‫∞ڂ‬ 𝑖=1 𝐸𝑖 indicates union of 𝐸1 , 𝐸2 , … , and indicates outcomes that

are in at least one 𝐸𝑖 , ∀𝑖.

• Generalizing, ‫∞ځ‬ 𝑖=1 𝐸𝑖 indicates intersection of 𝐸1 , 𝐸2 , … , and indicates outcomes

that are in all events 𝐸𝑖 , ∀𝑖.

APl 744 CSCCM@IITD 35

Discrete random variables

• Discrete random variable 𝑋 can take any value from a finite or countably infinite
set 𝒳.

• A discrete random variable is defined by probability mass function (PMF), 𝑓 𝑥 ,

where,
0 ≤ 𝑓 𝑥 ≤ 1; ෍ 𝑓 𝑥 = 1
𝑥∈𝒳

• Consider 𝒳 = 1,2,3,4 . Also consider two PMFs as follows:

1
(a) 𝑓 𝑥 = 𝑘 = 4 , 𝑘 = 1,2,3,4
1 if k = 1
(b) 𝑓 𝑥 = 𝑘 = 𝕀 𝑥 = 1 = ቊ
0 elsewhere

• The two PMFs are as follows:

APl 744 CSCCM@IITD 36

Discrete random variables

1
𝑓 𝑥=𝑘 = 𝑓 𝑥=𝑘 =𝕀 𝑥=1
4

Run Matlab function discreteProbDistFig from Kevin Murphys’ PMTK

APl 744 CSCCM@IITD 37

Joint Probability

• Probability theory provides a consistent framework for the quantification and

manipulation of uncertainty

• This is a central foundation of pattern recognition and machine learning.

• The probability that 𝑋 takes the value 𝑥𝑖 and 𝑌 takes the value 𝑦𝑗 is represented as
ℙ 𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 . This is the joint probability of 𝑋 = 𝑥𝑖 and 𝑌 = 𝑦𝑗 .
𝑐𝑖

𝑛𝑖𝑗 𝑟𝑗
ℙ 𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 = 𝑦𝑗 𝑛𝑖𝑗
𝑁

• 𝑛𝑖𝑗 denotes the number of times 𝑋 =

𝑥𝑖 and 𝑌 = 𝑦𝑗 is observed. 𝑐𝑖 denotes the
number of times 𝑋 = 𝑥𝑖 is observed. 𝑟𝑗 𝑥𝑖
denotes the number of times 𝑌 = 𝑦𝑗 is
observed

APl 744 CSCCM@IITD 38

Sum and product rule
• The most important rule in Bayesian statistics is the sum and product rule.

• Sum rule:
σ𝑗 𝑛𝑖𝑗
ℙ 𝑋 = 𝑥𝑖 = ෍ ℙ 𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 =
𝑁
𝑗

𝑐𝑖
ℙ 𝑥 = නℙ 𝑥, 𝑦 𝑑𝑦

• Product rule:
𝑛𝑖𝑗 𝑛𝑖𝑗 𝑐𝑖 𝑦𝑗 𝑛𝑖𝑗 𝑟𝑗
ℙ 𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 = =
𝑁 𝑐𝑖 𝑁
= ℙ 𝑌 = 𝑦𝑗 |𝑋 = 𝑥𝑖 ℙ 𝑋 = 𝑥𝑖
𝑥𝑖
• Chain rule: The product rule leads to the chain rule

ℙ 𝑋1:𝐷 = ℙ 𝑋1 ℙ 𝑋2 𝑋1 ℙ 𝑋3 |𝑋1 , 𝑋2 ⋯ ℙ 𝑋𝐷 𝑋1 , 𝑋2 , … , 𝑋𝐷−1

• The most complex calculations in probability are nothing but simple applications
of sum and product rules
APl 744 CSCCM@IITD 39
Conditional probability and Bayes’ rule
ℙ 𝑋 = 𝑥, 𝑌 = 𝑦 ℙ 𝑋 = 𝑥 ℙ 𝑌 = 𝑦|𝑋 = 𝑥
ℙ 𝑋 = 𝑥|𝑌 = 𝑦 = =
ℙ 𝑌=𝑦 σ𝑥′ ℙ 𝑋 = 𝑥 ′ , 𝑌 = 𝑦

ℙ 𝑋 = 𝑥 ℙ 𝑌 = 𝑦|𝑋 = 𝑥
ℙ 𝑋 = 𝑥|𝑌 = 𝑦 =
σ𝑥′ ℙ 𝑋 = 𝑥 ′ ℙ 𝑌 = 𝑦|𝑋 = 𝑥 ′

• Bayes’ theorem plays an important role in machine learning and pattern

recognition.

• An example:
ℙ 𝐵 = 𝑟 = 0.4, ℙ 𝐵 = 𝑏 = 0.6

We select an orange. What is the probability

that the orange is from red box?
ℙ 𝐵 = 𝑟 ℙ 𝐹 = 𝑜|𝐵 = 𝑟
ℙ 𝐵 = 𝑟|𝐹 = 𝑜 =
σ𝑥′ ℙ 𝐵 = 𝑥 ′ , 𝐹 = 𝑜

APl 744 CSCCM@IITD 40

Conditional probability and Bayes’ rule

ℙ 𝐵 = 𝑟 ℙ 𝐹 = 𝑜|𝐵 = 𝑟
ℙ 𝐵 = 𝑟|𝐹 = 𝑜 =
ℙ 𝐹=𝑜

ℙ 𝐹=𝑜
= ℙ 𝐵 = 𝑟 ℙ 𝐹 = 𝑜|𝐵 = 𝑟
+ ℙ 𝐵 = 𝑏 ℙ 𝐹 = 𝑜|𝐵 = 𝑏

6 1
ℙ 𝐹 = 𝑜 = 0.4 × + 0.6 × = 0.45
8 4

0.4 × 6/8
ℙ 𝐵 = 𝑟|𝐹 = 𝑜 = = 2/3
0.45

• Once it is observed that the fruit selected is orange, the chance of selecting red
box increases from 0.4 to 0.67.

APl 744 CSCCM@IITD 41

Medical diagnosis – base rate fallacy
• Suppose you are sick and your doctor thinks that you have Tuberculosis (TB).

• It is known that 0.4% of the population has tuberculosis, ℙ TB = 0.004.

• A test is available but not perfect; if a tested patient has disease, 80% of the time
the test will be positive, ℙ Positive|TB = 0.80. On the contrary, if a tested
patient does not have the disease, 90% of the time the result is negative,
ℙ Negative|TB c = 0.9, ℙ Positive|TB c = 0.1.

• Your test is positive, should you be worried?

• Base rate fallacy: People will assume that they have 80% chance to have the
disease; but they ignore the PRIOR knowledge.

ℙ TB ℙ Positive|TB ℙ Positive
ℙ TB|Positive = = ℙ TB ℙ Positive|TB
ℙ Positive
0.8 + ℙ TB 𝑐 ℙ Positive|TB c
= 0.004 × ≈ 0.031 = 3.1% = 0.004 × 0.8 + 0.996 × 0.1
0.1028 = 0.1028.

APl 744 CSCCM@IITD 42

Independence and conditional probabilities

• Two events 𝐴 and 𝐵 are independent 𝐴 ⊥ 𝐵 if

ℙ 𝐴‫ = 𝐵ځ‬ℙ 𝐴 ℙ 𝐵

• Conditional probability: Probability that 𝐴 happens given 𝐵 has already

happened,
ℙ 𝐴‫𝐵ځ‬
ℙ 𝐴|𝐵 =
ℙ 𝐵

• It is trivial to prove that,

ℙ 𝐴‫ ≤ 𝐵ځ‬ℙ 𝐴|𝐵

• For independent events,

ℙ 𝐴|𝐵 = ℙ 𝐴

APl 744 CSCCM@IITD 43

Conditional independence

• Two events 𝐴 and 𝐵 are conditionally independent given 𝑍 if

ℙ 𝐴, 𝐵|𝑍 = ℙ 𝐴|𝑍 ℙ 𝐵|𝑍

• Define the following events:

• 𝑎 → it will rain tomorrow.
• 𝑏 → the ground is wet today.
• 𝑐 → it is raining today.

• 𝑐 causes both and hence, given 𝑐, 𝑎 and 𝑏 are

independent,
ℙ 𝑎, 𝑏 𝑐 = ℙ 𝑎 𝑐 ℙ 𝑏 𝑐
ℙ 𝑎 𝑏, 𝑐 = ℙ 𝑎 𝑐

• Observing a root node separates the children.

APl 744 CSCCM@IITD 44

Pairwise vs. Mutual independence

• Consider four balls, 1,2,3,4 are present in a box. Now consider the following
events:
• Event 1: ball 1 or 2 is drawn
• Event 2: ball 2 or 3 is drawn
• Event 3: ball 1 or 3 is drawn.

• Note that,
1 1
ℙ 𝐸1 , 𝐸2 = = ℙ 𝐸1 ℙ 𝐸2 , ℙ 𝐸2 , 𝐸3 = = ℙ 𝐸2 ℙ 𝐸3
4 1 1
4 1 1
2 2 2 2
1
ℙ 𝐸1 , 𝐸3 = = ℙ 𝐸1 ℙ 𝐸3
4 1 1
2 2
• However,
1
ℙ 𝐸1 , 𝐸2 , 𝐸3 = 0 ≠ ℙ 𝐸1 ℙ 𝐸2 ℙ 𝐸3 =
1 1 1
8
2 2 2
• Pairwise independence doesn’t ensure mutual independence

APl 744 CSCCM@IITD 45

Importance of independence

• Consider two events 𝑋 and 𝑌.

• 𝑋 takes 6 values and 𝑌 takes 5 values

ℙ 𝑥, 𝑦 ℙ 𝑦

No. of parameters = 9

No. of parameters = 29 30 − 1 ℙ 𝑥
• Independence is key to EFFICIENT probabilistic modelling (Naïve Bayes’,
Markov model, probabilistic graphical model, etc).

APl 744 CSCCM@IITD 46

Random variables

• Define Ω to be a probability space equipped with a probability measure P that

measures the probability of events . Ω contains all possible events in the form of
its own subsets.

• A real valued random variable 𝑋 is a mapping, 𝑋: Ω → ℝ.

• We call 𝑥 = 𝑋 𝜔 , 𝜔 ∈ Ω, a realization of 𝑋.

• Probability distribution of 𝑋: For 𝐵 ∈ ℝ,

𝜇𝑋 𝐵 = ℙ 𝑋 𝜔 ∈ 𝐵

• Probability density:
ℙ𝑥 𝐵 = න 𝑝𝑋 𝑥 𝑑𝑥
𝐵

• Often we write 𝑝𝑋 𝑥 = 𝑝 𝑥

APl 744 CSCCM@IITD 47

Cumulative density function
• A continuous random variable is defined by probability density function (PDF),
𝑝𝑋 𝑥 , where,
∞
𝑝𝑋 𝑥 ≥ 0; න 𝑝𝑋 𝑥 𝑑𝑥 = 1
−∞

• The CDF for a random variable 𝑋 is the function 𝐹(𝑧) that returns the probability
that 𝑋 is less than 𝑧

• Cumulative density function,

𝑧
𝐹 𝑧 = න 𝑝𝑋 𝑥 𝑑𝑥
−∞

𝑏
• ℙ 𝑥 ∈ 𝑎, 𝑏 = ‫ 𝑏 𝐹 = 𝑥𝑑 𝑥 𝑋𝑝 𝑎׬‬− 𝐹 𝑎

APl 744 CSCCM@IITD 48

Mean and variance
• The expected (mean) value of 𝑋 is
∞
𝐸 𝑋 = න 𝑥𝑝𝑋 𝑥 𝑑𝑥
−∞
• The variance of 𝑋 is
2
𝑣𝑎𝑟 𝑋 = 𝐸 𝑋−𝐸 𝑋 = 𝐸 𝑋 2 − 𝐸 2 (𝑋)

• The standard deviation is

𝑠𝑡𝑑 𝑋 = 𝑣𝑎𝑟 𝑋

• The 𝑘 −th moment is defined as

∞
𝑘 𝑘
𝐸 𝑋−𝐸 𝑋 =න 𝑥−𝐸 𝑋 𝑝𝑋 𝑥 𝑑𝑥
−∞
• If 𝑇 𝑋 is a function of 𝑋, then
∞
𝐸 𝑇 𝑋 = න 𝑇 𝑥 𝑝𝑋 𝑥 𝑑𝑥
−∞
∞ 2
𝑣𝑎𝑟 𝑇 𝑋 =න 𝑇 𝑥 −𝐸 𝑇 𝑋 𝑝𝑋 𝑥 𝑑𝑥
−∞

APl 744 CSCCM@IITD 49

Expectation

• Discrete:
𝐸 𝑓 𝑋 = ෍ 𝑓 𝑥 𝑝𝑋 𝑥
𝑥

• Continuous:
∞
𝐸 𝑓 𝑋 = න 𝑓 𝑥 𝑝𝑋 𝑥 𝑑𝑥
−∞

• Conditional expectation:
𝐸 𝑓 𝑋 |𝑌 = ෍ 𝑓 𝑥 𝑝 𝑥|𝑦
𝑥

• The expectation of a random variable is not necessarily the value that we should
expect a realization to have.

APl 744 CSCCM@IITD 50

Expectation

• Consider the example of throwing a dice, Ω = 1,2,3,4,5,6 .

• Define a random variable 𝑋 that indicates the outcome of a dice throw.

• We know that 𝑝 𝑥 = 1/6.

• The expectation of 𝑋 is
1 21
𝐸 𝑋 = ෍ 𝑥𝑝 𝑥 = 1 + 2 + 3 + 4 + 5 + 6 = = 3.5
6 6
𝑥

• Clearly, 3.5 is not an outcome (realization) of a dice throw.

APl 744 CSCCM@IITD 51

Expectation

• Let, Ω = −1,1 with probability density function being uniform

𝑝𝑋 𝑥 = 1/2

• Consider the two random variables,

𝑋1 : −1,1 → ℝ, 𝑋1 𝜔 = 1, ∀𝜔
2, 𝜔≥0
𝑋2 : −1,1 → ℝ, 𝑋2 𝜔 = ቊ
0, 𝜔<0

• Expectation of the two variables are given as

1 1
1 1
𝐸 𝑋1 = න 𝑋1 𝜔 𝑑𝑥 = න 1 𝑑𝑥 = 1
−1 2 −1 2
1 0 1
1 1 1
𝐸 𝑋2 = න 𝑋2 𝜔 𝑑𝑥 = න 0 𝑑𝑥 + න 2 𝑑𝑥 = 1
−1 2 −1 2 0 2

• Clearly, 𝑋2 can never take the value 1.

APl 744 CSCCM@IITD 52

Uniform random variable

• Probability density function

1
𝒰 𝑥|𝑎, 𝑏 = 𝕀 𝑎≤𝑥≤𝑏
𝑏−𝑎

• What is CDF of uniform random variable 𝒰 0,1 ?

𝑥, 0≤𝑥≤1
ℙ𝑈 𝑥 = ቐ 1, 𝑥>1
0, 𝑥<1

• Mean of 𝒰 𝑥|𝑎, 𝑏
𝑎+𝑏
𝐸 𝑥 =
2

• Note that it is possible for 𝑝 𝑥 > 1, although the density must integrate to 1. For
e.g.,
1
𝒰 𝑥|0,1/2 = 2, ∀𝑥 ∈ 0,
2

APl 744 CSCCM@IITD 53

Gaussian random variable

• A random variable 𝑋 ∈ ℝ is Gaussian or normally distributed, 𝑋 ∼ 𝒩 𝜇, 𝜎 2 if

𝑡
1 𝑥−𝜇 2
ℙ 𝑋≤𝑡 = න exp − 2
𝑑𝑥
𝜎 2𝜋 −∞ 2𝜎

• The PDF of Gaussian distribution is

2
1 𝑥−𝜇
𝒩 𝑥|𝜇, 𝜎 2 = exp −
𝜎 2𝜋 2𝜎 2

• We often work with the precision of a Gaussian, 𝜆 = 1/𝜎 2 . The higher the 𝜆, the
narrower the distribution is.

• 𝜇 and 𝜎 are the mean and standard deviation of Gaussian distribution.

APl 744 CSCCM@IITD 54

Gaussian random variable

Proof: The PDF of Gaussian distribution is normalized

∞ 1 2
• Let 𝐼 ≡ ‫׬‬−∞ exp − 2𝜎2 𝑥 − 𝜇 𝑑𝑥.

∞ ∞ 1 1
• Then, 𝐼 2 = ‫׬‬−∞ ‫׬‬−∞ exp − 2𝜎2 𝑥 − 𝜇 2
exp − 2𝜎2 𝑦 − 𝜇 2
𝑑𝑥𝑑𝑦

• Set 𝑟 2 = 𝑥 − 𝜇 2
+ 𝑦 − 𝜇 2 and perform variable transformation,
∞ 2𝜋 ∞
1 1 2
𝐼 = න න exp − 2 𝑟 𝑟𝑑𝑟𝑑𝜃 = 2𝜋 න exp − 2 𝑟 𝑟𝑑𝑟 = 2𝜋𝜎 2
2 2
0 0 2𝜎 0 2𝜎

Some useful relations integrals for derivations in Gaussian distribution

∞
න exp −𝑢2 𝑑𝑢 = 𝜋
−∞ Used in
∞
න 𝑢 exp −𝑢2 𝑑𝑢 = 0 derivation of
−∞ mean, SD etc
∞
𝜋
න 𝑢2 exp −𝑢2 𝑑𝑢 =
−∞ 2
APl 744 CSCCM@IITD 55
Gaussian random variable
Plot of the Standard Normal 𝒩 0,1 and its CDF

Run MatLab function gaussPlotDemo

from Kevin Murphys’ PMTK

APl 744 CSCCM@IITD 56

Gaussian random variable
• The Gaussian distribution is one of the most studied and most used distribution.

• Sample 𝑋1 , … , 𝑋𝑁 from 𝒩 𝜇, 𝜎 2 .

• In a typical inference problem, we are interested in

• Estimation of 𝜇 and 𝜎
• Confidence intervals of 𝜇 and 𝜎.

• Normaldata : Relative changes in reported larcenies between 1991 and 1995 and
1995 (relative to 1991) for the 90 most populous US counties. FBI data

MatLab implementation

From Bayesian Core , J.M. Marin and C.P. Roberts, Chapter 2 (available on line)

APl 744 CSCCM@IITD 57

Datasets: CMBData

CMBdata: Spectral representation of the cosmological microwave background

(CMB), i.e., electromagnetic radiation from photons back to 300,000 years after the
Big Bang, expressed as difference in apparent temperature from the mean
temperature.

CMBdata Normal estimation

MatLab implementation

From Bayesian Core, J.M. Marin and C.P. Roberts, Chapter 2

APl 744 CSCCM@IITD 58

Univariate Gaussian
• Representation of symmetric phenomena without
long tails

• Inappropriate for skewness, fat tails, multi-modality

etc.

• However, Gaussian distribution is the most popular

distribution because of the following reasons:

• Completely defined in terms of mean and standard deviation.

• The central limit-theorem shows that sum of i.i.d random variables has
approximately a Gaussian distribution making it appropriate choice for modelling
noise (limit of additive small effects).

• Gaussian distribution makes the least assumption (maximum entropy) from all
possible distributions with given mean and variance.

• Closed for solutions and interesting properties that we will encounter later.

APl 744 CSCCM@IITD 59

Binary variable

• Consider coin flipping experiment with heads=1 and tails =0, with 𝜇 ∈ 0,1 .
ℙ 𝑥=1𝜇 =𝜇
ℙ 𝑥 =0 𝜇 =1−𝜇

• This defines a Bernoulli distribution as

ℬℯ𝓇𝓃 𝑥|𝜇 = 𝜇 𝑥 1 − 𝜇 1−𝑥
= 𝜇𝕀 𝑥=1
1−𝜇 𝕀 𝑥=0

• For Bernoulli distribution,

𝐸 𝑥 = 𝜇, 𝑣𝑎𝑟 𝑥 = 𝜇 1 − 𝜇

• Likelihood: Consider 𝒟 = 𝑥1 , … , 𝑥𝑁 in which we have 𝑚 heads 𝑥 = 1 and 𝑁 −

𝑚 tails 𝑥 = 0 . Compute the parameter 𝜇.

• To solve such a problem, we first need to define data-likelihood, ℙ 𝒟|𝜇 .

APl 744 CSCCM@IITD 60

Likelihood function
• The likelihood function (often simply called the likelihood) measures the goodness
of fit of a statistical model to a sample of data for given values of the
unknown parameters.

• It is formed from the joint probability distribution of the sample, but viewed and
used as a function of the parameters only, thus treating the random variables as fixed
at the observed values

• For the coin tossing example,

𝑁 𝑁

ℙ 𝒟|𝜇 = ෑ ℙ 𝑥𝑖 |𝜇 = ෑ 𝜇 𝑥𝑖 1 − 𝜇 1−𝑥𝑖

𝑖=1 𝑖=1
= 𝜇𝑚 1− 𝜇 𝑁−𝑚

• Once the likelihood is formed, there are three ways to compute 𝜇.

• Maximize ℙ 𝒟|𝜇 (Maximum likelihood estimation (MLE)).
ℙ 𝜇 ℙ 𝒟|𝜇
• Compute the posterior, ℙ 𝜇 𝒟 = ℙ 𝒟
∝ ℙ 𝜇 ℙ 𝒟|𝜇 (Bayesian way)
• 𝜇∗ = argmax𝜇 ℙ 𝜇 𝒟 (MAP estimate).

APl 744 CSCCM@IITD 61

• It is formed from the joint probability distribution of the sample, but viewed and
used as a function of the parameters only, thus treating the random variables as fixed
at the observed values

• For the coin tossing example,

𝑁 𝑁

ℙ 𝒟|𝜇 = ෑ ℙ 𝑥𝑖 |𝜇 = ෑ 𝜇 𝑥𝑖 1 − 𝜇 1−𝑥𝑖

𝑖=1 𝑖=1
= 𝜇𝑚 1− 𝜇 𝑁−𝑚

• Once the likelihood is formed, there are three ways to compute 𝜇.

APl 744 CSCCM@IITD 62

Binomial distribution
• Consider a discrete random variable 𝑋 ∈
0,1, … , 𝑁 . Matlab code
ℬ𝒾𝓃 𝑁 = 10, 𝜇 = 0.25
• Binomial distribution is given as
𝑁 𝑥 𝑁−𝑥
ℬ𝒾𝓃 𝑥|𝑁, 𝜇 = 𝜇 1−𝜇
𝑥

• In the coin-flipping experiment, it gives the

probability that in 𝑁 flips we get 𝑥 heads with
probability of getting head to be 𝜇.

• For this distribution,

𝐸 𝑋 = 𝑁𝜇, 𝑣𝑎𝑟 𝑋 = 𝑁𝜇 1 − 𝜇

• It can be shown that in the limiting condition, 𝑁 → ∞, 𝑁𝜇 → 𝜆, Binomial distribution

converges to the Poisson’s distribution.

For more details: S. Ross, Introduction to Probability Models

APl 744 CSCCM@IITD 63

Binomial distribution

• The Binomial distribution for N=10, and 𝜇 = 0.25,0.9 is shown below using
MatLab function binomDistPlot from Kevin Murphys’ PMTK.

APl 744 CSCCM@IITD 64

Generalization of Bernoulli’s distribution

• We are now looking at discrete variables that can take on one of K possible
mutually exclusive states.

• The variable is represented by a 𝐾-dimensional vector 𝑥 in which one of the

elements 𝑥𝑘 = 1 and all remaining elements are zero, 𝑥 = 0,0, … , 1,0, … , 0 .

• Let the probability of 𝑥𝑘 = 1 be denoted by 𝜇𝑘 , then

𝐾 𝐾 𝐾
𝑥 𝕀 𝑥𝑘 =1
𝕡 𝒙 𝝁 = ෑ 𝜇𝑘 𝑘 = ෑ 𝜇𝑘 , ෍ 𝜇𝑘 = 1, 𝜇𝑘 ≥ 0, ∀𝑘
𝑘=1 𝑘=1 𝑘=1

• The mean of the distribution is 𝐸 𝒙 𝝁 = 𝜇.

• This is known as the Multinoulli distribution or the categorical distribution

𝒞𝒶𝓉 𝒙 𝝁 = ℳ𝓊ℓ𝓉𝒾𝓃ℴ𝓊ℓℓ𝒾 𝒙 𝝁 = ℳ𝓊 𝒙|1, 𝝁

• ℳ𝓊 ⋅ indicates multinomial distribution. 1 indicates that a dice is rolled only

once.

APl 744 CSCCM@IITD 65

Likelihood: Multinoulli Distribution

• Let us consider a dataset 𝒟 = 𝒙1 , … , 𝒙𝑁 . The objective is to compute the

parameters, 𝝁
𝑁 𝑁 𝐾 𝐾 𝐾
𝑥 σ𝑁
𝑖=1 𝑥𝑖𝑘 𝑚
ℙ 𝒟|𝝁 = ෑ ℙ 𝑥𝑖 |𝜇 = ෑ ෑ 𝜇𝑘 𝑖𝑘 = ෑ 𝜇𝑘 = ෑ 𝜇𝑘 𝑘
𝑖=1 𝑖=1 𝑘=1 𝑘=1 𝑘=1

• 𝑚𝑘 = σ𝑁𝑖=1 𝑥𝑖𝑘 is known as the sufficient statistics of the distribution and is the
number of observation of 𝑥𝑘 = 1.

• MLE estimate of 𝝁:
𝜇 ∗ = argmax log ℙ 𝒟|𝝁
𝜇
Subjected to
𝐾

෍ 𝜇𝑘 = 1
𝑘=1

𝑚𝑘
• This yields 𝜇𝑘 = 𝑁

APl 744 CSCCM@IITD 66

Multinomial distribution

• Generalizing Multinoulli distribution for 𝑁 −trials, we have

𝑁! 𝑚 𝑚
ℙ 𝑚1 , … , 𝑚𝐾 ; 𝑁, 𝜇1 , … , 𝜇𝐾 = 𝜇1 1 ⋯ 𝜇𝐾 𝐾
𝑚1 ! ⋯ 𝑚𝐾 !
where
෍ 𝑚𝑖 = 𝑁
𝑖

• A summary of different distributions in provided below (taken from K. Murphy’s

book)

APl 744 CSCCM@IITD 67

Student’s t distribution
𝜈 1 1 𝜈 1
Γ + 𝜆 2 𝜆 𝑥−𝜇 2 −2−2
𝑝 𝑥|𝜇, 𝜆, 𝜈 = 2 𝜈 2 1+
Γ 2 𝜋𝜈 𝜈

• The parameter 𝜆 is known as the precision of the t-distribution, even though, it is not
in general equal to the inverse of the variance.

• The parameter 𝜈 is called the degrees of freedom.

• For the particular case of 𝜈 = 1, the t-distribution reduces to Cauchy’s distribution.

• In the limit 𝜈 → ∞, the t-distribution 𝒯 𝑥|𝜇, 𝜆, 𝜈 reduces to Normal distribution with

mean 𝜇 and precision 𝜆, 𝒩 𝑥 𝜇, 𝜆−1 .

• Proof:
𝜈 1
𝜆 𝑥−𝜇 2 −2−2 𝜈+1 𝜆 𝑥−𝜇 2
𝑝 𝑥|𝜇, 𝜆, 𝜈 ∝ 1 + = exp − log 1 +
𝜈 2 𝜈

APl 744 CSCCM@IITD 68

Student’s t distribution
𝜈 1
𝜆 𝑥−𝜇 2 −2−2 𝜈+1 𝜆 𝑥−𝜇 2
𝑝 𝑥|𝜇, 𝜆, 𝜈 ∝ 1 + = exp − log 1 +
𝜈 2 𝜈

𝑥2 𝑥3
• Using Taylor’s series, we know log 1 + 𝑥 = 𝑥 − + −⋯≈𝑥
2 3

• Substituting it into the PDF,

𝜈+1 𝜆 𝑥−𝜇 2 𝜆 𝑥−𝜇 2

𝑝 𝑥|𝜇, 𝜆, 𝜈 ∝ exp − +𝒪 𝜈 −2 = exp − + 𝒪 𝜈 −1
2 𝜈 2

• This proves that in the limiting condition 𝜈 → ∞, the 𝒯 𝑥|𝜇, 𝜆, 𝜈 reduces to

𝒩 𝑥|𝜇, 𝜆−1 .

APl 744 CSCCM@IITD 69

Student’s t distribution

MatLab code

Mean: 𝜇, 𝜈 > 1
Mode: 𝜇
𝜈
,𝜈 > 2
𝜆 𝜈−2
Variance: ൞ ∞, 1 < 𝜈 ≤ 2 ,
undefined otherwise

APl 744 CSCCM@IITD 70

Student’s t distribution is mixture of Gaussians

• If we have a univariate Gaussian, 𝒩 𝑥|𝜇, 𝜏 −1 together with a prior

• Writing the expression of the two distributions, we have

∞ 1
𝜏 2 𝜏 2
𝑏𝑎 𝑎−1
𝑝 𝑥|𝜇, 𝑎, 𝑏 = න exp − 𝑥 − 𝜇 𝜏 exp −𝑏𝜏 𝑑𝜏
0 2𝜋 2 Γ 𝑎

1 2
• Consider, 𝑧 = 𝑏 + 2 𝑥 − 𝜇 𝜏,
𝐴
1
∞
𝑏𝑎 1 2
𝑝 𝑥|𝜇, 𝑎, 𝑏 = න 𝜏 0.5 exp −𝑧 𝜏 𝑎−1 𝑑𝜏
Γ 𝑎 2𝜋 0
1
∞
𝑏𝑎 1 2 1 𝑑𝑧
= 1 න 𝑧 0.5 exp −𝑧 𝑧 𝑎−1
Γ 𝑎 2𝜋 𝐴
𝐴2+𝑎−1 0

APl 744 CSCCM@IITD 71

Student’s t distribution is mixture of Gaussians
1
∞
𝑏𝑎 1 2 1 0.5 𝑎−1
𝑑𝑧
𝑝 𝑥|𝜇, 𝑎, 𝑏 = 1 න 𝑧 exp −𝑧 𝑧
Γ 𝑎 2𝜋 +𝑎−1 0 𝐴
𝐴2
• On simplifications,
1 1
− −𝑎 ∞
𝑏𝑎 1 2 1 2
2
𝑝 𝑥|𝜇, 𝑎, 𝑏 = 𝑏+ 𝑥−𝜇 න exp −𝑧 𝑧 𝑎−1+0.5 𝑑𝑧
Γ 𝑎 2𝜋 2 0

∞
• By definition, Γ 𝑎 = ‫׬‬0 exp −𝑧 𝑧 𝑎−1 𝑑𝑧. Therefore,
1 1
−2−𝑎
𝑏𝑎 1 1
2
2
1
𝑝 𝑥|𝜇, 𝑎, 𝑏 = 𝑏+ 𝑥−𝜇 Γ 𝑎+
Γ 𝑎 2𝜋 2 2
𝑎
• Finally, redefining 𝜈 = 2𝑎, 𝜆 = 𝑏 , we have

𝜈 1 1 1 𝜈
Γ 2+2 𝜆 2 𝜆 𝑥−𝜇 2 −2−2
𝑝 𝑥|𝜇, 𝑎, 𝑏 = 1+
Γ 𝑎 𝜋𝜈 𝜈

• Student’s t-distribution is infinite mixture of Gaussians.

APl 744 CSCCM@IITD 72

Robustness of student’s t distribution
• The robustness of the t-distribution is illustrated by comparing the maximum
likelihood solutions for a Gaussian and a t-distribution (30 samples from
Gaussian is used).

• The effect of small number of outliers (Fig. on right) is less significant for the t-
distribution than for the Gaussian.

Matlab Code

APl 744 CSCCM@IITD 73

Other distributions

• There are several other distributions:

• Laplace distribution
• Beta distribution
• Gamma distribution
• Exponential distribution
• Chi-squared distribution
• Inverse Gamma distribution
• The Pareto distribution

• The operations are the same as any other probability density functions.

• You can expect questions on some of this distributions in the HW.

APl 744 CSCCM@IITD 74

Covariance

• Covariance
𝑐𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝐸 𝑋 𝑌−𝐸 𝑌 = 𝐸 𝑋𝑌 − 𝐸 𝑋 𝐸 𝑌
• It expresses the extent to which 𝑋 and 𝑌 vary (linearly) together.
• If 𝑋 and 𝑌 are independent, 𝑃 𝑋, 𝑌 = 𝑃 𝑋 𝑃 𝑌 , 𝑐𝑜𝑣 𝑋, 𝑌 = 0; however, the
reverse might not be true.
• 𝑋 and 𝑌 are said to be orthogonal if, 𝐸 𝑋𝑌 = 0.
• The correlation reflects the noisiness and direction of a linear relationship (top
row), but not the slope of that relationship (middle), nor nonlinear relationships
(bottom).

APl 744 CSCCM@IITD 75

Multi-variate random variable

• Consider 𝑿 = 𝑋1 , 𝑋2 , … , 𝑋𝑁 𝑇 ∈ ℝ𝑁 where each component 𝑋𝑖 is an ℝ −

valued function.

• 𝑿 is defined by the joint probability density of its components

𝑝𝑿 : ℝ𝑁 → ℝ+

• The cumulative distribution function is defined as

𝐹 𝑥1 , 𝑥2 , … , 𝑥𝑁 = ℙ 𝑋1 ≤ 𝑥1 , 𝑋2 ≤ 𝑥2 , … , 𝑋𝑁 ≤ 𝑥𝑁 ∈ 0,1

• The probability density function of 𝑿 is defined as

𝜕𝑁 𝐹 𝒙
𝑝𝑿 𝑥1 , 𝑥2 , … , 𝑥𝑁 =
𝜕𝑥1 𝜕𝑥2 ⋯ 𝜕𝑥𝑁
and ‫ = 𝒙𝑑 𝒙 𝑿𝑝 ׬‬1

• The expectation is defined as

𝐸 𝑿 = න𝒙𝑝𝑿 𝒙 𝑑𝒙 ∈ ℝ𝑁

APl 744 CSCCM@IITD 76

Multi-variate random variable

• The covariance matrix is given as

𝑇
𝑐𝑜𝑣 𝑿 = න 𝒙 − 𝐸 𝑿 𝒙−𝐸 𝑿 𝑝𝑿 𝒙 𝑑𝒙 ∈ ℝ𝑁×𝑁

• The covariance matrix is symmetric and semi-definite.

• The diagonal of the covariance matrix gives the variances of the individual
components.

𝑣𝑎𝑟 𝑿1 𝑐𝑜𝑣 𝑿1 , 𝑿2 ⋯ 𝑐𝑜𝑣 𝑿1 , 𝑿𝑁

𝑐𝑜𝑣 𝑿2 , 𝑿1 𝑣𝑎𝑟 𝑿2 ⋯ 𝑐𝑜𝑣 𝑿2 , 𝑿𝑁
𝑐𝑜𝑣 𝐗 =
⋮ ⋮ ⋱ ⋮
𝑐𝑜𝑣 𝑿𝑁 , 𝑿1 𝑐𝑜𝑣 𝑿𝑁 , 𝑿2 ⋯ 𝑣𝑎𝑟 𝑿𝑁

• A normalized version of this is the correlation matrix where all elements are
between −1,1 (diagonal elements = 1).

APl 744 CSCCM@IITD 77

Multi-variate random variable

• A normalized version of this is the correlation matrix where all elements are
between −1,1 (diagonal elements = 1).

1 𝑐𝑜𝑟𝑟 𝑿1 , 𝑿2 ⋯ 𝑐𝑜𝑟𝑟 𝑿1 , 𝑿𝑁
𝑐𝑜𝑟𝑟 𝑿2 , 𝑿1 1 ⋯ 𝑐𝑜𝑟𝑟 𝑿2 , 𝑿𝑁
𝐑=
⋮ ⋮ ⋱ ⋮
𝑐𝑜𝑟𝑟 𝑿𝑁 , 𝑿1 𝑐𝑜𝑟𝑟 𝑿𝑁 , 𝑿2 ⋯ 1

APl 744 CSCCM@IITD 78

Multivariate Gaussian

• A multivariate 𝑿 ∈ ℝ𝑁 is Gaussian if its probability

1
1 2 1
𝑝 𝒙 = 𝑁
exp −𝒙 − 𝝁 𝑇 Σ −1 𝒙 − 𝝁
2𝜋 det Σ 2
where 𝝁 ∈ ℝ𝑁 is the mean vector and Σ ∈ ℝ𝑁×𝑁 is the covariance matrix.

• This distribution is invariant under linear transformation. Suppose two

independent variables 𝑿1 and 𝑿2 are defined as follows:
𝑿1 = 𝓝 𝝁1 , 𝚺1 , 𝑿2 = 𝓝 𝝁2 , 𝚺2
Then, 𝐀𝑿1 + 𝐁𝑿2 + 𝒄 ∼ 𝓝 𝐀𝝁1 + 𝐁𝝁2 + 𝒄, 𝐀𝚺1 𝐀T + 𝐁𝚺2 𝐁 T

APl 744 CSCCM@IITD 79

Multivariate Gaussian

MATLAB code

APl 744 CSCCM@IITD 80

Transformations of probability distributions

• A probability density transforms differently from functions.

• Let 𝑥 = 𝑔 𝑦 , then
𝑑𝑥
𝑝𝑌 𝑦 = 𝑝𝑋 𝑔 𝑦 = 𝑝𝑋 𝑔 𝑦 𝑔′ 𝑦
𝑑𝑦

• This is derived based on the observation that

𝑝𝑌 𝑦 𝑑𝑦 = 𝑝𝑋 𝑥 𝑑𝑥

• An example
𝑏𝑎 𝑎−1
𝒢𝒶𝓂𝓂𝒶 𝑥|𝑎, 𝑏 = 𝑥 exp −𝑥𝑏
Γ 𝑎
Define, 𝑌 = 1/𝑋.
𝑏𝑎 − 𝑎−1 1 𝑏𝑎 − 𝑎−1 −2 exp
𝑝𝑌 𝑦 = 𝑦 exp −𝑏/𝑦 − 2 = 𝑦 −𝑏/𝑦
Γ 𝑎 𝑦 Γ 𝑎
𝑏𝑎 − 𝑎+1
𝑝𝑌 𝑦 = 𝑦 exp −𝑏/𝑦
Γ 𝑎
→ This is the inverse Gamma distribution.

APl 744 CSCCM@IITD 81

Multivariate change of variable

• For multi-variate distribution, the mapping is in terms of the jacobian

𝜕𝒙
𝑝𝑌 𝒚 = 𝑝𝑋 𝒙 det
𝜕𝒚
where
𝜕𝑦1 𝜕𝑦1
⋯
𝜕𝒚 𝜕𝑥1 𝜕𝑥𝑛
= ⋱
𝜕𝒙 𝜕𝑦𝑁 𝜕𝑦𝑁
⋯
𝜕𝑥1 𝜕𝑥𝑁
• Multivariate student’s t-distribution
𝐷 𝜈 1 𝜈 𝐷
−2− 2
Γ 2 +2 𝚲 2 Δ 2
𝒯 𝒙|𝝁, 𝚲, 𝜈 = 𝜈 1 +
Γ 2 (𝜋𝜈) 𝜈
Δ2 = 𝒙 − 𝝁 𝑇 Λ 𝒙 − 𝝁 Mahalanobis distance
• Dirichlet distribution
𝐾
𝛼 −1
𝑝 𝝁|𝜶 ∝ ෑ 𝜇𝑘 𝑘 , ෍ 𝜇𝑘 = 1, 0 ≤ 𝜇𝑘 ≤ 1
𝑘 𝑘

APl 744 CSCCM@IITD 82

The law of large numbers
• Let 𝑋𝑖 , 𝑖 = 1,2, … , 𝑁 be independent and identically distributed variables with finite
mean 𝐸 𝑋𝑖 = 𝜇 and variance 𝑣𝑎𝑟 𝑋𝑖 = 𝜎 2 .

• We define 𝑋ത𝑛 as
𝑁
𝑋𝑖
𝑋ത𝑛 = ෍
𝑁
𝑖=1
• Taking expectation of both sides, we see
𝑁
𝐸 𝑋𝑖
𝐸 𝑋ത𝑛 = ෍ =𝜇
𝑁
𝑖=1
𝑁
𝑣𝑎𝑟 𝑋𝑖 𝜎2
𝑉𝑎𝑟 𝑋ത𝑛 =෍ =
𝑁2 𝑁
𝑖=1

• Weak LLN: lim ℙ 𝑋ത𝑛 − 𝜇 ≥ 𝜖 = 0, ∀𝜖

𝑛→∞

• Large LLN: lim 𝑋ത𝑛 = 𝜇, almost surely

𝑛→∞

For more details: click here

APl 744 CSCCM@IITD 83

Statistical Inference: Parametric & Non-Parametric Approach

• Assume we have a set of observations,

𝒮 = 𝑥1 , 𝑥2 , … , 𝑥𝑁 , 𝑥𝑗 ∈ ℝ𝑁

• The problem is to infer the underlying probability distribution that gives rise to the
data 𝒮.

• Parametric model: Assume a model and then try to infer the parameters (e.g., fiting
a normal distribution).

• Non-parametric model: No analytical expression for the probability density is

available. Description consists of defining the dependency or independency of the
data. This leads to numerical exploration

APl 744 CSCCM@IITD 84

The Central Limit Theorem

• Let 𝑋𝑖 , 𝑖 = 1,2, … , 𝑁 be independent and identically distributed variables with finite

mean 𝐸 𝑋𝑖 = 𝜇 and variance 𝑣𝑎𝑟 𝑋𝑖 = 𝜎 2 .

• We define a variables 𝑧𝑁 such that

1 𝑋തN − 𝜇
𝑧𝑁 = 𝑋1 + ⋯ + 𝑋𝑁 − 𝑁𝜇 = 𝜎
𝜎 𝑁
𝑁

• As 𝑁 → ∞, the distribution of 𝑧𝑁 converges to the distribution of a standard normal

𝜎2
ത
variable and 𝑋N ∼ 𝒩 𝜇, as 𝑁 → ∞
𝑁

• This justifies the reason for assuming the noise to be Gaussian.

APl 744 CSCCM@IITD 85

The Central Limit Theorem

Matlab code

APl 744 CSCCM@IITD 86

Computing 𝜋 using MC

𝑟
𝐼 = න 𝕀 𝑥 2 + 𝑦 2 ≤ 𝑟 2 𝑑𝑥𝑑𝑦 = 𝜋𝑟 2
−𝑟

1 2 𝑟
𝜋 = 2 4𝑟 න 𝕀 𝑥 2 + 𝑦 2 ≤ 𝑟 2 𝑝𝑋 𝑥 𝑝𝑌 𝑥 𝑑𝑥𝑑𝑦
𝑟 −𝑟
𝑁
1
≈ 4 ෍ 𝕀 𝑥 2 + 𝑦2 ≤ 𝑟2
𝑁
𝑖

Run mcEstimatePi from PMTK

APl 744 CSCCM@IITD 87

Kullback-Leibler Divergence and cross-entropy

• Consider some unknown distribution 𝑝 𝑥 and suppose we have modelled this using
an approximate distribution 𝑞 𝑥 .

• The 𝐾𝐿 divergence between 𝑝 and 𝑞 is given as

𝑞 𝑥
𝐾𝐿 𝑝||𝑞 = − න 𝑝 𝑥 log 𝑑𝑥
𝑝 𝑥
= − න𝑝 𝑥 log 𝑞 𝑥 𝑑𝑥 + න𝑝 𝑥 log 𝑞 𝑥 𝑑𝑥

• The cross-entropy between 𝑝 𝑥 and 𝑞 𝑥 is given as

ℍ 𝑝, 𝑞 = − න𝑝 𝑥 log 𝑞 𝑥 𝑑𝑥

• The two important properties of KL divergence are as follows:

• 𝐾𝐿 𝑝||𝑞 ≥ 0
• 𝐾𝐿(𝑝| 𝑞 ≠ 𝐾𝐿(𝑞| 𝑝

APl 744 CSCCM@IITD 88

Jensen’s inequality

• For a convex function 𝑓, we have

𝑀 𝑀

𝑓 ෍ 𝜆𝑖 𝑥𝑖 ≤ ෍ 𝜆𝑖 𝑓 𝑥𝑖 , 𝜆𝑖 ≥ 0 and ෍ 𝜆𝑖 = 1
𝑖=1 𝑖=1 𝑖

• In the context of probability, we thus have

𝑓 න𝑥𝑝 𝑥 𝑑𝑥 ≤ න𝑓 𝑥 𝑝 𝑥 𝑑𝑥

• Let's consider the KL-divergence from before

𝑞 𝑥 𝑞 𝑥
𝐾𝐿 𝑝||𝑞 = − න 𝑝 𝑥 log 𝑑𝑥 ≥ − log න 𝑝 𝑥 𝑑𝑥 = 0
𝑝 𝑥 𝑝 𝑥

• This proves that 𝐾𝐿 ≥ 0.

APl 744 CSCCM@IITD 89

KL divergence vs. MLE estimate

• Suppose we have data, 𝒙𝑛 ∼ 𝑝 𝒙 , 𝑛 = 1, … , 𝑁 from an unknown distribution 𝑝 𝒙

that we try to approximate with a parametric model 𝑞 𝒙 𝜃 . Then
𝑁
𝑞 𝒙|𝜃 1
𝐾𝐿(𝑝| 𝑞 = − න 𝑝 𝒙 log 𝑑𝑥 ≈ ෍ −log 𝑞 𝒙𝑛 |𝜃 + log 𝑝 𝒙𝑛
𝑝 𝒙 𝑁
𝑖=1

• Note that only the blue term is having 𝑞. Therefore, minimizing 𝐾𝐿(𝑝| 𝑞 is similar
to maximizing the data-likelihood with the empirical distribution 𝑞 𝒙 𝜃 .

APl 744 CSCCM@IITD 90

Conditional entropy

• For joint distribution, the conditional entropy is

ℍ 𝑦 𝑥 = න න𝑝 𝑦, 𝑥 log 𝑝 𝑦|𝑥 𝑑𝑥𝑑𝑦

• Using 𝑝 𝑥, 𝑦 = 𝑝 𝑥 𝑝 𝑦 𝑥 , we have
𝑝 𝑥, 𝑦
ℍ 𝑦 𝑥 = න න 𝑝 𝑦, 𝑥 log 𝑑𝑥𝑑𝑦
𝑝 𝑥

• This can be rewritten as

ℍ 𝑦 𝑥 = න න𝑝 𝑦, 𝑥 log 𝑝 𝑥, 𝑦 𝑑𝑥𝑑𝑦 − න න𝑝 𝑦, 𝑥 log 𝑝 𝑥 𝑑𝑥𝑑𝑦

ℍ 𝑦 𝑥 = න න𝑝 𝑦, 𝑥 log 𝑝 𝑥, 𝑦 𝑑𝑥𝑑𝑦 − න log 𝑝 𝑥 න𝑝 𝑦, 𝑥 𝑑𝑦 𝑑𝑥

ℍ 𝑥,𝑦 𝑝 𝒙
ℍ𝑥
• Therefore,
ℍ 𝑥, 𝑦 = ℍ 𝑥 + ℍ 𝑦 𝑥

APl 744 CSCCM@IITD 91

Summary

• Product rule and sum rule:

ℙ 𝑋 = 𝑥𝑖 = ෍ ℙ 𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ℙ 𝑥 = නℙ 𝑥, 𝑦 𝑑𝑦
𝑗

ℙ 𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 = ℙ 𝑌 = 𝑦𝑗 |𝑋 = 𝑥𝑖 ℙ 𝑋 = 𝑥𝑖

• Prior, posterior and likelihood:

𝑝 𝜃|𝒟 ∝ 𝑝ถ
𝜃 𝑝 𝒟|𝜃
posterior prior likelihood

• Law of large number, Monte Carlo integration:

𝑎 𝑁
1
න 𝑓 𝑥 𝑝𝑋 𝑥 𝑑𝑥 ≈ ෍ 𝑓 𝑥
−𝑎 𝑁
𝑖

• Central limit theorem

APl 744 CSCCM@IITD 92

Summary

• Different PDFs and PMFs, Independence and its importance

• KL divergence
𝑞 𝒙|𝜃
𝐾𝐿(𝑝| 𝑞 = − න 𝑝 𝒙 log 𝑑𝑥
𝑝 𝒙

• MLE, MAP and Bayesian approach

• Maximize ℙ 𝒟|𝜇 (MLE).

ℙ 𝜇 ℙ 𝒟|𝜇
• Compute the posterior, ℙ 𝜇 𝒟 = ∝ ℙ 𝜇 ℙ 𝒟|𝜇 (Bayesian way)
ℙ 𝒟
• 𝜇∗ = argmax𝜇 ℙ 𝜇 𝒟 (MAP estimate).

• Jensen’s inequality

𝑀 𝑀

𝑓 ෍ 𝜆𝑖 𝑥𝑖 ≤ ෍ 𝜆𝑖 𝑓 𝑥𝑖 , 𝜆𝑖 ≥ 0 and ෍ 𝜆𝑖 = 1
𝑖=1 𝑖=1 𝑖

APl 744 CSCCM@IITD 93

ML Full Syllabus
No ratings yet
ML Full Syllabus
576 pages
1 Introduction
No ratings yet
1 Introduction
58 pages
ML Handout
No ratings yet
ML Handout
9 pages
Course Content
No ratings yet
Course Content
3 pages
ML CP-23-24 EVEN As On 81.25
No ratings yet
ML CP-23-24 EVEN As On 81.25
13 pages
Introduction To Machine Learning: Pekka Parviainen
No ratings yet
Introduction To Machine Learning: Pekka Parviainen
39 pages
Course Code CSA400 8 Course Type LTP Credits 4: Applied Machine Learning
No ratings yet
Course Code CSA400 8 Course Type LTP Credits 4: Applied Machine Learning
3 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
Course Code Course Title Course Planner: Through This Course Students Should Be Able To
No ratings yet
Course Code Course Title Course Planner: Through This Course Students Should Be Able To
4 pages
MCA, DCS, IIICT, Indus University: Course Objectives
No ratings yet
MCA, DCS, IIICT, Indus University: Course Objectives
2 pages
CSE AIML BTech Fourth Year
No ratings yet
CSE AIML BTech Fourth Year
26 pages
Handout
No ratings yet
Handout
4 pages
R18B Tech MinorIVYearISemesterTENTATIVESyllabus
No ratings yet
R18B Tech MinorIVYearISemesterTENTATIVESyllabus
22 pages
PCATG
No ratings yet
PCATG
266 pages
ML Final
No ratings yet
ML Final
95 pages
ML Coursefile Final 23.01.25
No ratings yet
ML Coursefile Final 23.01.25
235 pages
Lahore University of Management Sciences CS 535/EE 514 Machine Learning
No ratings yet
Lahore University of Management Sciences CS 535/EE 514 Machine Learning
3 pages
CMU 10-301: Intro to Machine Learning
No ratings yet
CMU 10-301: Intro to Machine Learning
34 pages
Mac Unit 3
No ratings yet
Mac Unit 3
65 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
3 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
18 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
81 pages
Lecture 1
100% (1)
Lecture 1
51 pages
Cp-Integrated - Aiml
No ratings yet
Cp-Integrated - Aiml
8 pages
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
Machine Learning-Updated
No ratings yet
Machine Learning-Updated
4 pages
ML 01
No ratings yet
ML 01
15 pages
CPSC 340 and 532M: Machine Learning and Data Mining
No ratings yet
CPSC 340 and 532M: Machine Learning and Data Mining
52 pages
cs412 24FA Syllabus
No ratings yet
cs412 24FA Syllabus
2 pages
Ai ML
No ratings yet
Ai ML
2 pages
Machine Learning for MCA Students
No ratings yet
Machine Learning for MCA Students
3 pages
Syll 2021 Reg
No ratings yet
Syll 2021 Reg
2 pages
Course Logistics and Introduction To Machine Learning
No ratings yet
Course Logistics and Introduction To Machine Learning
34 pages
2 Syllabus
No ratings yet
2 Syllabus
3 pages
ML Course Aug2025
No ratings yet
ML Course Aug2025
6 pages
Bits f464 Machine Learning l1
No ratings yet
Bits f464 Machine Learning l1
5 pages
Machine Learning and Its Applications
No ratings yet
Machine Learning and Its Applications
81 pages
Final PRINT 2022 SCHEME VI SEM SCHEME & SYLLABUS
No ratings yet
Final PRINT 2022 SCHEME VI SEM SCHEME & SYLLABUS
30 pages
Lecture 1 - Intro
No ratings yet
Lecture 1 - Intro
31 pages
FML Course Content
No ratings yet
FML Course Content
2 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
2 pages
CS0701 ML (Machine Learning)
No ratings yet
CS0701 ML (Machine Learning)
4 pages
Intro Slides
No ratings yet
Intro Slides
31 pages
Sodapdf
No ratings yet
Sodapdf
6 pages
Course Assessment Plan
No ratings yet
Course Assessment Plan
10 pages
TIU UCS T451 Machine Learning AI
No ratings yet
TIU UCS T451 Machine Learning AI
5 pages
Deep Learning Course Overview
No ratings yet
Deep Learning Course Overview
4 pages
ML
No ratings yet
ML
2 pages
Ai&ml Unit 3
No ratings yet
Ai&ml Unit 3
81 pages
1 22csu601-Aiml Syllabus
No ratings yet
1 22csu601-Aiml Syllabus
4 pages
ML Final
No ratings yet
ML Final
98 pages
Concepts - of - Machine - Learning (Minor)
No ratings yet
Concepts - of - Machine - Learning (Minor)
14 pages
PGP Aiml2024
No ratings yet
PGP Aiml2024
22 pages
AI Fellowship Nepal
No ratings yet
AI Fellowship Nepal
17 pages
ML Lec5
No ratings yet
ML Lec5
4 pages
Fee-Receipt PDF
No ratings yet
Fee-Receipt PDF
1 page
Writing Informational Texts in Grade 3
No ratings yet
Writing Informational Texts in Grade 3
2 pages
Reading Skill: Abdul Rahman
No ratings yet
Reading Skill: Abdul Rahman
18 pages
Y4 Unit 7 Blogging
No ratings yet
Y4 Unit 7 Blogging
23 pages
Anganwadi Visit Report
No ratings yet
Anganwadi Visit Report
11 pages
Essentials of English Grammar The Quick Guide To Good English 3rd Edition L Sue Baugh Newest Edition 2025
No ratings yet
Essentials of English Grammar The Quick Guide To Good English 3rd Edition L Sue Baugh Newest Edition 2025
109 pages
An Assessment of Construction Students' Throughut Rate in South Africa
No ratings yet
An Assessment of Construction Students' Throughut Rate in South Africa
1 page
College Resume
No ratings yet
College Resume
2 pages
Trocarization Techniques in Cattle
67% (3)
Trocarization Techniques in Cattle
18 pages
Data Encoding and Compression Techniques
No ratings yet
Data Encoding and Compression Techniques
1 page
WWW - K2view - Com - What Is Retrieval Augmented Generation
No ratings yet
WWW - K2view - Com - What Is Retrieval Augmented Generation
29 pages
Math-6 Q1 - LAS Wk1 - Day1-5
No ratings yet
Math-6 Q1 - LAS Wk1 - Day1-5
10 pages
RRL (Parental Participation: Assessing Student's Classroom Engagement)
No ratings yet
RRL (Parental Participation: Assessing Student's Classroom Engagement)
7 pages
2024 LIM LFSC Practical Task 2 Marking Guideline Corrected
100% (3)
2024 LIM LFSC Practical Task 2 Marking Guideline Corrected
7 pages
DLL-Business Ethics-Q1-W4
No ratings yet
DLL-Business Ethics-Q1-W4
5 pages
Telangana State Public Service Commission Group-Iii Services NOTIFICATION NO. 29/2022, DATED: 30/12/2022
No ratings yet
Telangana State Public Service Commission Group-Iii Services NOTIFICATION NO. 29/2022, DATED: 30/12/2022
3 pages
Kingsbury School and Sports College 2013/2014: Cover Teacher Support Booklet
No ratings yet
Kingsbury School and Sports College 2013/2014: Cover Teacher Support Booklet
23 pages
MCA Advanced Java Exam Guide
No ratings yet
MCA Advanced Java Exam Guide
2 pages
Basic 7 Rme End of Term Three
No ratings yet
Basic 7 Rme End of Term Three
7 pages
Collin and Young The Future of Career
No ratings yet
Collin and Young The Future of Career
335 pages
Sa 2 Living Things
No ratings yet
Sa 2 Living Things
3 pages
Eco Club Mission Life Reg School List 21.11.2025
No ratings yet
Eco Club Mission Life Reg School List 21.11.2025
11 pages
Journal Entries To Financial Statements
No ratings yet
Journal Entries To Financial Statements
2 pages
Finance Job Application Letter
No ratings yet
Finance Job Application Letter
1 page
Past Simple 'Wh' Questions with Did
100% (1)
Past Simple 'Wh' Questions with Did
4 pages
Biology Curriculum Overview
No ratings yet
Biology Curriculum Overview
1 page
All
No ratings yet
All
39 pages
The Analysis of Diving Resistance by Using Dry Static Method For Freediving Beginners
No ratings yet
The Analysis of Diving Resistance by Using Dry Static Method For Freediving Beginners
3 pages
Self-Handicapping and Self-Esteem Dynamics
No ratings yet
Self-Handicapping and Self-Esteem Dynamics
17 pages
Sir Aaron Klug: Nobel Biophysicist Profile
No ratings yet
Sir Aaron Klug: Nobel Biophysicist Profile
5 pages