0% found this document useful (0 votes)

42 views65 pages

M3 DensityEstimation v1

This document provides an outline and overview of density estimation techniques, beginning with an introduction. It discusses parametric methods like maximum likelihood estimation (MLE) for continuous and discrete densities. MLE finds the parameters that maximize the likelihood of the data. Examples covered include MLE for 1D Gaussian distributions and Bernoulli distributions. Nonparametric methods are also briefly mentioned. The document cites several sources for content used.

Uploaded by

Aniket Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views65 pages

M3 DensityEstimation v1

Uploaded by

Aniket Keshri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

M3.

Density Estimation
Manikandan Narayanan
Week 6 (Sep 4-, 2023)
PRML Jul-Nov 2023 (Grads section)
Acknowledgment of Sources
• Slides based on content from related
• Courses:
• IITM – Profs. Arun/Harish/Chandra’s PRML offerings (slides, quizzes, notes, etc.), Prof.
Ravi’s “Intro to ML” slides – cited respectively as [AR], [HR], [CC], [BR] in the bottom right
of a slide.
• India – NPTEL PR course by IISc Prof. PS. Sastry (slides, etc.) – cited as [PSS] in the bottom
right of a slide.

• Books:
• PRML by Bishop. (content, figures, slides, etc.) – cited as [CMB]
• Pattern Classification by Duda, Hart and Stork. (content, figures, etc.) – [DHS]
• Mathematics for ML by Deisenroth, Faisal and Ong. (content, figures, etc.) – [DFO]
• Information Theory, Inference and Learning Algorithms by David JC MacKay – [DJM]
Outline for Module M3
• M3. Density Estimation
• M3.0 Introduction/Warmup
• M3.1 Parametric methods
• M3.2 Nonparametric methods (only brief mention)
Outline for Module M3 (detailed)
• M3. Density Estimation
• M3.0 Introduction/Warmup
• M3.0.0 What it means to “learn” from data?
• M3.0.1 Intuitive warmup to ML (Estimation)
• M3.1 Parametric methods
(aka parameter learning of probabilistic models)
• M3.1.1 Maximum Likelihood Estimation (MLE)
(for continuous/discrete densities, (& mixture densities (later)))
• M3.1.2 Bayesian Inference(/estimation)
• M3.2 Nonparametric methods (only brief mention)
• M3.2.0 General idea
• M3.2.1 K-nearest neighbors
Outline for Module M3
• M3. Density Estimation
• M3.0 Introduction/Warmup
• M3.0.0 What it means to “learn” from data?
• M3.0.1 Intuitive warmup to ML (Estimation)
• M3.1 Parametric methods
• M3.2 Nonparametric methods
Introduction to Density estimation
• So far: Bayesian decision theory (incl. Bayes classifiers)
• Two steps in a generative or discriminative model setting: Inference vs. Decision steps
• But how to do inference, i.e., how to “learn” a Bayes classifier from data???
• estimate the joint (class prior and class conditional) p(x,t) or posterior density p(t|x).
• So density estimation needed in both generative/discriminative model settings.

• Density estimation (informally aka learning the data distbn.):

• Addresses a fundamental question of what it means to learn from data.
• be it supervised (p(x,t) or p(t|x)) or unsupervised (p(x)) learning!
• Relies heavily on assumptions made in model selection step – otherwise, an
ill-posed problem!!

[CMB]
Inference & Decision (steps in detail for a generative model):

Setup (training data):

--------------------------------------------------------------------------------------------------------------
Inference (density est.):

--------------------------------------------------------------------------------------------------------------
Decision:
[HR]
Introduction to Density estimation
• So far: Bayesian decision theory (incl. Bayes classifiers)
• Two steps in a generative or discriminative model setting: Inference vs. Decision steps
• But how to do inference, i.e., how to “learn” a Bayes classifier from data???
• estimate the joint (class prior and class conditional) p(x,t) or posterior density p(t|x).
• So density estimation needed in both generative/discriminative model settings.

• Density estimation (informally aka learning the data distbn.):

[CMB]
Density Estimation: Problem Statement & Notations
• Problem: “Learn a model from data” == “Estimate a density/distribution 𝔻𝔻
from independent observations (i.e., iid samples drawn from 𝔻𝔻)”

• Input: 𝑁𝑁 data points x1 , … , x𝑁𝑁 𝑇𝑇 assumed to be iid samples from an

unknown probability distribution 𝔻𝔻
• x𝑛𝑛 ∼𝑖𝑖𝑖𝑖𝑖𝑖 𝔻𝔻 for all 𝑛𝑛 = 1, … , 𝑁𝑁.
• x𝑛𝑛 ∈ ℝ𝑑𝑑

• Output: Probability density/distribution 𝔻𝔻 that “best fits” the data

• Univariate distbn. if d=1, and Multivariate/Joint distbn. if multiple (d>1) r.v.s are to be
modelled (e.g.., fish length, width and color).
• Family/Form of distributions fixed at “model selection” step to get a well-posed
problem.
Density estimation (intuitively in pictures)

[DJM,CMB]
Approaches to Density estimation
• Parametric approach:
• some functional form of probability distribution 𝐷𝐷 assumed for the data points
• family of models parameterized by 𝜃𝜃 i.e., 𝑝𝑝 𝑥𝑥 𝜃𝜃 or 𝑓𝑓(𝑥𝑥|𝜃𝜃), with each family
member specified by a particular value of the parameter vector 𝜃𝜃.
• Distribution could be simple (e.g., unimodal density) or complex (e.g., multi-modal
density, incl. mixture density for mixture models)

• Nonparametric approach:
• distribution not assumed to be of a functional form specified by a few parameters;
instead form of distribution typically depends on the size of the dataset.
• Still have some “parameters” but they control model complexity (more so than
specifying the exact functional form of the distribution)

[CMB]
Warmup: Intuitive depiction of density
estimation example
Warmup: Parametric approach on a toy
dataset

[DJM]
Recap: The (1D) Gaussian/Normal Distribution

[CMB]
Warmup: How to fit a 1D Gaussian to this
data? – Intuition

[DJM]
Warmup: How to fit a 1D Gaussian to this data?
– “Visual” MLE

[DJM]
Warmup: How to fit a 1D Gaussian to this data?
(contd.)

[DJM]
Warmup: MLE for 1D Gaussian (the need for
“continuous optimization”)

[DJM]
MLE for one 1D Gaussian (closed-form
solution)
• Log likelihood:

• MLE estimates:

[DJM]
Outline for Module M3
• M3. Density Estimation
• M3.0 Introduction/Background
• M3.1 Parametric methods
• M3.1.1 Maximum Likelihood Estimation (MLE)
(for continuous/discrete densities; mixture densities (later))
• M3.1.2 Bayesian Inference(/estimation)
• M3.2 Nonparametric methods
ML approach
• Dataset D or 𝐷𝐷𝑁𝑁 = {𝑥𝑥1 , … , 𝑥𝑥𝑁𝑁 } (iid samples from 𝑝𝑝(𝑥𝑥|𝜃𝜃); 𝑝𝑝 denotes pmf or pdf)
• Likelihood (function of parameters, given the data, is used as the score function):
ℒ 𝜃𝜃; 𝐷𝐷𝑁𝑁 = 𝑃𝑃({𝑥𝑥1 , … , 𝑥𝑥𝑁𝑁 } | 𝜃𝜃) = � 𝑝𝑝 𝑥𝑥𝑛𝑛 𝜃𝜃)
𝑛𝑛=1,…,𝑁𝑁
• ML Estimate
(opt. problem, solved
analytically or numerically):

• Has desirable properties, mainly consistency (for “most” densities).

MLE converges in probab.
to the true parameter(s):

[PSS]
Examples we will see:
1) Gaussian (uni- and multi-variate)
2) Bernoulli
3) Categorical/Multinoulli
Example 1 (Continuous density): MLE for 1D
Gaussian (toy dataset)
• What is the likelihood?

[DJM]
MLE for 1D Gaussian (general N datapoints)
Rough space for illustrations
MLE for one 1D Gaussian
• Log likelihood:

• MLE estimates:

[DJM]
Bias of the estimator: 𝔼𝔼𝐷𝐷𝑁𝑁={𝑥𝑥1,…,𝑥𝑥𝑁𝑁}|𝜃𝜃 [𝜃𝜃�
𝑁𝑁 ] − 𝜃𝜃
From uni- to multi-variate Gaussian

[CMB]
Maximum Likelihood for the Gaussian (1)
• Given i.i.d. data , the log likelihood function is given
by

• Sufficient statistics

[CMB]
Maximum Likelihood for the Gaussian (2)
• Set the gradient of the log likelihood function to zero,

• and solve to obtain

• Similarly

[CMB]
Maximum Likelihood for the Gaussian (3)
Under the true distribution, bias for 2nd est.

Hence define

[CMB]
Derivation of MLE of Multi-variate Gaussian
• Facts on gradients (wrt vector or matrix of parameters):
𝜕𝜕 𝑇𝑇
• 𝑥𝑥 𝐴𝐴 𝑥𝑥 = 𝐴𝐴𝑇𝑇 𝑥𝑥 + 𝐴𝐴𝐴𝐴 (or 2𝐴𝐴𝐴𝐴 if 𝐴𝐴 is symmetric)
𝜕𝜕𝜕𝜕
𝜕𝜕 𝑇𝑇
• 𝑥𝑥 𝐴𝐴 𝑥𝑥 = 𝑥𝑥𝑥𝑥 𝑇𝑇 (outer-product)
𝜕𝜕𝜕𝜕
𝜕𝜕
• log |𝐴𝐴| = 𝐴𝐴−𝑇𝑇
𝜕𝜕𝜕𝜕

• Gradient of 𝐿𝐿𝐿𝐿 𝜇𝜇, Λ : = 𝐿𝐿𝐿𝐿 𝜇𝜇, Σ −1

[From Secn. 13.5 of The Multivariate Gaussian from https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/other-readings/chapter13.pdf]

Outline for Module M3
• M3. Density Estimation
• M3.0 Introduction/Background
• M3.1 Parametric methods
• M3.1.1 Maximum Likelihood Estimation (MLE)
(for continuous/discrete densities, mixture densities (later))
• M3.1.2 Bayesian Inference(/estimation)
• M3.2 Nonparametric methods
Example 2: Bernoulli/Binary RVs
• Coin flipping: heads=1, tails=0

• Bernoulli Distribution

[CMB]
(Parametric) Density Estimation / Parameter
Estimation / Parameter learning
• ML for Bernoulli
• Given:
•

[CMB]
Example 3: From Bernoulli to Multinoulli
Categorical (Multinoulli) Variables
1-of-K coding scheme:

[CMB]
ML Parameter estimation
• Given:

• Ensure , use a Lagrange multiplier.

[CMB]
Aside in Appendix

• An Aside: Relation between Bernoulli and Binomial distribution

• An Aside: Relation between Categorical and Multinomial Distribution
Outline for Module M3
• M3. Density Estimation
• M3.0 Introduction/Background
• M3.1 Parametric methods
• M3.1.1 Maximum Likelihood Estimation (MLE)
• M3.1.2 Bayesian Inference(/estimation)
• M3.2 Nonparametric methods
Motivation: Why go from MLE to Bayesian
inference?
• Small sample sizes - overfitting to training data 𝒟𝒟

• ⇒ Prediction: all future tosses will land heads up

• Laplace’s sunrise problem: What is the probability that the sun will rise tomorrow? [https://en.wikipedia.org/wiki/Sunrise_problem]

• Prior information
• MLE cannot use additional information we may have about the parameter!

• Richer (compound or hierarchical) distbns. to fit the data, and robustness to outliers
• Treating parameters as r.v.s with their own distributions can offer a “natural” plug-and-play hierarchical modelling framework
to construct complex distbns. (marginal distbns. with heavy tails or overdispersion, etc.) that fit the data better.

[CMB]
Bayesian approach
• Bayesian approach in theory: View parameter 𝜃𝜃 as a r.v. and not as a fixed constant as in MLE
• Why Bayesian? ML (frequentist/Fisherian) approach gives useful/consistent estimators for many distbns., but
fails for small sample sizes and doesn’t permit incorporation of additional info. about the parameter!

• Information about the r.v. before seeing the data is encoded as a prior distribution 𝑃𝑃 𝜃𝜃

• Use Bayes rule to get posterior that captures your degree of belief/uncertainty about 𝜃𝜃 after seeing the data:
𝑃𝑃 𝜃𝜃 𝐷𝐷𝑁𝑁 ) ∝ 𝑃𝑃 𝜃𝜃 𝑃𝑃 𝐷𝐷𝑁𝑁 𝜃𝜃)
posterior ∝ prior x likelihood

• Bayesian approach in practice:

• Conjugate priors make calcn./interpretn. easy by ensuring posterior & prior follow same distbn.
• But may not be applicable always (use approximate inference such as MCMC/Gibbs sampling for more complex priors)
• What about that pesky hyperparameter (i.e., pseudocounts for beta distbn.)?
• Full (posterior) distribution vs. a point estimate?
• Posterior mode (MAP) or Posterior mean – a practical resort
• an ideal Bayesian can integrate over uncertainty around the parameter - posterior predictive distbn.

[PSS,CMB]
Bayesian approach
• Bayesian approach in theory: View parameter 𝜃𝜃 as a r.v. and not as a fixed constant as in MLE
• Why Bayesian? ML (frequentist/Fisherian) approach gives useful/consistent estimators for many distbns., but
fails for small sample sizes and doesn’t permit incorporation of additional info. about the parameter!

• Information about the r.v. before seeing the data is encoded as a prior distribution 𝑃𝑃 𝜃𝜃

• Bayesian approach in practice:

[PSS,CMB]
Three examples again:
Bayesian inference for:
Example 2: Bernoulli
Example 3: Categorial/Multinoulli
Example 1: Gaussian (mostly 1D, multi-variate in Appendix)
Example 2: Bayesian inference for Bernoulli
What is a good prior?
What is a good prior? Beta Distribution
• Distribution over

(converges for z > 0) [CMB]

Beta Distribution

[CMB]
Bayesian Bernoulli

Beta distribution is the conjugate prior for the parameter

of the Bernoulli distribution (or Bernoulli likelihood fn.).

[CMB]
Bayesian inference in action: Beta-Bernoulli
(Prior ∙ Likelihood = Posterior)

[CMB]
Pseudocounts, and updating these counts
with new data - example

[CMB]
Properties of the Posterior
As the size of the data set, N, increase

[CMB]
Point estimate vs. using the full posterior:
Prediction under the (full) posterior
What is the probability that the next coin toss will land
heads up?

[CMB]
Example 3: Bayesian inference for
Categorical/Multinoulli
Dirichlet Distribution for the Prior

Conjugate prior for the

categorical likelihood fn. i.e.,
Dirichlet distbn. is conjugate prior
for the parameters of the
Categorical distbn.

[CMB]
Bayesian Categorical

[CMB]
Example 1: Bayesian inference for 1D
Gaussian?
Bayesian Inference for the Gaussian (1)
• Assume 𝜎𝜎 2 is known. Given i.i.d. data
, the likelihood function for 𝜇𝜇 is given by

• This has a Gaussian shape as a function of 𝜇𝜇 (but it is not a

distribution over 𝜇𝜇).
Bayesian Inference for the Gaussian (2)
• Combined with a Gaussian prior over 𝜇𝜇,

• this gives the posterior

• Completing the square over 𝜇𝜇, we see that

Bayesian Inference for the Gaussian (3)
• … where

• Note:
Bayesian Inference for the Gaussian (4)
• Example: for N = 0, 1, 2 and 10.
Bayesian Inference for the Gaussian (5)
• Sequential Estimation

• The posterior obtained after observing N-1 data points becomes the
prior when we observe the N th data point.
Bayesian Inference for the Gaussian (6)
• Now assume 𝜇𝜇 is known. The likelihood function for 𝜆𝜆 =1/ 𝜎𝜎 2 is given
by

• This has a Gamma shape as a function of 𝜆𝜆.

• (cf. Appendix for more on Bayesian inference of Gaussian)

Wharton Business Analytics Coursera Quiz
100% (2)
Wharton Business Analytics Coursera Quiz
155 pages
Analysis of Heights of Singers
100% (4)
Analysis of Heights of Singers
14 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
4.ML Estimation
No ratings yet
4.ML Estimation
19 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Statistical Learning Methods
No ratings yet
Statistical Learning Methods
28 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
PDF Estimation 23mar23
No ratings yet
PDF Estimation 23mar23
45 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
PDF Estimation Corr
No ratings yet
PDF Estimation Corr
43 pages
Densityestimation
No ratings yet
Densityestimation
28 pages
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
No ratings yet
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
34 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
02 Review Estimation 2
No ratings yet
02 Review Estimation 2
36 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
Estimation Theory
100% (1)
Estimation Theory
8 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Likelihood Frequentist
No ratings yet
Likelihood Frequentist
27 pages
Max Likelihood
No ratings yet
Max Likelihood
4 pages
EE769 13 Density Estimation and Sampling
No ratings yet
EE769 13 Density Estimation and Sampling
19 pages
Lecture 6
No ratings yet
Lecture 6
123 pages
Theoretical Statistics. Lecture 15.: M-Estimators. Consistency of M-Estimators. Nonparametric Maximum Likelihood
No ratings yet
Theoretical Statistics. Lecture 15.: M-Estimators. Consistency of M-Estimators. Nonparametric Maximum Likelihood
20 pages
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
No ratings yet
Chapter 9 - Statistical Estimat - 2016 - Introduction To Statistical Machine Lea
8 pages
MLE and MAP Classifier
No ratings yet
MLE and MAP Classifier
55 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Estimation 4
No ratings yet
Estimation 4
16 pages
Csci567 Hw1 Spring 2016
No ratings yet
Csci567 Hw1 Spring 2016
9 pages
Main Parameterestimation PDF
No ratings yet
Main Parameterestimation PDF
73 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
Pa 01 Density Estimation
No ratings yet
Pa 01 Density Estimation
25 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Unsupervised Learning Clustering Math
No ratings yet
Unsupervised Learning Clustering Math
28 pages
Inf 2
No ratings yet
Inf 2
37 pages
Dr. Arslan Shaukat
No ratings yet
Dr. Arslan Shaukat
18 pages
CS464 Ch3 Estimation
No ratings yet
CS464 Ch3 Estimation
56 pages
4.4 Parametric and Non-Parametric Estimator
No ratings yet
4.4 Parametric and Non-Parametric Estimator
47 pages
Assignment 10 Solution
No ratings yet
Assignment 10 Solution
8 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
Lecture 3 ML - Optimization
No ratings yet
Lecture 3 ML - Optimization
32 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Lean Six Sigma Green Belt Certification Training Manual CSSC 2018 06b (1) (201 250)
No ratings yet
Lean Six Sigma Green Belt Certification Training Manual CSSC 2018 06b (1) (201 250)
50 pages
Annex: Calculation of Mean and Standard Deviation: QC Ranges. An Example Is Shown On The Next Page
No ratings yet
Annex: Calculation of Mean and Standard Deviation: QC Ranges. An Example Is Shown On The Next Page
3 pages
Unit 5
No ratings yet
Unit 5
3 pages
Panel Cointegration Tests - A Review PDF
No ratings yet
Panel Cointegration Tests - A Review PDF
37 pages
Statistics and Probablity
No ratings yet
Statistics and Probablity
20 pages
Football Scores The Poisson Distribution and 30 Ye
No ratings yet
Football Scores The Poisson Distribution and 30 Ye
7 pages
STATISTICS Test
No ratings yet
STATISTICS Test
6 pages
Test
No ratings yet
Test
31 pages
Sst4e Tif 10 PDF
No ratings yet
Sst4e Tif 10 PDF
21 pages
Applied Quantitative Analysis in Education and The Social Sciences
No ratings yet
Applied Quantitative Analysis in Education and The Social Sciences
2 pages
Decompositin Lengkap PDF
No ratings yet
Decompositin Lengkap PDF
18 pages
Uniform Dist
No ratings yet
Uniform Dist
4 pages
ch13 Assignment
No ratings yet
ch13 Assignment
8 pages
Simple Linear Regression With Example Problem
No ratings yet
Simple Linear Regression With Example Problem
12 pages
ADT123
No ratings yet
ADT123
9 pages
Best-Fit Probability Distributions and Return Periods, Northern Cyprus
No ratings yet
Best-Fit Probability Distributions and Return Periods, Northern Cyprus
10 pages
Pearson Correlation Coefficient Calculator
No ratings yet
Pearson Correlation Coefficient Calculator
2 pages
Finhack 2018 - ATM Cash Optimization (Dilan) v2
No ratings yet
Finhack 2018 - ATM Cash Optimization (Dilan) v2
23 pages
Poisson Distribution Table More Exact
No ratings yet
Poisson Distribution Table More Exact
2 pages
Statistics & Probability: Second Semester
No ratings yet
Statistics & Probability: Second Semester
11 pages
4 Queuing System
No ratings yet
4 Queuing System
43 pages
STA416 - Topic 4 - 3
No ratings yet
STA416 - Topic 4 - 3
40 pages
Pareto Distributions Second Edition Barry C Arnold Download
No ratings yet
Pareto Distributions Second Edition Barry C Arnold Download
52 pages
Descriptive Statistics: X N X N X ... X X X
No ratings yet
Descriptive Statistics: X N X N X ... X X X
8 pages
Sist Iso 16269 4 2014
No ratings yet
Sist Iso 16269 4 2014
15 pages
Forecasting
No ratings yet
Forecasting
50 pages
Sta301 Mid Term Solved Mcqs With References
No ratings yet
Sta301 Mid Term Solved Mcqs With References
29 pages
Econometrics Project: Pana Elena Bianca Group 137
No ratings yet
Econometrics Project: Pana Elena Bianca Group 137
17 pages

M3 DensityEstimation v1

Uploaded by

M3 DensityEstimation v1

Uploaded by

M3.

• Density estimation (informally aka learning the data distbn.):

Setup (training data):

• Density estimation (informally aka learning the data distbn.):

• Input: 𝑁𝑁 data points x1 , … , x𝑁𝑁 𝑇𝑇 assumed to be iid samples from an

• Output: Probability density/distribution 𝔻𝔻 that “best fits” the data

• Has desirable properties, mainly consistency (for “most” densities).

• and solve to obtain

• Gradient of 𝐿𝐿𝐿𝐿 𝜇𝜇, Λ : = 𝐿𝐿𝐿𝐿 𝜇𝜇, Σ −1

[From Secn. 13.5 of The Multivariate Gaussian from https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/other-readings/chapter13.pdf]

• Ensure , use a Lagrange multiplier.

• An Aside: Relation between Bernoulli and Binomial distribution

• ⇒ Prediction: all future tosses will land heads up

• Bayesian approach in practice:

• Bayesian approach in practice:

(converges for z > 0) [CMB]

Beta distribution is the conjugate prior for the parameter

Conjugate prior for the

• This has a Gaussian shape as a function of 𝜇𝜇 (but it is not a

• this gives the posterior

• Completing the square over 𝜇𝜇, we see that

• This has a Gamma shape as a function of 𝜆𝜆.

• (cf. Appendix for more on Bayesian inference of Gaussian)

You might also like