0% found this document useful (0 votes)

14 views23 pages

01 Intro Densities

The document discusses various aspects of machine learning, focusing on pattern recognition, unsupervised learning, and density estimation. It covers the importance of feature representation, model parameters, and the challenges of high-dimensional spaces, as well as techniques for sampling from probability distribution functions. Additionally, it introduces non-parametric density estimation methods, including histograms and kernel-based approaches, while emphasizing the significance of hyperparameter optimization and the bias-variance tradeoff in model selection.

Uploaded by

sanxchep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views23 pages

01 Intro Densities

Uploaded by

sanxchep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Lecture Pattern Analysis

Part 01: Introduction and First Sampling

Christian Riess
IT Security Infrastructures Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg
April 18, 2024
Pattern Recognition Recap and Unsupervised Learning

• Remember the steps of the classical pattern recognition pipeline:

Prepro- Feature Classi-

(Data) Sampling (Class)
cessing Extraction fication

→x f (x) y

• Fundamental ML assumption: good feature representations map similar

objects to similar features
• Classifier training is almost always supervised,
i.e. a training sample is a tupel (xi , yi ) (cf. lecture “Pattern Recognition”)
• Unsupervised ML works without labels, i.e., it only operates on inputs (xi )
• Unsup. ML can be seen as representation or summary of a distribution
• So, “classification versus representation” could be a jingle to further distinguish
PR from PA (cf. our discussion in the joint meeting)

C. Riess | Part 01: Introduction and First Sampling April 18, 2024 1
Further Aspects of Interest: Parameters and Hyperparameters

• Every machine learning model has parameters

• For example, linear regression predicts with d parameters βi a d-dimensional
hyperplane that predicts y for a d-dimensional input x̃ = (1, x1 , . . . , xd −1 )⊤ ,
d
X
⊤
y = β x̃ = βi · x̃i (1)
i =0

• Less parameters make the model more robust, more parameters make the
model more flexible
• To continue the example, consider linear regression on a basis expansion of
a scalar unknown x, e.g., fitting a d-dimensional polynomial to the vector
(1, x , x 2 , . . . , x d ): larger d enables more complex polynomials
• The dimension d is a hyperparameter, i.e., a parameter that somehow
parameterizes the choice of parameters
C. Riess | Part 01: Introduction and First Sampling April 18, 2024 2
Further Aspects of Interest: Local Operators and High
Dimensional Spaces

• Thinking about model flexibility: more “local” models are more flexible, but
require more parameters and are less robust
• How can we find a good trade-off? This is the model selection problem

• Another issue: all local models perform poorly in higher dimensional spaces
• A probably surprising consequence is that high-dimensional methods must
be non-local along some direction

• Also summarization methods (clustering) performs poorly in higher

dimensional spaces

• All these points motivate to also look into dimensionality reduction

C. Riess | Part 01: Introduction and First Sampling April 18, 2024 3
A Study of Distributions

• In PA, we look at data in feature spaces

• To understand and manipulate these data points, they are mathematically
commonly represented as probability distribution functions (PDFs)
• Additionally, inference allows to draw conclusions from distributions

• Common operations on distributions:

• Fitting a distribution model to the data (parametric or non-parametric)
represents the data as a distribution
• Sampling from a distribution creates new data points that follow the
distribution (i.e., they are plausible)
• Factorizing a distribution is a key technique for reducing the complexity

C. Riess | Part 01: Introduction and First Sampling April 18, 2024 4
Recap on Probability Vocabulary

• Let X , Y denote two random variables

• Important vocabulary and equations are:
Joint distribution p (X , Y )

Conditional distribution of X given Y p(X |Y )

Sum rule / marginalization over Y p (X ) p(X , Y )

P
=
Y

Product rule p (X , Y ) = p(Y |X ) · p(X )

p(X |Y )·p(Y )
Bayes rule p(Y |X ) = p (X )
likelihood·prior
Bayes rule in the language of ML posterior = evidence

• Please browse the book by Bishop, Sec. 1.2.3, to refresh your mind if
necessary!

C. Riess | Part 01: Introduction and First Sampling April 18, 2024 5
Sampling from a PDF

• Oftentimes, it is necessary to draw samples from a PDF

• Example:
• Logistic Regression fits a single regression curve to the data (cf. PR)
• Bayesian Logistic Regression fits a distribution of curves

The distribution is narrow at observations (crosses), and wider otherwise

• Sample curves from the distribution to obtain its spread (“uncertainty”)

• Special PDFs like Gaussians have closed-form solutions for sampling

• We look now at a sampling method that works on arbitrary PDFs
C. Riess | Part 01: Introduction and First Sampling April 18, 2024 6
Idea of the Sampling Algorithm

• The key idea is to use the cumulative density function (CDF) P (z ) of p(X ),

Zz
P (z ) = p(X )dX (2)
−∞

• A sample uniformly drawn from the CDF y -axis intersects P (z ) at location z

• This z position is our random draw from p(x ):

1 1

p(X ) P (Z ) P (Z )

X z z

C. Riess | Part 01: Introduction and First Sampling April 18, 2024 7
Sampling Algorithm

• Discretize the domain of the PDF p(X )

• Linearize p(X ) if it is multi-dimensional
• Calculate the cumulative density function P (z ) of p(X ),
the range of that CDF must be between 0 to 1
• Draw a uniformly distributed number u between 0 and 1
• The sample from the PDF is

z∗ = argmin P (z ) ≥ u (3)
z

• This method is not used in high-dim. spaces. Can you find the reason?
• We will later look at more advanced sampling strategies

C. Riess | Part 01: Introduction and First Sampling April 18, 2024 8
Lecture Pattern Analysis

Part 02: Non-Parametric Density Estimation

Christian Riess
IT Security Infrastructures Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg
April 18, 2024
Introduction

• Density Estimation = create a PDF from a set of samples

• The lecture Pattern Recognition introduces parametric density estimation:
• There, a parametric model (e.g., a Gaussian) is fitted to the data
• Maximum Likelihood (ML) estimator:

θ ∗ = argmax p(x1 , . . . , xN |θ) (1)

• Maximum a Posteriori (MAP) estimator:

Bayes p(x1 , . . . , xN |θ) · p(θ)

θ ∗ = argmax p(θ|x1 , . . . , xN ) = (2)
θ p(x1 , . . . , xN )

• Browse the PR slides if you like to know more

• Parametric density estimators require a good function representation

• Non-parametric density estimators can operate on arbitrary distributions
C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 1
Non-Parametric Density Estimation: Histograms

• Non-parametric estimators do not use functions with a limited set of

parameters
• A simple non-parametric baseline is to create a histogram of samples1
• The number of bins is important to obtain a good fit

• Pro: Good for a quick visualization

• Pro: “Cheap” for many samples in low-dimensional space
• Con: Discontinuities at bin boundaries
• Con: Scales poorly to high dimensions (cf. curse of dimensionality later)
1
See introduction of Bishop Sec. 2.5

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 2

Improving on the Histogram Approach

• A kernel-based method and a nearest-neighbor method are slightly better

• Both variants share their mathematical framework:
• Let p(x) be a PDF in D-dim. space, and R a small region around x
R
→ The probability mass in R is p = p(x) dx
R
• Assumption 1: in R are many points → p is a relative frequency,
# points in R K
p = = (3)
total # of points N

• Assumption 2: R is small enough s.t. p(x) is approximately constant,

Z Z
p = p(x) dx = p(x) dx = p(x) · V (4)
R R

• Both assumptions together are slightly contradictory, but they yield

K # points in R
p(x) = = (5)
N·V total # of points · Volume of R
C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 3
Kernel-based DE: Parzen Window Estimator (1/2)

• The Parzen window estimator fixes V and leaves K /N variable2

• D-dimensional Parzen window kernel function (a.k.a. “box kernel”):

1 if |ui | ≤ 12 ∀i = 1, . . . , D
k (u) = (6)
0 otherwise

• Calculate K with this kernel function:

N
X x − xi
K (x) = k (7)
h
i =1

where h is a scaling factor that adjusts the box size

• Hence, the whole density is
N
1 X 1 x − xi
p(x) = k (8)
N hD h
i =1

2
See Bishop Sec. 2.5.1

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 4

Kernel-based DE: Parzen Window Estimator (2/2)

• The kernel removes much of the discretization error of the fixed-distance

histogram bins, but it still leads to blocky estimates
• Replacing the box kernel by a Gauss kernel further smooths the result,

N D /2 ∥x − xi ∥22

1 X 1
p(x) = · exp − , (9)
N 2π h2 2
2h
i =1

where h is the standard deviation of the Gaussian

• Mathematically, also any other kernel is possible if these conditions hold:

k (u) ≥0 (10)

Z
k (u) du =1 (11)

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 5

K-Nearest Neighbors (k-NN) Density Estimation

• Recall our derived equation for estimating the density

K # points in R
p(x) = = (12)
N·V total # of points · Volume of R

• The Parzen window estimator fixes V , and K varies

• The k-Nearest Neighbors estimator fixes K , and V varies
• k-NN calculates V from the distance of the K nearest neighbors3

• Note that both the Parzen window estimator and the k-NN estimator are
“non-parametric”, but they are not free of parameters
• The kernel scaling h and the number of neighbors k are hyper-parameters,
i.e., some form of prior knowledge to guide the model creation
• The model parameters are the samples themselves. Both estimators need to
store all samples, which is why they are also called memory methods
3
See Bishop Sec. 5.2.2

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 6

First Glance at the Model Selection Problem
• Optimizing the hyperparameters is also called Model Selection Problem
• Hyperparameters must be optimized on a held-out part of the training data,
the validation set:
train on training data with different hyperparameter sets hi , evaluate on
validation data to get the best performing set h∗ via maximum likelihood (ML)
• What if hyperparameters are optimized directly on the training data?
Then the most complex (largest, most flexible) model wins, because it
achieves the lowest training error

• When training data is limited, it can be better exploited with cross validation
• In this case, the data is subdivided into k folds (partitions). Do k training/eval.
runs (using each fold once for validation and the rest for training), and select
that h∗ with ML across all folds
• The choice of k is a hyper-hyperparameter that affects the quality of the
predicted error (check Hastie/Tibshirani/Friedman Chap. 7 if curious)
C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 7
Cross Validation (CV) for Unsupervised Methods?

• CV requires an objective function for ML, hence it is almost exclusively used

on supervised tasks, where labels make performance measurement trivial
• Density estimation is unsupervised, hence we need an additional trick to
measure its performance
• The trick is to optimize the DE hyperparameters by using the prediction of
held-out samples as objective function:
• Split the data into J folds:
j j j
Strain = S \ {x N ·j , . . . x
⌊J⌋ ⌊ NJ ⌋·(j +1)−1 } , Stest = S \ Strain
• Let α be the unknown hyperparameters, and
j
let pj (x|α) be the density estimate for samples Strain on hyperparams α
• Then, the ML estimate is
J −1
Y Y
α∗ = argmax pj (x|α) (13)
α
j =0 x∈S j
test

• In practice, take the logarithm (“log likelihood”) to mitigate numerical issues

→ the product becomes a sum
C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 8
Lecture Pattern Analysis

Part 03: Bias and Variance

Christian Riess
IT Security Infrastructures Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg
April 18, 2024
Introduction

• The motivation behind the hyperparameter optimization is to aim for

generalization to new data
• For kernel density estimation, the pitfalls are:
• Too large kernel: covers all space with some probability mass, but the density
is too uniform (does not represent the structure)
• Too small kernel: closely represents the training data, but might assign too low
probabilities in areas without training data
• In contrast, the “optimal”1 kernel size: represents the structure of the training
data and also covers unobserved areas to some extent
• This is an instance of the bias-variance tradeoff2

1
This may sound as if there were a unique minimum, maybe even of a convex function — in practice, there is not that one single best solution; so read this as a
somewhat hypothetical statement
2
See PR lecture or Hastie/Tibshirani/Friedman Sec. 7-7.3 if more details are desired

C. Riess | Part 03: Bias and Variance April 18, 2024 1

Bias and Variance in Regression

• Bias is the square of the average deviation of an estimator from the ground
truth
• Variance denotes is the variance of the estimates, i.e., the expected squared
deviation from the estimated mean3
• Informal interpretation:
• High bias indicates model undercomplexity: we obtain a poor fit to the data
• High variance indicates model overcomplexity: the fit also models not just
the structure of the data, but also its noise
• Higher model complexity (= more model parameters) tends to lower bias and
higher variance
• We will usually not be able to get bias and variance simultaneously to 0
• Regularization increases bias and lowers variance

3
See Hastie/Tibshirani/Friedman Sec. 7.3 Eqn. (7.9) for a detailed derivation

C. Riess | Part 03: Bias and Variance April 18, 2024 2

Sketches for Model Undercomplexity and Overcomplexity

• Note that this example implicitly contains a smoothness assumption

• It does not claim that there is a universally best fit on arbitrary input
distributions (because of the No-Free-Lunch Theorem)
C. Riess | Part 03: Bias and Variance April 18, 2024 3
Transferring Bias and Variance to our Density Estimators

• Our kernel framework can directly replicate these investigations by

retargeting our kernels to regression or classification:
• Regression:
• Estimate f (x) at position x as a kernel-weighted sum of the neighbors or
• as a k -NN mean of k neighbors
• Classification:
• Estimate for classes c1 and c2 individual densities, evaluate pc1 (x) and pc2 (x),
and select the class with higher probability or
• Select the majority class within k nearest neighbors
• We will then observe that
• Larger kernel support / larger k increases bias and lowers variance
• Smaller kernel support / smaller k lowers bias and increases variance

• Analogously, we can use the notion of bias/variance also on our initial

unsupervised density estimation task

C. Riess | Part 03: Bias and Variance April 18, 2024 4

Grade 3 Idioms 4
No ratings yet
Grade 3 Idioms 4
2 pages
Homework Nonprm Solution
No ratings yet
Homework Nonprm Solution
2 pages
Network Communication Types: by Ahmed El Hefny
100% (1)
Network Communication Types: by Ahmed El Hefny
15 pages
Pa 01 Density Estimation
No ratings yet
Pa 01 Density Estimation
25 pages
Histogram Density Estimation
No ratings yet
Histogram Density Estimation
17 pages
AA1 Tema4
No ratings yet
AA1 Tema4
37 pages
Non-Parametric Methods
No ratings yet
Non-Parametric Methods
51 pages
CpE646 7v3 PDF
No ratings yet
CpE646 7v3 PDF
40 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
Empirical Finance1
No ratings yet
Empirical Finance1
31 pages
Densityestimation
No ratings yet
Densityestimation
28 pages
TEAA - Memory Based Tecniques
No ratings yet
TEAA - Memory Based Tecniques
23 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
ML Unit-4
No ratings yet
ML Unit-4
29 pages
Lec 04
No ratings yet
Lec 04
70 pages
Racine - 2007 - Nonparametric Econometrics A Primer
No ratings yet
Racine - 2007 - Nonparametric Econometrics A Primer
88 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Geoff Bohling NonParClass
No ratings yet
Geoff Bohling NonParClass
26 pages
U4 ProbabilityDensityEstimation
No ratings yet
U4 ProbabilityDensityEstimation
6 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
No ratings yet
Modern Multivariate Statistical Techniques: - Nonparametric Density Estimation Xi Chen Nov 6
20 pages
13 Density Estimation Note
No ratings yet
13 Density Estimation Note
48 pages
Merged Exercises
No ratings yet
Merged Exercises
238 pages
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
No ratings yet
The Study of Different Types of Kernel Density Estimators: Minge Sha, Yonggang Xie
5 pages
Lec 10 NN
No ratings yet
Lec 10 NN
10 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
2019BurkovTheHundred pageMachineLearnin2
No ratings yet
2019BurkovTheHundred pageMachineLearnin2
33 pages
Density Estimation Is A Statistical Technique Used
No ratings yet
Density Estimation Is A Statistical Technique Used
16 pages
Non Parametric Density Estimation
No ratings yet
Non Parametric Density Estimation
4 pages
Lecture 12
No ratings yet
Lecture 12
4 pages
Lecture Notes On Bayesian Nonparametrics: Version: May 16, 2014
No ratings yet
Lecture Notes On Bayesian Nonparametrics: Version: May 16, 2014
108 pages
BNP PDF
No ratings yet
BNP PDF
108 pages
Conditional Density Estimation With Neural Network
No ratings yet
Conditional Density Estimation With Neural Network
41 pages
Chap 4
No ratings yet
Chap 4
21 pages
MT2023 Sol
No ratings yet
MT2023 Sol
8 pages
Tabak Turner
No ratings yet
Tabak Turner
20 pages
Robust Kernel Density Estimation-Kim and Scott
No ratings yet
Robust Kernel Density Estimation-Kim and Scott
37 pages
A Short Course On Nonparametric Curve Estimation R PDF
No ratings yet
A Short Course On Nonparametric Curve Estimation R PDF
114 pages
UNIT2SVMKNN
No ratings yet
UNIT2SVMKNN
31 pages
Estimating The Support of A High-Dimensional Distribution
No ratings yet
Estimating The Support of A High-Dimensional Distribution
28 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
16 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Parameter Estimation - PR
No ratings yet
Parameter Estimation - PR
66 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
I2ml3e Chap8
No ratings yet
I2ml3e Chap8
28 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Review of Kernel Density Estimation
No ratings yet
Review of Kernel Density Estimation
35 pages
Tema5 Teoria-2830
No ratings yet
Tema5 Teoria-2830
57 pages
PDF Estimation 23mar23
No ratings yet
PDF Estimation 23mar23
45 pages
CS-AI LECUTRE NOTES Unsupervised Learning-03
No ratings yet
CS-AI LECUTRE NOTES Unsupervised Learning-03
71 pages
PDF Estimation Corr
No ratings yet
PDF Estimation Corr
43 pages
2024-Fourier Basis Density Model
No ratings yet
2024-Fourier Basis Density Model
5 pages
Pattern Recognition 21BR551 MODULE 03 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 03 NOTES
16 pages
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
No ratings yet
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
14 pages
9.few Shot Learning
No ratings yet
9.few Shot Learning
38 pages
Assignment 1 Pa
No ratings yet
Assignment 1 Pa
6 pages
Lecture1 Recap MRI
No ratings yet
Lecture1 Recap MRI
46 pages
BioSig WS24-25 Lecture 01 2024-10-15
No ratings yet
BioSig WS24-25 Lecture 01 2024-10-15
53 pages
BioSig Student Assignment Guide
No ratings yet
BioSig Student Assignment Guide
6 pages
WMC MCQs 3
No ratings yet
WMC MCQs 3
64 pages
ML QB 5
No ratings yet
ML QB 5
44 pages
Anyaya NG Imperyalista Philippine Writers Series by Elynia S. Mabanglo
No ratings yet
Anyaya NG Imperyalista Philippine Writers Series by Elynia S. Mabanglo
9 pages
Math 6 - Q1 - W3.docx Edited
No ratings yet
Math 6 - Q1 - W3.docx Edited
11 pages
Hypnosis Hypnotic Gaze Braco
No ratings yet
Hypnosis Hypnotic Gaze Braco
7 pages
Mathematics Stage 9
No ratings yet
Mathematics Stage 9
4 pages
The History of Volleyball Project
0% (1)
The History of Volleyball Project
7 pages
LCF Paper High Strength Steel-2024
No ratings yet
LCF Paper High Strength Steel-2024
12 pages
Kiss That Frog Book Review
No ratings yet
Kiss That Frog Book Review
6 pages
Zoo Conservation Programmes
No ratings yet
Zoo Conservation Programmes
4 pages
Java Fundamentals PDF
No ratings yet
Java Fundamentals PDF
106 pages
3MIXTATIN
No ratings yet
3MIXTATIN
45 pages
NxOpen Programming MasterCourse CADVertex
No ratings yet
NxOpen Programming MasterCourse CADVertex
10 pages
Knowledge Booster 4 Unit-6
No ratings yet
Knowledge Booster 4 Unit-6
7 pages
Durapac - Pumps - LR
No ratings yet
Durapac - Pumps - LR
29 pages
Is Iso 1066 1975
No ratings yet
Is Iso 1066 1975
9 pages
Disciple Making and Church Planting
100% (1)
Disciple Making and Church Planting
5 pages
Sacred Places To Visit in Madinah Ul Nabi SAW
No ratings yet
Sacred Places To Visit in Madinah Ul Nabi SAW
41 pages
RIL Index 12-JUN-2020
No ratings yet
RIL Index 12-JUN-2020
36 pages
Design Pattern in Option Pricing Part I
No ratings yet
Design Pattern in Option Pricing Part I
3 pages
Ge CWP PH 2 O&m Manual
No ratings yet
Ge CWP PH 2 O&m Manual
2 pages
EN ACS880 DDC CTRL PRG YDCLX FW A A4
No ratings yet
EN ACS880 DDC CTRL PRG YDCLX FW A A4
230 pages
Plate Buckling Slides
No ratings yet
Plate Buckling Slides
117 pages
TMMIN SPEX - Working Calender On Feb '2025
No ratings yet
TMMIN SPEX - Working Calender On Feb '2025
1 page
Adbury Survey of Training Experiences and Clinical Practice in Assessment For Autism Spectrum Disorder by Neuropsychologists
No ratings yet
Adbury Survey of Training Experiences and Clinical Practice in Assessment For Autism Spectrum Disorder by Neuropsychologists
19 pages
Yatin Pandya: An Architect
No ratings yet
Yatin Pandya: An Architect
1 page
1-1. Location: 1. Background To Nairobi City
No ratings yet
1-1. Location: 1. Background To Nairobi City
9 pages
The Art of Support
No ratings yet
The Art of Support
203 pages

01 Intro Densities

Uploaded by

01 Intro Densities

Uploaded by

Lecture Pattern Analysis

Part 01: Introduction and First Sampling

• Remember the steps of the classical pattern recognition pipeline:

Prepro- Feature Classi-

• Fundamental ML assumption: good feature representations map similar

• Every machine learning model has parameters

• Also summarization methods (clustering) performs poorly in higher

• All these points motivate to also look into dimensionality reduction

• In PA, we look at data in feature spaces

• Common operations on distributions:

• Let X , Y denote two random variables

Conditional distribution of X given Y p(X |Y )

Sum rule / marginalization over Y p (X ) p(X , Y )

Product rule p (X , Y ) = p(Y |X ) · p(X )

• Oftentimes, it is necessary to draw samples from a PDF

The distribution is narrow at observations (crosses), and wider otherwise

• Special PDFs like Gaussians have closed-form solutions for sampling

• A sample uniformly drawn from the CDF y -axis intersects P (z ) at location z

• Discretize the domain of the PDF p(X )

Part 02: Non-Parametric Density Estimation

• Density Estimation = create a PDF from a set of samples

θ ∗ = argmax p(x1 , . . . , xN |θ) (1)

• Maximum a Posteriori (MAP) estimator:

Bayes p(x1 , . . . , xN |θ) · p(θ)

• Browse the PR slides if you like to know more

• Parametric density estimators require a good function representation

• Non-parametric estimators do not use functions with a limited set of

• Pro: Good for a quick visualization

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 2

• A kernel-based method and a nearest-neighbor method are slightly better

• Assumption 2: R is small enough s.t. p(x) is approximately constant,

• Both assumptions together are slightly contradictory, but they yield

• The Parzen window estimator fixes V and leaves K /N variable2

• Calculate K with this kernel function:

where h is a scaling factor that adjusts the box size

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 4

• The kernel removes much of the discretization error of the fixed-distance

where h is the standard deviation of the Gaussian

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 5

• Recall our derived equation for estimating the density

• The Parzen window estimator fixes V , and K varies

C. Riess | Part 02: Non-Parametric Density Estimation April 18, 2024 6

• CV requires an objective function for ML, hence it is almost exclusively used

• In practice, take the logarithm (“log likelihood”) to mitigate numerical issues

Part 03: Bias and Variance

• The motivation behind the hyperparameter optimization is to aim for

C. Riess | Part 03: Bias and Variance April 18, 2024 1

C. Riess | Part 03: Bias and Variance April 18, 2024 2

• Note that this example implicitly contains a smoothness assumption

• Our kernel framework can directly replicate these investigations by

• Analogously, we can use the notion of bias/variance also on our initial

C. Riess | Part 03: Bias and Variance April 18, 2024 4

You might also like