0% found this document useful (0 votes)
106 views45 pages

Lecture 3:introduction To Data Analysis and Machine Learning

The document discusses the goals of data analysis and machine learning. It states that the goal of data analysis is to determine parameters from data by combining new data with prior information. The goal of machine learning is to predict parameters of new data given existing labeled data through supervised learning or finding patterns through unsupervised learning. It also discusses summarizing the posterior information from data analysis, such as the mean, variance, and credible intervals, and predicting future observations based on current data and models through posterior predictive distributions.

Uploaded by

masacremasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views45 pages

Lecture 3:introduction To Data Analysis and Machine Learning

The document discusses the goals of data analysis and machine learning. It states that the goal of data analysis is to determine parameters from data by combining new data with prior information. The goal of machine learning is to predict parameters of new data given existing labeled data through supervised learning or finding patterns through unsupervised learning. It also discusses summarizing the posterior information from data analysis, such as the mean, variance, and credible intervals, and predicting future observations based on current data and models through posterior predictive distributions.

Uploaded by

masacremasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

LECTURE 3:INTRODUCTION TO DATA

ANALYSIS AND MACHINE LEARNING


• Goal of data analysis: to determine some parameters from the data
• We want to combine new data with previous information on the
parameters (prior: theoretical or empirical)
• We multiply the likelihood of parameters given the data with the
prior to get the posterior
• Goal of machine learning: we want to predict parameters of new data
given some existing labeled data (supervised learning),searching for
patterns and dimensionality reduction (unsupervised learning)

• The goals of statistics and ML are often similar or related


• Methodologies and language are often very different
1
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Goals of Data Analysis

• Data analysis: summarizing the posterior information: mean or


mode, variance… Typically we are interested in more than mean and
variance (skewness, curtosis, full PDF)
• Posterior intervals: e.g. 95% credible interval can be constructed as
central (relative to median) or highest posterior density. Typically
these agree, but:

2
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Posterior PDF p(λ|D,H ) contains all information on !

!* = Maximum A
Posterior or MAP

p( |D, H) / p(D| , H) if p( ) / constant


If p(!) ∝ constant (uniform prior) → !* = maximum likelihood
estimator (MLE) and MLE=MAP
d
p( |D, H) = 0 Approximate p(!|D) as a Gaussian around ! *
d = ⇤

d2 1
• Error estimate: lnp( |D, H) = 2
d 2 = ⇤

• Laplace approximation: = ⇤
±
3
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Posterior predictive distribution
• Predicting future observation conditional on current data y and model
posterior: we marginalize over all models at fixed current data y
• We have seen it in example 2.6 (lecture 2 slide 28)
• Example: we measure a quantity but each measurement has some error
!. After N measurements we get mean µ1 and variance "1=!/N1/2. Next
measurement will be located around µ1 combining the two errors.

Two sources of uncertainty! Will be discussed further when we cover


hierarchical models

4
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Modern statistical methods (Bayesian or not)
Gelman et al., Bayesian Data Analysis, 3rd edition

5
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
INTRODUCTION TO MODELING OF DATA

• We are given N number of data measurements (xi,yi)


• Each measurement comes with an error estimate !i
• We have a parametrized model for the data y = y(xi)
• We think the error probability is Gaussian and the
measurements are uncorrelated:

(y(xi ) yi )2
1 2 2
p(yi ) = p 2
e i
2⇡ i
Y
p(~y ) = p(yi )
i

6
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
We can parametrize the model in terms of M free parameters
y(xi|a1,a2,a3,…,aM)

Bayesian formalism gives us the full posterior information on


the parameters of the model
Y
p(~y |~a) = p(yi |~a) = L(~a)
i
Q
i p(yi |~a)p(~a)
p(a1 , ..., aM |~y ) =
p(yi )

We can assume a flat prior p(a1,a2,a3,…,aM) = constant


In this case posterior proportional to likelihood L

Normalization (evidence, marginal) p(yi) not needed if we just


need relative posterior density
7
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Maximum likelihood estimator (MLE)

• Instead of the full posterior we can ask what is the best fit
value of parameters a1,a2,a3,…,aM

• We can define this in different ways: mean, median, mode

• Choosing the mode (peak posterior or peak likelihood) means


we want to maximize the likelihood: maximum likelihood
estimator (or MAP for non-uniform prior)

@L @lnL
MLE : = 0 or =0
@~a @~a

8
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
(y(xi ) yi )2
1 2 2
Maximum'likelihood'estimator p(yi ) = p2⇡ 2 e i

i
for'gaussian'errors Y
p(~y ) = p(yi )
i
X n (yi y(xi |a1 , ..., aM ))2 o
2lnL = 2 + ln i
i i

}
2
Since !i does not depend on ai, MLE means minimizing "2 wrt ak

@ 2 X yi y(xi ) @y(xi )
=0 ! 2 =0
@ak i i @ak

This is a system of M nonlinear equations for M unknowns


9
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Fitting&data&to&a&straight&line:&model&is&a&line

Linear Regression

Measures how well the model agrees with the data

Minimize !2":"

10
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Define:

! ! !
Matrix S Sx a Sy
Form:
=
Sx Sxx b Sxy
Solve this with linear algebra
11
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
!
! ! ! 1 S Sx
C =
S Sx a Sy Sx Sxx
= !
Sx Sxx b Sxy 1 Sxx Sx
C=
Sx S

Solution: Define

This gives best fit â & b̂

12
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
What about the errors?
• We approximate the log posterior around its peak with a
quadratic function
• The posterior is thus approximated as a Gaussian
• This goes under the name Laplace approximation
• Note that the errors need to be described as a matrix
• It is exact for linear parameters (such as a and b)

13
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
MLE/MAP+Laplace
2 · lnp(a, b|yi ) = 2 · lnL(a, b)
Taylor expansion around the peak ( â & b̂ ): first derivative is 0
Let a = x1 , b = x2
1 X @ 2 lnL
2 · lnL(x1 , x2 ) = 2 · lnL(xˆ1 , xˆ2 ) 2· xi xj
2 i,j=1,2 @xi @xj xi =xˆi

where xi = xi x̂i
1X
xi Cij 1 xj
2 ij
Note: h xi xj i = Cij

Gaussian posterior approximation: we are dropping terms beyond 2nd order


@ 2 lnL
⌘ Cij 1 (C 1
= ↵ is called precision matrix.)
@xi @xj (Precision*matrix*also*called*Hessian*matrix)
P
1
xi Cij 1 xj
L / e 2 ij
14
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
2
2 · lnL =

@2 2 X 1
=2 2 = 2S
@a2 i i

@2 2 X x2
i
= 2 2 = 2Sxx
@b2 i i

@2 2 X xi
=2 2 = 2Sx
@a@b i i

! !
1 S Sx 1 Sxx Sx
C = C=
Sx Sxx Sx S
S51 is+error+on+a+at+a+fixed+b
Define

Marginalized+error+on+a:+integrate+out+b
Marginal+errors+are+larger:+!a2>S51 15
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Show
Z ⇥ ⇤ 2
1 (b b̂)
1
(a â) 2
Caa1 +(a â)(b b̂)Cab1 +(b b̂) 2
Cbb1
da · e 2 / e 2 Cbb

(Complete the square in a)

Solution:
h Cab1 2 Cab2 2 i 1 2
Caa1 ( a+ 1 b) 2 b + C bb b
Caa Caa
Z ⇥ 1 ⇤
1 1
2 Caa
1
(
C
a+ ab1 b) 2 p
da · e Caa = 2⇡Caa
1

⇥ C
2
+C
1
Caa1

1
2 b2 ab bb
1
1 b2
/ e Caa =e 2 Cbb

16
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Bayesian Posterior and Marginals
• The posterior distribution p(a,b|yi) is described as a 2-d C-1
ellipse in (a,b) plane

• At any fixed value of a (or b) the posterior of b (or a) is a


gaussian with variance [C-1bb(aa)]-1

• If we want to know the error on b (or a) independent of a (or b)


we need to marginalize over a (or b)

• This marginalization can be done analytically (completion of


squares), and leads to Cbb(aa) as the variance of b (or a)

• This will increase the error: Cbb(aa)>[C-1bb(aa)]-1

17
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Asymptotics theorems
(Le Cam 1953, adopted to Bayesian posteriors)

• At a fixed number of parameters posteriors approach a multi-


variate Gaussian in the large N limit (N: number of data points):
this is because the 2nd order Taylor expansion of ln L is more and
more accurate in this limit, i.e. we can drop 3rd and higher order
terms, by central limit theorem

• The marginalized means approach the true value and the


variance approaches the Fisher matrix, defined as ensemble
average of precision matrix <C-1>

• The likelihood dominates over the prior in large N limit

18
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Asymptotics theorems
(Le Cam 1953, adopted to Bayesian posteriors)
• There are caveats when this does not apply, e.g. when data are
not informative about a parameter or some linear combination of
them, when number of parameters M is comparable to N, when
posteriors are improper or likelihoods are unbounded… Always
exercise care!

• In practice the asymptotic limit is often not achieved for


nonlinear models, i.e. we cannot linearize the model across the
region of non-zero posterior: this is why we will use advanced
Bayesian methods to evaluate the posteriors instead of Gaussian
• It is useful to know the existence of this limit, but since we
cannot know ahead of time whether we are in this limit or not in
practice we cannot assume it: we will be doing full Bayesian
posteriors in this course, but we will also sometimes compare to
the gaussian limit 19
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Multivariate linear least squares

• We can generalize the model to a generic functional form


yi = a0X0(xi) + a1X1(xi) + … + aM-1XM-1(xi)

• The problem is linear in aj and can be nonlinear in xi,


e.g. Xj(xi)=xij

• We can define design matrix Aij = Xj(xi)/!i and

• bi = yi/ !i

20
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Design matrix

Credit: NR, Press et al. 21


PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Solution by normal equations d!2/dak=0

"kj=d2!2/dakdaj

To solve the normal equations to obtain best fit values and the precision matrix
we need to learn linear algebra numerical methods: topic of next lecture 22
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Gaussian posterior

Marginalization over nuisance parameters


• If we want to know the error on j-th parameter we need to
marginalize over all other parameters

• In analogy to 2-d case this leads to σj2 = Cjj

• So we need to invert the precision matrix ! = C-1 to get C


• Analytic marginalization is only possible for a multi-variate
Gaussian distribution: a great advantage of using a Gaussian

• If the posterior is not Gaussian it may be made more


Gaussian by a nonlinear transformation of the variable
23
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
What about multi-dimensional projections?
• Suppose we are interested in ν components of a, marginalizing
over remaining M- ν components.

• We take the components of C corresponding to ν parameters to


create ν x ν matrix Cproj
• Invert the matrix to get precision matrix Cproj-1
• Posterior distribution is proportional to
exp(-δaprojTCproj-1 δaproj/2),
which is distributed as exp(-!"2/2),
i.e. "2 with ν degrees of freedom

24
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Credible intervals under Gaussian posterior approx.
• We like to quote posteriors in terms of X% credible intervals
• For Gaussian likelihoods most compact posteriors correspond to a
constant change !"2 relative to MAP/MLE
• The intervals depend on the dimension: example for X=68

25
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
26
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
We rarely go above ν = 2 dimensions in projections
(difficult to visualize)

27
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Introduction to Machine Learning
• From%some%input%x,%output%can%be:%
• Summary%z:%unsupervised%learning%(descriptive,%hindsight)
• Prediction%y:%supervised%learning%(predictive,%insight)
• Action%a%to%maximize%reward%r:%reinforcement%learning%
(prescriptive,%foresight)
• Value%vs%difficulty%(although%
this%view%is%subjective)
• Supervised%learning:%
classification%and%regression
• Unsupervised%learning:%e.g.
dimensionality%reduction
Chris%Wiggins%taxonomy,%Gartner/Recht graph 28
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Data$Analysis$versus$Machine$Learning
• In#physical#sciences#we#usually#compare#data#to#a#physics#based#model#to#infer#
parameters#of#the#model.#This#is#often#an#analytic#model#as#a#function#of#physical#
parameters#(e.g.#linear#regression).#This#is#the#Bayesian#Data#Analysis#component#
of#this#course.#We#need#likelihood#and#prior.#
• In#machine#learning#we#usually#do#not#have#a#model,#all#we#have#is#data.#If#the#
data#is#labeled,#we#can#also#do#inference#on#a#new#unlabeled#data:#we#can#learn#
that#data#with#certain#value#of#the#label#have#certain#properties,#so#that#when#we#
evaluate#a#new#data#we#can#assign#the#value#of#the#label#to#it.#This#works#both#for#
regression#(continuous#values#for#labels)#or#classification#(discrete#label#values).#
ML$is$a$fancy$version$of$interpolation.
• Hybrid:#Likelihood#free#inference#(LFI),#i.e.#inference#using#ML#methods.#Instead#
of#doing#prior+likelihood analysis#we#make#labeled#synthetic#data#realizations#
using#simulations,#and#use#ML#methods#to#infer#the#parameter#values#given#the#
actual#data#realization.#We#pay#the#price#of#sampling#noise,#in#that#we#may#not#
have#sufficient#simulations#for#ML#methods#to#learn#the#labels#well.#
• For#very#complicated,#high#dimensional#problems,#full#Bayesian#analysis#may#not#
be#feasible#and#LFI#can#be#an#attractive#alternative.#We#will#be#learning#both#
approaches#in#this#course.#

29
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Supervised Learning (SL)
• Answering)a)specific)question:)e.g.)regression)or)classification
• Supervised)learning)is)essentially)interpolation
• General)approach:)frame)the)problem,)collect)the)data
• Choose)the)SL)algorithm
• Choose)objective)function)(decide)what)to)optimize)
• Train)the)algorithm,)test)(crossEvalidate)

30
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Classes of problems: regression

• w

31
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Basic machine learning procedure
• We#have#some#data#x#and#some#labels#y,#such#that#Y=(x,y).#We#wish#to#find#some#
model#g(a)#and#some#cost#or#loss#function#C(Y,#g(a))#that#we#wish#to#minimize#
such#that#the#model#g(a)#explains#the#data#Y.#
• E.g.#Y=(x,y),#C=!2
• G=a0X0(xi) + a1X1(xi) + … + aM-1XM-1(xi)

• In ML we divide data into training data Ytrain (e.g. 90%) and test data Ytest (e.g.
10%)
• We fit model to the training data: the value of the minimum loss function at amin is
called in-sample error Ein=C(Ytrain,g(amin))
• We test the results on test data, getting out of sample error
Eout=C(Ytest,g(amin))>Ein
• This is called cross-validation technique
• If we have different models then test data are called validation data while test data
are used to test different models, each trained on training data (3 way split, e.g.
60%, 30%, 10%) 32
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Data analysis versus machine learning
• Data$analysis:$fitting$existing$data$to$a$physics$based$model$to$
obtain$model$parameters$y.$Parameters$are$fixed:$we$know$physics$
up$to$parameter$values.$Parameter$posteriors$are$the$goal.$
• ML:$use$model$derived$from$existing$data$to$predict$regression$or$
classification$parameters$y$for$new$data.$
• Example:$polynomial$regression.$This$will$be$HW$4$problem
• We$can$fit$the$training$data$to$a$simple$model$or$complex$model
• In$the$absence$of$noise$complex$model$(many$fitting$parameters$
a)$always$better
• In$the$presence$of$noise$complex$model$often$worse
• Note$that$parameters$a$have$no$meaning$on$their$own,$just$means$
to$reach$the$goal$of$predicting$y
33
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
f(x)=2x,(no(noise((

34
f(x)=2x.10x5+15x10
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Over%fitting+noise+with+too+complex+models+(bias%variance+trade%off)

35
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Bias-variance trade-off

36
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Another example: k-nearest neighbors
How$predictions$change$as$we$
average$over$more$nearest$
neighbours?$

37
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Statistical learning theory
• We#have#data#and#we#can#change#the#number#of#data#points
• we#have#models#and#we#can#change#complexity#(number#of#model#
parameters#in#simple#versions)
• Trade<off#at#fixed#model#complexity:
• small#data#size#suffers#from#a#large#
variance#(we#are#overfitting#noise)
• Large#data#size#suffers#from#model#bias
• Variance#quantified#by#Ein vs#Eout
• Ein and#Eout approach#bias#for#large#data
• To#reduce#bias#increase#complexity
38
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Bias-variance trade-off vs complexity
• Low$complexity:$large$bias
• Large$complexity:$large$variance
• Optimum$when$the$two$are$balanced
• Complexity$can$be$controlled$by$
regularization$(we$will$discuss$it$further)

39
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Representational power
• We#are#learning#a#manifold#M#f:X Y

• To#learn#complex#manifolds#we#need#high#representational#power
• We#need#a#universal#approximator#with#good#generalization#
properties#(from#in>sample#to#out#of#sample,#i.e.#not#over>fitting)#
• This#is#where#neural#networks#excel:#they#can#fit#anything#(literally,#
including#pure#noise),#yet#can#also#generalize## 40
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Unsupervised machine learning
• Discovering+structure+in+unlabeled+data
• Examples:+clustering,+dimensionality+reduction
• The+promise:+easier+to+do+regression,+
classification
• Easier+visualization

41
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Dimensionality reduction
• PCA$(lecture$4),$ICA$(lecture$5)
• Manifold$projection:$want$to$reduce$it$preserving$pairwise$
distance$between$data$points$(e.g.$tASNE,$ISOMAP,$UMAP).$
• If$reduced$too$much$we$get$crowding$problem

42
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
UMAP:&Uniform&Manifold&Approximation&and&
Projection&for&Dimension&Reduction

• Tries&to&connect&nearby&points&using&locally&varying&metric
• Best&on&the&market&at&the&moment
• You&will&try&it&in&HW3
• Example&MNIST&digits&separate&in&2d&UMAP&plane

43
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Clustering+algorithms

• For$unsupervised$learning$(no$labels$available)$we$also$need$
to$identify$distinct$classes
• Clustering$algorithms$look$at$clusters$of$data$in$original$space$
or$reduced$dimensionality$space
• We$will$look$at$k=means$and$Gaussian$mixture$model$later
• Clustering$algorithms$such$as$HDBSCAN$connect$close$
particles$together:$friends$of$friends$algorithms
• HW$3:$UMAP$+$clustering

44
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK
Literature
• Numerical Recipes, Press et al., Chapter 15
(http://apps.nrbook.com/c/index.html)
• Bayesian Data Analysis, Gelman et al. , Chapter 1-4
• https://umap-learn.readthedocs.io/en/latest/how_umap_works.html
• A high bias, low variance introduction to machine learning
for physicists, https://arxiv.org/pdf/1803.08823.pdf (pictures on slides
34-42 taken from this review)
45
PHYS188/288: BAYESIAN DATA ANALYSIS AND MACHINE LEARNING FOR PHYSICAL SCIENCES UROŠ SELJAK

You might also like