0% found this document useful (0 votes)

51 views3 pages

VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018

Bk = nj (μj − μ)(μj − μ)T and Wk = Σ(x(i) − μc(i) )(x(i) − μc(i) )T j=1 i=1 The Calinski-Harabaz index is defined as: CH = tr(Bk /(k − 1)) / tr(Wk /(n − k)) Higher values indicate better clustering. Principal Component Analysis (PCA) - PCA finds the k-dimensional subspace that maximizes the variance of the projected data. - It does so by computing

Uploaded by

Mridul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views3 pages

VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018

Uploaded by

Mridul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CS 229 – Machine Learning https://stanford.

edu/~shervine

VIP Cheatsheet: Unsupervised Learning

Afshine Amidi and Shervine Amidi

September 9, 2018

Introduction to Unsupervised Learning

k-means clustering
r Motivation – The goal of unsupervised learning is to find hidden patterns in unlabeled data
{x(1) ,...,x(m) }. We note c(i) the cluster of data point i and µj the center of cluster j.
r Jensen’s inequality – Let f be a convex function and X a random variable. We have the r Algorithm – After randomly initializing the cluster centroids µ1 ,µ2 ,...,µk ∈ Rn , the k-means
following inequality: algorithm repeats the following step until convergence:
E[f (X)] > f (E[X]) m
X
1{c(i) =j} x(i)
i=1
Expectation-Maximization c(i) = arg min||x(i) − µj ||2 and µj = m
j X
1{c(i) =j}
r Latent variables – Latent variables are hidden/unobserved variables that make estimation
problems difficult, and are often denoted z. Here are the most common settings where there are i=1
latent variables:

Setting Latent variable z x|z Comments

Mixture of k Gaussians Multinomial(φ) N (µj ,Σj ) µj ∈ Rn , φ ∈ Rk

Factor analysis N (0,I) N (µ + Λz,ψ) µj ∈ Rn

r Algorithm – The Expectation-Maximization (EM) algorithm gives an efficient method at

estimating the parameter θ through maximum likelihood estimation by repeatedly constructing
a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:
r Distortion function – In order to see if the algorithm converges, we look at the distortion
function defined as follows:
• E-step: Evaluate the posterior probability Qi (z (i) ) that each data point x(i) came from
a particular cluster z (i) as follows: m
X
J(c,µ) = ||x(i) − µc(i) ||2
Qi (z (i)
) = P (z (i)
|x (i)
; θ) i=1

• M-step: Use the posterior probabilities Qi (z (i) ) as cluster specific weights on data points
x(i) to separately re-estimate each cluster model as follows: Hierarchical clustering

r Algorithm – It is a clustering algorithm with an agglomerative hierarchical approach that

Xˆ build nested clusters in a successive manner.
P (x(i) ,z (i) ; θ)
θi = argmax Qi (z (i) ) log dz (i) r Types – There are different sorts of hierarchical clustering algorithms that aims at optimizing
θ z (i) Qi (z (i) )
i different objective functions, which is summed up in the table below:

Stanford University 1 Fall 2018

CS 229 – Machine Learning https://stanford.edu/~shervine

Ward linkage Average linkage Complete linkage • Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.
Minimize within cluster Minimize average distance Minimize maximum distance
distance between cluster pairs of between cluster pairs (i) m m
(i)
xj − µj 1 X (i) 1 X (i)
xj ← where µj = xj and σj2 = (xj − µj )2
σj m m
i=1 i=1
Clustering assessment metrics
m
In an unsupervised learning setting, it is often hard to assess the performance of a model since 1 X T
we don’t have the ground truth labels as was the case in the supervised learning setting. • Step 2: Compute Σ = x(i) x(i) ∈ Rn×n , which is symmetric with real eigenvalues.
m
r Silhouette coefficient – By noting a and b the mean distance between a sample and all i=1
other points in the same class, and between a sample and all other points in the next nearest
cluster, the silhouette coefficient s for a single sample is defined as follows: • Step 3: Compute u1 , ..., uk ∈ Rn the k orthogonal principal eigenvectors of Σ, i.e. the
orthogonal eigenvectors of the k largest eigenvalues.
b−a
s= • Step 4: Project the data on spanR (u1 ,...,uk ). This procedure maximizes the variance
max(a,b)
among all k-dimensional spaces.

r Calinski-Harabaz index – By noting k the number of clusters, Bk and Wk the between

and within-clustering dispersion matrices respectively defined as

k m
X X
Bk = nc(i) (µc(i) − µ)(µc(i) − µ)T , Wk = (x(i) − µc(i) )(x(i) − µc(i) )T
j=1 i=1

the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such
that the higher the score, the more dense and well separated the clusters are. It is defined as
follows:

Tr(Bk ) N −k
s(k) = × Independent component analysis
Tr(Wk ) k−1
It is a technique meant to find the underlying generating sources.
r Assumptions – We assume that our data x has been generated by the n-dimensional source
Principal component analysis vector s = (s1 ,...,sn ), where si are independent random variables, via a mixing and non-singular
matrix A as follows:
It is a dimension reduction technique that finds the variance maximizing directions onto which x = As
to project the data.
The goal is to find the unmixing matrix W = A−1 by an update rule.
r Eigenvalue, eigenvector – Given a matrix A ∈ Rn×n , λ is said to be an eigenvalue of A if
there exists a vector z ∈ Rn \{0}, called eigenvector, such that we have: r Bell and Sejnowski ICA algorithm – This algorithm finds the unmixing matrix W by
following the steps below:
Az = λz
• Write the probability of x = As = W −1 s as:

r Spectral theorem – Let A ∈ Rn×n . If A is symmetric, then A is diagonalizable by a real n

Y
orthogonal matrix U ∈ Rn×n . By noting Λ = diag(λ1 ,...,λn ), we have: p(x) = ps (wiT x) · |W |
i=1
∃Λ diagonal, A = U ΛU T
• Write the log likelihood given our training data {x(i) , i ∈ [[1,m]]} and by noting g the
Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of sigmoid function as:
matrix A.
m n
!
r Algorithm – The Principal Component Analysis (PCA) procedure is a dimension reduction X X
0
technique that projects the data on k dimensions by maximizing the variance of the data as l(W ) = log g (wjT x(i) ) + log |W |
follows: i=1 j=1

Stanford University 2 Fall 2018

CS 229 – Machine Learning https://stanford.edu/~shervine

Therefore, the stochastic gradient ascent learning rule is such that for each training example
x(i) , we update W as follows:

1 − 2g(w1T x(i) )
  
1 − 2g(w2 x ) x(i) T + (W T )−1 
T (i)
W ←− W + α  .
..
 
1 − 2g(wn T x(i) )

Stanford University 3 Fall 2018

Image Enhancement Image Filtering
No ratings yet
Image Enhancement Image Filtering
167 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Unsupervised Machine Learning in Python
100% (1)
Unsupervised Machine Learning in Python
89 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Clustering
No ratings yet
Clustering
82 pages
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
Victor Guillemin-Multilinear Algebra and Differential Forms For Beginners (Fall 2010 MIT Notes)
No ratings yet
Victor Guillemin-Multilinear Algebra and Differential Forms For Beginners (Fall 2010 MIT Notes)
290 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
Chapter 04
No ratings yet
Chapter 04
42 pages
QSRI Lecture4
No ratings yet
QSRI Lecture4
56 pages
Optimisation and Dimension Reduction Tech-Unlocked
No ratings yet
Optimisation and Dimension Reduction Tech-Unlocked
29 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
Unsupervised Learning - A Comprehensive Overview of
No ratings yet
Unsupervised Learning - A Comprehensive Overview of
5 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
MLSlides5 - Selected - Shared
No ratings yet
MLSlides5 - Selected - Shared
30 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
CP4252 ML Unit-Iii
No ratings yet
CP4252 ML Unit-Iii
18 pages
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi August 12, 2018
No ratings yet
VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi August 12, 2018
2 pages
Module 3
No ratings yet
Module 3
17 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
Wk03 Machine Learning
No ratings yet
Wk03 Machine Learning
5 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
Lecture Note 08
No ratings yet
Lecture Note 08
6 pages
Variance
No ratings yet
Variance
6 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
T3 Scheme 24 25
No ratings yet
T3 Scheme 24 25
4 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
No ratings yet
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
28 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Ds Module 5
No ratings yet
Ds Module 5
49 pages
Empirical Finance
No ratings yet
Empirical Finance
5 pages
Unsupervised Learning 1691392220
No ratings yet
Unsupervised Learning 1691392220
15 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
Unit3 Datamining
No ratings yet
Unit3 Datamining
5 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Unit 2
No ratings yet
Unit 2
7 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Tema5 Teoria-2830
No ratings yet
Tema5 Teoria-2830
57 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Exercises in Nonlinear Control Systems
No ratings yet
Exercises in Nonlinear Control Systems
99 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
PSLEMSVol 1
90% (10)
PSLEMSVol 1
278 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
LM Business Math - Q1 W3-4 - MELC2 Module 3
No ratings yet
LM Business Math - Q1 W3-4 - MELC2 Module 3
13 pages
Chapter 4 - Combinational Logic Solutions To Problems - (1, 5, 9, 12, 23, 30)
100% (1)
Chapter 4 - Combinational Logic Solutions To Problems - (1, 5, 9, 12, 23, 30)
10 pages
UIL 4TH - 5TH Grade MATH Practice Test
0% (1)
UIL 4TH - 5TH Grade MATH Practice Test
10 pages
Fortran Problem List - 50 Batch-1
No ratings yet
Fortran Problem List - 50 Batch-1
2 pages
Maths Seven 3RD Term
No ratings yet
Maths Seven 3RD Term
3 pages
Grade 10 Quadratic Equations: Answer The Questions
No ratings yet
Grade 10 Quadratic Equations: Answer The Questions
11 pages
Cbse Class 9 Maths Supplymentary Exam Marking Scheme 2023 24
No ratings yet
Cbse Class 9 Maths Supplymentary Exam Marking Scheme 2023 24
4 pages
10th Maths EM First Revision Exam 2024 Covai District English Medium PDF Download
No ratings yet
10th Maths EM First Revision Exam 2024 Covai District English Medium PDF Download
2 pages
MKL 2017 Developer Reference Fortran PDF
No ratings yet
MKL 2017 Developer Reference Fortran PDF
3,348 pages
Basel Problem Proof
No ratings yet
Basel Problem Proof
4 pages
RPT Matematik Form 1 & Cup
No ratings yet
RPT Matematik Form 1 & Cup
29 pages
Module-4 RKDC Important Question
No ratings yet
Module-4 RKDC Important Question
21 pages
TI36PRO Guidebook EN
No ratings yet
TI36PRO Guidebook EN
78 pages
ACTIVITY-PLAN Mathematics Enhancement
No ratings yet
ACTIVITY-PLAN Mathematics Enhancement
2 pages
June 2014 IAL QP - S1 Edexcel
No ratings yet
June 2014 IAL QP - S1 Edexcel
24 pages
Chess2Vec: Learning Vector Representations For Chess: Research Conducted While Author Was An Intern at Occamzrazor
No ratings yet
Chess2Vec: Learning Vector Representations For Chess: Research Conducted While Author Was An Intern at Occamzrazor
5 pages
Instructions: R A R A A R
No ratings yet
Instructions: R A R A A R
3 pages
Geometry Basics Vocabulary
No ratings yet
Geometry Basics Vocabulary
39 pages
Computer Architecture Detailed Answers
No ratings yet
Computer Architecture Detailed Answers
2 pages
Laboratory 8 - Continuous Time Fourier Transform
No ratings yet
Laboratory 8 - Continuous Time Fourier Transform
10 pages
Unit 1 - Math Assignment
No ratings yet
Unit 1 - Math Assignment
3 pages
Basics of Finite Automata: Neenu Prasad
No ratings yet
Basics of Finite Automata: Neenu Prasad
10 pages
04 Intro To LMI
No ratings yet
04 Intro To LMI
38 pages
Powerpoint-Number N Number Sense Q1-Worksheet 9
No ratings yet
Powerpoint-Number N Number Sense Q1-Worksheet 9
22 pages
Mathematics Level 9 31:3 PDF
No ratings yet
Mathematics Level 9 31:3 PDF
4 pages
SSC Mock Test - 39 Solution: Centres at
No ratings yet
SSC Mock Test - 39 Solution: Centres at
7 pages

VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018

Uploaded by

VIP Cheatsheet: Unsupervised Learning: Afshine Amidi and Shervine Amidi September 9, 2018

Uploaded by

CS 229 – Machine Learning https://stanford.

VIP Cheatsheet: Unsupervised Learning

Afshine Amidi and Shervine Amidi

Introduction to Unsupervised Learning

Setting Latent variable z x|z Comments

Mixture of k Gaussians Multinomial(φ) N (µj ,Σj ) µj ∈ Rn , φ ∈ Rk

Factor analysis N (0,I) N (µ + Λz,ψ) µj ∈ Rn

r Algorithm – The Expectation-Maximization (EM) algorithm gives an efficient method at

r Algorithm – It is a clustering algorithm with an agglomerative hierarchical approach that

Stanford University 1 Fall 2018

r Calinski-Harabaz index – By noting k the number of clusters, Bk and Wk the between

r Spectral theorem – Let A ∈ Rn×n . If A is symmetric, then A is diagonalizable by a real n

Stanford University 2 Fall 2018

Stanford University 3 Fall 2018

You might also like