0% found this document useful (0 votes)

5 views32 pages

Tut 7

Uploaded by

aftab.ycce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views32 pages

Tut 7

Uploaded by

aftab.ycce

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

CSC 311: Introduction to Machine Learning

Tutorial - Matrix Decomposition & Probabilistic Models

TA: Vahid Balazadeh

Instructors: Michael Zhang and Chandra Gummaluru

University of Toronto

Based on slides by Haonan Duan

Intro ML (UofT) CSC311-Tut7 1 / 31
Matrix Decomposition

We can decompose an integer into its prime factors, e.g.,

12 = 2 × 2 × 3.
Similarly, matrices can be decomposed into product of other
matrices.
Examples are Eigendecomposition, SVD, Schur decomposition, LU
decomposition, . . . .
Here, we focus on Eigendecomposition and SVD

Intro ML (UofT) CSC311-Tut7 2 / 31

Eigenvector

An eigenvector of a square matrix A is a nonzero vector v such

that multiplication by A only changes the scale of v:

Av = λv

The scalar λ is known as the eigenvalue.

If v is an eigenvector of A, so is any rescaled vector αv.
αv has the same eigenvalue as v. Thus, we constrain the
eigenvector to be of unit length.

Intro ML (UofT) CSC311-Tut7 3 / 31

Compute eigenvalues - characteristic polynomial

Eigenvalue equation of matrix A:

Av = λv
λv − Av = 0
(λI − A)v = 0

If nonzero solution for v exists, then it must be the case that:

det(λI − A) = 0

Unpacking the determinant as a function of λ , we get a

polynomial, called the characteristic polynomial:

PA (λ) = det(λI − A) = λn + cn−1 λn−1 + . . . + c1 λ + c0

Compute eigenvalues of A → solve PA (λ) = 0

Intro ML (UofT) CSC311-Tut7 4 / 31

Exercise

Consider the matrix:

2 1
A=
1 2

What are the eigenvalues and eigenvectors of A?

Intro ML (UofT) CSC311-Tut7 5 / 31

Solution

We first need to calculate the eigenvalues,

2−λ 1
det(A − λI) = 0 =⇒ det =0
1 2−λ
=⇒ (2 − λ)2 − 1 = 0 =⇒ λ1 = 3, λ2 = 1

Then, we solve (A − λi I)vi = 0 to find eigenvectors:

−1 1
(A − λ1 I)v1 = 0 =⇒ v =0
1 −1 1

1 normalize 1 1
=⇒ v1 = =⇒ v1 = √
1 2 1
Similarly,

1 −1
(A − λ2 I)v2 = 0 =⇒ v1 = √
2 1

Intro ML (UofT) CSC311-Tut7 6 / 31

Intro ML (UofT) CSC311-Tut7 7 / 31

Eigendecomposition
Spectral Theorem - Every symmetric matrix A ∈ Rn×n has a
set of n orthonormal eigenvectors forming a basis. Furthermore,
all eigenvalues are real.
Therefore, A can be decomposed to the following form
A = P DP −1
P is an orthogonal matrix of the eigenvectors of A, and D is a
diagonal matrix of eigenvalues.
A [v1 , . . . , vn ] = [Av1 , . . . , Avn ]
| {z }
P
= [λ1 v1 , . . . , λn vn ]
 
λ1 ... 0
 .. .. .. 
= [v1 , . . . , vn ]  . . . 
| {z }
P 0 ... λn
| {z }
D
Intro ML (UofT) CSC311-Tut7 7 / 31
Intuitions of Eigendecomposition

Diagonal matrix allows fast computations of their determinants,

powers and inverses.
Eigendecomposition transforms a matrix into a diagonal form by
changing the basis.

det(A) = det(P DP −1 ) = det(P ) det(D) det(P )−1

= det(D)
Yn
= λi
i=1

A−1 = P D−1 P −1

Intro ML (UofT) CSC311-Tut7 8 / 31

Geometric intuitions of eigendecomposition

Top-left to bottom-left: P −1 performs a basis change.

Bottom-left to bottom-right: D performs a scaling.
Bottom-right to top-right: P undoes the basis change.

Intro ML (UofT) CSC311-Tut7 9 / 31

Singular Value Decomposition (SVD)

If A ∈ Rm×n is not square, eigendecomposition is undefined.

SVD is a decomposition of the form A = U ΣV T .
SVD is more general than eigendecomposition. Every real matrix
has a SVD.

Intro ML (UofT) CSC311-Tut7 10 / 31

SVD - Terminology

U and V are orthogonal matrices, and Σ is a diagonal matrix (not

necessarily square).
Diagonal entries of Σ are called singular values of A.
Columns of U are the left singular vectors, and columns of V are
the right singular vectors.

Intro ML (UofT) CSC311-Tut7 11 / 31

SVD and eigendecomposition

SVD can be interpreted in terms of eigendecomposition.

Left singular vectors of A are the eigenvectors of AAT .
Right singular vectors of A are the eigenvectors of AT A
Nonzero singular values of A are square roots of eigenvalues of
AT A and AAT . AT A and AAT are positive semi-definite (PSD),
thus their eigenvalues are positive.

Intro ML (UofT) CSC311-Tut7 12 / 31

Informal Proof

Since B = AA⊤ ∈ Rm×m is symmetric, eigendecomposition holds

B = P DP −1

Now, assume SVD exists, i.e., A = U ΣV ⊤ . Therefore,

B = AA⊤ = (U ΣV ⊤ )(V Σ⊤ U ⊤ ) = U ΣΣ⊤ U ⊤

Matching those two:

P DP −1 = U ΣΣ⊤ U ⊤
1 √
Therefore, U = P and Σ ≡ D 2 or σi = di .

A similar approach on C = A⊤ A ∈ Rn×n leads to V .

Intro ML (UofT) CSC311-Tut7 13 / 31

Exercise

Compute SVD of the matrix:

3 2 2
A=
2 3 −2

Intro ML (UofT) CSC311-Tut7 14 / 31

Solution
Here, we calculate U and Σ. First, define B = AA⊤
 
3 2
3 2 2  17 8
B= 2 3 = 
2 3 −2 8 17
2 −2

Then, we can calculate the eigenvalues and eigenvectors

(using
characteristic
1 1
polynomial): λ1 = 25, λ2 = 9 and v1 = √12 , v2 = √12 . Therefore,
−1 1
B = P DP −1 where

1 1 1 25 0
P =√ , D=
2 −1 1 0 9
1
We had U = P and Σ ≡ D 2 :

5 0 0
Σ=
0 3 0

Find V for exercise.

Intro ML (UofT) CSC311-Tut7 15 / 31
Rank-r approximation

Given a matrix A, SVD allows us to find its “best” rank-r

approximation Ar (r < n).
Why? store less parameters
Pn
We can write A = U ΣV ⊤ as A = ⊤
i=1 σi ui vi , where σi are sorted
from the largest to the smallest.

Intro ML (UofT) CSC311-Tut7 16 / 31

Rank-r approximation

The rank-r approximation Ar is defined as:

r
X
A= σi ui viT
i=1

Ar is the best approximation of rank r by many norms, such as

spectral norm.
∥Ax∥2
∥A∥2 := sup
x ∥x∥2
It means that ∥A − Ar ∥2 ≤ ∥A − B∥2 for any rank r matrix B.

Intro ML (UofT) CSC311-Tut7 17 / 31

Rank-r approximation

Intro ML (UofT) CSC311-Tut7 18 / 31

Maximum Likelihood Estimation (MLE)

Goal: estimate parameters θ from observed data {x1 , · · · , xN }

Main idea: We should choose parameters that assign high
probability to the observed data:

θ̂ = argmax L(θ; x1 , · · · , xN )

Intro ML (UofT) CSC311-Tut7 19 / 31

Three steps for computing MLE

1 Write down the likelihood objective:

N
Y
L(θ; x1 , · · · , xN ) = L(θ; xi )
i=1

2 Transform to log likelihood:

N
X
l(θ; x1 , · · · , xN ) = log L(θ; xi )
i=1

3 Compute the critical point:

∂l
=0
∂θ

Intro ML (UofT) CSC311-Tut7 20 / 31

Example - categorial distribution

X is a discrete random variable with the following probability mass

function (0 ≤ θ ≤ 1 is an unknown parameter):

X 0 1 2 3
P (X) 2θ/3 θ/3 2(1 − θ)/3 (1 − θ)/3

The following 10 independent observations were taken from X:

{3, 0, 2, 1, 3, 2, 1, 0, 2, 1}.
What is the MLE for θ?

Intro ML (UofT) CSC311-Tut7 21 / 31

Step 1: Likelihood objective

L(θ) = P (X = 3)P (X = 0)P (X = 2)P (X = 1)P (X = 3)

× P (X = 2)P (X = 1)P (X = 0)P (X = 2)P (X = 1)
2θ θ 2(1 − θ) 3 (1 − θ) 2
= ( )2 ( )3 ( ) ( )
3 3 3 3

Intro ML (UofT) CSC311-Tut7 22 / 31

Step 2: Log likelihood

l(θ) = log L(θ)

2 1
= 2(log + log θ) + 3(log + log θ)
3 3
2 2
+ 3(log + log(1 − θ)) + 2(log + log(1 − θ))
3 3
= C + 5(log θ + log(1 − θ))

Intro ML (UofT) CSC311-Tut7 23 / 31

Step 3: critical points

∂l
=0
∂θ
1 1
→ 5( − )=0
θ 1−θ
→ θ̂ = 0.5

Intro ML (UofT) CSC311-Tut7 24 / 31

Exercise

Suppose that X1 , · · · , Xn form a random sample from a uniform

distribution on the interval (0, θ), where of the parameter θ > 0 but is
unknown. Find MLE of θ.

Intro ML (UofT) CSC311-Tut7 25 / 31

Solution

- Calculate the likelihood:

Y Y I (Xi ∈ (0, θ))
L(X1 , . . . , Xn ; θ) = Pθ (Xi ) =
i i
θ

- Calculate the log-likelihood:

Y X I (Xi ∈ (0, θ))
l(θ) = log Pθ (Xi ) = log
i i
θ

If Xi ̸∈ (0, θ), then log 0 will be undefined. Therefore, θ ∈ [maxi {Xi }, ∞)

- What value of θ maximizes l(θ)?

Intro ML (UofT) CSC311-Tut7 26 / 31

Bayesian Inference - Philosophy

Bayesian interprets probability as degrees of beliefs.

Bayesian treats parameters as random variables.
Bayesian learning is updating our beliefs (probability distribution)
based on observations.

Intro ML (UofT) CSC311-Tut7 27 / 31

Bayesian versus Frequentist

MLE is the standard frequentist inference method.

Bayesian and frequentist are the two main approaches in
statistical machine learning. Some of their ideological differences
can be summarized as:
Frequentist Bayesian
Probability is relative frequency degree of beliefs
Parameter θ is unknown constant random variable

Han Liu and Larry Wasserman, Statistical Machine Learning, 2014

Intro ML (UofT) CSC311-Tut7 28 / 31
The Bayesian approach to machine learning

1 We define a model that expresses qualitative aspects of our

knowledge (eg, forms of distributions, independence assumptions).
The model will have some unknown parameters.
2 We specify a prior probability distribution for these unknown
parameters that expresses our beliefs about which values are more
or less likely, before seeing the data.
3 We gather data.
4 We compute the posterior probability distribution for the
parameters, given the observed data.
5 We use this posterior distribution to draw scientific conclusions
and make predictions

Radford M. Neal, Bayesian Methods for Machine Learning, NIPS 2004 tutorial
Intro ML (UofT) CSC311-Tut7 29 / 31
Computing the posterior

The posterior distribution is computed by the Bayes’ rule:

P (parameter)P (data|parameter)
P (parameter|data) =
P (data)

The denominator is just the required normalizing constant. So as

a proportionality, we can write:

posterior ∝ prior × likelihood

Intro ML (UofT) CSC311-Tut7 30 / 31

Exercise

Suppose you have a Beta(4, 4) prior distribution on the probability

θ that a coin will yield a ‘head’ when spun in a specified manner.
The coin is independently spun ten times, and ‘heads’ appear
fewer than 3 times. You are not told how many heads were seen,
only that the number is less than 3.
Calculate your exact posterior density (up to a proportionality
constant) for θ and sketch it.

Intro ML (UofT) CSC311-Tut7 31 / 31

Eigenvalues and Eigen Vector Slides
No ratings yet
Eigenvalues and Eigen Vector Slides
41 pages
COMP4222-Lecture 4-Self Reading-Chapter 4 Eigenvalues Decomposition by Longin Jan Latecki
No ratings yet
COMP4222-Lecture 4-Self Reading-Chapter 4 Eigenvalues Decomposition by Longin Jan Latecki
30 pages
Eval Norms
No ratings yet
Eval Norms
49 pages
My Notes For Linear Algebra 987654
No ratings yet
My Notes For Linear Algebra 987654
33 pages
Singular Value Decomposition Worked Numerical Examples
No ratings yet
Singular Value Decomposition Worked Numerical Examples
24 pages
SVD and PCA
No ratings yet
SVD and PCA
36 pages
Lecture 5
No ratings yet
Lecture 5
30 pages
SVD My Lecture 2021-Desktop-Qov8vhr
No ratings yet
SVD My Lecture 2021-Desktop-Qov8vhr
79 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
The Spectral Decomposition: 1 N N J J J J J J 1 1 T 1 N N T N
No ratings yet
The Spectral Decomposition: 1 N N J J J J J J 1 1 T 1 N N T N
7 pages
L14 SVD
No ratings yet
L14 SVD
8 pages
Dama50 2024 2025 Unit3n
No ratings yet
Dama50 2024 2025 Unit3n
56 pages
Elec9731 LM1
No ratings yet
Elec9731 LM1
41 pages
Lecture1 Slides
No ratings yet
Lecture1 Slides
26 pages
Lecture 9 Unit2
No ratings yet
Lecture 9 Unit2
168 pages
MFDS Lecture BITS WILP
No ratings yet
MFDS Lecture BITS WILP
29 pages
Ila6 6 1
No ratings yet
Ila6 6 1
16 pages
Assignment 1: Statistical Machine Learning, Summer Term 2022
No ratings yet
Assignment 1: Statistical Machine Learning, Summer Term 2022
4 pages
SVD Slides
No ratings yet
SVD Slides
17 pages
Deep-Learning
No ratings yet
Deep-Learning
28 pages
Lecture Note 4 - ADDS 24.1F (24th July 2024)
No ratings yet
Lecture Note 4 - ADDS 24.1F (24th July 2024)
10 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
Module II
No ratings yet
Module II
41 pages
NA Lecture 15
No ratings yet
NA Lecture 15
37 pages
Hello!: I Am Sindhu Yamsani
No ratings yet
Hello!: I Am Sindhu Yamsani
15 pages
1.2.7 Singular Value Decomposition: Mathematical Background 39
No ratings yet
1.2.7 Singular Value Decomposition: Mathematical Background 39
7 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
An Introduction To Eigenvalues and Eigenvectors: A Project Report
No ratings yet
An Introduction To Eigenvalues and Eigenvectors: A Project Report
16 pages
MTK3002 - Notes For Student
No ratings yet
MTK3002 - Notes For Student
15 pages
Jiang 2022 J. Phys. Conf. Ser. 2282 012004
No ratings yet
Jiang 2022 J. Phys. Conf. Ser. 2282 012004
9 pages
Math-803-Lecture 19-Matrix - Approx - PCA
No ratings yet
Math-803-Lecture 19-Matrix - Approx - PCA
17 pages
Svdnotes
No ratings yet
Svdnotes
10 pages
Sol 11
No ratings yet
Sol 11
15 pages
Singular Value Decomposition Example
100% (2)
Singular Value Decomposition Example
5 pages
SVD Notes
No ratings yet
SVD Notes
7 pages
My Notes On Tensors
No ratings yet
My Notes On Tensors
5 pages
Strang 367-376
No ratings yet
Strang 367-376
11 pages
PART I: Approximation of Static Systems
No ratings yet
PART I: Approximation of Static Systems
123 pages
Homework 4 MATH2050
No ratings yet
Homework 4 MATH2050
7 pages
Lecture 07
No ratings yet
Lecture 07
5 pages
SVD Topics
No ratings yet
SVD Topics
38 pages
Singular Value Decomposition: Reduced Density Matrix
No ratings yet
Singular Value Decomposition: Reduced Density Matrix
3 pages
STAT 243 Autumn 2024 HW6
No ratings yet
STAT 243 Autumn 2024 HW6
3 pages
Chapter02 - 2024-2025 Num - Ana
No ratings yet
Chapter02 - 2024-2025 Num - Ana
23 pages
Final 4 Sem
No ratings yet
Final 4 Sem
29 pages
Internal 4 Sem
No ratings yet
Internal 4 Sem
36 pages
Lecture 6
No ratings yet
Lecture 6
53 pages
Math Prelims
No ratings yet
Math Prelims
40 pages
Lecture 6
No ratings yet
Lecture 6
53 pages
Introduction To Linear Algebra V: 1 Eigenvalue and Eigenvector
No ratings yet
Introduction To Linear Algebra V: 1 Eigenvalue and Eigenvector
4 pages
NLA10
No ratings yet
NLA10
66 pages
REPORT Contour
100% (3)
REPORT Contour
7 pages
Singular Value Decomposition Example PDF
No ratings yet
Singular Value Decomposition Example PDF
9 pages
Eigenvalue Problem PDF
No ratings yet
Eigenvalue Problem PDF
35 pages
Numerical Linear Algebra: Course Material Networkmaths Graduate Programme Maynooth 2010
No ratings yet
Numerical Linear Algebra: Course Material Networkmaths Graduate Programme Maynooth 2010
66 pages
EIGENVALUES AND EIGENVECTORS - Wellesley Cambridge
No ratings yet
EIGENVALUES AND EIGENVECTORS - Wellesley Cambridge
14 pages
S M S T C Lecture Notes Lecture4
No ratings yet
S M S T C Lecture Notes Lecture4
11 pages
Math Primer
No ratings yet
Math Primer
13 pages
Eigenvalues and Eigenvectors Tutorial
No ratings yet
Eigenvalues and Eigenvectors Tutorial
11 pages
Small Hydro Power Plant: Scenario in India - A Comparative Study
No ratings yet
Small Hydro Power Plant: Scenario in India - A Comparative Study
7 pages
En 10306
No ratings yet
En 10306
1 page
ReadyIAS AW Toolkit
No ratings yet
ReadyIAS AW Toolkit
41 pages
Documents 12-01-2022
No ratings yet
Documents 12-01-2022
4,940 pages
Testing MCQ
No ratings yet
Testing MCQ
59 pages
Fundamentals of Aerodynamits: MC Graw Hill
No ratings yet
Fundamentals of Aerodynamits: MC Graw Hill
9 pages
The Hydrologic Budget
100% (1)
The Hydrologic Budget
6 pages
Task: For This Assessment, Students Are Expected To Write A Weekly Journal Over The
No ratings yet
Task: For This Assessment, Students Are Expected To Write A Weekly Journal Over The
4 pages
Lecture - 11 SD Final
100% (1)
Lecture - 11 SD Final
26 pages
MRO Intelligence Report PDF
No ratings yet
MRO Intelligence Report PDF
9 pages
Katalog Atk&toner
No ratings yet
Katalog Atk&toner
21 pages
Transactions - 1
No ratings yet
Transactions - 1
41 pages
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
No ratings yet
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
55 pages
Macos Mojave Compatibility 02 07
No ratings yet
Macos Mojave Compatibility 02 07
11 pages
Final Exam Notes
No ratings yet
Final Exam Notes
11 pages
Features Features Features Features
No ratings yet
Features Features Features Features
8 pages
Vertic
No ratings yet
Vertic
4 pages
Kel 13 Jurnal Ips
No ratings yet
Kel 13 Jurnal Ips
10 pages
Mark Scheme
No ratings yet
Mark Scheme
4 pages
Prof K V Subbaraju
No ratings yet
Prof K V Subbaraju
26 pages
DxDiag Requisitos
No ratings yet
DxDiag Requisitos
30 pages
Academic Writing
No ratings yet
Academic Writing
12 pages
Keyboard Layout Selection Procedure
No ratings yet
Keyboard Layout Selection Procedure
8 pages
Gas Laws Practice Worksheet
No ratings yet
Gas Laws Practice Worksheet
2 pages
HSDL 3005 028
No ratings yet
HSDL 3005 028
28 pages
Introduction EMT357-upload Ver
No ratings yet
Introduction EMT357-upload Ver
19 pages
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
No ratings yet
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
9 pages
2013 ME Magway,, English
No ratings yet
2013 ME Magway,, English
4 pages
Rubric 4
No ratings yet
Rubric 4
5 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet

Tut 7

Uploaded by

Tut 7

Uploaded by

CSC 311: Introduction to Machine Learning

Tutorial - Matrix Decomposition & Probabilistic Models

TA: Vahid Balazadeh

Based on slides by Haonan Duan

We can decompose an integer into its prime factors, e.g.,

Intro ML (UofT) CSC311-Tut7 2 / 31

An eigenvector of a square matrix A is a nonzero vector v such

The scalar λ is known as the eigenvalue.

Intro ML (UofT) CSC311-Tut7 3 / 31

Eigenvalue equation of matrix A:

If nonzero solution for v exists, then it must be the case that:

Unpacking the determinant as a function of λ , we get a

PA (λ) = det(λI − A) = λn + cn−1 λn−1 + . . . + c1 λ + c0

Compute eigenvalues of A → solve PA (λ) = 0

Intro ML (UofT) CSC311-Tut7 4 / 31

Consider the matrix:

What are the eigenvalues and eigenvectors of A?

Intro ML (UofT) CSC311-Tut7 5 / 31

We first need to calculate the eigenvalues,

Then, we solve (A − λi I)vi = 0 to find eigenvectors:

Intro ML (UofT) CSC311-Tut7 6 / 31

Intro ML (UofT) CSC311-Tut7 7 / 31

Diagonal matrix allows fast computations of their determinants,

det(A) = det(P DP −1 ) = det(P ) det(D) det(P )−1

Intro ML (UofT) CSC311-Tut7 8 / 31

Top-left to bottom-left: P −1 performs a basis change.

Intro ML (UofT) CSC311-Tut7 9 / 31

If A ∈ Rm×n is not square, eigendecomposition is undefined.

Intro ML (UofT) CSC311-Tut7 10 / 31

U and V are orthogonal matrices, and Σ is a diagonal matrix (not

Intro ML (UofT) CSC311-Tut7 11 / 31

SVD can be interpreted in terms of eigendecomposition.

Intro ML (UofT) CSC311-Tut7 12 / 31

Since B = AA⊤ ∈ Rm×m is symmetric, eigendecomposition holds

Now, assume SVD exists, i.e., A = U ΣV ⊤ . Therefore,

B = AA⊤ = (U ΣV ⊤ )(V Σ⊤ U ⊤ ) = U ΣΣ⊤ U ⊤

Matching those two:

A similar approach on C = A⊤ A ∈ Rn×n leads to V .

Intro ML (UofT) CSC311-Tut7 13 / 31

Compute SVD of the matrix:

Intro ML (UofT) CSC311-Tut7 14 / 31

Then, we can calculate the eigenvalues and eigenvectors

Find V for exercise.

Given a matrix A, SVD allows us to find its “best” rank-r

Intro ML (UofT) CSC311-Tut7 16 / 31

The rank-r approximation Ar is defined as:

Ar is the best approximation of rank r by many norms, such as

Intro ML (UofT) CSC311-Tut7 17 / 31

Intro ML (UofT) CSC311-Tut7 18 / 31

Goal: estimate parameters θ from observed data {x1 , · · · , xN }

Intro ML (UofT) CSC311-Tut7 19 / 31

1 Write down the likelihood objective:

2 Transform to log likelihood:

3 Compute the critical point:

Intro ML (UofT) CSC311-Tut7 20 / 31

X is a discrete random variable with the following probability mass

The following 10 independent observations were taken from X:

Intro ML (UofT) CSC311-Tut7 21 / 31

L(θ) = P (X = 3)P (X = 0)P (X = 2)P (X = 1)P (X = 3)

Intro ML (UofT) CSC311-Tut7 22 / 31

l(θ) = log L(θ)

Intro ML (UofT) CSC311-Tut7 23 / 31

Intro ML (UofT) CSC311-Tut7 24 / 31

Suppose that X1 , · · · , Xn form a random sample from a uniform

Intro ML (UofT) CSC311-Tut7 25 / 31

- Calculate the likelihood:

- Calculate the log-likelihood:

If Xi ̸∈ (0, θ), then log 0 will be undefined. Therefore, θ ∈ [maxi {Xi }, ∞)

- What value of θ maximizes l(θ)?

Intro ML (UofT) CSC311-Tut7 26 / 31

Bayesian interprets probability as degrees of beliefs.

Intro ML (UofT) CSC311-Tut7 27 / 31

MLE is the standard frequentist inference method.

Han Liu and Larry Wasserman, Statistical Machine Learning, 2014

1 We define a model that expresses qualitative aspects of our

The posterior distribution is computed by the Bayes’ rule:

Then, we can calculate the eigenvalues and eigenvectors