0% found this document useful (0 votes)

33 views42 pages

Chapter 4 ML Parametric Classification

Uploaded by

Tuğba Can

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views42 pages

Chapter 4 ML Parametric Classification

Uploaded by

Tuğba Can

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

BLG 527E Machine Learning

FALL 2021-2022
Assoc. Prof. Yusuf Yaslan & Assist. Prof. Ayşe Tosun

Parametric Methods
Parametric Estimation
• X = { xt }t where xt ~ p (x)
• Parametric estimation:
Assume a form for p (x |q ) and estimate q , its sufficient statistics,
using X
e.g., N ( μ, σ2) where q = { μ, σ2}

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 2
Maximum Likelihood Estimation
• Likelihood of q given the sample X
l (θ|X) = p (X |θ) = ∏t p (xt|θ)

• Log likelihood
L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

• Maximum likelihood estimator (MLE)

θ* = argmaxθ L(θ|X)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3
Examples: Bernoulli/Multinomial
• Bernoulli: Two states, failure/success, x in {0,1}
P (x) = pox (1 – po ) (1 – x)
L (po|X) = log ∏t poxt (1 – po ) (1 – xt)
MLE: po = ∑t xt / N

• Multinomial: K>2 states, xi in {0,1}

P (x1,x2,...,xK) = ∏i pixi
L(p1,p2,...,pK|X) = log ∏t ∏i pixit
MLE: pi = ∑t xit / N

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4
Examples: Bernoulli (Derivation)
• Bernoulli: Two states, failure/success, x in {0,1}

P (x) = pox (1 – po ) (1 – x)
t t
L (po|X) = log ∏t pox (1 – po ) (1 – x )

dL( p0 | X ) N t d N
d
 x log( p0 )   (1  x )
t
log(1  p0 )
dp0 t 1 dp0 t 1 dp0
N N
1 1

p0
x 
t 1
t
 (1  x t )
t 1 1  p0
0

5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Bernoulli (Derivation)
N N N
(1  p0 ) x t  p0  1  p0  x t 0
t 1 t 1 t 1
N N
1
 x  p0 N 0  p0 
t
 x t

t 1 N t 1

MLE: po = ∑t xt / N
Gaussian (Normal) Distribution
• p(x) = N ( μ, σ2)
p x  
1  x   2 
1  x   2 
px  
exp - 
exp 
2
2  2 

2   2  2


μ σ

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7
Gaussian (Normal) Distribution
• Given that X = { xt }t with xt ~ N ( μ, σ2)

𝐿 ¿
 x t

MLE for μ and σ2: m t

N
 x  m
t 2

s2  t

N
Bias and Variance
Let X be a sample from a population specified up to a
parameter q

To evaluate the quality of this estimator we can measure

how much it is different from q
That is (d (X)-q )2

But since it is random variable (it depends on the

sample) we need to average over all possible X and
consider meas square error of the estimator

Remember the properties of expectation

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
Bias and Variance
Unknown parameter q
Estimator di = d (Xi) on sample Xi

Bias: bq(d) = E [d] – q

Variance: E [(d–E [d])2]

Mean square error:

r (d,q) = E [(d–q)2]=E [(d–E[d]+E[d]-θ)2]

= (E[d]-θ)2+E[(d–E[d])2+2 (d–E[d])(E[d]-θ)]

Remember the properties of expectation

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
= E[(E[d]-θ)2]+E [(d–E [d])2] +2 E[(d–E[d])(E[d]-θ)]

= E[(E[d]-θ)2]+E [(d–E [d])2] +2 (E[d]–E[d])(E[d]-θ)

= (E [d] – θ)2 +E [(d–E [d])2]

= (E [d] – θ)2 + E [(d–E [d])2]

= Bias2 + Variance
Bayes’ Estimator
• Sometimes before looking at a sample we may have some prior
information on the possible value range that a parameter , θ, may
take.
• This information is quite useful especially when the sample is small.
• Treat θ as a random variable with prior p (θ)

• Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)

• Density at x: p(x|X) = ∫ p(x|θ, X) p(θ|X) dθ= ∫ p(x|θ) p(θ|X) dθ

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
Bayes’ Estimator
• Evaluating the p(x|X) integrals may be quite difficult except in cases
where the posterior has a nice form
• Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)
• Maximum Likelihood (ML): θML = argmaxθ p(X|θ)
• Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ
• If we have no prior reason to favor some values of θ then the prior
density is flat and the posterior will have the same form as the
likelihood p(X|θ)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13
Bayes’ Estimator: Example
• xt ~ N (θ, σo2) and θ ~ N ( μ, σ2) where are μ, σ2, σo2
known
• θML = m

• p(θ|X) ∝ p(X| θ)p(θ)

• Take the derivative with respect to θ

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
1  x   2 
P( X |  ) exp   
2  0  2 0 
2

1     2 
P( )  exp   
2   2 
2

Likelihood :
1     2  N 1 
 xt    
2

l ( ) exp    exp  
2   2  2
 t 1 2  0  2 2
0 
Loglikelihood :

 1        N   1   x t  
2
   
2

L( ) log       log   

 2    2 2
 t 1   2  0   2 2
0  
 1       
2 N   1   xt      
2

L( ) log        log   


 2    2 2
 t 1   2  0   2 2
0  
L( )

 

    N xt    0
  2
t 1 02

N
xt N
     N N x t N
  

t 1  0
2
 2
t 1  0  2
 2 
 0 t 1 N

t 1  0
2
 2 2 
 
N  N 
m 2  2  2
0 2
 0 

N /2
1/  2
E  | X   0
m 
N /  0  1/ 
2 2
N /  0 1/ 
2 2
Bayes’ Estimator: Example
• xt ~ N (θ, σo2) and θ ~ N ( μ, σ2)
• θML = m
• θMAP = θBayes’ =

N / 2
1/  2
E  | X   m 0

N /  0  1/ 
2 2
N /  0 1/ 
2 2

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
Bayesian Learning for Coin Model
• Bayesian Learning procedure:
• Given data x1, x2,…, xN write down expression for likelihood p(X| θ)

Specify a prior P (θ )

Compute posterior p(θ |X) = p(X| θ)p(θ)/ p(X)

p(θ |X) ∝ p(X| θ)p(θ)

• This example is obtained from Nando Freitas lecture notes

Bayesian Learning for Coin Model
• For coin model likelihood of data (i.i.d. in our case)

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

Where xt ϵ {0,1} and m is the number of 1’s

• Specify a prior on θ. For this we need to introduce Beta distribution

Beta Distribution
(   )   1  ,  are hyperparameters
p ( )   (1   )   1
( )(  )

( z ) e x x z  1dx
0

p( )d 1
(   )   1
( )( )
1
 (1   ) d 1

( )(  ) 
 (1   ) d  (   )
1 1
E[ ] 
 
• The figure is obtained from wikipedia
Bayesian Learning for Coin Model
Compute Posterior: p(θ |X) ∝ p(X| θ)p(θ)

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

(   )   1 1 1 1

p ( )   (1   )   (1   )   1
( )(  ) const
1 1 ( ' ' )  ' 1
p ( | X )   (1   )   1 m (1   ) N  m  p ( | X )   (1   )  ' 1
const ( ' )(  ' )
 ' m   ,  ' N  m  
Conjugate Prior
• Conjugate priors: A likelihood-prior pair is said to be conjugate if they
result in a posterior which is of the same form as the prior.
• This enables us to compute the posterior density analytically without
having to worry about computing the denominator in Bayes' rule, the
marginal likelihood.

Prior Likelihood
Gaussian Gaussian
Beta Binomial
Dirichlet Multinomial
Gamma Gaussian
Example
• Suppose that we observe X= {1,1,1,1,1,1} where each xt comes
from Bernoulli distribution θML = 1
• We can compute posterior and use its mean as the estimate

1 1 ( ' ' )  ' 1 '

p ( | X )   (1   )   1 m (1   ) N  m  p ( | X )   (1   )  ' 1 E[ ] 
const ( ' )(  ' )  ' '
 ' m   ,  ' N  m  

• Using Beta(2,2) prior 8

B 
10
Parametric Classification
gi x  px |C i P C i 
or
gi x  log px |C i  log P C i 

px |C i  
1 
exp  
x  i  
2

2 
2  i  2  i 
1
gi x   log 2  log  i 
x   i 2
 log P C i 
2
2 2 i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 24
• Given the sample X {x t ,r t }tN1
t
1 if x Ci
x t
ri  t
0 if x C j , j i
• ML estimates are

• Discriminant becomes

1
gi x   log 2  log si 
x  mi 
 log ˆC i 
P
2

2 2 si 2
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25
Equal variances

Single boundary at
halfway between means

Two boundaries

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27
Probabilistic Interpretation of Linear Regression

r  f x  
estimator : gx | 
 ~ N 0, 2 
pr | x  ~ N gx | , 2 
N
L  |X  log  px t , r t 
t 1
N N
log  pr t | x t  log  px t 
t 1 t 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 28
Regression: From LogL to Error

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29
Linear Regression
g x t |w1 , w 0 w1 x t  w 0 Take derivative of E

 r
t
t
Nw 0  w1 x t

t
…wrto w0

r x t t
w 0  x  w1  xt

t 2 …wrto w1
t t t

 N t   w0 
x t
   r t

A  w  y  t 
  t x    w1   t x 
t 2    t t
x t
r
t

w A  1y
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 30
Polynomial Regression
g x |w k , , w 2 , w1 , w 0 w k x
t
    w x   w x
t k
2
t 2
1
t
 w0

1 x 1

x 1 2
 x  

 1 k
r 1

 2
D 
1 x 2
x 2 2
 x  
2 k
r r 
   
   N
 1 x N
x N 2
 x  
N 2
 r 

w D D DT r T 1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 31
Other Error Measures
N 2
1
• Square Error: 
E  |X    r t  g x t | 
2 t 1

N 2

 r t
 g x t |  
E  |X   t 1 N 2
• Relative Square Error:
 r
t 1
t
r 
• Absolute Error: E (θ |X) = ∑t |rt – g(xt| θ)|
• ε-sensitive Error:
E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)

   
E r  gx 2 | x E r  E r | x 2 | x  E r | x   gx 2
noise squared error

 
E X E r | x   g x  | x  E r | x   E X g x   E X g x   E X g x 
2 2
 2

bias variance

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 33
Estimating Bias and Variance

• M samples Xi={xti , rti}, i=1,...,M

are used to fit gi (x), i =1,...,M and t=1,…,N

 g x  f x 
1
Bias g  
2 t t 2

N t

  g x  g x 
1
Variance g   t t 2
i
NM t i

1
g x   g i x 
 
M i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 34
Bias/Variance Dilemma
• Example: gi(x)=2 has no variance and high bias

gi(x)= ∑t rti/N has lower bias with variance

• As we increase complexity,
bias decreases (a better fit to data) and
variance increases (fit varies more with data)
• Bias/Variance dilemma: (Geman et al., 1992)

variance

Bias Variance Image is obtained from: https://www.researchgate.net/figure/Visualizing-bias-and-variance-tradeoff-using-a-bulls-eye-diagram_fig3_318432363 36

Polynomial Regression

Best fit “min error”

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 38
Regression example
Coefficients increase in magnitude as order
increases:

1: [-0.0769, 0.0016]
2: [0.1682, -0.6657, 0.0080]
3: [0.4238, -2.5778, 3.4675, -0.0002
4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]

Idea: Penalize large coefficients

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 39
Regularization
• New Cost Function
2

E w | X     y t  g x t | w    i wi2
N
1
2 t 1
• Ridge RegressionR( w)w 2  w2
i i
• LASSO: R( w) w 1 wi
i

)+
 i wi2 wT w
• = -)+λ

• -) +λ = 0  w +λ 

• +λ)

• =
• Image is obtained from Nando Freitas’ lecture notes

Bayesian Evolutionary Analysis With BEAST
100% (1)
Bayesian Evolutionary Analysis With BEAST
262 pages
I2ml2e Chap4 v1 0
No ratings yet
I2ml2e Chap4 v1 0
27 pages
I2ml3e Chap4
No ratings yet
I2ml3e Chap4
28 pages
4.ML Estimation
No ratings yet
4.ML Estimation
19 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Lecture 6
No ratings yet
Lecture 6
13 pages
CS3491-AI ML-Chapter 4
No ratings yet
CS3491-AI ML-Chapter 4
26 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Week 5
No ratings yet
Week 5
49 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Probabilistic Models For Supervised Learning: Piyush Rai Introduction To Machine Learning (CS771A)
32 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Dr. Arslan Shaukat
No ratings yet
Dr. Arslan Shaukat
18 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Lec 38
No ratings yet
Lec 38
8 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Bayesian
No ratings yet
Bayesian
91 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Lin Reg
No ratings yet
Lin Reg
34 pages
Assignment #1
No ratings yet
Assignment #1
2 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Bayesian Learning
No ratings yet
Bayesian Learning
21 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
Bishop CH 3 Notes
No ratings yet
Bishop CH 3 Notes
6 pages
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
No ratings yet
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
11 pages
Bayesian Basics: Ryan P. Adams
No ratings yet
Bayesian Basics: Ryan P. Adams
7 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
ML 3
No ratings yet
ML 3
66 pages
ML - Unit 1 - Part Ii
No ratings yet
ML - Unit 1 - Part Ii
18 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
HW 4
No ratings yet
HW 4
6 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Modelling Survival Data in Medical Research, 4th Edition Latest Edition Download
95% (20)
Modelling Survival Data in Medical Research, 4th Edition Latest Edition Download
16 pages
CS103: Mathematical Foundations of Computer Science, Stanford University
No ratings yet
CS103: Mathematical Foundations of Computer Science, Stanford University
3 pages
Bayesian Inference
No ratings yet
Bayesian Inference
12 pages
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
8 pages
Risk Management Procedure Example
No ratings yet
Risk Management Procedure Example
11 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
The Manual Concerning Opposition B. Similarity of Goods and Services
No ratings yet
The Manual Concerning Opposition B. Similarity of Goods and Services
56 pages
Reliability Notes
100% (4)
Reliability Notes
231 pages
Cause and Correlation in Biology A User S Guide To Path Analysis Structural Equations and Causal Inference 1st Edition Bill Shipley PDF Download
100% (2)
Cause and Correlation in Biology A User S Guide To Path Analysis Structural Equations and Causal Inference 1st Edition Bill Shipley PDF Download
71 pages
Discrete-Valued Time Series - Reprint (Christian H. Weiss)
No ratings yet
Discrete-Valued Time Series - Reprint (Christian H. Weiss)
224 pages
Growth Curve Models and Statistical Diagnostics Annotated PDF Download
100% (11)
Growth Curve Models and Statistical Diagnostics Annotated PDF Download
14 pages
Probabilistic Thinking: Nur Aini Masruroh
No ratings yet
Probabilistic Thinking: Nur Aini Masruroh
29 pages
ACS 301 Topic 3-Decrement Rates
No ratings yet
ACS 301 Topic 3-Decrement Rates
4 pages
Statistical Learning in Genetics An Introduction Using R Reference Book Download
No ratings yet
Statistical Learning in Genetics An Introduction Using R Reference Book Download
16 pages
Unified Methods For Censored Longitudinal Data and Causality Full-Feature Download
No ratings yet
Unified Methods For Censored Longitudinal Data and Causality Full-Feature Download
14 pages
Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties
No ratings yet
Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties
14 pages
Gradient-Based Optimization of Dag-Penalized Likelihood For Learning Linear Dag Models (GOLEM)
No ratings yet
Gradient-Based Optimization of Dag-Penalized Likelihood For Learning Linear Dag Models (GOLEM)
23 pages
R5 - Finance Reading Risk and Return 1 Stock Returns and Diversification
No ratings yet
R5 - Finance Reading Risk and Return 1 Stock Returns and Diversification
43 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
Sample Risk Assessment Matrix
100% (2)
Sample Risk Assessment Matrix
32 pages
Crosstabs: Crosstabs /TABLES JK BY Keputusan /format Avalue Tables /statistics Chisq /cells Count /count Round Cell
No ratings yet
Crosstabs: Crosstabs /TABLES JK BY Keputusan /format Avalue Tables /statistics Chisq /cells Count /count Round Cell
14 pages
Degradation Model Analysis of Laser Diodes: &) U. Zeimer B. Sumpf G. Erbert G. Tra Nkle
No ratings yet
Degradation Model Analysis of Laser Diodes: &) U. Zeimer B. Sumpf G. Erbert G. Tra Nkle
5 pages
NONMEM Tutorial - Part II
No ratings yet
NONMEM Tutorial - Part II
19 pages
Efficiency Analysis of Crop Production in Gurage Zone: The Case of Abeshige Woreda, SNNPR Ethiopia
100% (1)
Efficiency Analysis of Crop Production in Gurage Zone: The Case of Abeshige Woreda, SNNPR Ethiopia
13 pages
Distributions
No ratings yet
Distributions
54 pages
Journal of Animal Ecology - 2008 - Edwards - Using Likelihood To Test For L Vy Flight Search Patterns and For General
No ratings yet
Journal of Animal Ecology - 2008 - Edwards - Using Likelihood To Test For L Vy Flight Search Patterns and For General
11 pages
RiskAssessmentForInfrastructureProjectsCaseStudyPuneMetrorailProject (152 155)
No ratings yet
RiskAssessmentForInfrastructureProjectsCaseStudyPuneMetrorailProject (152 155)
4 pages
CBCHB Manual
No ratings yet
CBCHB Manual
111 pages
Specification Test: Vid Adrison
No ratings yet
Specification Test: Vid Adrison
18 pages

Chapter 4 ML Parametric Classification

Uploaded by

Chapter 4 ML Parametric Classification

Uploaded by

BLG 527E Machine Learning

• Maximum likelihood estimator (MLE)

• Multinomial: K>2 states, xi in {0,1}

MLE for μ and σ2: m t

To evaluate the quality of this estimator we can measure

But since it is random variable (it depends on the

Remember the properties of expectation

Bias: bq(d) = E [d] – q

Mean square error:

r (d,q) = E [(d–q)2]=E [(d–E[d]+E[d]-θ)2]

Remember the properties of expectation

= E[(E[d]-θ)2]+E [(d–E [d])2] +2 (E[d]–E[d])(E[d]-θ)

= (E [d] – θ)2 +E [(d–E [d])2]

= (E [d] – θ)2 + E [(d–E [d])2]

• Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)

• p(θ|X) ∝ p(X| θ)p(θ)

• Take the derivative with respect to θ

L( ) log       log   

L( ) log        log   

Compute posterior p(θ |X) = p(X| θ)p(θ)/ p(X)

p(θ |X) ∝ p(X| θ)p(θ)

• This example is obtained from Nando Freitas lecture notes

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

Where xt ϵ {0,1} and m is the number of 1’s

• Specify a prior on θ. For this we need to introduce Beta distribution

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

(   )   1 1 1 1

1 1 ( ' ' )  ' 1 '

• Using Beta(2,2) prior 8

• M samples Xi={xti , rti}, i=1,...,M

gi(x)= ∑t rti/N has lower bias with variance

Bias Variance Image is obtained from: https://www.researchgate.net/figure/Visualizing-bias-and-variance-tradeoff-using-a-bulls-eye-diagram_fig3_318432363 36

Best fit “min error”

Idea: Penalize large coefficients

You might also like