0% found this document useful (0 votes)

28 views50 pages

4gaussian Discriminant

Uploaded by

shukladinesh0206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views50 pages

4gaussian Discriminant

Uploaded by

shukladinesh0206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Gaussian Discriminant Analysis

S. Sumitra
Department of Mathematics
Indian Institute of Space Science and Technology

MA613 Data Mining

Introduction

Binary Classification
{(x1 , y1 ), (x2 , y2 ), . . . (xN , yN )} be the given data where
xi ∈ Rn , yi ∈ {1, 0}.

Data Soleus Gactrocnemius yi : 1/0

x1T x11 x12 1

x2T x21 x22 0

x3T x31 x32 0

x4T x41 x42 1

x5T x51 x52 1

Sum Rule and Product rule

Consider two random variables X and Y . The values taken

by X are {x1 , x2 , . . . xM } and Y are {y1 , y2 , . . . yL }. Let the
experiment conducted N times. Let the number of times in
which X = xi and Y = yj be nij , X = xi be ci and Y = yj be
rj .
nij
p(xi , yj ) = =, i = 1, 2, . . . M, j = 1, 2, . . . L
N
ci
p(xi ) = , i = 1, 2, . . . M
N
rj
p(yj ) = , j = 1, 2, . . . L
N
Sum Rule and Product rule

L
X
p(xi ) = p(xi , yj )
j=1

nij p(xi , yj )
p(yj |xi ) = =
ci p(xi )

nij
p(xi , yj ) =
N
nij ci
=
ci N
= p(yj |xi )p(xi )
= p(xi |yj )p(yj )
Bayes Theorem

p(xi |yj )p(yj )

p(yj |xi ) =
p(xi )

p(y )p(x/y )
p(y /x) =
p(x)
p(y |x) is the posterior, p(x|y ) is the likelihood function, p(y ) is
the prior, p(x) is the marginal likelihood.
Approaches of Probability

Frequentist:
nx
P(x) = limnt →∞
nt
Maximum likelihood Estimator
Bayesian Approach
p(y )p(x/y )
p(y /x) =
p(x)
Bayes theorem relates the posterior probability (what we
know about the parameter after seeing the data) to the
likelihood (derived from a statistical model for the observed
data) and the prior (what we knew about the parameter
before we saw the data).
Bayes Theorem: Two class Classification Problem

p(y = 1)p(x/y )
p(y = 1/x) =
p(x)
where

p(x) = p(y = 1)p(x/y = 1) + p(y = 0)p(x/y = 0)

Conditional Probability

Consider two boxes: one red and one blue. Let there be 2
apples and 6 oranges in the red box and 3 apples and 1
orange in the blue box. A box is randomly picked and a
fruit is selected. After observing the fruit it is replaced in
the box. This process is repeated many times. In doing so,
let the red box be picked 40% of the time and blue box
60% of the time. What is the overall probability that the
selection procedure will pick an apple?. Given that we
have chosen an orange, what is the probability that the box
we chose was the blue one?
Let B denote the identity of the box and F an identity of a fruit.
P(B = r ) = .4, P(B = b) = .6
p(F = a|B = r ) = 1/4, p(F = O/B = r ) = 3/4, p(F = a|B =
b) = 3/4, p(F = a|B = b) = 3/4, p(F = O|B = b) = 1/4

p(F = a) = p(B = r )p(F = a|B = r ) + p(B = b)p(F = a|B = b)

= 1/4 ∗ 4/10 + 3/4 ∗ 6/10
11
=
20

p(F = 0) = 1 − 11/20 = 9/20

(p(F = O)|B = r )p(B = r )
p(B = r /F = O) =
p(F = O)
= 2/3

p(B = b/F = O) = 1/3

Therefore if the observed fruit is orange, it is more probable to

come from red box than blue box.
Random Vector

A random vector is a random variable with multiple

dimensions
Each element of the vector is a scalar random variable
Each element has either a finite number of observed
empirical values or a finite or infinite number of potential
values
The potential values are specified by a theoretical joint
probability distribution.
X = (X1 , X2 , . . . Xn )T
Linear Relationship

y = mx + c
y = Sinx
Covariance

A1 A2
1 3
-2 1
5 7
4 5

P
j (Aij − Āj )
Covariance(Aj , Aj ) = Variance(Aj ) = , j = 1, 2
N −1
Covariance
− A¯1 )(A2j − A¯2 )
P
j (A1j
Covariance(A1 , A2 ) =
N −1
Covariance

Covariance is a measure of the joint variability of two

random variables
Variables are positively related if they move in the same
direction.
Variables are inversely related if they move in opposite
directions.
It can take any value between -infinity to +infinity
It is used for the linear relationship between variables.
Data: Matrix

 
x11 x12 . . . x1n
 x21 x22 . . . x2n 
X =
 
.. .. .. 
 . . . 
xN1 xN2 . . . xNn
Covariance Matrix

 
cov (A1 , A1 ) cov (A1 , A2 ) . . . cov (A1 , An )
 cov (A2 , A1 ) cov (A2 , A2 ) . . . cov (A2 , An ) 
Σ=
 
.. .. .. 
 . . . 
cov (An , A1 ) cov (An , A1 ) . . . cov (An , An )

Σ is called the covariance matrix of X

(xij − Āj )(xij − Āj )
Cov (Aj , Aj ) = Var (Aj ) = N
P
j=1
N −1
PN (xij − Āj )(xik − A¯k )
Cov (Aj , Ak ) = i=1
N −1
Covariance matrix: Formula

x11 − A¯1 x12 − A¯2 . . . x1n − A¯n

 
 x21 − A¯1 x22 − A¯2 . . . x2n − A¯n 
Xc = 
 
.. .. .. 
 . . . 
xN1 − A¯1 xN2 − A¯2 . . . xNn − A¯n
Find Xc XcT .
Covariance matrix: Formula

XcT Xc
Σ=
N −1
Unbiased Estimator

An unbiased estimator of a parameter is an estimator

whose expected value is equal to the parameter.
In statistics, Bessel’s correction is the use of n − 1 instead
of n in the formula for the sample variance and sample
standard deviation where n is the number of observations
in a sample. This method corrects the bias in the
estimation of the population variance.
One can understand Bessel’s correction as the degrees of
freedom in the residuals vector (residuals, not errors,
because the population mean is unknown):
(x1 − x, . . . , xn − x), where x is the sample mean. While
there are n independent observations in the sample, there
are only n − 1 independent residuals, as they sum to 0.
{x1 , x2 }, xi ∈ R2
A1 = (x11 , x21 )T , A2 = (x12 , x22 )T µ = (µ1 , µ2 )T
Find (x1 − µ)(x1 − µ)T + (x2 − µ)(x2 − µ)T
(x11 − µ1 )(x11 − µ1 ) + (x21 − µ1 )(x21 − µ1 )
cov (A1 , A1 ) =
N −1
(x11 − µ1 )(x12 − µ2 ) + (x21 − µ1 )(x22 − µ2 )
cov (A1 , A2 ) =
N −1
(x12 − µ2 )(x12 − µ2 ) + (x22 − µ2 )(x22 − µ2 )
cov (A2 , A2 ) =
N −1

T (x11 − µ1 )(x11 − µ1 ) (x11 − µ1 )(x12 − µ2 )
(x1 −µ)(x1 −µ) =
(x11 − µ1 )(x12 − µ2 ) (x12 − µ2 )(x12 − µ2 )

T (x21 − µ1 )(x21 − µ1 ) (x21 − µ1 )(x22 − µ2 )
(x2 −µ)(x2 −µ) =
(x21 − µ1 )(x22 − µ2 ) (x22 − µ2 )(x22 − µ2 )

(x1 − µ)(x1 − µ)T + (x2 − µ)(x2 − µ)T

=Σ
N −1
Covariance Matrix: Formula

PN
i=1 (xi − µ)(xi − µ)T
Σ=
N −1
Covariance and Independence

The value of covariance between two random variables lie

between (−∞, +∞)
A positive value of covariance means that two random
variables tend to vary in the same direction, a negative
value means that they vary in opposite directions, and a 0
means that they don’t vary together.
Covariance and Independence

Cov(X,Y) = 0, no linear correlation.

If X and Y are independent, Cov (X , Y ) = 0
Cov (X , Y ) = 0, does not imply X and Y are independent
Mahalonabis Distance
Mahalonabis Distance

Euclidean distance does not consider the distribution of the

data points. So, it cannot be used to measure the deviation
of a point from the data distribution.
Mahalonobis distance is the distance between a point and
a distribution.

D 2 = (x − µ)T Σ−1 (x − µ)
If Σ = I, Mahalonabis distance becomes equivalent to
Euclidean distance
If the variables in the dataset are strongly correlated, then,
the covariance will be high. Dividing by a large covariance
will effectively reduce the distance.
Normal Distribution
X is a continuous real valued random variable
Probability density function (pdf):
(x − µ)2

1
p(x) = √ exp −
2πσ 2σ 2
Rb
P(a < x < b) = a p(x)dx
Multivariate Gaussian (Normal) Distribution

The multivariate Gaussian distribution is a generalization of

the one-dimensional (univariate) normal distribution to
higher dimensions.
One definition is that a random vector is said to be
k -variate normally distributed if every linear combination of
its k components has a univariate normal distribution.
X = (X1 , X2 , . . . Xn ) ∼ N (µ, σ)
X1 ∼ N (µ1 , σ1 ), X2 ∼ N (µ2 , σ2 ), . . . Xn ∼ N (µn , σn )
Multivariate Gaussian (Normal) Distribution

The probability density function

1 1 T −1
p(x) = exp − (x − µ) Σ (x − µ)
(2π)n/2 |Σ|1/2 2

where |Σ| is the determinant of the covariance matrix Σ.

Attributes are continuous valued
Linear Discriminant Analysis (LDA): Binary
Classification

Bayes Theorem
Data: {(xi , yi ), i = 1, 2, . . . N}, xi ∈ Rn , yi ∈ {1, 0}
p(y = 1)p(x/y )
p(y = 1/x) =
p(x)
p(y ): prior probability of y
p(x/y ): the distribution of x given y
Parameters

y ∼ Bernoulli(φ)

x/(y = 0) ∼ N (µ0 , Σ)

x/(y = 1) ∼ N (µ1 , Σ)
MLE of Bernoulli Distribution is the sample mean
Multivariate Gaussian Distribution
Determination of Parameters

No of times y = 1 appears
φ =
P Total number of data
1(yi = 1)
=
N

sum of positive data

µ1 =
Total number of positive data
PN
i=1 xi (yi = 1)
= PN
i=1 1(yi = 1)

sum of negative data

µ0 =
Total number of negative data
PN
i=1 xi (yi = 0)
= PN
i=1 1(yi = 0)
Determination of Parameters

Positive class {xp1 , xp2 , . . . xpk }

Negative class {xn1 , xn2 , . . . xnl }

Pk
i=1 (xpi − µ1 )(xpi − µ1 )T
Σ1 =
k −1

Pl
i=1 (xni − µ0 )(xni − µ0 )T
Σ0 =
l −1

(k − 1)Σ1 + (l − 1)Σ0
Σ =
k +l −2
(k − 1)Σ1 + (l − 1)Σ0
=
N −2
Algorithms

Discriminative Learning Algorithms

Learn learn p(y /x) directly (such as logistic regression),
Learn mappings directly from the space of inputs X to the
labels
Logistic Regression
Generative Learning Algorithms
Model p(x/y )
LDA
Output: Sigmoid Function

p(y = 1)p(x/y = 1)
p(y = 1/x) =
p(x)
p(y = 1)p(x/y = 1)
=
p(y = 1)p(x/y = 1) + p(y = 0)p(x/y = 0)
1
=
p(y = 0)p(x/y = 0)
1+
p(y = 1)p(x/y = 1)
1
=
1 + exp(−a)

p(y = 1)p(x/y = 1)
where a = log
p(y = 0)p(x/y = 0)
a ≥ 0, p(y | x) ≥ 0.5
a < 0, p(y | x) < 0.5
Decision Boundary: LDA
l1 = log p(x/y = 1), l0 = log p(x/y = 0)
π1 = p(y = 1), π0 = p(y = 0)
a = log π1 + l1 − log π0 − l0

1 1
l1 − l0 = − (x − µ1 )T Σ−1 (x − µ1 ) + (x − µ0 )T Σ−1 (x − µ0 )
2 2
1
= − x Σ x + x Σ µ1 + µT1 Σ−1 x − µT1 Σ−1 µ1
T −1 T −1
2
+ x T Σ−1 x − x T Σ−1 µ0 − µT0 Σ−1 x + µT0 Σ−1 µ0

1 T −1
= x Σ (µ1 − µ0 ) + (µT1 − µT0 )Σ−1 x − µT1 Σ−1 µ1
2
+µT0 Σ−1 µ0

1
= (µT1 − µT0 )Σ−1 x − (µT1 Σ−1 µ1 − µT0 Σ−1 µ0 )
2
Decision Boundary: LDA

π1 1
a = log + (µT1 − µT0 )Σ−1 x − (µT1 Σ−1 µ1 − µT0 Σ−1 µ0 )
π0 2
T −1
w x + w0 , where w = Σ (µ1 − µ0 ) and
1 1 π1
w0 = − µT1 Σ−1 µ1 + µT0 Σ−1 µ0 + log
2 2 π0
Linear decision boundary
Decision Boundary: LDA
Determine the class

y = 0, 1. Find µk for each class . Find the common

covariance matrix Σ as the weighted average of Σk
x | (y = k ) ∼ N (µk , Σ) .

Method 1

Ĝ(x) = arg max p(y = k )p(x | y = k )

Method 2
p(y = 1)p(x | y = 1)
p(y = 1 | x) =
p(x)

p(y = 0 | x) = 1 − p(y = 1 | x)
Multiclass
C = 1, 2, . . . , m. Find µk for each class k . Find the
common covariance matrix Σ as the weighted average of
Σk
x/C = k ∼ N (µk , Σ) .
Ĝ(x) = arg max p(C = k /X = x)
k
= arg max p(C = k )p(x/k )
k
= arg max log p(C = k )p(x/k )
k
1
= arg max(− (x − µk )T Σ−1 (x − µk ) + log πk )
k 2
1
= arg max −x T Σ−1 x + x T Σ−1 µk + µTk Σ−1 x − µTk Σ−1 µk
k 2

+ log πk
1
= arg max µTk Σ−1 x − µTk Σ−1 µk + log πk
k 2
Linear discriminant function

Define the linear discriminant function

1
δk (x) = µTk Σ−1 x − µTk Σ−1 µk + log πk
2
where p(C = k ) = πk .
Then Ĝ(x) = arg maxk δk .
Decision Boundary:Multi Class LDA

The decision boundary between class k and l is:

{x : δk (x) = δl (x)} or equivalently the following holds
1 1
µTk Σ−1 x − µTk Σ−1 µk + log πk = µTl Σ−1 x − µTl Σ−1 µl + log πl
2 2
πk T T −1 1 T −1 1 T −1
log + (µk − µl )Σ x − µk Σ µk + µl Σ µl = 0
πl 2 2
That is,
πk 1
log − (µk + µl )T Σ−1 (µk − µl ) + (µTk − µTl )Σ−1 x = 0
πl 2
Decision Boundary:Multi Class LDA
Quadratic Discriminant Analysis

µ̂k and a covariance matrix Σ̂k for each class separately

x | C = k ∼ N (µk , Σk ), k = 1, 2, . . . m
Therefore

1 1 T −1
p(x | C = k ) = exp − (x − µk ) Σk (x − µk )
(2π)n/2 |Σk |1/2 2
Quadratic Discriminant Analysis: Maximum A
Posteriori (MAP) Estimation

Ĝ(x) = arg max p(C = k /X = x)

k
= arg max p(C = k )p(x/k )
k
= arg max log p(C = k )p(x/k )
k
1 1
= arg max − log |Σk | − (x − µk )T Σ−1
k (x − µk )
k 2 2

+ log πk

where, p(C = k ) = πk
Quadratic Discriminant Analysis: Discriminant
Function

Quadratic discriminant function:

1 1
δk (x) = − log |Σk | − (x − µk )T Σ−1
k (x − µk ) + log πk
2 2
1 T −1 1
= µTk Σ−1
k x − 2 µk Σk µk + log πk − 2 log |Σk |
1
− x T Σ−1
k x
2
This objective is now quadratic in x and so are the decision
boundaries.
: Classification Rule:

Ĝ(x) = arg max δk (x)

k
Decision Boundary: QDA

The decision boundary between class k and l is:

{x : δk (x) = δl (x)}
Quadratic decision boundary
GDA and Logistic Regression

GDA makes stronger modeling assumptions, and is more

data efficient (i.e., requires less training data to learn well)
when the modeling assumptions are correct or at least
approximately correct.
In GDA, the attributes are continuous valued. In Logistic
regression, attributes are can be discrete or continuous.
Logistic regression makes weaker assumptions, and is
significantly more robust to deviations from modeling
assumptions.
Specifically, when the data is indeed non-Gaussian, then in
the limit of large datasets, logistic regression will almost
always do better than GDA. For this reason, in practice
logistic regression is used more often than GDA

STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Probability and Statistics Cheat Sheet
100% (2)
Probability and Statistics Cheat Sheet
28 pages
6.5 M/D/1, M/M/1, & M/M/N Queuing
No ratings yet
6.5 M/D/1, M/M/1, & M/M/N Queuing
8 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Advanced Statistics
100% (1)
Advanced Statistics
131 pages
GP Gpss15 Session1
No ratings yet
GP Gpss15 Session1
244 pages
(Ebook PDF) Fundamental Statistics For The Behavioral Sciences 9th Edition by David C. Howellpdf Download
100% (4)
(Ebook PDF) Fundamental Statistics For The Behavioral Sciences 9th Edition by David C. Howellpdf Download
44 pages
Ken Black QA 5th Chapter16 Solution
0% (1)
Ken Black QA 5th Chapter16 Solution
16 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
C0 English
No ratings yet
C0 English
42 pages
A Guide To Control Charts
No ratings yet
A Guide To Control Charts
12 pages
3logistic Regression
No ratings yet
3logistic Regression
61 pages
16oct24 Annotations
No ratings yet
16oct24 Annotations
35 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
2 pages
4 - Probability Theory
No ratings yet
4 - Probability Theory
20 pages
Probablity Distribution
No ratings yet
Probablity Distribution
10 pages
Chapter 11 KNN Naive Bayes and LDA
No ratings yet
Chapter 11 KNN Naive Bayes and LDA
15 pages
Statistical Inference Cheat Sheet
No ratings yet
Statistical Inference Cheat Sheet
4 pages
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
No ratings yet
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
6 pages
2021 - Week - 3 - Ch.2 Random Process
No ratings yet
2021 - Week - 3 - Ch.2 Random Process
11 pages
Suresh Kumar 5-9 Chap Notes
No ratings yet
Suresh Kumar 5-9 Chap Notes
24 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
2 Mle
No ratings yet
2 Mle
28 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Applied Maths
No ratings yet
Applied Maths
34 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
STAT456 Study Guide
No ratings yet
STAT456 Study Guide
31 pages
2 Probability
No ratings yet
2 Probability
30 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
54 pages
Murphysolns
No ratings yet
Murphysolns
45 pages
Statistics and Probablity
No ratings yet
Statistics and Probablity
20 pages
Football Scores The Poisson Distribution and 30 Ye
No ratings yet
Football Scores The Poisson Distribution and 30 Ye
7 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
AE - Tema 3 - The Multivariate Gaussian Distribution
No ratings yet
AE - Tema 3 - The Multivariate Gaussian Distribution
6 pages
Lecture1 Introduction To GPs
No ratings yet
Lecture1 Introduction To GPs
172 pages
Intro To Data Science Lecture 2
No ratings yet
Intro To Data Science Lecture 2
12 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Solutions For Practice Set
No ratings yet
Solutions For Practice Set
7 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Variables Aleatorias
No ratings yet
Variables Aleatorias
14 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
Stat2112 1st and 2nd Quarter Exam
No ratings yet
Stat2112 1st and 2nd Quarter Exam
21 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Probability
No ratings yet
Probability
12 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Formulario Ep Probability and Statistics
No ratings yet
Formulario Ep Probability and Statistics
28 pages
Generative Algorithms
No ratings yet
Generative Algorithms
3 pages
Unit 5
No ratings yet
Unit 5
21 pages
Edexcel GCE: Tuesday 19 June 2001 Time: 1 Hour 30 Minutes
No ratings yet
Edexcel GCE: Tuesday 19 June 2001 Time: 1 Hour 30 Minutes
5 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Multivariate Transformations Interpretation
No ratings yet
Multivariate Transformations Interpretation
6 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Probability and Statistics Cookbook
No ratings yet
Probability and Statistics Cookbook
28 pages
Psws Probability Prop Size Bierrenbach
No ratings yet
Psws Probability Prop Size Bierrenbach
5 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Multivariate Statistical Distributions
No ratings yet
Multivariate Statistical Distributions
12 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Am (101-120) Analisis Multinivel
No ratings yet
Am (101-120) Analisis Multinivel
20 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Lecture Notes 1 36-705 Brief Review of Basic Probability
No ratings yet
Lecture Notes 1 36-705 Brief Review of Basic Probability
7 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
28 pages
Probability and Statistics - Cookbook
No ratings yet
Probability and Statistics - Cookbook
28 pages
Unit IV Lesson 9 Association of Attributes
No ratings yet
Unit IV Lesson 9 Association of Attributes
5 pages
Math-7 FLDP Quarter-4 Week-7
No ratings yet
Math-7 FLDP Quarter-4 Week-7
7 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Estimating Population Variance
No ratings yet
Estimating Population Variance
26 pages
Sst4e Tif 10 PDF
No ratings yet
Sst4e Tif 10 PDF
21 pages
Parametric Test: DR - Dr.Siswanto, MSC
No ratings yet
Parametric Test: DR - Dr.Siswanto, MSC
24 pages
Example of How To Use Multiple Linear Regression
No ratings yet
Example of How To Use Multiple Linear Regression
4 pages
STAT 2006 Chapter 2 - 2022
No ratings yet
STAT 2006 Chapter 2 - 2022
83 pages
Dual State-Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter
No ratings yet
Dual State-Parameter Estimation of Hydrological Models Using Ensemble Kalman Filter
13 pages
Sta301 Mid Term Solved Mcqs With References
No ratings yet
Sta301 Mid Term Solved Mcqs With References
29 pages
Research Chapter 4
No ratings yet
Research Chapter 4
5 pages
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
Tutorial 8 - Analysis of Variance (ANOVA) : Presented by Eng. Alaa Zarif & Eng. Lobna El Seify
No ratings yet
Tutorial 8 - Analysis of Variance (ANOVA) : Presented by Eng. Alaa Zarif & Eng. Lobna El Seify
19 pages
Vanbreukelen 2006
No ratings yet
Vanbreukelen 2006
6 pages
Ads511 Data Analysis Spss Exercise 1 7 Week 14
No ratings yet
Ads511 Data Analysis Spss Exercise 1 7 Week 14
48 pages
Statistics and Probability 4th Quarter
No ratings yet
Statistics and Probability 4th Quarter
3 pages
Chapter 4 Bayesian Machinery - Bayesian Hierarchical Models in Ecology
No ratings yet
Chapter 4 Bayesian Machinery - Bayesian Hierarchical Models in Ecology
10 pages
Non-Normal Process Capability Indices
No ratings yet
Non-Normal Process Capability Indices
6 pages
Business Analytics Calendar February 2023
No ratings yet
Business Analytics Calendar February 2023
1 page
ESSU Zitabelle-Cantos Regression
No ratings yet
ESSU Zitabelle-Cantos Regression
5 pages
MATH 1281 - Unit 2 Discussion Assignment
No ratings yet
MATH 1281 - Unit 2 Discussion Assignment
3 pages

4gaussian Discriminant

Uploaded by

4gaussian Discriminant

Uploaded by

Gaussian Discriminant Analysis

MA613 Data Mining

Data Soleus Gactrocnemius yi : 1/0

x2T x21 x22 0

x3T x31 x32 0

x4T x41 x42 1

x5T x51 x52 1

Consider two random variables X and Y . The values taken

p(xi |yj )p(yj )

p(x) = p(y = 1)p(x/y = 1) + p(y = 0)p(x/y = 0)

p(F = a) = p(B = r )p(F = a|B = r ) + p(B = b)p(F = a|B = b)

p(F = 0) = 1 − 11/20 = 9/20

p(B = b/F = O) = 1/3

Therefore if the observed fruit is orange, it is more probable to

A random vector is a random variable with multiple

Covariance is a measure of the joint variability of two

Σ is called the covariance matrix of X

x11 − A¯1 x12 − A¯2 . . . x1n − A¯n

An unbiased estimator of a parameter is an estimator

(x1 − µ)(x1 − µ)T + (x2 − µ)(x2 − µ)T

The value of covariance between two random variables lie

Cov(X,Y) = 0, no linear correlation.

Euclidean distance does not consider the distribution of the

The multivariate Gaussian distribution is a generalization of

The probability density function

where |Σ| is the determinant of the covariance matrix Σ.

sum of positive data

sum of negative data

Positive class {xp1 , xp2 , . . . xpk }

Discriminative Learning Algorithms

y = 0, 1. Find µk for each class . Find the common

Ĝ(x) = arg max p(y = k )p(x | y = k )

Define the linear discriminant function

The decision boundary between class k and l is:

µ̂k and a covariance matrix Σ̂k for each class separately

Ĝ(x) = arg max p(C = k /X = x)

Quadratic discriminant function:

Ĝ(x) = arg max δk (x)

The decision boundary between class k and l is:

GDA makes stronger modeling assumptions, and is more

You might also like