0% found this document useful (0 votes)

2 views100 pages

Lecture2 Classification PartI

The document is a lecture on supervised learning classification methods, focusing on techniques such as K-Nearest Neighbors and Logistic Regression. It discusses the classification goal, training and test error rates, and introduces the Bayes classifier. The lecture also covers the logistic function and its advantages over linear regression for binary classification problems.

Uploaded by

guoxiaofan0225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views100 pages

Lecture2 Classification PartI

Uploaded by

guoxiaofan0225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

IG.

3510-Machine Learning
Lecture 2: Supervised learning: Classification
Part I

Dr. Patricia CONDE-CESPEDES

[email protected]

September 16th, 2024

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 1 / 62

Plan

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 2 / 62

Introduction

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 3 / 62

Introduction

Introduction to classification

In classification, the response Y is a qualitative or categorical

variable. Some examples:
In the Spam detection problem the target can take only two values
{”Spam”, ”mail”}.
If the variable is ”Origin”, then, the target variable takes more than
two labels {”American”, ”Asian”, ”African”, ”European”} .
The goal is:
Classification goal
Given feature vectors X and a qualitative response Y , the goal is to build a
classifier function C (X) that takes as input X and predicts its value for Y .

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 4 / 62

Introduction

Example
Goal: predict whether an individual will default on his/her credit card
payment, on the basis of annual income and monthly credit card balance.

individuals who defaulted (orange) and those who did not (blue).
Y : default credit payment based on balance X1 and income X2 .
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 5 / 62
Introduction

Training error and test error in classification

Training error rate: the proportion of misclassified observations in

the training set:
n
1X
1(yi 6= ŷi )
n
i=1

where ŷi is the predicted class by our classifier for the ith observation
and yi is the real value.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 6 / 62

Introduction

Training error and test error in classification

Training error rate: the proportion of misclassified observations in

the training set:
n
1X
1(yi 6= ŷi )
n
i=1

where ŷi is the predicted class by our classifier for the ith observation
and yi is the real value.
Test error rate: . For a given observation (x0 , y0 ), a good classifier
will have minimum estimated error test:

average(1(y0 6= ŷ0 ))

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 6 / 62

Introduction

The Bayes Classifier

In practice, we estimate the conditional probability of Y given X:

Suppose Y has κ categories numbered {1, 2, ..., κ}. Then, we want to
estimate:
pk (x) = P(Y = k|X = x), k = 1, 2, ..., κ.
pk (x) is the conditional probability of class k at value x.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 7 / 62

Introduction

The Bayes Classifier

In practice, we estimate the conditional probability of Y given X:

Suppose Y has κ categories numbered {1, 2, ..., κ}. Then, we want to
estimate:
pk (x) = P(Y = k|X = x), k = 1, 2, ..., κ.
pk (x) is the conditional probability of class k at value x.
The test error rate is minimized, on average, if given an observation, it is
assigned to the most likely class.
Such a classifier is called the Bayes classifier:

C (x) = j if pj (x) = max{p1 (x), p2 (x), ..., pκ (x)}

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 7 / 62

Introduction

Bayes error rate in a two-class problem

If there are only 2 classes, the Bayes classifier will choose the class j for
which:

P(Y = j|X = x0 ) > 0.5

Then, the Bayes error rate will be:

1 − E (max P(Y = j|X = x0 ))

In practical applications, we do not know the conditional distribution of Y

given X . Then, we will have to estimate it!

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 8 / 62

K-Nearest Neighbors

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 9 / 62

K-Nearest Neighbors

The K-nearest neighbors (KNN) classifier

Given a positive integer K and a test observation x0 , the KNN classifier

proceeds as follows:
1 first identifies the K points in the training data that are closest to x0 ,
represented by N0 .

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 10 / 62

K-Nearest Neighbors

The K-nearest neighbors (KNN) classifier

Given a positive integer K and a test observation x0 , the KNN classifier

proceeds as follows:
1 first identifies the K points in the training data that are closest to x0 ,
represented by N0 .
2 It then estimates the conditional probability for class j as follows:
1 X
p̂j (x0 ) = 1(yi = j)
K
i∈N0

that is, the fraction of points in N0 whose response values are j.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 10 / 62

K-Nearest Neighbors

The K-nearest neighbors (KNN) classifier

Given a positive integer K and a test observation x0 , the KNN classifier

that is, the fraction of points in N0 whose response values are j.

Finally, KNN applies the Bayes rule to classify the test observation x0 .

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 10 / 62

K-Nearest Neighbors

KNN small example

Training data set consisting of six blue and six orange observations.
Goal: make a prediction for the point labeled by the black cross.
KNN for K = 3,

On the right, KNN for K = 3 applied at every point (the test set) and the corresponding
KNN decision boundary.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 11 / 62
K-Nearest Neighbors

KNN classifier: simple but good!

KNN can often produce classifiers that are surprisingly close to the optimal
Bayes classifier.The purple dashed line represents the Bayes decision boundary.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 12 / 62

K-Nearest Neighbors

KNN:The value of k and the flexibility of the model

KNN: The flexibility decreases with the value of K .

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 13 / 62
Logistic Regression

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 14 / 62

Logistic Regression

Motivation with binary classification

Let us suppose we want to predict the Marital status. Then, we have

two levels and we can use the binary coding:
(
0 : No
Y =
1 : Yes

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 15 / 62

Logistic Regression

Motivation with binary classification

Let us suppose we want to predict the Marital status. Then, we have

two levels and we can use the binary coding:
(
0 : No
Y =
1 : Yes

We want to estimate a probability E (Y |X = x) = P(Y = 1|X = x) (

because Y is an indicator variable).
What if we perform linear regression?

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 15 / 62

Logistic Regression

Linear regression with a two-level output

Let us suppose we have only one predictor X , then we want to estimate

p(X ) = P(Y = 1|X ) using a linear regression model:

p(x) = β0 + β1 X

and classify as ”1:Yes” if p̂ > 0.5.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 16 / 62

Logistic Regression

Linear regression with a two-level output

Let us suppose we have only one predictor X , then we want to estimate

p(X ) = P(Y = 1|X ) using a linear regression model:

p(x) = β0 + β1 X

and classify as ”1:Yes” if p̂ > 0.5.

However, this model has some drawbacks:
Why not reverse coding {0:Yes, 1:No}? And the fit would be
different!

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 16 / 62

Logistic Regression

Linear regression with a two-level output

Let us suppose we have only one predictor X , then we want to estimate

p(X ) = P(Y = 1|X ) using a linear regression model:

p(x) = β0 + β1 X

and classify as ”1:Yes” if p̂ > 0.5.

However, this model has some drawbacks:
Why not reverse coding {0:Yes, 1:No}? And the fit would be
different!
Linear regression might produce probability estimates falling outside
the interval [0, 1].

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 16 / 62

Logistic Regression

Linear regression with a two-level output

Let us suppose we have only one predictor X , then we want to estimate

p(X ) = P(Y = 1|X ) using a linear regression model:

p(x) = β0 + β1 X

and classify as ”1:Yes” if p̂ > 0.5.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 16 / 62

Logistic Regression

Solution: the logistic function

The logistic function gives outputs between 0 and 1.

e β0 +β1 X
p(X ) = ,
1 + e β0 +β1 X
(e ≈ 2.71828 is the Euler’s number.)
No matter what values β0 , β1 or X take, p(X ) will always lie between 0
and 1.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 17 / 62

Logistic Regression

Linear versus Logistic Regression

In orange the observations, in blue, the fitted curve for each model.
For logistic regression, when y = 0, p(X ) takes low values, whereas for
y = 1, p(X ) takes high values.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 18 / 62
Logistic Regression

log odds or logit transformation of p(X )

Rewriting The logistic regression function is equivalent to:

p(X )
= e β0 +β1 X .
(1 − p(X ))
The quantity p(X )/(1 − p(X )) is called the odds.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 19 / 62

Logistic Regression

log odds or logit transformation of p(X )

Rewriting The logistic regression function is equivalent to:

p(X )
= e β0 +β1 X .
(1 − p(X ))
The quantity p(X )/(1 − p(X )) is called the odds.
Interpretation:The odds is the ratio between P(Y = 1/X ) and P(Y = 0/X )

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 19 / 62

Logistic Regression

log odds or logit transformation of p(X )

Rewriting The logistic regression function is equivalent to:

p(X )
= e β0 +β1 X .
(1 − p(X ))
The quantity p(X )/(1 − p(X )) is called the odds.
Interpretation:The odds is the ratio between P(Y = 1/X ) and P(Y = 0/X )
By taking the logarithm of both sides, we get:

p(X )
log = β0 + β1 X .
(1 − p(X ))

The left-hand side is called the log-odds or logit.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 19 / 62

Logistic Regression

log odds or logit transformation of p(X )

Rewriting The logistic regression function is equivalent to:

The left-hand side is called the log-odds or logit.

Interpretation increasing X by one unit changes the log odds by β1 , or equivalently it
multiplies the odds by e β1 .
if β1 > 0 then increasing X will be associated with increasing p(X ), and
if β1 < 0 then increasing X will be associated with decreasing p(X ).

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 19 / 62

Logistic Regression

Estimating the coefficients in logistic regression

Y is a Bernoulli random variable as it can take only two values:

P(Y = 1|X ) = p(x) and P(Y = 0|X ) = (1 − p(x)).

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 20 / 62

Logistic Regression

Estimating the coefficients in logistic regression

Y is a Bernoulli random variable as it can take only two values:

P(Y = 1|X ) = p(x) and P(Y = 0|X ) = (1 − p(x)).
Suppose we have a random (train) sample of size n: (y1 , x1 ), . . . , (yn , xn ),
the joint probability of observing the n values of Y is given as:
n
Y
p(xi )yi (1 − p(xi ))1−yi (1)
i=1
We suppose the observations are independently distributed.
The joint probability distribution is known in statistics as likelihood
function and will be denoted `(.):
e β0 +β1 Xi Y e β0 +β1 Xi
Y Y Y
`(β0 , β1 ) = p(xi ) (1 − p(xi )) = β +β X
1−
yi =1 yi =0 yi =1
1 + e 0 1 i y =0 1 + e β0 +β1 Xi
i

We estimate β0 and β1 that maximize the likelihood function.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 20 / 62

Logistic Regression

Example with the Credit data

Considered the Credit dataset, we want to predict the default of a

customer (pay or not) according to the balance.
The parameter estimates are β̂0 and β̂1 .

their standard errors measure the accuracy of the coefficient.

The z-statistic plays the same role as the t-statistic in the linear
regression output.
A small p-value implies that there is an association between balance and
probability of default.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 21 / 62

Logistic Regression

Making predictions

What is our estimated probability of default for someone with a balance of

$1000?

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 22 / 62

Logistic Regression

Making predictions

What is our estimated probability of default for someone with a balance of

$1000?

e β̂0 +β̂1 X e −10.6513+0.0055×1,000

p̂(X ) = = = 0.006
1 + e β̂0 +β̂1 X 1 + e −10.6513+0.0055×1,000

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 22 / 62

Logistic Regression

Making predictions

What is our estimated probability of default for someone with a balance of

$1000?

e β̂0 +β̂1 X e −10.6513+0.0055×1,000

p̂(X ) = = = 0.006
1 + e β̂0 +β̂1 X 1 + e −10.6513+0.0055×1,000
With a balance of $2000?

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 22 / 62

Logistic Regression

Making predictions

What is our estimated probability of default for someone with a balance of

$1000?

e β̂0 +β̂1 X e −10.6513+0.0055×1,000

p̂(X ) = = = 0.006
1 + e β̂0 +β̂1 X 1 + e −10.6513+0.0055×1,000
With a balance of $2000?

e β̂0 +β̂1 X e −10.6513+0.0055×2,000

p̂(X ) = = = 0.586
1 + e β̂0 +β̂1 X 1 + e −10.6513+0.0055×2,000

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 22 / 62

Logistic Regression

Logistic regression with qualitative predictors

We can be interested in predicting default based on a categorical
variable, for instance the student status.

What is the estimated probability of defaulting for a student (1: Yes,

0:Not)?

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 23 / 62

Logistic Regression

Logistic regression with qualitative predictors

We can be interested in predicting default based on a categorical
variable, for instance the student status.

What is the estimated probability of defaulting for a student (1: Yes,

0:Not)?
e −3.5041+0.4049×1
p̂(X ) = p̂(default=Yes|x = student) = = 0.0431
1 + e −3.5041+0.4049×1

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 23 / 62

Logistic Regression

Logistic regression with qualitative predictors

We can be interested in predicting default based on a categorical
variable, for instance the student status.

What is the estimated probability of defaulting for a student (1: Yes,

0:Not)?
e −3.5041+0.4049×1
p̂(X ) = p̂(default=Yes|x = student) = = 0.0431
1 + e −3.5041+0.4049×1
What about for an individual who is not a student?
e −3.5041+0.4049×0
p̂(X ) = p̂(default=Yes|x = non student) = = 0.0292
1 + e −3.5041+0.4049×0

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 23 / 62

Logistic Regression

Multiple Logistic Regression

Suppose there is more than one regressor. Analogously to the extension

from simple to multiple linear regression, we have:

p(X )
log = β0 + β1 X1 + . . . + βp Xp
(1 − p(X ))

then,

e β0 +β1 X1 +...+βp Xp
p(X ) =
1 + e β0 +β1 X1 +...+βp Xp

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 24 / 62

Logistic Regression

Multiple Logistic Regression example

Consider the Credit data, we want to predict default based on 3
predictors: balance, income and student.

Then, we can make predictions:

For example, a student with a credit card balance of $1, 500 and an income of 40 K$
has an estimated probability of default of:

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 25 / 62

Logistic Regression

Multiple Logistic Regression example

Consider the Credit data, we want to predict default based on 3
predictors: balance, income and student.

Then, we can make predictions:

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 25 / 62

Logistic Regression

Multiple Logistic Regression example

Consider the Credit data, we want to predict default based on 3
predictors: balance, income and student.

Then, we can make predictions:

For example, a student with a credit card balance of $1, 500 and an income of 40 K$
has an estimated probability of default of:
e −10.869+0.0057×1,500+0.003×40−0.6468×1
p̂(X ) = = 0.058
1 + e −10.869+0.0057×1,500+0.003×40−0.6468×1
A non-student with the same balance and income has an estimated probability of default
of:
e −10.869+0.00574×1,500+0.003×40−0.6468×0
p̂(X ) = = 0.105
1 + e −10.869+0.00574×1,500+0.003×40−0.6468×0
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 25 / 62
Logistic Regression

Logistic regression with more than two classes

What happens if the target variable has more than 2 categories?

We can generalize logistic regression to a κ-level output variable as follows:

e β0k +β1k X1 +...+βpk Xp

P(Y = k|X ) = Pκ β0` +β1` X1 +...+βp` Xp
k ∈ 1, ..., κ
`=1 e
This is also called softmax function.
In this case there is a linear function for each class, except the last one
since all the probabilities sum up to 1.
So, only κ − 1 linear functions are fitted.
Logistic regression with more than two classes is also referred to as
multinomial logistic regression.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 26 / 62

Linear Discriminant Analysis

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 27 / 62

Linear Discriminant Analysis Introduction

Introduction to Discriminant Analysis

In Discriminant Analysis:
We treat the predictors X as random continuous variables and model
the distribution of X in each of the classes separately.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 28 / 62

Linear Discriminant Analysis Introduction

Introduction to Discriminant Analysis

In Discriminant Analysis:
We treat the predictors X as random continuous variables and model
the distribution of X in each of the classes separately.
Next, use Bayes theorem to obtain P(Y = k|X = x).

Bayes’ theorem
Let A1 , A2 , . . . , Aκ be a collection of κ mutually exclusive and exhaustive
events with prior probabilities P(Ak ) ∀k ∈ {1, . . . , κ}. Then, given an
event B for which P(B) > 0, the posterior probability of Ak given that B
has occurred is:

P(Ak ∩ B) P(B|Ak )P(Ak )

P(Ak |B) = = Pκ
P(B) `=1 P(B|A` )P(A` )

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 28 / 62

Linear Discriminant Analysis Introduction

Bayes’ theorem scheme

P(Ak ∩ B) P(B|Ak )P(Ak )

P(Ak |B) = = P5
P(B) `=1 P(B|A` )P(A` )

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 29 / 62

Linear Discriminant Analysis Introduction

Bayes theorem for LDA (Linear Discriminant Analysis)

Using the Bayes’ theorem for the classification problem, the probability of
class k given an observation x is:

P(X = x|Y = k)P(Y = k)

P(Y = k|X = x) =
P(X = x)

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 30 / 62

Linear Discriminant Analysis Introduction

Bayes theorem for LDA (Linear Discriminant Analysis)

Using the Bayes’ theorem for the classification problem, the probability of
class k given an observation x is:

P(X = x|Y = k)P(Y = k)

P(Y = k|X = x) =
P(X = x)

We will use the following notation:

πk fk (x)
P(Y = k|X = x) = pk (X = x) = Pκ
`=1 π` f` (x)

where:
πk = P(Y = k) represent the overall or prior probability that a
randomly chosen observation comes from the kth class;
fk (x) = P(X = x|Y = k) is the density of X for an observation that
belongs to class k.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 30 / 62
Linear Discriminant Analysis Introduction

Visual example: LDA with κ = 2 and p = 1

To simplify, we assume fk (X ) is a normal distribution.
Example: In the case of 2 classes, we classify a new point according to
which density is higher and one explanatory variable X .

On the left, π1 = π2 , then compare f1 (x) and f2 (x).

On the right, π1 6= π2 , then compare π1 f1 (x) and π2 f2 (x).
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 31 / 62
Linear Discriminant Analysis Introduction

Linear Discriminant Analysis when p = 1

We assume The density of X in class k follows a Gaussian density
N (µk , σk2 ) :
x−µk 2

1 − 21
fk (x) = √ e σ k
2πσk
where:
µk is the mean of X in class k and
σk2 is the variance of X in class k. For now, we assume
σ1 = . . . σκ = σ are the same among all the classes.
Plugging this into Bayes formula, we get for pk (x) = P(Y = k|X = x)
2
1 x−µk
1 −
πk √2πσ e 2 σk
k
pk (x) =
x−µ`
2 (2)
Pκ − 12
√ 1
` π` 2πσ` e
σ`

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 32 / 62

Linear Discriminant Analysis Introduction

Discriminant functions

The Bayes classifier involves assigning an observation X = x to the class

for which pk (x) is largest. Taking logs, and discarding terms that do not
depend on k, this is equivalent to assigning x to the class with the largest
discriminant score:

µk µ2k
−
δk (x) = x + log(πk )
σ2 2σ 2
Remark that δk is a linear function of x. That is where the name Linear
Discriminant Analysis (LDA) comes from.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 33 / 62

Linear Discriminant Analysis Introduction

Discriminant functions, example (1/2)

If there are κ = 2 classes and π1 = π2 = 0.5, then the decision boundary is

at
µ1 + µ2
x=
2
µ21 −µ22 µ1 +µ2
That is δ1 (x) = δ2 (x) ⇔ x = 2(µ1 −µ2 ) = 2 Then, a test observation

µ1 +µ2
x0 will be classified in class 1 if x0 > 2 (here we suppose µ1 > µ2 ) and
to class 2 otherwise.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 34 / 62

Linear Discriminant Analysis Introduction

Discriminant functions, example (2/2)

In this example an observation is equally likely to come from either class,
that is, π1 = π2 = 0.5.

The mean and variance parameters for the two density functions are µ1 = −1.25,
µ2 = 1.25, and σ12 = σ22 = 1
The Bayes classifier assigns the observation to class 1 if x < 0 and class 2 otherwise.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 35 / 62
Linear Discriminant Analysis Introduction

Estimation of the parameters

In practice, even if we know X is drawn from a Gaussian distribution, the

parameters are unknown, therefore we have to estimate them:
nk
π̂k =
n

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 36 / 62

Linear Discriminant Analysis Introduction

Estimation of the parameters

In practice, even if we know X is drawn from a Gaussian distribution, the

parameters are unknown, therefore we have to estimate them:
nk
π̂k =
n
1 X
µ̂k = xi
nk
i:yi =k

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 36 / 62

Linear Discriminant Analysis Introduction

Estimation of the parameters

In practice, even if we know X is drawn from a Gaussian distribution, the

parameters are unknown, therefore we have to estimate them:
nk
π̂k =
n
1 X
µ̂k = xi
nk
i:yi =k
κ κ
2 1 X X X nk − 1 2
2
σ̂ = (xi − µ̂k ) = σ̂
n−κ n−κ k
k=1 i:yi =k k=1
where: σ̂k2 = nk1−1 i:yi =k (xi − µ̂k )2 is the usual formula for the
P
estimated variance within the kth class.
where n is the total number of training observations, nk is the number of training
observations in the kth class. σ̂ 2 can be seen as a weighted average of the sample
variances for each of the κ classes.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 36 / 62

Linear Discriminant Analysis Introduction

Discriminant functions, example with estimated parameters

Simulated data and the corresponding histogram for 20 observations from
each class.

On the left the theoretical Bayes boundary (dashed line), on the right the decision
boundary was calculated with the estimates (black solid line)
Since π̂1 = π̂2 , the decision boundary corresponds to the midpoint between the sample
means for the two classes, (µ1 + µ2 )/2.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 37 / 62

Linear Discriminant Analysis Introduction

Linear Discriminant Analysis for p > 1

We assume that X = (X1 , X 2, ..., Xp) is drawn from a multivariate

Gaussian or multinormal distribution

Left: Equal variance and zero correlation, Right: different variances and existing correlation.
1 1 T Σ−1 (x−µ)
where the density can be written: f (x) = p 1 e − 2 (x−µ)
(2π) |Σ| 2 2

Where µ ∈ Rp is the mean vector and Σ is the covariance matrix.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 38 / 62

Linear Discriminant Analysis Introduction

LDA with p > 1 predictors

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 39 / 62

Linear Discriminant Analysis Introduction

LDA with p > 1 predictors

The LDA classifier assumes that the observations in the kth class are
drawn from a multivariate Gaussian distribution N (µk , Σ), where:
µk is mean vector of X specific to class k, and
Σ is a covariance matrix that is supposed common to all κ classes.
Plugging the density function for the kth class, fk (X = x), into the Bayes
formula and a little of algebra reveals that the Bayes classifier assigns an
observation X = x to the class for which:
1
δk (x) = x T Σ−1 µk − µT Σ−1 µk + log(πk )
2 k
is largest.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 39 / 62

Linear Discriminant Analysis Introduction

LDA with p > 1 predictors

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 39 / 62

Linear Discriminant Analysis Introduction

Example for p = 2 and κ = 3

Three equally-sized Gaussian classes are shown with class-specific mean
vectors and a common covariance matrix.
The dashed lines represent
the theoretical Bayes
decision boundaries. So,
they represent the set of
values x for which
δk (x) = δ` (x) for k 6= `.
There is one line per each
pair of classes.

These three Bayes decision boundaries divide the predictor space into three
regions. The Bayes classifier will classify an observation according to the
region in which it is located.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 40 / 62
Linear Discriminant Analysis Introduction

Example of estimation for p = 2 and κ = 3

Once more we estimate the unknown parameters µ1 , . . . , µκ , π1 , . . . , πκ ,
and Σ. Given a new observation X = x, LDA calculates δ̂(x) and classifies
to the class for which it is largest.

On the right, the estimated LDA decision boundaries are shown as solid black lines.
Here, n = 60 observations, 20 per class.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 41 / 62
Linear Discriminant Analysis Introduction

From δk (x) to probabilities

Once we have the estimates δ̂k (x), we can turn these into estimates for
class probabilities:

π̂k e δ̂k (x)

p̂(Y = k|X = x) = Pκ .
δ̂` (x)
`=1 π̂` e

So classifying to the largest δ̂k (x) amounts to classifying to the class for
which P(Y = k|X = x) is the largest.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 42 / 62

Linear Discriminant Analysis Introduction

From δk (x) to probabilities

Once we have the estimates δ̂k (x), we can turn these into estimates for
class probabilities:

π̂k e δ̂k (x)

p̂(Y = k|X = x) = Pκ .
δ̂` (x)
`=1 π̂` e

So classifying to the largest δ̂k (x) amounts to classifying to the class for
which P(Y = k|X = x) is the largest.
When κ = 2, classify to class 2 if P(Y = 2|X = x) > 0.5, else to class 1.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 42 / 62

Other forms of Discriminant Analysis

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 43 / 62

Other forms of Discriminant Analysis

Quadratic Discriminant Analysis (QDA)

QDA, like LDA assumes the X are drawn from a multivariate Gaussian
distribution. However, unlike LDA, QDA assumes that
each class has its own covariance matrix.

If X comes from the kth class, then X ∼ N (µk , Σk )

According to bayes classifier, an observation x will be assigned to class k if:

1 1
δ(x) = − (x − µk )T Σ−1
k (x − µk ) + log πk − log |σk | (3)
2 2
1 1 1
= − x T Σ−1
k x + x T Σ−1
k µ k − µT −1
k Σk µk + log πk − log |σk |
2 2 2
is largest.

Notice that this is a quadratic function of x. That is where the name

QDA comes from!
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 44 / 62
Other forms of Discriminant Analysis

Why to use QDA instead of LDA?

The answer lies in the bias-variance trade-off:

Estimating a covariance matrix implies estimating p(p + 1)/2
parameters for each class. Whereas LDA implies estimating only one
covariance matrix.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 45 / 62

Other forms of Discriminant Analysis

Why to use QDA instead of LDA?

The answer lies in the bias-variance trade-off:

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 45 / 62

Other forms of Discriminant Analysis

Why to use QDA instead of LDA?

The answer lies in the bias-variance trade-off:

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 45 / 62

Other forms of Discriminant Analysis

Why to use QDA instead of LDA?

The answer lies in the bias-variance trade-off:

Estimating a covariance matrix implies estimating p(p + 1)/2
parameters for each class. Whereas LDA implies estimating only one
covariance matrix.
LDA is a much less flexible classifier than QDA, and so has
substantially lower variance.
However, if LDA’s assumption that the κ classes share a common
covariance matrix is wrong, then LDA can suffer from high bias. So,
QDA would be a better choice.
If n is small and so reducing variance is crucial LDA tends to be
better than QDA. Incontrast, QDA is recommended if the training set
is very large, so the variance of the classifier is not a major concern.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 45 / 62

Other forms of Discriminant Analysis

QDA, example
Observations drawn from two Gaussian variables.

Bayes’ classifier (purple dashed), LDA (black dotted), and QDA (green solid).
Left: common correlation of 0.7 among the two classes.
Right: Orange class has 0.7 correlation, whereas blue class has -0.7
correlation.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 46 / 62
Other forms of Discriminant Analysis

Naive Bayes

The Bayes theorem implies:

πk fk (x)
P(Y = k|X = x) = pk (X = x) = Pκ
`=1 π` f` (x)

Naive Bayes classifier assumes

Qp conditional independence between the
feature variables, fk (x) = j=1 fjk (xj ). For a Gaussian distribution, this
means that Σk are diagonal.
 
p p
Y 1 X (xj − µkj )2
δk (x) ∝ log πk
 fkj (xj ) = −

2
+ log πk
2 σkj
j=1 j=1

It is useful when p is large. Despite strong assumptions, naive Bayes often

produces good classification results.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 47 / 62

Evaluating the quality of the predictions

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 48 / 62

Evaluating the quality of the predictions

LDA on the Credit data, the confusion matrix (1/2)

We want to predict whether or not an individual will default on the basis

of credit card balance and income. The confusion matrix

(23 + 252)/10000 errors, so a 2.75% training error rate. In contrast the

quantity (81 + 9644)/10000 = 97.25% is called accuracy!

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 49 / 62

Evaluating the quality of the predictions

LDA on the Credit data, the confusion matrix (1/2)

We want to predict whether or not an individual will default on the basis

of credit card balance and income. The confusion matrix

(23 + 252)/10000 errors, so a 2.75% training error rate. In contrast the

quantity (81 + 9644)/10000 = 97.25% is called accuracy!
However:
Only 3.33% of the individuals defaulted. So, a trivial classifier that
always predicts not default, will result in an error rate of 3.33%.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 49 / 62

Evaluating the quality of the predictions

LDA on the Credit data, the confusion matrix (2/2)

Of the true No’s, we make 23/9667 = 0.2% errors; of the true Yes’s,
we make 252/333 = 75.7% errors!

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 50 / 62

Evaluating the quality of the predictions

Types of errors

A binary classifier can make two types of errors:

False positive rate: The fraction of negative examples that are
classified as positive, 0.2% in example.
False negative rate: The fraction of positive examples that are
classified as negative, 75.7% in example.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 51 / 62

Evaluating the quality of the predictions

Types of errors

A binary classifier can make two types of errors:

It is often of interest to evaluate class-specific performance.

The bank can be more interested in detecting people who default than
people that do not default.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 51 / 62

Evaluating the quality of the predictions

Confusion matrix

sensitivity or recall (True positive rate): percentage of true

defaulters that are identified, 24.3% in the example.
specificity (True negative rate): percentage of non-defaulters that
are correctly identified, this case, 99.8% in the example.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 52 / 62
Evaluating the quality of the predictions

Changing the threshold

The LDA produces a low sensitivity because it tries to approximate the

Bayes classifier. The Bayes classifier will yield the smallest possible total
number of misclassified observations, irrespective of which class the errors
come from.
LDA assigns an observation to class Yes if:

p̂(Y = Yes|Balance,Income) ≥ 0.5

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 53 / 62

Evaluating the quality of the predictions

Changing the threshold

The LDA produces a low sensitivity because it tries to approximate the

p̂(Y = Yes|Balance,Income) ≥ 0.5

In contrast, the bank might particularly wish to avoid incorrectly

classifying an individual who will default. Why not to change this
threshold and classify any customer with a posterior probability of default
above 20% to the default class?

p̂(Y = Yes|Balance,Income) ≥ 0.2

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 53 / 62

Evaluating the quality of the predictions

LDA for Credit data with threshold 0.2

Now the false negative rate decreased to 41.4%.

However, the false positive rate has increased. As a result the overall
error rate has increased slightly to 3.73%.
We can try different values of threshold.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 54 / 62

Evaluating the quality of the predictions

Varying the threshold

The trade-off that results from modifying the threshold value for the
posterior probability of default.

As the threshold is reduced, the error rate among individuals who default
decreases, but the error rate among the individuals who do not default
increases.
P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 55 / 62
Evaluating the quality of the predictions

ROC (Receiver Operating Characteristics) curve

The overall performance of a classifier, summarized over all possible
thresholds, is given by the area under the (ROC) curve (AUC).
The ROC curve plots the sensitivity versus (1-specificity )

Ideally AUC = 1.
A classifier that performs
no better than chance has
an AUC = 0.5.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 56 / 62

Evaluating the quality of the predictions

ROC (Receiver Operating Characteristics) curve

The overall performance of a classifier, summarized over all possible
thresholds, is given by the area under the (ROC) curve (AUC).
The ROC curve plots the sensitivity versus (1-specificity )

Ideally AUC = 1.
A classifier that performs
no better than chance has
an AUC = 0.5.
ROC curves are useful for
comparing different
classifiers, since they take
into account all possible
thresholds.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 56 / 62

Evaluating the quality of the predictions

Confusion matrix: F1 score

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 57 / 62

A Comparison of Classification Methods

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 58 / 62

A Comparison of Classification Methods

Logistic Regression versus LDA

For a two-class problem, one can show that for LDA:

p1 (x) p1 (x)
log = log = c0 + c1 x1 + . . . + cp xp
1 − p1 (x) p2 (x)

Hence, both, LDA and logistic regression have a linear boundary.

The difference is in how the parameters are estimated.
LDA assumes that the observations are drawn from a Gaussian
distribution with a common covariance matrix in each class, it is
preferable over logistic regression when this assumption approximately
holds. Conversely, logistic regression can outperform LDA if these
Gaussian assumptions are not met.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 59 / 62

A Comparison of Classification Methods

Logistic Regression versus LDA

For a two-class problem, one can show that for LDA:

p1 (x) p1 (x)
log = log = c0 + c1 x1 + . . . + cp xp
1 − p1 (x) p2 (x)

Hence, both, LDA and logistic regression have a linear boundary.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 59 / 62

A Comparison of Classification Methods

Summary

When the true decision boundaries are linear, then the LDA and
logistic regression approaches will tend to perform well.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 60 / 62

A Comparison of Classification Methods

Summary

When the true decision boundaries are linear, then the LDA and
logistic regression approaches will tend to perform well.
When the boundaries are moderately non-linear, QDA may give better
results.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 60 / 62

A Comparison of Classification Methods

Summary

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 60 / 62

A Comparison of Classification Methods

Summary

When the true decision boundaries are linear, then the LDA and
logistic regression approaches will tend to perform well.
When the boundaries are moderately non-linear, QDA may give better
results.
For much more complicated decision boundaries, a non-parametric
approach such as KNN can be superior. But the level of smoothness
for a non-parametric approach must be chosen carefully.
LDA is useful when n is small, or the classes are well separated, and
Gaussian assumptions are reasonable. Also when κ > 2.

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 60 / 62

A Comparison of Classification Methods

Summary

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 60 / 62

References

Outline

1 Introduction

2 K-Nearest Neighbors

3 Logistic Regression

4 Linear Discriminant Analysis

5 Other forms of Discriminant Analysis

6 Evaluating the quality of the predictions

7 A Comparison of Classification Methods

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 61 / 62

References

James, Gareth; Witten, Daniela; Hastie, Trevor and Tibshirani,

Robert. ”An Introduction to Statistical Learning with Applications in
R”, 2nd edition, New York : ”Springer texts in statistics”, 2021. Site
web: https://hastie.su.domains/ISLR2/ISLRv2_website.pdf
Hastie, Trevor; Tibshirani, Robert and Friedman, Jerome (2009).
”The Elements of Statistical Learning (Data Mining, Inference, and
Prediction), 2nd edition”. New York: ”Springer texts in statistics”.
Site web :
http://statweb.stanford.edu/~tibs/ElemStatLearn/

P. Conde-Céspedes Lecture 2: Classification (Part I) September 16th, 2024 62 / 62

Supervised Machine Learning Algorithm
100% (1)
Supervised Machine Learning Algorithm
111 pages
Naive Bayes
No ratings yet
Naive Bayes
2 pages
Sy19 A22 Cours3
No ratings yet
Sy19 A22 Cours3
98 pages
KNN and Baysian Method
No ratings yet
KNN and Baysian Method
43 pages
2223hk1 Slide03 ML2022
No ratings yet
2223hk1 Slide03 ML2022
33 pages
2 - Classification Models
No ratings yet
2 - Classification Models
52 pages
Module 3 Intro
No ratings yet
Module 3 Intro
46 pages
Supervised Classification 3601
No ratings yet
Supervised Classification 3601
39 pages
L24 Classification
No ratings yet
L24 Classification
40 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
05 Logistic Regression
No ratings yet
05 Logistic Regression
12 pages
Lecture 05 - Logistic Regression
No ratings yet
Lecture 05 - Logistic Regression
10 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
Machine Learning - Classification
No ratings yet
Machine Learning - Classification
13 pages
ML 5
No ratings yet
ML 5
76 pages
MI - Unit 3
No ratings yet
MI - Unit 3
107 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
58 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Class
No ratings yet
Class
102 pages
1 Intro
No ratings yet
1 Intro
5 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Aluminium Foil
0% (1)
Aluminium Foil
45 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
Lec 2
No ratings yet
Lec 2
37 pages
Legend of Zelda Breath of The Wild - Revalis Theme Piano Cover
100% (2)
Legend of Zelda Breath of The Wild - Revalis Theme Piano Cover
2 pages
EDA Lecture Module 2
100% (1)
EDA Lecture Module 2
42 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Lista Fabio Cozman
No ratings yet
Lista Fabio Cozman
6 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
ML Unit-IV Notes
No ratings yet
ML Unit-IV Notes
49 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Week 4 Day 2 Science
No ratings yet
Week 4 Day 2 Science
3 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
Customer Service Department: WETT001304 Komatsu Utility Europe Pag. 1 Di 46
100% (1)
Customer Service Department: WETT001304 Komatsu Utility Europe Pag. 1 Di 46
46 pages
Braced Cuts
No ratings yet
Braced Cuts
62 pages
KDP Amazon
100% (1)
KDP Amazon
7 pages
BECE352E Module 3
No ratings yet
BECE352E Module 3
64 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
6 Classification
No ratings yet
6 Classification
53 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Data MIning Chapter 8
No ratings yet
Data MIning Chapter 8
11 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
After The Storm
No ratings yet
After The Storm
4 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Tender - GCMS Specification
No ratings yet
Tender - GCMS Specification
5 pages
English Proficiency Test For Aviation: Set 33-Pilot
No ratings yet
English Proficiency Test For Aviation: Set 33-Pilot
13 pages
NMI
No ratings yet
NMI
36 pages
Bart Daily Routine 6TH
50% (2)
Bart Daily Routine 6TH
2 pages
Patient Registration Form 29
No ratings yet
Patient Registration Form 29
8 pages
Experiment-1: Aim: Equipment Required
No ratings yet
Experiment-1: Aim: Equipment Required
17 pages
Template Sop 2 & 3-Sheryl A. Vicente
No ratings yet
Template Sop 2 & 3-Sheryl A. Vicente
8 pages
EE 432/532 Diffusion Examples - 1
No ratings yet
EE 432/532 Diffusion Examples - 1
13 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Final Copy 222-1
No ratings yet
Final Copy 222-1
51 pages
Lesson Plan: Veer Surendra Sai University of Technology
No ratings yet
Lesson Plan: Veer Surendra Sai University of Technology
2 pages
O5+6, Part 2 - Fatigue, Creep and Wear
No ratings yet
O5+6, Part 2 - Fatigue, Creep and Wear
28 pages
Menu Bela Terbaru 2023
No ratings yet
Menu Bela Terbaru 2023
10 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Memorandum: Rivergate Place, Murrarie, QLD Hope Harbour Marina, QLD +1300 052 081
No ratings yet
Memorandum: Rivergate Place, Murrarie, QLD Hope Harbour Marina, QLD +1300 052 081
3 pages
Anila 8611
No ratings yet
Anila 8611
18 pages
SA 226 LUBRICATION - Maintenance Practices
No ratings yet
SA 226 LUBRICATION - Maintenance Practices
12 pages
How Much Power
No ratings yet
How Much Power
5 pages
"To Dsign and Fabricate 360 Flexible Drilling Machine": Class Room Notes Warm
No ratings yet
"To Dsign and Fabricate 360 Flexible Drilling Machine": Class Room Notes Warm
3 pages
Accident Definition & Meaning - Merriam-Webster
No ratings yet
Accident Definition & Meaning - Merriam-Webster
8 pages
Holacracy - The New Management System
No ratings yet
Holacracy - The New Management System
11 pages
Bus 1010 E-Portfolio Assignment
No ratings yet
Bus 1010 E-Portfolio Assignment
6 pages
LinearAI-DS Mid ch1-6 2021S2 DR - Omar
No ratings yet
LinearAI-DS Mid ch1-6 2021S2 DR - Omar
10 pages
Ankur's Resume
No ratings yet
Ankur's Resume
2 pages
Cet455-Qp May 24
No ratings yet
Cet455-Qp May 24
2 pages
Exercises of Equations and Disequations
From Everand
Exercises of Equations and Disequations
Simone Malacrida
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet