0% found this document useful (0 votes)

6 views43 pages

Classification 2024

The document discusses classification in statistical machine learning, focusing on logistic regression and the naive Bayes classifier for predicting qualitative responses. It explains the logistic model, including the estimation of regression coefficients and making predictions, as well as the use of Bayes' theorem and linear discriminant analysis. Additionally, it covers confusion matrices to evaluate classification performance and the implications of misclassification in predictive modeling.

Uploaded by

ayanmallick9933797081

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views43 pages

Classification 2024

Uploaded by

ayanmallick9933797081

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Classification

Prof. Asim Tewari

IIT Bombay

Asim Tewari, IIT Bombay ME 781: Engineering Data Mining and Applications
What is classification?
• The linear regression models assumes that the
response variable Y is quantitative. But in many
situations, the response variable is instead
qualitative.
• For example, eye color is qualitative, taking
qualitative on values blue, brown, or green. Often
qualitative variables are referred to as
categorical.
• Approaches for predicting qualitative responses,
a process that is known as classification

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Default data set. Left: The annual incomes and monthly credit card balances of a
number of individuals. The individuals who defaulted on their credit card payments are
shown in orange, and those who did not are shown in blue. Center: Boxplots of balance as a
function of default status. Right: Boxplots of income as a function of default status.
Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model
• A linear regression model to represent these
probabilities:

• In logistic regression, we use the logistic

function:

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
For binary response variable Y

• Classification using the Default data. Left: Estimated probability of

default using linear regression. Some estimated probabilities are
negative! The orange ticks indicate the 0/1 values coded for
default(No or Yes). Right: Predicted probabilities of default using
logistic regression. All probabilities lie between 0 and 1.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model
• The quantity p(X)/[1−p(X)] is called the odds, and can
take on any value odds between 0 and ∞.

• Values of the odds close to 0 and ∞ indicate very low

and very high probabilities of default, respectively. For
example, on average 1 in 5 people with an odds of 1/4
will default, since p(X) = 0.2 implies an odds of 0.2/
(1−0.2) = 1/4. Likewise on average nine out of every
ten people with an odds of 9 will default, since p(X) =
0.9 implies an odds of 0.9/(1−0.9) = 9.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model
• We can define log-odds or logit as

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model
• Estimating the Regression Coefficients:
• Likelihood function

• The basic intuition behind using maximum likelihood to fit a logistic

regression model is as follows: we seek estimates for β0 and β1 such that
the predicted probability ˆp(xi) of default for each individual, corresponds
as closely as possible to the individual’s observed default status.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model

• For the Default data, estimated coefficients of the

logistic regression model that predicts the
probability of default using balance. A one-unit
increase in balance is associated with an increase
in the log odds of default by 0.0055 units.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model
• Estimating the Regression Coefficients:
• Likelihood function

• The basic intuition behind using maximum likelihood to fit a logistic

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model
Making Predictions
• We predict that the default probability for an
individual with a balance of $1, 000 is

• Which is below 1%. In contrast, the predicted

probability of default for an individual with a
balance of $2, 000 is much higher, and equals
0.586 or 58.6%.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
The Logistic Model
• Predictors with more than two levels: One can
use qualitative predictors (with levels) with
the logistic regression model using the dummy
variable approach.
• Multiple Logistic Regression

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Classification of response variable
with more than two classes
• Why Not Linear Regression?
• We could consider encoding these values as a
quantitative response variable, Y , as follows:

• Unfortunately, this coding implies an ordering on the outcomes, putting

drug overdose in between stroke and epileptic seizure, and insisting that
the difference between stroke and drug overdose is the same as the
difference between drug overdose and epileptic seizure.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Classification of response variable
with more than two classes
• The Logistic Model

– The model is speciﬁed in terms of K − 1 log-odds or logit transformations (reflecting the constraint
that the probabilities sum to one).
– Although the model uses the last class as the denominator in the odds-ratios, the choice of
denominator is arbitrary in that the estimates are equivariant under this choice.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Bayes’ Theorem
You are planning a picnic today, but the morning
is cloudy
• Oh no! 50% of all rainy days start off cloudy!
• But cloudy mornings are common (about 40%
of days start cloudy)
• And this is usually a dry month (only 3 of 30
days tend to be rainy, or 10%)
• What is the chance of rain during the day?

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Bayes’ Theorem
Bayes’ Theorem is a way of finding a probability when we know certain other
probabilities.
The formula is:

P(A|B) = P(A) P(B|A)/P(B)

Which tells us:

how often A happens given that B happens, written P(A|B),
When we know:
how often B happens given that A happens, written P(B|A)
and how likely A is on its own, written P(A) and
how likely B is on its own, written P(B)

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Bayes’ Theorem
We will use Rain to mean rain during the day, and Cloud
to mean cloudy morning.
The chance of Rain given Cloud is written P(Rain|Cloud)
So let's put that in the formula:
P(Rain|Cloud) = P(Rain) P(Cloud|Rain)/P(Cloud)

• P(Rain) is Probability of Rain = 10%

• P(Cloud|Rain) is Probability of Cloud, given that Rain happens = 50%
• P(Cloud) is Probability of Cloud = 40%
• P(Rain|Cloud) = 0.1 x 0.50.4 = .125

Or a 12.5% chance of rain. Not too bad, let's have a

picnic!

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Naive Bayes Classifier
• The naive Bayes classifier is a probabilistic classifier based on
Bayes’s theorem. Let A and B denote two random events,
then

• If the event A can be decomposed into the disjoint events A1,

. . . , Ac, p(Ai) > 0, i=1, 2, …c, then

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Naive Bayes Classifier
• The relevant events considered in
classification are “object belongs to class i”, or
more briefly, just “i”, and “object has the
feature vector x”, or “x”. Replacing Ai, Aj and B
by these events yields

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Naive Bayes Classifier
If the p features in x are stochastically
independent, then

Inserting this into yields the classification

probabilities of the naive Bayes classifier.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Naive Bayes Classifier
• Naive Bayes classifier: data of the student
example

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Naive Bayes Classifier

So, a student who went to class regularly and studied the material will pass
with a probability of 93 %, and a deterministic naive Bayes classifier will yield
“passed” for this student, and all other students with the same history
Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Naive Bayes Classifier
Naive Bayes classifier: classifier function for the
student example

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Linear Discriminant Analysis
• Suppose that we wish to classify an observation into
one of K classes, where K ≥ 2. In other words, the
qualitative response variable Y can take on K possible
distinct and unordered values.
• Let πk represent the overall or prior probability that a
randomly chosen observation comes from the kth
class; this is the probability that a given observation is
associated with the kth category of the response
variable Y .
• Let fk(X) ≡ Pr(X = x|Y = k) denote the density function of
X for an observation that comes from the kth class.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Linear Discriminant Analysis
Let πk represent the overall or prior probability that a randomly chosen observation comes
from the kth class; this is the probability that a given observation is associated with the kth
category of the response variable Y .

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Linear Discriminant Analysis
Let fk(X) ≡ Pr(X = x|Y = k) denote the density function of X for an observation that
comes from the kth class.

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Linear Discriminant Analysis
• Then Bayes’ theorem states that

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Linear Discriminant Analysis
• Linear Discriminant Analysis for p = 1

• let us further assume that σ21 = . . . = σ2K

Asim Tewari, IIT Bombay ME 781: Statistical Machine Learning and Data Mining
Linear Discriminant Analysis
If all variance (s ) are equal, then: