0% found this document useful (0 votes)
49 views32 pages

Probabilistic Models For Classification

The document discusses probabilistic models for classification, including generative models that model the joint probability distribution and discriminative models that directly model the posterior probability. Generative models include Gaussian discriminant analysis and naive Bayes classifiers, while discriminative models include logistic regression. The document also covers evaluating classification performance using metrics like accuracy, precision, recall, ROC curves, estimating generalization error through cross-validation, and other related topics.

Uploaded by

Sweta Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views32 pages

Probabilistic Models For Classification

The document discusses probabilistic models for classification, including generative models that model the joint probability distribution and discriminative models that directly model the posterior probability. Generative models include Gaussian discriminant analysis and naive Bayes classifiers, while discriminative models include logistic regression. The document also covers evaluating classification performance using metrics like accuracy, precision, recall, ROC curves, estimating generalization error through cross-validation, and other related topics.

Uploaded by

Sweta Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Probabilistic Models

for Classification

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 1
Binary Classification Problem
•N  iid training samples:
• Class label:
• Feature vector:

• Focus on modeling conditional probabilities


• Needs to be followed by a decision step

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 2
Generative models for classification
•  Model joint probability

• Class posterior probabilities via Bayes rule

• Prior probability of a class:


• Class conditional probabilities:

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 3
Generative Process for Data
• Enables
  generation of new data points
• Repeat N times
• Sample class
• Sample feature value

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 4
Conditional Probability in a Generative Model

• 

where

• Logistic function

• Independent of specific form of class


conditional probabilities

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 5
Case: Binary classification with Gaussians

•  Prior class probability

• Gaussian class densities

• Parameters
• Note: Covariance parameter is shared

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 6
Case: Binary classification with Gaussians

• 

Where

• Quadratic term cancels out

• Linear classification model


• Class boundary

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 7
Special Cases
• 
• Class boundary:

• Class boundary shifts by

• Arbitrary
• Decision boundary still linear but
not orthogonal to the hyper-plane
joining the two means

Image from Michael Jordan’s book

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 8
MLE for Binary Gaussian
•  Formulate loglikelihood in terms of parameters

• Maximize loglikelihood wrt parameters

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 9
Case: Gaussian Multi-class Classification
• 
• Prior
• Class conditional densities

where
• Soft-max / normalized exponential function
• For Gaussian class conditionals

• The decision boundaries are still lines in the feature space

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 10
MLE for Gaussian Multi-class
• Similar to the Binary case

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 11
Case: Naïve Bayes
• Similar
  to Gaussian setting, only features are
discrete (binary, for simplicity)

• “Naïve” Assumption: Feature dimensions


conditionally independent given class label
• Very different from independence assumption

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 12
Case: Naïve Bayes
•  Class conditional probability

• Posterior probability

Where

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 13
MLE for Naïve Bayes
 
• Formulate loglikelihood in terms of parameters

• Maximize likelihood wrt parameters

• MLE overfits
• Susceptible to 0 frequencies in training data

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 14
Bayesian Estimation for Naïve Bayes
• Model
  the parameters as random variables and
analyze posterior distributions
• Take point estimates if necessary

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 15
Discriminative Models for Classification
• Familiar
  form for posterior class
distribution

• Model posterior distribution directly

• Advantages as classification model


• Fewer assumptions, fewer parameters
Image from Michael Jordan’s book

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 16
Logistic Regression for Binary Classification

•  Apply model for binary setting

• Formulate likelihood with weights as parameters

where

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 17
MLE for Binary Logistic Regression
•  Maximize likelihood wrt weights

• No closed form solution

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 18
MLE for Binary Logistic Regression
• Not
  quadratic but still convex
• Iterative optimization using gradient descent (LMS
algorithm)

• Batch gradient update

• Stochastic gradient descent update

• Faster algorithm – Newton’s Method


• Iterative Re-weighted least squares (IRLS)

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 19
Bayesian Binary Logistic Regression
• Bayesian model exists, but intractable
• Conjugacy breaks down because of the sigmoid function
• Laplace approximation for the posterior

• Major challenge for Bayesian framework

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 20
Soft-max regression for Multi-class Classification

• Left as exercise

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 21
Choices for the activation function
• Probit function: CDF of the Gaussian
• Complementary log-log model: CDF of exponential

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 22
Generative vs Discriminative: Summary
• Generative models
• Easy parameter estimation
• Require more parameters OR simplifying assumptions
• Models and “understands” each class
• Easy to accommodate unlabeled data
• Poorly calibrated probabilities

• Discriminative models
• Complicated estimation problem
• Fewer parameters and fewer assumptions
• No understanding of individual classes
• Difficult to accommodate unlabeled data
• Better calibrated probabilities

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 23
Decision Theory
• From posterior distributions to actions
• Loss functions measure extent of error
• Optimal action depends on loss function

• Reject option for classification problems

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 24
Loss functions
•  0-1 loss

• Minimized by MAP estimate (posterior mode)

• loss

• Expected loss: (Min mean squared error)


• Minimized by Bayes estimate (posterior mean)

• loss

Minimized by posterior median


Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 25
Evaluation of Binary Classification Models

• Consider
  class conditional distribution
• Decision rule:

• Confusion Matrix

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 26
ROC curves

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 27
ROC curves

• Plot TPR and FPR for


different values of
decision threshold

• Quality of classifier
measured by area under
the curve (AUC)

Image from wikipedia

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 28
Precision-recall curves
  settings such as
•• In
information retrieval,
• Precision =
• Recall =
• Plot precision vs recall for
varying values of threshold
• Quality of classifier
measured by area under the
curve (AUC) or by specific
values e.g. P@k

Image from scikit-learn

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 29
F1-scores
• To
  evaluate at a single threshold, need to combine
precision and recall

• when P and R and not equally important

• Harmonic mean
• Why?

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 30
Estimating generalization error
• Training set performance is not a good indicator of
generalization error
• A more complex model overfits, a less complex one underfits
• Which model do I select?

• Validation set
• Typically 80%, 20%
• Wastes valuable labeled data

• Cross validation
• Split training data into K folds
• For ith iteration, train on K/i folds, test on i th fold
• Average generalization error over all folds
• Leave one out cross validation: K=N

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 31
Summary
• Generative models
• Gaussian Discriminant Analysis
• Naïve Bayes
• Discriminative models
• Logistics regression
• Iterative algorithms for training
• Binary vs Multiclass

• Evaluation of classification models

• Generalization performance
• Cross validation

Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya 32

You might also like