Chapter 4 Statistical Classification Methods
Chapter 4 Statistical Classification Methods
CHAPTER 4: Classifications
1
Overview
At the end of this chapter students
should be able to understand
➢Logistic Regression
➢Naïve Bayesian
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
2
Machine Learning
Machine
Learning
Supervised Learning
Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs. It infers a
function from labeled training data consisting of a set of training examples.
In supervised learning, a dataset comprising of elements is given with a set of
features X1,X2,…,Xp as well as a response or outcome variable Y for each
element. The goal was then to build a model to predict Y using X1,X2,…,Xp.
Example:
Regression
and
classification
where prior
information is
available
Classification
Supervised learning or classification means grouping things together based on
certain common features. It is the method of putting similar things into one group. It
makes study easier and more systematic.
Example: Clustering
Clustering
Clustering: representation of
input data in clusters/classes
based on some inherent
similarity measures (no training
set).
Clustering, which involves
segregating data based on the
similarity between data
instances. It involves an iterative
process to find cluster.
Supervised vs Unsupervised
Classification Performance - Confusion Matrix
THE TOOLS
What is confusion matrix?
A confusion matrix is a table that is often used to describe the performance of
a classification model (or "classifier") on a set of test data for which the true
values are known.
TP : True Positive, TN : True Negative
FP : False Positive, FN : False Negative
Classification Performance - Confusion Matrix
THE TOOLS
Example: In medical diagnosis,
test sensitivity is the ability of a test to
correctly identify those with the
disease (true positive rate), whereas test
specificity is the ability of the test to
correctly identify those without the
disease (true negative rate).
TP + TN
Accuracy =
TP + TN + FP + FN
Sensitivity = TP
= recall = r
TP + FN
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁 + 𝐹𝑃
Overview
➢Logistic Regression
➢Naïve Bayesian
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning with Applications in R”, Springer,
ISBN 978-1-4614-7137-0, ISBN 978-1-4614-7138-7 (eBook) 13
Overview – Logistic Regression
Logistic regression is a statistical model that uses a logistic function to model a
binary dependent variable. In regression analysis, logistic regression is
estimating the parameters in a form of binary regression.
What is logistic regression in simple terms?
Logistic Regression, also known as Logit Regression or Logit Model, is a
mathematical model used in statistics to estimate (guess) the probability of an event
occurring having been given some previous data. Logistic Regression works with
binary data, where either the event happens (1), or the event does not happen (0).
14
Overview – Logistic Regression
What is difference between logistic regression and linear regression?
• Linear regression is used for predicting the continuous dependent
variable using a given set of independent features whereas Logistic
Regression is used to predict the categorical.
15
Logistic Regression
Example: Credit Card Fraud
When a credit card transaction happens, the bank
makes a note of several factors. For instance, the
date of the transaction, amount, place, type of
purchase, etc. Based on these factors, they
develop a Logistic Regression model of whether
the transaction is a fraud or not.
Logistic Regression
Why is logistic regression better?
Good accuracy for many simple data sets and it performs well when the dataset
is linearly separable.
Logistic Regression - Computation
The formula for one variable x The formula for multi variables of x
Logistic Regression - Example
Using a software (e.g. Python, R) to find the Logistic Regression model
Example of output
A sample of 1000 people were selected to identify how their age, daily internet
usage and time spent on site, will affect their intuition to click on an advertisement.
Use first 700 observations as training datasets and remaining 300 as testing
datasets.
Number of observation: 1000
Variables Description
Y “Clicked on Ad”: Indicating clicking on Ad, 0 = NO, 1 = YES
X1 “Daily time spent on Site”: Consumer time spending on site in
minute
X2 “Age”: Consumer Age
X3 “Daily Internet Usage”: Average time in minutes a day consumer is
on the internet (online)
Logistic Regression – Phyton Codes
#Logistic Regression #set the values that will be used to train the
#commands below are used to import the model which is about 70% and test 30%
important commands needed for the codes x_for_train (x[:700])
y_for_train (y[:700])
import matplotlib.pyplot as plt
import numpy as np x_for_test (x[701:1000])
import pandas as pd y_for_test (y[701:1000])
#import data from computer #fit training data command and print the
from google.colab import files results
uploaded = files.upload() model.fit(x_for_train,y_for_train)
beta0 : [19.66511428]
beta1 : [[-0.17887355 0.1258386 -0.06456599]]
Results -
Interpretation:
Logistic Regression - Example
Classifying your daily productivity
Lately you’ve been interested in gauging your productivity. You’ve been asking
yourself, at the end of each day, if the day was indeed productive. But that’s just a
potentially biased, qualitative data point. You want to find a more scientific way to go
about it.
You’ve observed the natural flows of your day, and realized that what impacts it the
most is:
•Sleep you know that sleep, or lack thereof, has a big impact on your day.
•Coffee doesn’t the day start after coffee?
•Focus time it’s not always possible, but you try to have 3–4h of intently focused time
to dive into projects.
•Lunch you’ve noticed the day flows smoothly when you have time for a proper lunch,
not just snacks.
•Walks you’ve been taking short walks to get your steps in, relax a bit and muse about
your projects.
https://towardsdatascience.com/logistic-regression-in-real-life-
building-a-daily-productivity-classification-model-a0fc2c70584e
Logistic Regression - Example
To classify your day as productive or not with a
Logistics Regression model, the first step is to pick
an arbitrary threshold x and assign observations to
each class based on a simple criteria:
•Class Non-Productive, all outcomes that are less
than or equal to x.
•Class Productive otherwise, i.e., all outcomes
greater than x.
Logistic Regression - Example
Observed Data for 20 days
•Outcomes less than or equal to zero are assigned to Class 0, i.e., a nonproductive day.
•Positive outcomes are assigned to Class 1, i.e., a productive day.
Logistic Regression
One of the most significant advantages of the logistic regression model is that it doesn't
just classify but also gives probabilities.
The following are some of the advantages of the logistic
regression algorithm.
•Simple to understand, easy to implement, and efficient to train
•Performs well when the dataset is linearly separable
•Good accuracy for smaller datasets
•Doesn't make any assumptions about the distribution of classes
•Useful to find relationships between features
•Provides well-calibrated probabilities
•Less prone to overfitting in low dimensional datasets
•Can be extended to multi-class classification
Logistic Regression - Example
The following are some of the disadvantages of the logistic regression algorithm:
➢Naïve Bayes
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
34
Naïve Bayes
Learning objectives:
-Introduction - Deterministic vs
Stochastics
-Law of Probability
-Understand Naïve Bayes
Classifier
Some references
http://www3.cs.stonybrook.edu/~cse634/ch6book.pdf
https://www3.cs.stonybrook.edu/~cse634/T14.pdf
Introduction- Stochastic vs Deterministic
A deterministic system is a system Aion
stochastic model is a tool for estimating
in which no randomness is involved probability distributions of potential outcomes
in the development of future states by allowing for random variation in one or more
of the system. A deterministic inputs over time. The random variation is
model will thus always produce the usually based on fluctuations observed in
same output from a given starting historical data for a selected period using
condition or initial state. standard time-series techniques.
36
Introduction- Stochastic vs Deterministic
37
Example - Stochastic vs Deterministic
38
Laws of Probability
41
What is Bayes’ Theorem?
Example
A doctor knows that meningitis causes stiff
neck 50% of the time - likelihood
Prior probability of any patient having
meningitis is 1/50,000 - prior Question:
Prior probability of any patient having stiff If a patient has stiff neck,
neck is 1/20 - prior what is the probability
Solution he/she has meningitis?
44
Naïve Bayes - Example
𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Naïve Bayes - Example
49
Bayes’ Theorem – Phyton codes
#Naive Bayes #set the values that will be used to train the
#commands below are used to import the model which is about 70% and test 30%
important commands needed for the codes x_for_train (x[:700])
y_for_train (y[:700])
import matplotlib.pyplot as plt
import numpy as np x_for_test (x[701:1000])
import pandas as pd y_for_test (y[701:1000])
51
Naïve Bayes
• Advantages:
https://www.quora.com/What-is-the-difference-between-logistic-regression-and-Naive-Bayes
Naïve bayes vs Logistic Regression
3. Model assumptions
Naïve Bayes: Model assumes all the features are conditionally independent .so, if some of the
features are dependent on each other (in case of a large feature space), the prediction might be poor.
Logistic regression: It the splits feature space linearly; it works OK even if some of the variables are
correlated.
4. Model limitations
Naïve Bayes: Works well even with less training data, as the estimates are based on the joint density
function
Logistic regression: With the small training data, model estimates may over fit the data
➢Naïve Bayes
➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis
55
What is linear discriminant analysis
Linear discriminant analysis is a technique that is used by the researcher to
analyze the research data when the criterion or the dependent variable is
categorical and the predictor or the independent variable is interval in nature.
Why LDA is used
Discriminant analysis is a versatile
statistical method often used by market
researchers to classify observations into
two or more groups or categories. In other
words, discriminant analysis is used to
assign objects to one group among several
known groups.
Discriminant Analysis
• LDA makes predictions by
estimating the probability that a new
set of inputs belongs to each class.
The class that gets the highest
probability is the output class and a
prediction is made.
• Model the distribution of X in each of the classes separately, and then use
Bayes theorem to obtain
Pr( 𝑌 = 𝑘|𝑋 = 𝑥)
• Use normal (Gaussian) distributions for each class, this leads to linear or
quadratic discriminant analysis.
• Remark: it could be done with other distributions.
Discriminant Analysis
Linear Discriminant Analysis when there is only 1 predictor (p=1)
Classify to the
highest density
Example of
decision
boundaries:
Discriminant Analysis
Discriminant functions
• To classify the value X = x, we need to find the k which gives the largest pk ( x)
• After simplifications it is equivalent of finding the largest discriminant score
using the formula:
δ0 = - 0.12 δ1 = - 2.52
#Import additional items These codes are not complete, please attend
from sklearn import linear_model
import numpy as np
tutorial/lab classes for full details of the codes
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import
LinearDiscriminantAnalysis as LDA
from sklearn.metrics import accuracy_score
LDA Assumptions:
•LDA assumes normally distributed data and a class-
specific mean vector.
•LDA assumes a common covariance matrix, that is
common to all classes in a data set.
When these assumptions hold, then LDA approximates the Bayes classifier very
closely and the discriminant function produces a linear decision boundary.
QDA vs LDA
QDA Assumptions:
•Observation of each class is drawn from a normal distribution (same as LDA).
•QDA assumes that each class has its own covariance matrix (different from LDA).
When these assumptions hold, QDA approximates the Bayes classifier very closely
and the discriminant function produces a quadratic decision boundary.
In conclusion, LDA is less flexible than QDA because it can estimate fewer
parameters. This can be good when only a few observations in training dataset so
lower the variance. On the other hand, when the K classes have very different
covariance matrices then LDA suffers from high bias and QDA might be a better
choice, what comes down to is the bias-variance trade-off. Therefore, it is crucial to
test the underlying assumptions of LDA and QDA on the data set and then use
both methods to decide which one is more appropriate.
Prediction Models in Healthcare
Machine learning applications in healthcare sector: An overview
Virendra Kumar Verma a, Savita Verma
Materials Today: Proceedings 57 (2022) 2144–2147
This work proposes a smart home health monitoring system that helps to analyze the patient’s
blood pressure and glucose readings at home and notifies the healthcare provider in case of any
abnormality detected. The goal is to predict the hypertension and diabetes status using the
patient’s glucose and blood pressure readings using supervised machine learning classification
algorithms.