0% found this document useful (0 votes)
60 views

Chapter 4 Statistical Classification Methods

This chapter discusses various classification algorithms in machine learning: - Supervised learning aims to predict outcomes by learning from labeled examples, while unsupervised learning finds hidden patterns in unlabeled data. - Logistic regression predicts categorical dependent variables by estimating probabilities using a logistic function of linear combinations of independent variables. Naive Bayes and discriminant analysis are also covered. - Key concepts include linear and quadratic discriminant analysis, confusion matrices for evaluating classification performance, and examples of applying these algorithms.

Uploaded by

shah reza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Chapter 4 Statistical Classification Methods

This chapter discusses various classification algorithms in machine learning: - Supervised learning aims to predict outcomes by learning from labeled examples, while unsupervised learning finds hidden patterns in unlabeled data. - Logistic regression predicts categorical dependent variables by estimating probabilities using a logistic function of linear combinations of independent variables. Naive Bayes and discriminant analysis are also covered. - Key concepts include linear and quadratic discriminant analysis, confusion matrices for evaluating classification performance, and examples of applying these algorithms.

Uploaded by

shah reza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

FEM 2063 - Data Analytics

CHAPTER 4: Classifications

1
Overview
At the end of this chapter students
should be able to understand

➢Supervised and non-supervised Learning

➢Logistic Regression

➢Naïve Bayesian

➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis

2
Machine Learning
Machine
Learning
Supervised Learning
Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs. It infers a
function from labeled training data consisting of a set of training examples.
In supervised learning, a dataset comprising of elements is given with a set of
features X1,X2,…,Xp as well as a response or outcome variable Y for each
element. The goal was then to build a model to predict Y using X1,X2,…,Xp.

Example:
Regression
and
classification
where prior
information is
available
Classification
Supervised learning or classification means grouping things together based on
certain common features. It is the method of putting similar things into one group. It
makes study easier and more systematic.

There are four main types


of classification tasks that
you may encounter; they
are:
•Binary Classification.
•Multi-Class Classification.
•Multi-Label Classification.
•Imbalanced Classification.
Classification
Supervised learning or classification: attribution of a class or label to an
observation by exploiting the availability of a training set (labeled data) or in other
words Classification is a subcategory of supervised learning where the goal is
to predict the categorical class labels (discrete, unordered values, group
membership) of new instances based on past observations
Unsupervised Learning
What is unsupervised machine learning?
Unsupervised learning is a machine learning technique in which models are not
supervised using training dataset. Instead, models itself find the hidden patterns
and insights from the given data. It can be compared to learning which takes
place in the human brain while learning new things.

Example: Clustering
Clustering
Clustering: representation of
input data in clusters/classes
based on some inherent
similarity measures (no training
set).
Clustering, which involves
segregating data based on the
similarity between data
instances. It involves an iterative
process to find cluster.
Supervised vs Unsupervised
Classification Performance - Confusion Matrix
THE TOOLS
What is confusion matrix?
A confusion matrix is a table that is often used to describe the performance of
a classification model (or "classifier") on a set of test data for which the true
values are known.
TP : True Positive, TN : True Negative
FP : False Positive, FN : False Negative
Classification Performance - Confusion Matrix
THE TOOLS
Example: In medical diagnosis,
test sensitivity is the ability of a test to
correctly identify those with the
disease (true positive rate), whereas test
specificity is the ability of the test to
correctly identify those without the
disease (true negative rate).

TP + TN
Accuracy =
TP + TN + FP + FN

Sensitivity = TP
= recall = r
TP + FN
𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁 + 𝐹𝑃
Overview
➢Logistic Regression

➢Naïve Bayesian

➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis

G. James, D. Witten, T. Hastie, R. Tibshirani, “An Introduction to Statistical Learning with Applications in R”, Springer,
ISBN 978-1-4614-7137-0, ISBN 978-1-4614-7138-7 (eBook) 13
Overview – Logistic Regression
Logistic regression is a statistical model that uses a logistic function to model a
binary dependent variable. In regression analysis, logistic regression is
estimating the parameters in a form of binary regression.
What is logistic regression in simple terms?
Logistic Regression, also known as Logit Regression or Logit Model, is a
mathematical model used in statistics to estimate (guess) the probability of an event
occurring having been given some previous data. Logistic Regression works with
binary data, where either the event happens (1), or the event does not happen (0).

14
Overview – Logistic Regression
What is difference between logistic regression and linear regression?
• Linear regression is used for predicting the continuous dependent
variable using a given set of independent features whereas Logistic
Regression is used to predict the categorical.

15
Logistic Regression
Example: Credit Card Fraud
When a credit card transaction happens, the bank
makes a note of several factors. For instance, the
date of the transaction, amount, place, type of
purchase, etc. Based on these factors, they
develop a Logistic Regression model of whether
the transaction is a fraud or not.
Logistic Regression
Why is logistic regression better?
Good accuracy for many simple data sets and it performs well when the dataset
is linearly separable.
Logistic Regression - Computation
The formula for one variable x The formula for multi variables of x
Logistic Regression - Example
Using a software (e.g. Python, R) to find the Logistic Regression model
Example of output

Making Predictions: Predicting good or bad creditor based on account balance


• What is our estimated probability of default for someone with a balance of $1000?

Please try using


this formula
• With a balance of $2000?
Logistic Regression - Example
A group of 20 students spends between 0 and 6 hours studying for an exam.
How does the number of hours spent studying affect the probability of the student
passing the exam?
The reason for using logistic regression for this problem is that the values of the
dependent variable, pass and fail, while represented by "1" and "0", are
not cardinal numbers.
Logistic Regression - Example
Logistic Regression - Example
Logistic Regression
More than 2 independent variables
Example of output
Logistic Regression - Example
Example: more than 2 variables

A sample of 1000 people were selected to identify how their age, daily internet
usage and time spent on site, will affect their intuition to click on an advertisement.
Use first 700 observations as training datasets and remaining 300 as testing
datasets.
Number of observation: 1000
Variables Description
Y “Clicked on Ad”: Indicating clicking on Ad, 0 = NO, 1 = YES
X1 “Daily time spent on Site”: Consumer time spending on site in
minute
X2 “Age”: Consumer Age
X3 “Daily Internet Usage”: Average time in minutes a day consumer is
on the internet (online)
Logistic Regression – Phyton Codes
#Logistic Regression #set the values that will be used to train the
#commands below are used to import the model which is about 70% and test 30%
important commands needed for the codes x_for_train (x[:700])
y_for_train (y[:700])
import matplotlib.pyplot as plt
import numpy as np x_for_test (x[701:1000])
import pandas as pd y_for_test (y[701:1000])

#import Logistic Regression command as it #define model to be used


will be used in the coding model = LogisticRegression()
from sklearn.linear_model import #fit training data command
LogisticRegression model.fit(x_for_train,y_for_train)

#import data from computer #fit training data command and print the
from google.colab import files results
uploaded = files.upload() model.fit(x_for_train,y_for_train)

#read the file #define and print values of intercept and


df = pd.read_csv(‘XXX.csv') coefficient of the data for the train
#the data in the file imported will be
displayed in the coding #calculate accuracy
print(df)
#set the independent variables and dependent These codes are not complete, please attend
variable as x and y respectively
tutorial/lab classes for full details of the codes
Logistic Regression – Phyton Codes

beta0 : [19.66511428]
beta1 : [[-0.17887355 0.1258386 -0.06456599]]
Results -
Interpretation:
Logistic Regression - Example
Classifying your daily productivity
Lately you’ve been interested in gauging your productivity. You’ve been asking
yourself, at the end of each day, if the day was indeed productive. But that’s just a
potentially biased, qualitative data point. You want to find a more scientific way to go
about it.
You’ve observed the natural flows of your day, and realized that what impacts it the
most is:
•Sleep you know that sleep, or lack thereof, has a big impact on your day.
•Coffee doesn’t the day start after coffee?
•Focus time it’s not always possible, but you try to have 3–4h of intently focused time
to dive into projects.
•Lunch you’ve noticed the day flows smoothly when you have time for a proper lunch,
not just snacks.
•Walks you’ve been taking short walks to get your steps in, relax a bit and muse about
your projects.
https://towardsdatascience.com/logistic-regression-in-real-life-
building-a-daily-productivity-classification-model-a0fc2c70584e
Logistic Regression - Example
To classify your day as productive or not with a
Logistics Regression model, the first step is to pick
an arbitrary threshold x and assign observations to
each class based on a simple criteria:
•Class Non-Productive, all outcomes that are less
than or equal to x.
•Class Productive otherwise, i.e., all outcomes
greater than x.
Logistic Regression - Example
Observed Data for 20 days

•Outcomes less than or equal to zero are assigned to Class 0, i.e., a nonproductive day.
•Positive outcomes are assigned to Class 1, i.e., a productive day.
Logistic Regression
One of the most significant advantages of the logistic regression model is that it doesn't
just classify but also gives probabilities.
The following are some of the advantages of the logistic
regression algorithm.
•Simple to understand, easy to implement, and efficient to train
•Performs well when the dataset is linearly separable
•Good accuracy for smaller datasets
•Doesn't make any assumptions about the distribution of classes
•Useful to find relationships between features
•Provides well-calibrated probabilities
•Less prone to overfitting in low dimensional datasets
•Can be extended to multi-class classification
Logistic Regression - Example
The following are some of the disadvantages of the logistic regression algorithm:

•Constructs linear boundaries


•Can lead to overfitting if the number of features is more than the number of
observations
•Predictors should have no multicollinearity
•Challenging to obtain complex relationships. Algorithms like neural networks are
more suitable and powerful
•Can be used only to predict discrete functions
•Can't solve non-linear problems
•Sensitive to outliers
Overview
➢Logistic Regression

➢Naïve Bayes

➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis

34
Naïve Bayes
Learning objectives:

-Introduction - Deterministic vs
Stochastics
-Law of Probability
-Understand Naïve Bayes
Classifier

Some references
http://www3.cs.stonybrook.edu/~cse634/ch6book.pdf
https://www3.cs.stonybrook.edu/~cse634/T14.pdf
Introduction- Stochastic vs Deterministic
A deterministic system is a system Aion
stochastic model is a tool for estimating
in which no randomness is involved probability distributions of potential outcomes
in the development of future states by allowing for random variation in one or more
of the system. A deterministic inputs over time. The random variation is
model will thus always produce the usually based on fluctuations observed in
same output from a given starting historical data for a selected period using
condition or initial state. standard time-series techniques.

36
Introduction- Stochastic vs Deterministic

37
Example - Stochastic vs Deterministic

38
Laws of Probability

© 2019 Petroliam Nasional39


Berhad (PETRONAS) |
Laws of Probability

© 2019 Petroliam Nasional40


Berhad (PETRONAS) |
What is Bayes’ Theorem?
Conditional probability is the likelihood of an
outcome occurring, based on a previous outcome
occurring.
Bayes' theorem provides a way to revise existing
predictions or theories (update probabilities) given
new or additional evidence.

41
What is Bayes’ Theorem?

Prior probability represents what is originally believed before new


evidence is introduced, and posterior probability takes this new
information into account. A posterior probability can subsequently
become a prior for a new updated posterior probability as new
information arises and is incorporated into the analysis. Likelihood,
42
event that is expected to occur.
Example of Bayes’ Theorem

Example
A doctor knows that meningitis causes stiff
neck 50% of the time - likelihood
Prior probability of any patient having
meningitis is 1/50,000 - prior Question:
Prior probability of any patient having stiff If a patient has stiff neck,
neck is 1/20 - prior what is the probability
Solution he/she has meningitis?

© 2019 Petroliam Nasional43


Berhad (PETRONAS) |
Example of Bayes’ Theorem

44
Naïve Bayes - Example

𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷 𝑯𝑿 =
𝑷(𝑿)
Naïve Bayes - Example

This officer Drew is a


male or female?

Given the small database with names and sex.

We can apply Bayes theorem


1 Attribute: Name
Naïve Bayes - Example

Officer Drew is a female!


Naïve Bayes Classifier
A classifier is a ML model segregating different objects based on certain features of variables.
It is a kind of classifier that works on the Bayes theorem. Prediction of membership
probabilities is made for every class such as the probability of data points associated with a
particular class. The class having maximum probability is appraised as the most suitable
class. This is also referred to as Maximum A Posteriori (MAP).
The MAP for a hypothesis is: For example,
• 𝑀𝐴𝑃 (𝐻) = max 𝑃((𝐻|𝐸)) •Fruit may be observed to be an apple if it
• 𝑀𝐴𝑃 (𝐻) = max 𝑃((𝐻|𝐸) ∗ (𝑃(𝐻)) /𝑃(𝐸)) is red, round, and about 4″ in diameter.
• 𝑀𝐴𝑃 (𝐻) = max(𝑃(𝐸|𝐻) ∗ 𝑃(𝐻)) •In this case also even if all the features
• 𝑃 (𝐸) is evidence probability, and it is used are interrelated to each other, a naive
to normalize the result. The result will not bayes classifier will observe all these
be affected by removing 𝑃(𝐸). independently contributing to the
probability that the fruit is an apple.
Naive Bayes classifiers conclude that all the variables
or features are not related to each other, and
existence or absence of a variable does not impact
the existence or absence of any other variable.
Naïve Bayes - Example
Test sample (unseen)

49
Bayes’ Theorem – Phyton codes
#Naive Bayes #set the values that will be used to train the
#commands below are used to import the model which is about 70% and test 30%
important commands needed for the codes x_for_train (x[:700])
y_for_train (y[:700])
import matplotlib.pyplot as plt
import numpy as np x_for_test (x[701:1000])
import pandas as pd y_for_test (y[701:1000])

#import Naïve Bayes command as it will be #define model to be used


used in the coding model = GaussianNB()
from sklearn.naive_bayes import GaussianNB #fit training data command
model.fit(x_for_train,y_for_train)
#import data from computer
from google.colab import files #fit training data command and print the
uploaded = files.upload() results
model.fit(x_for_train,y_for_train)
#read the file
df = pd.read_csv(‘XXX.csv') #calculate accuracy
#the data in the file imported will be
displayed in the coding These codes are not complete, please attend
print(df)
#set the independent variables and dependent
tutorial/lab classes for full details of the codes
variable as x and y respectively 50
Bayes’ Theorem – Phyton outputs

51
Naïve Bayes
• Advantages:

–Fast to train. Fast to classify


–Not sensitive to irrelevant features
–Handles real and discrete data
–Handles streaming data well

• Disadvantage: Assumes independence of features


Naïve bayes vs Logistic Regression
Below is the list of 5 major differences between Naïve
Bayes and Logistic Regression.
1. Purpose or what class of machine leaning does it
solve?
Both the algorithms can be used for classification of
the data. Using these algorithms, you could predict
whether a banker can offer a loan to a customer or
not or identify given mail is a Spam or not.

2. Algorithm’s Learning mechanism


Naïve Bayes: For the given features (x) and the
label y, it estimates a joint probability from the
training data. Hence this is a Generative model
Logistic regression: Estimates the probability(y/x)
directly from the training data by minimizing error.
Hence this is a Discriminative model

https://www.quora.com/What-is-the-difference-between-logistic-regression-and-Naive-Bayes
Naïve bayes vs Logistic Regression
3. Model assumptions
Naïve Bayes: Model assumes all the features are conditionally independent .so, if some of the
features are dependent on each other (in case of a large feature space), the prediction might be poor.
Logistic regression: It the splits feature space linearly; it works OK even if some of the variables are
correlated.

4. Model limitations
Naïve Bayes: Works well even with less training data, as the estimates are based on the joint density
function
Logistic regression: With the small training data, model estimates may over fit the data

5. Approach to be followed to improve the results


Naïve Bayes: When the training data size is less relative to the features, the information/data on prior
probabilities help in improving the results
Logistic regression: When the training data size is less relative to the features, Lasso and Ridge
regression will help in improving the results.
Overview
➢Logistic Regression

➢Naïve Bayes

➢Discriminant Analysis
➢Linear Discriminant Analysis
➢Quadratic Discriminant Analysis

55
What is linear discriminant analysis
Linear discriminant analysis is a technique that is used by the researcher to
analyze the research data when the criterion or the dependent variable is
categorical and the predictor or the independent variable is interval in nature.
Why LDA is used
Discriminant analysis is a versatile
statistical method often used by market
researchers to classify observations into
two or more groups or categories. In other
words, discriminant analysis is used to
assign objects to one group among several
known groups.
Discriminant Analysis
• LDA makes predictions by
estimating the probability that a new
set of inputs belongs to each class.
The class that gets the highest
probability is the output class and a
prediction is made.

• Model the distribution of X in each of the classes separately, and then use
Bayes theorem to obtain
Pr( 𝑌 = 𝑘|𝑋 = 𝑥)
• Use normal (Gaussian) distributions for each class, this leads to linear or
quadratic discriminant analysis.
• Remark: it could be done with other distributions.
Discriminant Analysis
Linear Discriminant Analysis when there is only 1 predictor (p=1)

• The Gaussian (normal) density has the form


f k ( x) = Pr( X = x | Y = k ) is the (normal) density for X in class k .
• Here k is the mean, and  k2 the variance (in class k).
• We will assume that all the  k =  are the same.
•  k = Pr(Y = k ) is the marginal or prior probability for class k.

Classify to the
highest density

Example of
decision
boundaries:
Discriminant Analysis
Discriminant functions
• To classify the value X = x, we need to find the k which gives the largest pk ( x)
• After simplifications it is equivalent of finding the largest discriminant score
using the formula:

Note that  k ( x) is a linear function of x.

n = size of training set


𝑛𝑘 = size of class k in the training set
Example
Discriminant Analysis
Default Balance μ0 =(580+245+1970)/3=931.67 π0 =3/5
When x =
0 580 μ1 =(7390+2845)/2=5117.50 π1 =2/5
900 what
0 245 is the
0 1970 𝜎 2 =((580-931.67)^2+..+(2845-5117.50)^2)/3=4.10^6
default
1 7390
1 2845 n = 5,
k =2
? 900

δ0 = - 0.12 δ1 = - 2.52

since δ0 > δ1 --> Default for x=900 is 0


LDA– Phyton codes
##Mount the drive #set the values that will be used to train the
from google.colab import drive model which is about 70% and test 30%
drive.mount('/content/drive’)
#Fit variables into model
#Import Data #Find predicted classes of test sets
import pandas as pd
df = pd.read_csv('/content/drive/My #Determine the Accuracy
Drive/XXX.csv') #Visualize the results by plotting figures
print('\nData', df)

#Import additional items These codes are not complete, please attend
from sklearn import linear_model
import numpy as np
tutorial/lab classes for full details of the codes
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import
LinearDiscriminantAnalysis as LDA
from sklearn.metrics import accuracy_score

#Define x and y variables

#Split data into training and testing


Example: Dataset chosen contains 3 LDA - Results
attributes (Diastolic Blood Pressure,
(diaBP), Systolic Blood Pressure, (sysBP)
and the age of the patient with more than
150 observations. These variables are used
to predict 10-year risk of coronary heart
disease
LDA - Interpretation
Example:
Plots for Linear Discriminant Analysis (LDA) results are the visual relationship of the
independent variable, risk of getting coronary heart disease to dependent variables, age,
Diastolic Blood Pressure, (diaBP), and Systolic Blood Pressure, (sysBP). The blue dots on
the scattered plots represents no coronary heart disease risk in the next ten years, while
the orange dots on the scattered plots represents there is risk of getting coronary heart
disease in the next 10 years. Based on the graphs, it is observed that all three
independent variables contribute to most of the no-risk of obtaining coronary heart
disease in 10 years, while only a very small number of patients will obtain coronary heart
disease. It is concluded that there is only small amount of inaccuracy of prediction as the
accuracy is 85.4.
Discriminant Analysis
Other forms of Discriminant Analysis

• When f k ( x) are Gaussian densities, with the same covariance matrix in


each class, this leads to linear discriminant analysis.

• With Gaussians but different  k in each class, we get quadratic


discriminant analysis (QDA).
QDA vs LDA
A major difference between the two is that LDA
assumes the feature covariance matrices of both
classes are the same, which results in a linear
decision boundary. In contrast, QDA is less strict
and allows different feature covariance matrices for
different classes, which leads to a quadratic
decision boundary.

LDA Assumptions:
•LDA assumes normally distributed data and a class-
specific mean vector.
•LDA assumes a common covariance matrix, that is
common to all classes in a data set.
When these assumptions hold, then LDA approximates the Bayes classifier very
closely and the discriminant function produces a linear decision boundary.
QDA vs LDA
QDA Assumptions:
•Observation of each class is drawn from a normal distribution (same as LDA).
•QDA assumes that each class has its own covariance matrix (different from LDA).
When these assumptions hold, QDA approximates the Bayes classifier very closely
and the discriminant function produces a quadratic decision boundary.

In conclusion, LDA is less flexible than QDA because it can estimate fewer
parameters. This can be good when only a few observations in training dataset so
lower the variance. On the other hand, when the K classes have very different
covariance matrices then LDA suffers from high bias and QDA might be a better
choice, what comes down to is the bias-variance trade-off. Therefore, it is crucial to
test the underlying assumptions of LDA and QDA on the data set and then use
both methods to decide which one is more appropriate.
Prediction Models in Healthcare
Machine learning applications in healthcare sector: An overview
Virendra Kumar Verma a, Savita Verma
Materials Today: Proceedings 57 (2022) 2144–2147

Machine learning (ML) applications are


everywhere and are used in many real-world
applications. It is essential in several areas,
such as healthcare and medical data protection.
ML is applied to analyze medical records and
disease forecasts. In our study, we review
several ML algorithms, applications, techniques,
opportunities, and challenges for the healthcare
sector. This paper fills a research gap for
efficient use of ML algorithms and applications
in the healthcare sector.
Prediction Models in Healthcare

Machine learning (ML) is essential in


healthcare sector such as medical imaging
diagnostics, improved radiotherapy,
personalized treatment, crowdsourced data
gathering, smart health records, ML based
behavioral modification, clinical trials, and
research. Healthcare is becoming more
problematic and costly. It uses several ML
techniques to fix it. Various ML techniques and
applications for disease prediction are
presented in this paper. Using ML algorithms
and techniques, we hope to improve the
accuracy of many disease predictions in the
future.
Prediction Models in Healthcare
S. P. Chatrati, G. Hossain, A. Goyal et al., Smart home health monitoring system for predicting type 2 diabetes and hypertension,
Journal of King Saud University – Computer and Information Sciences, https://doi.org/10.1016/j.jksuci.2020.01.010

This work proposes a smart home health monitoring system that helps to analyze the patient’s
blood pressure and glucose readings at home and notifies the healthcare provider in case of any
abnormality detected. The goal is to predict the hypertension and diabetes status using the
patient’s glucose and blood pressure readings using supervised machine learning classification
algorithms.

Proposed Block Diagram and Workflow


Prediction Models in Healthcare
Summary
Logistic Regression (LR):Logistic function, Maximum likelihood

Naïve Bayes: Independence of attributes

Linear Discriminant Analysis (LDA): Normal distribution, Same covariance


matrices

Quadratic Discriminant Analysis (QDA): Normal distribution, Different


covariance matrices
73

You might also like