0% found this document useful (0 votes)
8 views38 pages

COMP-377Week6 v1.1

This document covers logistic regression and support vector machines, focusing on their application in classification problems. It explains the logit function, the logistic function, and how logistic regression transforms linear regression outputs into probabilities for binary outcomes. Additionally, it discusses the cost function for logistic regression and introduces softmax regression for multiclass classification.

Uploaded by

Noveen Mirza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views38 pages

COMP-377Week6 v1.1

This document covers logistic regression and support vector machines, focusing on their application in classification problems. It explains the logit function, the logistic function, and how logistic regression transforms linear regression outputs into probabilities for binary outcomes. Additionally, it discusses the cost function for logistic regression and introduces softmax regression for multiclass classification.

Uploaded by

Noveen Mirza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

AI for Software Developers

Logistic Regression
and Support Vector
Machines
Lesson 6 Objectives

❑ Explain logistic regression, logit function and derive its


model
❑ Differentiate between linear and logistic regression
❑ Use logistic regression in your applications for solving
classification problems
❑ Explain and use Support Vector Machines for solving
classification problems

2 7/18/2021 AI for Software Developers


Lecture 5 Review
❑ Regression ❑ Linear Regression
➢ predicting a real-valued label ➢ find a linear equation that
(often called a target) given an minimizes the distance
unlabeled example between the data points and
➢ determine the relationship the modeled line.
between an independent 𝑦𝑖 = 𝛽1 ∙ 𝑥𝑖 + 𝛽0 + 𝜖𝑖
variable(s) and a dependent ➢ Loss Function - squared error
variable that optimally adjusts loss
to the provided data ▪ (𝒚𝒊 − 𝜷𝟎 − 𝜷𝟏 𝒙𝒊 )𝟐 = 𝜖𝑖 2
❑ Commonly used regressions ➢ The least squares cost
➢ Linear function is the average loss
➢ Polynomial defined as:
1
➢ Stepwise 𝐽(𝛽0 ,𝛽1 ) = 𝑛 σ𝒏𝒊=𝟎(𝒚𝒊 − 𝜷𝟎 − 𝜷𝟏 𝒙𝒊 )𝟐
➢ Ridge, etc.

3 7/18/2021 Neural Networks


Lecture 5 Review
❑ Minimization of cost function to 𝜕𝐽
find 𝜷𝟎 and 𝜷𝟏 that minimize the (𝜷𝟎 )𝑘 𝜕(𝜷𝟎 )𝑘
➢ 𝛽𝑘 = , 𝛻𝐽(𝛽𝑘 )= ,
cost: (𝜷𝟏 )𝑘 𝜕𝐽
𝜕(𝜷𝟏 )𝑘
➢ Analytical approach:
𝛽0 ➢ 𝜷𝒌+𝟏 ← 𝜷𝒌 − 𝜶𝜵𝑱(𝜷𝒌 )
= 𝑋 𝑇 𝑋 −1 𝑋 𝑇 y
𝛽1 ❑ Overfitting
➢ Using covariance function: ➢ the model does generalize
𝑐𝑜𝑣(𝑥,𝑦) well from training data to
𝛽መ = 𝑣𝑎𝑟(𝑥) , 𝛼ො = 𝑦ത − 𝛽መ 𝑥ҧ
unseen data
➢ Using Pearson Correlation ❑ Underfitting applies to situations
Coefficient: where the model is too simple and
𝑆𝑦 performs poorly
𝛽መ = 𝑆 , 𝛼ො = 𝑦ത − 𝛽መ 𝑥ҧ
𝑥
❑ Using regularization to control
➢ Using Gradient Descent parameters
Method
➢ Ridge regression

4 7/18/2021 Neural Networks


Motivation for Logistic Regression

❑ In linear regression we predict the value of 𝑦 (𝑖) for the 𝑖‘th


example 𝑥 (𝑖) using a linear function:
𝑦 = ℎ𝛽 (𝑥)= 𝛽 ⊤ 𝑥
❑ The values of dependent variable y are continuous.
❑ The linear regression is not a good solution for predicting
binary-valued labels (𝑦 (𝑖) ∈ {0,1}).
➢ For example, a loan approval app needs to predict if the
customer should get a loan or not, a student retention
prediction app needs to predict if a student will pass or
not, etc.

5 7/18/2021 AI for Software Developers


Logistic Regression

❑ Logistic regression can be


used for predicting
binary-valued labels – acts as
a classification algorithm.
❑ The name comes from
statistics and is due to the
fact that the mathematical
formulation of logistic
regression is similar to that
of linear regression.

6 7/18/2021 AI for Software Developers


Logistic regression

❑ Consider the simple linear regression model:


𝑦𝑖 = 𝛽0 + 𝛽1 ∙ 𝑥 + 𝜖𝑖 , where 𝒚𝒊 is binary, taking on the value of either
0 or 1.
❑ In logistic regression, we still want to model y as a linear
function of x, however, with a binary y this is not
straightforward.
➢ The linear combination of features such as 𝛽0 + 𝛽1 ∙ 𝑥 is a
function with range (−∞, +∞), while y has only two
possible values.
➢ However, if we define a ‘negative’ label as 0 and the
‘positive’ label as 1, we would just need to find a simple
continuous function whose codomain is (0, 1).
7 7/18/2021 AI for Software Developers
Logit function

❑ If p is a probability, then p/(1 − p) is the corresponding odds; the


logit of the probability is the logarithm of the odds, i.e. :
𝑝
𝒍𝒐𝒈𝒊𝒕 𝑝 = log( )
1−𝑝
❑ The odds for or odds of some
event reflect the likelihood that
the event will take place.
❑ We can also call the logit
function a log-odd function,
because we are calculating
the log of the odds p/(1 − p) for
a given probability p.

8 7/18/2021 AI for Software Developers


Logistic function

❑ The "logistic" function of any number is given by the inverse-logit:


𝑥
𝟏 𝑒
𝑙𝑜𝑔𝑖𝑡 −1 𝑥 = 𝒍𝒐𝒈𝒊𝒔𝒕𝒊𝒄 𝑥 = =
𝟏 + 𝒆−𝑥 𝑒 𝑥 + 1
❑ This function is nothing less than a sigmoid function.

9 7/18/2021 AI for Software Developers


Logistic function

❑ The standard logistic function (also known as the sigmoid


𝟏
function), 𝒇 𝒙 = , where e is the base of the natural
𝟏+𝒆−𝒙
logarithm, is the function that can be used to model the
binary response.
❑ This function is bounded between 0 and 1, has a
characteristic sigmoidal- or S-shape, and approaches 0 and 1
asymptotically.
❑ Derives naturally when the binary response variable results
from a zero-one mapping of an underlying continuous
response variable.

10 7/18/2021 AI for Software Developers


Logistic regression

❑ Let’s denote 𝑔 𝑥 = 𝛽0 + 𝛽1 ∙ 𝑥 +𝜖.


❑ We need to define a new function F(g(x)) that transforms g(x) by
squashing the output of linear regression to a value in the [0,1]
range.
𝟏
❑ The sigmoid function, , can do just that!
𝟏+𝒆−𝑥
❑ We plug g(x) into the sigmoid function above, resulting in a function
of our original function that outputs a probability between 0 and 1:
𝟏
𝑃 𝑦=1𝑥 =𝐹 𝑔 𝑥 =
𝟏 + 𝒆−(𝛽0+ 𝛽1∙𝑥)
❑ We are calculating the probability that the training example belongs
to a certain class: 𝑃 𝑦 = 1 𝑥 .

11 7/18/2021 AI for Software Developers


Logistic Regression

❑ In classification problems, we want to know the likelihood


that an event will take place (pass/fail, etc.) - this forces the
output to assume only values between 0 and 1 (probability
between 0 and 1).
𝟏
❑ If we solve 𝑃 𝑦 = 1 𝑥 = for 𝛽0 + 𝛽1 ∙ 𝑥 + ϵ on the right
𝟏+𝒆−(𝛽0 + 𝛽1∙𝑥)
side, we obtain logistic regression model as a linear model for
the log odds.
𝑝
𝑙𝑜𝑔𝑖𝑡 𝑝 = log( ) = 𝛽0 + 𝛽1 ∙ 𝑥 + ϵ
1−𝑝

𝑒 𝛽0 + 𝛽1 ∙𝑥
𝑝Ƹ = + 𝛽1 ∙𝑥 +1, where 𝑝Ƹ (p-hat) denotes an estimated
𝑒 𝛽0
probability.
12 7/18/2021 AI for Software Developers
Logistic Regression

❑ The following image shows the mapping from an infinite domain


of possible outcomes to the [0,1] range, with p being the
probability of the occurrence of the event being represented.
➢ It depicts the transformation from a linear regression logistic
regression using the sigmoid function.

13 7/18/2021 AI for Software Developers


Logistic Regression

❑ Note that in the logit model, 𝛽1 now represents the rate of


change in the log-odds ratio as X changes.
➢ In other words, it’s the “slope of log-odds”, not the “slope
of the probability”.
❑ The output of the logistic regression model looks like an S-
curve showing P(Y=1|x) based on the value of x.
❑ To predict the Y label ( say spam/not spam), you have to set
a probability cutoff, or threshold, for a positive result.
➢ For example: “If our model thinks the probability of this
email being spam is higher than 70%, label it spam.
➢ Otherwise, label it not spam.

14 7/18/2021 AI for Software Developers


Logistic Regression – Cost function

❑ Recall:
➢ In linear regression we tried to predict the value of y(i) for
the i‘th example x(i) using a linear function:
𝑦 = ℎ𝛽 (𝑥)=𝛽 ⊤ 𝑥
❑ In logistic regression we use sigmoid function to “squash” the
value of 𝛽 ⊤ 𝑥 into the range [0,1] so that we may interpret
ℎ𝛽 (𝑥)as a probability.
❑ The goal is to search for a value of 𝛽 so that the probability
P(y=1|x)=ℎ𝛽 (𝑥) is large when x belongs to the “1” class and
small when x belongs to the “0” class (so that P(y=0|x) is
large).

15 7/18/2021 AI for Software Developers


Logistic Regression – Cost function

❑ In logistic regression, the cost function is basically a measure of


how often you predicted 1 when the true answer was 0, or vice
versa.
❑ For a set of training examples with binary labels {(𝑥 (𝑖) ,
𝑦 (𝑖) ):𝑖 = 1,2, … 𝑚}, the following cost function measures how
well a given ℎ𝛽 does this:
𝐽 𝛽 = − σ𝑖(𝑦 (𝑖) log( ℎ𝛽 (𝑥 (𝑖) )) + (1 − 𝑦 (𝑖) )log(1 − ℎ𝛽 (𝑥 (𝑖) )))
❑ The derivative of J(𝛽) as given above with respect to 𝛽𝑗 is:
𝜕𝐽(𝛽) (𝑖)
= σ𝑖(𝑥𝑗 (ℎ𝛽 (𝑥 (𝑖) )-𝑦 (𝑖) )
𝜕𝛽𝑗

This is essentially the same as the gradient for linear regression


except that now ℎ𝛽 (x)=σ(𝛽 ⊤ 𝑥).
16 7/18/2021 AI for Software Developers
Logistic Regression

❑ Properties of the logistic function:


➢ Models the probability of an event p, depending on one or more
independent variables.
▪ For example, the probability of being awarded a prize, given
previous qualifications, etc.
➢ Estimates (this is the regression part) p for a determined
observation, related to the possibility of the event not occurring.
➢ Predicts the effect of the change of independent variables
using a binary response.
➢ Classifies observations by calculating the probability of an item
being of a determined class.

17 7/18/2021 AI for Software Developers


Multiclass application – softmax
regression

❑ Logistic regression can also be conveniently generalized to account


for many classes.
❑ In logistic regression we assumed that the labels were binary
(y(i)∈{0,1}), but softmax regression allows us to handle
y(i)∈{1,…,K}, where K is the number of classes and the label y can
take on K different values, rather than only two.
❑ Given a test input x, we want to estimate the probability that
P(y=k|x) for each value of k=1,…,K.
❑ The softmax regression will make this output a K-dimensional
vector (whose elements sum to 1), giving us our K estimated
probabilities:

18 7/18/2021 AI for Software Developers


Multiclass application – softmax
regression

19 7/18/2021 AI for Software Developers


Cardiac disease modeling with logistic
regression

❑ In this first exercise, we will work on predicting the probability of


having coronary heart disease, based on the age of the population.
❑ It's a classic problem, which will be a good start for understanding
this kind of regression analysis.
❑ We will use a very simple and often studied dataset, which was
published in Applied Logistic Regression, from David W. Hosmer,
Jr. Stanley Lemeshow and Rodney X. Sturdivant.
❑ We list the age in years (AGE) and the presence or absence of
evidence of significant coronary heart disease (CHD) for 100
subjects in a hypothetical study of risk factors for heart disease.
❑ The outcome variable is CHD, which is coded with a value of 0 to
indicate that CHD is absent, or 1 to indicate that it is present in the
individual.

20 7/18/2021 AI for Software Developers


Cardiac disease modeling with logistic
regression

21 7/18/2021 AI for Software Developers


Applied Logistic Regression, Hosmer at al
Cardiac disease modeling with logistic
regression

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn import linear_model
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt
sns.set(style='whitegrid', context='notebook')
df = pd.read_csv("data/CHD.csv", header=0)
plt.figure() # Create a new figure
plt.axis ([0,70,-0.2,1.2])
plt.title('Original data')
plt.scatter(df['age'],df['chd']) #Plot a scatter draw of the random datapoints

22 7/18/2021 AI for Software Developers


Cardiac disease modeling with logistic
regression

❑ We create a logistic regression model, using the LogisticRegression


objet from sklearn, and then call the fit function, which will create a
sigmoid optimized to minimize the prediction error for our train data.
logistic = linear_model.LogisticRegression(C=1e5)
logistic.fit(df['age'].values.reshape(100,1),df['chd'].values.reshape(
100,1))
❑ To represent the results, we generate a linear space from 10 to 90
years, with 100 subdivisions.
❑ We plot for each sample of the domain, the probability for 1, also
the probability for 0 (simply the inverse of the previous one)
❑ Then we plot the predictions, and the original data points, so we
can match everything in a single graphic.

23 7/18/2021 AI for Software Developers


Cardiac disease modeling with logistic
regression

x_plot = np.linspace(10, 90, 100)


oneprob=[]
zeroprob=[]
predict=[]
plt.figure(figsize=(10,10))
for x in x_plot:
# Returns the probability of the sample x for each class in the model
oneprob.append (logistic.predict_proba(np.array([x]).reshape(1, 1))[0][1]);
zeroprob.append (logistic.predict_proba(np.array([x]).reshape(1, 1))[0][0]);
# Returns predicted class label of the sample x.
predict.append (logistic.predict(np.array([x]).reshape(1, 1))[0]);
plt.plot(x_plot, oneprob);
plt.plot(x_plot, zeroprob)
plt.plot(x_plot, predict);
plt.scatter(df['age'],df['chd'])
24 7/18/2021 AI for Software Developers
Cardiac disease modeling with logistic
regression

25 7/18/2021 AI for Software Developers


Support Vector Machines (SVM)

❑ The dataset below is linearly separable.


❑ There exist many different linear classifiers.
❑ Which linear classifier is the best?

26 7/18/2021 AI for Software Developers


Support Vector Machines
❑ The distance to the nearest point on
either side of the line is called the
margin.
❑ Support vectors are the data
points, which are closest to the
hyperplane.
❑ If we want to learn a classifier
that generalizes best, need
one that achieves the
maximum margin.
❑ SVM tries to maximize the
margin.
27 7/18/2021 AI for Software Developers
Support Vector Machines

❑ We want to find the optimal hyperplane (a line, in our 2D


example).
❑ This hyperplane needs to:
1. separate the data cleanly, with pluses on one side
of the line and minuses on the other side
2. maximize the margin.
❑ This is an optimization problem.
❑ The solution has to respect constraint (1) while
maximizing the margin as is required in (2).

28 7/18/2021 AI for Software Developers


Support Vector Machines - Formulation

❑ We are given labeled data points (𝑥 𝑖 , 𝑦 𝑖 ), i=1,..,n.


❑ We need to learn a hyperplane w′x − b = 0 such that:
1- all the points in class with
labels 𝑦 𝑖 = +1, lie above
the margin, that is w′x − b ≥ 1
2 - all the points in class with
labels 𝑦 𝑖 = -1, lie below the
margin, that is w′x − b ≤ -1
2
3 - the margin 𝛾 = is maximized
𝑤

29 7/18/2021 AI for Software Developers


Support Vector Machines - Optimization
2 𝟏 𝟐
❑ Maximizing 𝛾 = is equivalent to minimizing 𝒘
𝑤 𝟐
with the constraints that:
𝑖
w′x − b ≥ 1 when 𝑦 = +1
w′x − b ≤ −1 when 𝑦 𝑖 = −1
❑ This is a convex, quadratic minimization problem and
has a global minimum.

30 7/18/2021 AI for Software Developers


Inseparable datasets

❑ Linearly inseparable – we soften the margin by accepting


misclassified examples
❑ Nonlinear datasets – create classifier by increasing the number of
dimensions where we can draw a hyperplane to separate them.

31 7/18/2021 AI for Software Developers


Machine Learning for Humans, 2017
SVM Examples

❑ SVM is well suited for classification of complex small- or


medium-sized datasets.
❑ First example (SVMExample) uses cancer dataset from
sklearn.
❑ Second example (plot_digits_classification) uses digits
dataset from sklearn.

32 7/18/2021 AI for Software Developers


plot_digits_classification example

print(__doc__)

# Author: Gael Varoquaux <gael dot varoquaux at normalesup dot


org>
# License: BSD 3 clause

# Standard scientific Python imports


import matplotlib.pyplot as plt

# Import datasets, classifiers and performance metrics


from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split
33 7/18/2021 AI for Software Developers
plot_digits_classification example

digits = datasets.load_digits()
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
ax.set_axis_off()
ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
ax.set_title('Training: %i' % label)

34 7/18/2021 AI for Software Developers


plot_digits_classification example

# flatten the images


n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
# Create a classifier: a support vector classifier
clf = svm.SVC(gamma=0.001)
# Split data into 50% train and 50% test subsets
X_train, X_test, y_train, y_test = train_test_split(
data, digits.target, test_size=0.5, shuffle=False)
# Learn the digits on the train subset
clf.fit(X_train, y_train)
# Predict the value of the digit on the test subset
predicted = clf.predict(X_test)

35 7/18/2021 AI for Software Developers


plot_digits_classification example

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))


for ax, image, prediction in zip(axes, X_test, predicted):
ax.set_axis_off()
image = image.reshape(8, 8)
ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
ax.set_title(f'Prediction: {prediction}')

36 7/18/2021 AI for Software Developers


plot_digits_classification example

print(f"Classification report for classifier {clf}:\n"


f"{metrics.classification_report(y_test, predicted)}\n")
disp = metrics.plot_confusion_matrix(clf, X_test, y_test)
disp.figure_.suptitle("Confusion Matrix")
print(f"Confusion matrix:\n{disp.confusion_matrix}")
plt.show()

37 7/18/2021 AI for Software Developers


References

❑ Textbook
❑ Andriy Burkov, The Hundred-Page Machine Learning Book, 2019
❑ https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
❑ Vishal Maini, Samer Sabri, Machine Learning for Humans, 2017
❑ http://ufldl.stanford.edu/tutorial/supervised/LinearRegression/
❑ http://ufldl.stanford.edu/tutorial/supervised/LogisticRegression/
❑ https://scikit-
learn.org/stable/auto_examples/classification/plot_digits_classificati
on.html
❑ https://www.learnpython.org/

38 7/18/2021 AI for Software Developers

You might also like