0% found this document useful (0 votes)

51 views42 pages

Logistic Regression

Here are the key steps to implement L1 and L2 regularization using scikit-learn: 1. For L1 (Lasso) regularization, use LassoRegressor from sklearn.linear_model and specify the alpha parameter. 2. For L2 (Ridge) regularization, use RidgeRegressor from sklearn.linear_model and specify the alpha parameter. 3. Pre-process data and split into train and test. 4. Fit the model on training data. 5. Make predictions on test data and calculate accuracy. 6. Tune the alpha hyperparameter - low alpha can cause overfitting, high alpha can cause underfitting. Choose alpha that gives best test accuracy.

Uploaded by

Collins Chavhanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views42 pages

Logistic Regression

Uploaded by

Collins Chavhanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

CLASSIFICATION

• So far the methods we have looked at are

aimed at predicting the value of a continuous
variable.
• When the dependent variable is discrete/
categorical, then it is no longer a regression
problem but a classification problem.
• The goal of classification is to place each
observation into a category defined by based
on a set of predictor variables .
• The following are some common examples of
where classification would be used:

– Trying to determine where to set the cut-off for

some diagnostic test (pregnancy tests, prostate or
breast cancer screening tests, etc...)
– Trying to determine if cancer has gone into
remission based on treatment and various other
indicators
– To categorize photos into distinct categories, a
multinomial classification model can be
developed.
Why can’t we use linear regression for classification?

• Let’s consider a simple binary classification

problem where y only takes two categories.
• Let’s say we want to classify whether a tumor is
malignant or benign

Let’s see what happens when we try to fit a

regression line
• Threshold classifier output at 0.5
If 0.5, predict = 1
If 0.5, predict = 0

Does it work?
What happens to the regression line when we get more samples?
Logistic Regression
What??

• Sometimes in machine learning, the dependent

variable is non-continous i.e. it falls within a finite set
of categories
• We may need a model that gives us a probability as
output e.g coin toss
• A logistic regression model is an efficient mechanism
for predicting the likelihood of an event or choice
being made
How does it work??

• The probability output of a logistic regression

function can be used in two ways:
– ‘As is’
– Converted to a binary category
• For example, let’s say we want to create a
logistic regression model to predict the
probability of getting heads in a coin toss.
• Let’s call that probability p(heads). If the
model predicts that p(heads) = 0.4, then after
n = 20 trials, heads = p(heads)*20 = 8 times.
• In many cases, you'll map the logistic
regression output into the solution to a binary
classification problem, in which the goal is to
correctly predict one of two possible labels e.g
spam or not spam
• Let’s say the model returns a 0.9995
probability that an email is spam, this means
that it’s highly likely that the email is spam….
• Likewise a probability of 0.003 would suggest
that it’s highly likely that the email is not
spam.
• However how do we determine the category
for an email with let’s say 0.45 probability of
being spam??
• Ans: We have to set a cutoff value for the
probability, a process known as thresholding.
• A logistic regression model ensures that it’s
output always falls between 0 and 1 by making
use of a sigmoid function..
′ 1
𝑦 = −𝑧
1+𝑒
• is the output of the logistic regression model for
a particular example

• values are the model’s learned weights and is

the bias.
• The values are feature values for a particular
example
• The sigmoid function yields the following plot

• is also defined as the log odds because it is the

log of the probability of the 1 label(e.g heads)
divided by the probability of the 0 label (e.g tails)
• Let’s say we have a logistic regression model
that was trained using three features and that
learned the following bias and weights
=1
=2
= -1
=5
Suppose the model is given the following features:
=0
= 10
=2
• Then
• And = 0.731
Cost function

• Linear regression used MSE as the loss

function

• What happens when we try to use the MSE

loss function for classification problems?
• Well, due to the non linear nature of the
sigma function, plotting the MSE function
against the weights for logistic regression
would result in a non convex graph and it
would be hard to use gradient descent to
converge at the global minima.
• For logistic regression, we use the following
cost function i.e. for binary classification
• Combining the two functions we get:

• So for n examples:

• So how can we obtain the optimal parameters

that give us the minimum value for our cost
function?
• The gradient descent algorithm works the
same way for logistic regression and linear
regression.
• Hence for , we have to determine the cost
function and differentiate it w.r.t each of the
weights (H/W)
Logistic Regression using sklearn
• There are mainly two ways of implementing
classification using sklearn
– SGD classifier class
– LogisticRegression class
• These two models aim to achieve the same goal but
using different optimization techniques
• For example using SGDClassifier(loss=‘log_loss’) will
result in a model equivalent to LogisticRegression
which is fitted via stochastic gradient descent.
• LogisticRegression uses gradient descent by
default.
• So for larger datasets, it would be better to
use SGDClassifier for two main reasons:
– Too many gradient descent steps are required
– Each gradient step is computationally expensive

Class presentation. Describe the differences

between gradient descent and stochastic
gradient descent.
Implementation
We are going to train a classifier using the
LogisticRegression class using the titanic dataset.

1. Importing required libraries

2. loading our data

3. Feature Engineering(one hot encoding)

4. Train our model

5. Making predictions
pd.get_dummies()
• pandas.get_dummies() is used for data
manipulation. It converts categorical data into
dummy or indicator variables.
• Machine learning models cannot interpret
categorical data hence it needs to be
translated to numerical data.
• This process is referred to as dummy variable
encoding
• In our previous example a random sample of
five rows looks like this

• Then after dummy variable encoding…

• By looking at our dummy encoded dataframe,
we can see that there is a lot of redundancy
• Our data can be represented using fewer
columns using the drop_first parameter
Regularization
• A common problem that one can encounter when
training models is the problem of over fitting.
• Overfitting is a modeling error that occurs when our
models fits exactly or too closely to our training data
• As a result an overfit model performs well on the
training data but not as well on unseen examples.

[It’s like memorising past exam questions and failing

the exam loool]
• Underfitting occurs when our model fails to
capture the relationship between the input
and output variables resulting in a high error
rate on both the training set and unseen
examples
• Regularization does not improve the
performance of our model on the training
dataset however our model fits well to unseen
examples
• The following three strategies are used to
reduce model complexity:
– Early stopping i.e. limiting the learning rate
– L1 regularization
– L2 regularization
• A linear regression model that implements L1
norm for regularisation is called lasso
regression, and one that implements
(squared) L2 norm for regularisation is
called ridge regression. To implement these
two, note that the linear regression model
stays the same:

• but it is the calculation of the loss function

that includes these regularisation terms i.e
• with L1 regularization.

• with L2 regularization.

• Therefore apart from minimizing the error between

and the optimization algorithm now has to reduce the
regularization terms in order to minimize the cost
function

• Lets look at how L1 and L2 regression work using a

simple linear regression model
• To demonstrate the effect of L1 and L2
regularization, let’s fit our model using three
different loss functions:
– L (with no regularization)
– L1
– L2
• With no regularization,

NB: we are assuming that our model will be

overfitted using this loss function
• With L1 regularization:

• With L2 regularization:

• If you recall, we said using gradient descent:

• Substituting L, L1 and L2 we get:

• L:

• L1:
• L2:

Let and = 1, then:

L:
L1:
L2:
• Assuming that equation 0 gives us ω value
that leads to overfitting, then equations 1.1,
1.2 and 2 will reduces the chances of
overfitting by shifting away from that value
• L1 regularization helps in feature selection by
eliminating the features that are less
important i.e. we will be left with a smaller
number of features that explain most of the
variance
• Variance is the amount that the estimate of
the target function will vary given different
portions of the training dataset.
• L2 regularization seeks to reduce the chances
of overfitting by forcing the weights to be very
close to zero (but not zero)
• Now let’s look at how to implement L1 and L2
regression using sklearn
Sklearn.linear_model.Ridge
• this is an extension of LinearRegression() that
has been modified by a penalty parameter
that is equivalent to the square of the
magnitude of the coefficients i.e.
Loss function = OLS + alpha * summation (squared coefficient values)

• Our job is to select the alpha. A low alpha

value can lead to overfitting whereas a high
alpha value can lead to underfitting.
1. Importing libraries

2. Preprocessing and loading our data

3. Train the model and make a prediction

4. View the calculated weights and MSE

Change the value for alpha and see how it affects

the weights.
Now try Lasso regression ;-)

Experiment No.1 Span of Attention
100% (9)
Experiment No.1 Span of Attention
8 pages
Why Smart Executives Fail 2010
No ratings yet
Why Smart Executives Fail 2010
16 pages
Exalted 2nd Edition Dragon Blooded PDF
0% (3)
Exalted 2nd Edition Dragon Blooded PDF
2 pages
Why Do I Oppose The Unification Church?
No ratings yet
Why Do I Oppose The Unification Church?
6 pages
ML 1
No ratings yet
ML 1
24 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Unit 2
No ratings yet
Unit 2
8 pages
Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
LR, Decision Tree
No ratings yet
LR, Decision Tree
48 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
LR2
No ratings yet
LR2
25 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
Lecture3 Logistic Regression Regularization
No ratings yet
Lecture3 Logistic Regression Regularization
39 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
No ratings yet
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
42 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
ML Ai
No ratings yet
ML Ai
53 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Unit II
100% (1)
Unit II
13 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
53 pages
Lecture - 6 Classification (Logistic Regression)
No ratings yet
Lecture - 6 Classification (Logistic Regression)
48 pages
Lec8 Regularization Polynomial Regression
No ratings yet
Lec8 Regularization Polynomial Regression
30 pages
Alyssa King 9 2017
No ratings yet
Alyssa King 9 2017
2 pages
Nebosh How To Pass Your Open Book Exam On The First Attempt
No ratings yet
Nebosh How To Pass Your Open Book Exam On The First Attempt
4 pages
D. Amel Alouache
No ratings yet
D. Amel Alouache
84 pages
Guia Curso Ventas
No ratings yet
Guia Curso Ventas
4 pages
Dispute Contract Law MCQs
No ratings yet
Dispute Contract Law MCQs
5 pages
Lesson Plan 1-Triumpfh of Surgery
No ratings yet
Lesson Plan 1-Triumpfh of Surgery
9 pages
Lecture 3 Sampling and Sampling Distribution - Probability and Non-Probability Sampling
No ratings yet
Lecture 3 Sampling and Sampling Distribution - Probability and Non-Probability Sampling
16 pages
اللغة في رواية تجليات الروح
No ratings yet
اللغة في رواية تجليات الروح
55 pages
PDF Seeing, Knowing, Understanding: Philosophical Essays Barry Stroud Download
No ratings yet
PDF Seeing, Knowing, Understanding: Philosophical Essays Barry Stroud Download
28 pages
"Recharge" The First Indonesia Power Bank Rental App
No ratings yet
"Recharge" The First Indonesia Power Bank Rental App
3 pages
Omissions Answer Notes
No ratings yet
Omissions Answer Notes
2 pages
Bail - Crime No 362 (2024)
No ratings yet
Bail - Crime No 362 (2024)
6 pages
Chapter 2: Developing Marketing Strategies and Plans I. Marketing and Customer Value The Value Delivery Process
No ratings yet
Chapter 2: Developing Marketing Strategies and Plans I. Marketing and Customer Value The Value Delivery Process
7 pages
Creation Story of Luzon
100% (1)
Creation Story of Luzon
2 pages
Affidavit of Leonid Mandel & Michael Krichevsky
No ratings yet
Affidavit of Leonid Mandel & Michael Krichevsky
4 pages
Islmaic Banking Notes
No ratings yet
Islmaic Banking Notes
6 pages
Time Expressions Present Simple P Continuous
No ratings yet
Time Expressions Present Simple P Continuous
2 pages
Stat GCSE Edexcel June 2007
No ratings yet
Stat GCSE Edexcel June 2007
24 pages
Chandak GreenAiry
No ratings yet
Chandak GreenAiry
20 pages
Cobas® HCV
No ratings yet
Cobas® HCV
55 pages
Rocky Mountain Modern Language Association Rocky Mountain Review of Language and Literature
No ratings yet
Rocky Mountain Modern Language Association Rocky Mountain Review of Language and Literature
17 pages
Optimise B2 SB
No ratings yet
Optimise B2 SB
10 pages
Picturesque Quebec: A Sequel To Quebec Past and Present by Le Moine, J. M. (James MacPherson), Sir, 1825-1912
No ratings yet
Picturesque Quebec: A Sequel To Quebec Past and Present by Le Moine, J. M. (James MacPherson), Sir, 1825-1912
459 pages
Making Paper Plasmids
No ratings yet
Making Paper Plasmids
5 pages
Roll-Out and Designing Theme-Based Authentic Performance and Assessment Materials To 35 Districts For Second Quarter
No ratings yet
Roll-Out and Designing Theme-Based Authentic Performance and Assessment Materials To 35 Districts For Second Quarter
28 pages
History of Computers
No ratings yet
History of Computers
3 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

CLASSIFICATION

• So far the methods we have looked at are

– Trying to determine where to set the cut-off for

• Let’s consider a simple binary classification

Let’s see what happens when we try to fit a

• Sometimes in machine learning, the dependent

• The probability output of a logistic regression

• values are the model’s learned weights and is

• is also defined as the log odds because it is the

• Linear regression used MSE as the loss

• What happens when we try to use the MSE

• So how can we obtain the optimal parameters

Class presentation. Describe the differences

1. Importing required libraries

3. Feature Engineering(one hot encoding)

• Then after dummy variable encoding…

[It’s like memorising past exam questions and failing

• but it is the calculation of the loss function

• Therefore apart from minimizing the error between

• Lets look at how L1 and L2 regression work using a

NB: we are assuming that our model will be

• If you recall, we said using gradient descent:

• Substituting L, L1 and L2 we get:

Let and = 1, then:

• Our job is to select the alpha. A low alpha

2. Preprocessing and loading our data

4. View the calculated weights and MSE

Change the value for alpha and see how it affects

You might also like