0% found this document useful (0 votes)
5 views

Logistic Regression

Logistic regression is a supervised learning algorithm used for binary and multiclass classification, predicting the probability of an outcome belonging to a particular class. It utilizes the sigmoid function to ensure predictions remain between 0 and 1, and employs log loss as its cost function to optimize model parameters through gradient descent. Applications include medical diagnosis, spam detection, and fraud detection, making it a versatile tool in machine learning.

Uploaded by

l226451
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Logistic Regression

Logistic regression is a supervised learning algorithm used for binary and multiclass classification, predicting the probability of an outcome belonging to a particular class. It utilizes the sigmoid function to ensure predictions remain between 0 and 1, and employs log loss as its cost function to optimize model parameters through gradient descent. Applications include medical diagnosis, spam detection, and fraud detection, making it a versatile tool in machine learning.

Uploaded by

l226451
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Logistic Regression

Intro to Machine Learning by


Anam Riaz
Introduction to Logistic Regression
•What is Logistic Regression?
– Logistic regression is a supervised learning algorithm used for
classification problems.

 It predicts the probability of an outcome


belonging to a particular class (e.g., Yes/No,
Pass/Fail, 0/1).
• It is most commonly used for binary
classification but can be extended to
multiclass classification.
Why Not Use Linear Regression?
 Linear regression is used for continuous
values (e.g., predicting house prices).
 If we apply linear regression to
classification, it can produce values
greater than 1 or less than 0, which is
meaningless for probability estimation.
• Logistic regression solves this problem
by using the sigmoid function to keep
predictions between 0 and 1.
Difference
•Linear Regression Equation
h(x)=θ0+θ1x
 Produces continuous outputs.
 Not suitable for classification because it can predict values beyond
0 and 1.
•Logistic Regression Equation

 Uses the sigmoid function to transform outputs between 0 and


1.
• Thresholding: If h(x)>0.5h(x) > 0.5h(x)>0.5, classify as 1;
otherwise, classify as 0.
The Sigmoid Function
• • The sigmoid function is given by:

Why Sigmoid?
The sigmoid function converts any real number into a
probability.
It helps in classification by setting a decision boundary at
0.50.50.5.
Graph of the Sigmoid Function
When z is large (positive), g(z) approaches 1.

When z is small (negative), g(z) approaches 0.


When z=0, g(z)=0.5.
Hypothesis Function in Logistic Regression

• Hypothesis function:

• Represents the probability that output is 1


given input x.
• Example: If h(x) = 0.8, there is an 80% chance
that y = 1.
Cost Function in Logistic Regression
• Why Not Use Mean Squared Error (MSE)?
In linear regression, we minimize the Mean Squared
Error (MSE):

However, using MSE in logistic regression results in a


non-convex function, making gradient descent
ineffective.
Log Loss (Logistic Regression Cost Function)

• The cost function for logistic regression is called log


loss or cross-entropy loss:

• This function ensures that:


• If the predicted probability is close to actual class, the
loss is low.
• If the predicted probability is far from actual class, the
loss is high.
Optimization Using Gradient Descent
• Logistic regression uses gradient descent to
minimize the cost function.

• Learning rate (α) controls step size, α is the


learning rate.
• J(θ) is the cost function.
• ​Gradient descent ensures convergence to
optimal values.
Convex vs. Non-Convex Functions
• Linear regression has a convex cost function,
meaning it has a single global minimum.
• Logistic regression also has a convex function,
making optimization easier.
• Gradient descent ensures the parameters
converge to the optimal values.
Decision Boundary
• The decision boundary separates different
classes.
• The boundary is formed where h(x)=0.5.
• Example: If we have a model:

• Then the decision boundary occurs when:

• The boundary can be linear or non-linear


depending on the dataset.
Regularization in Logistic Regression
• Overfitting occurs when the model is too complex and
learns noise in the data.
• Regularization helps prevent overfitting by adding a
penalty term to the cost function:

• The λ (lambda) parameter controls the strength of


regularization:
• Large λ → More penalty, prevents overfitting.
• Small λ → Less penalty, may lead to overfitting.
Applications of Logistic Regression
• Medical Diagnosis (e.g., cancer detection).
• Spam Detection (classifying emails as spam or
not).
• Credit Scoring (approving or rejecting loans).
• Churn Prediction (predicting customer
retention).
• Fraud Detection (detecting fraudulent
transactions).
Solved Example: Predicting Student
Performance
• Dataset:
Study Hours (x) Pass/Fail (y)
1 0
2 0
3 0
4 1
5 1

• Goal: Build a logistic regression model to


predict whether a student will pass.
Step-by-Step Solution
1. Define logistic regression model:

2. Initialize parameters: θ0 = 0, θ1 = 0.
3. Compute hypothesis function using sigmoid.
4. Compute cost function using log loss.
5. Apply gradient descent to update parameters.
6. Make a prediction for a student who studied 3.5
hours.
Step 1: Define the Logistic Regression
Model
The logistic regression equation is:

• where:
• h(x) is the probability that the student will pass
(y=1).
• θ0​and θ1 are the parameters we need to
estimate.
Step 2: Initialize Parameters
Step 3: Compute Hypothesis Function
• Initial params: θ0​=0, θ1​=0
• Using the sigmoid function & Since θ0=0,
θ1=0.

• For all inputs, the initial probability prediction


is 0.5, which is incorrect. We need to optimize
θ0 and θ1.
Step 4: Compute the Cost Function
• Let's compute the cost for our dataset.
• Using random values θ0=−1 and θ1=0.5, we
compute:
Step 5: Gradient Descent Update
• To minimize the cost function, we update parameters:

For θ0: For θ1:

• With a learning rate α=0.1, we update:


Θ0 = θ0 − 0.1 × gradient
Θ1 = θ1 − 0.1 × gradient
Step 6: Making Predictions
• After multiple iterations, assume we get:
– θ0​= −3, θ1​=1
• Now, we predict for a student who studied 3.5
hours:

• Since h(3.5)>0.5, we predict "Pass" (1).


Conclusion
• Logistic regression is simple yet effective for
classification problems.
• It is interpretable and computationally
efficient.
• However, it assumes a linear decision
boundary, which may not be suitable for
complex datasets.

You might also like