0% found this document useful (0 votes)
12 views21 pages

SMDS Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

SMDS Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT-5

Logistic Regression

Syllabus:
The classification problem
Logistic Regression Setup
Interpreting the results
Comparing models
Classification using logistic Regression
The Classification Problem:
 Classification is a supervised machine learning technique
used to predict categorical outcomes.
 Classification is a way to sort things into different
groups.
 It involves identifying which category (class) an
observation belongs to.
 Example: Sorting emails as "Spam" or "Not Spam."
 Helps in decision-making, like predicting diseases or
fraud detection.
 Used in AI, machine learning, and daily applications.

Types of Classification:

 Binary Classification:
 Only two groups (e.g., Pass/Fail, Yes/No).

 Multi-Class Classification:
 More than two groups (e.g., Cat/Dog/Rabbit).

 Multi-Label Classification:
 One item can belong to multiple groups
(e.g., A movie can be both Action &
Comedy).
Working:
A computer looks at past data.
It learns patterns and applies them to new data.
Example: A bank can predict if a loan will be repaid or not.

Real-Life Examples:
 Face Recognition
 (e.g., Unlocking your phone).
 Medical Diagnosis
 (e.g., Checking if a person has a disease).
 Online Shopping
o (e.g., Recommending products based
on your interest).

Logistic Regression Setup:


A method used to predict whether something belongs to one
group or another.
Example: Predicting if a student will pass or fail based on
study hours.
When the output is either "Yes or No," "Spam or Not Spam,"
etc.
It helps in decision-making based on data.
Mathematical Formula (Simple Version):
Instead of predicting a direct number, it predicts a probability
(0 to 1).
Uses the Sigmoid Function to convert values into
probabilities.

Steps to Set Up Logistic Regression:

Step 1:
Collect Data (e.g., Student’s study hours & exam results).

Step 2:
Clean and prepare the data (remove errors, missing
values).

Step 3:
Split data into two parts
– Training Set & Testing Set.

Step 4:
Train the model (let the computer learn from training
data).
Step 5:
Test the model (check how well it predicts on new data).

Step 6:
Evaluate performance using accuracy, precision, recall,
etc.

Real-Life Uses of Logistic Regression:


 Predicting if a customer will buy a product or not.
 Diagnosing diseases based on symptoms.
 Detecting fraud in credit card transactions.

Interpreting the Results of Logistic Regression:


Logistic regression gives a probability (a value between 0 and
1).
Example:
If the probability is 0.85, there is an 85% chance of belonging
to Class 1 (e.g., "Yes," "Pass," "Spam").

Decision Making Using Probability:


 If probability > 0.5, predict Class 1 (e.g., "Yes," "Pass").
 If probability ≤ 0.5, predict Class 0 (e.g., "No," "Fail").
 The threshold (0.5) can be adjusted based on
requirements.
Key Performance Metrics:
Accuracy:
Measures how many predictions were correct.
Precision:
Out of all predicted "Yes" cases, how many were
actually "Yes"?
Recall:
Out of all actual "Yes" cases, how many were correctly
predicted?
F1 Score:
A balance between precision and recall.
Confusion Matrix:
A table showing correct and incorrect predictions.

Example Interpretation:
Suppose a model predicts if a student will pass an exam:
Probability = 0.92 → Predict "Pass"

Probability = 0.30 → Predict "Fail"

If the model makes many incorrect predictions, adjustments


are needed.
Interpretation Importance:
 Helps in understanding how well the model is
performing.
 Identifies areas where the model can improve.
 Ensures correct decision-making based on
reliable predictions.
Comparing Models:

Model Used for Mathematical Advantages Disadvantag Real life


concept es examples
Logistic Binary Uses sigmoid function Simple, fast, Struggles Spam
Regression Classificatio to o/p good for with detection
n(Yes/NO) probability(0&1) linear complex, (Classify
seperable non-linear emails as
data data spam or not
spam)
Linear Predicting Fits a straight line Simple, good Cannot House price
Regression continuous y = 𝛽0 +𝛽1 X data for linear handle prediction
values relationships classification (Predict price
on problems based on
area,
bedrooms
etc)
Decision Classificatio Spilt data into Works with Sensitive to Loan
trees n& branches based on non-linear small approval
Regression features data, easy to changes (decide if a
visualize customer
quantifies for
a loan)
K-NN Classificatio Compares new data No training Slow for Movie
n& with nearest existing time, good large datasets recommendat
Regression data points datasets ion (finds
movie similar
to ones you
like)
Support Classificatio Finds a hyperplane Works well Expensive Face
vector n that best separates with detection
machine classes complex, (Classify if
(SVM) high an image
dimensional contains a
data face or not)
Classification using logistic Regression:
Classification using logistic regression is a statistical
method used for binary (and sometimes multiclass)
classification tasks.
Despite the name "regression," it's actually used for
predicting categorical outcomes
Logistic Regression is used to classify data into two or more
categories.
Purpose:
Logistic regression estimates the probability that a data point
belongs to a particular class. Based on this probability, it then
classifies the data point.
Example Use Cases:
Spam (1) or not spam (0)
Customer will churn (1) or not (0)
Disease present (1) or not (0)

Extensions:
Multinomial logistic regression for more than two classes
Regularized logistic regression (L1, L2) for feature selection or
to avoid overfitting
It predicts the probability of an event occurring (e.g., "Spam"
or "Not Spam").
If probability > 0.5, classify as Class 1, else classify as Class 0.
Steps in Classification Using Logistic Regression:

Step 1:
Collect Data
Example: A bank wants to classify if a customer will repay a
loan (Yes/No).
Data includes income, credit score, loan amount, etc.

Step 2:
Preprocess Data
Handle missing values and remove unnecessary features.
Convert categorical data (e.g., "Male/Female") into numerical
format.

Step 3:
Split Data
Divide data into Training Set (80%) and Testing Set (20%).
The model learns patterns from the training data.
Step 4:
Train the Model
Use the Sigmoid function to predict probabilities.
Adjust model parameters to improve accuracy.

Step 5:
Make Predictions
Apply the trained model to new data.
If probability > 0.5, classify as "Yes";
otherwise, classify as "No."

Step 6:
Evaluate Performance
Check accuracy, precision, recall, and F1-score.
Use a Confusion Matrix to see correct vs. incorrect
predictions.
Applications of Logistic Regression in Classification:
Logistic regression is widely used for classification tasks,
particularly when the target variable is binary (e.g., yes/no,
spam/non, disease/no disease).

Here are some common and important applications of logistic


regression in classification:

1. Medical Diagnosis
Application: Predicting whether a patient has a disease (e.g.,
cancer, diabetes) based on symptoms, lab results, or other
medical parameters.
Example: Predicting the presence of heart disease using
features like age, cholesterol, blood pressure, etc

2. Email Spam Detection


Application: Classifying emails as “spam” or “not spam”
based on the email’s content and metadata.
Example: Logistic regression can use word frequencies,
presence of links, sender information, etc., as input features.
3. Credit Scoring and Risk Assessment
Application: Assessing the likelihood of a customer defaulting
on a loan or credit card.
Example: Input features can include income, credit history,
loan amount, and past repayment behavior.

4. Marketing and Customer Segmentation


Application: Predicting whether a customer will respond to a
marketing campaign (e.g., click an ad, buy a product).
Example: Logistic regression can use demographic data and
browsing behavior to predict conversion.

5. Fraud Detection
Application: Classifying financial transactions as fraudulent or
legitimate.
Example: Features could include transaction amount,
location, time, and user behavior patterns.

6. Churn Prediction
Application: Predicting whether a customer will stop using a
service or product.
Example: Telecom companies use logistic regression to
identify customers likely to cancel their plans.
7. Image Recognition (Binary Classification)
Application: Classifying simple images into two categories
(e.g., cat vs. not-cat).
Example: Flattened pixel values serve as features in a logistic
regression model.

8. Text Classification
Application: Classifying short texts, such as tweets, into
categories like positive/negative sentiment.
Example: Logistic regression can handle bag-of-words or TF-
IDF features for this purpose.

Medical Diagnosis (Detecting diseases like cancer or


diabetes).
Spam Detection (Classifying emails as Spam or Not Spam).
Fraud Detection (Identifying fraudulent credit card
transactions).
Customer Churn Prediction (Predicting if a customer will
leave a service).
Summary:

The Classification Problem:


 Classification is a supervised machine learning technique
used to predict categorical outcomes.
 It involves identifying which category (class) an
observation belongs to.
Examples:

 Spam or Not Spam emails

 Disease detection (Positive or Negative)

 Loan approval (Approved or Not Approved)

Types of Classification:

 Binary Classification (Two Classes)

 Multi-Class Classification (More than Two Classes)

 Multi-Label Classification (Multiple Labels at Once)


 Logistic Regression Setup:

Logistic Regression is used for binary classification problems.

It predicts the probability of an outcome belonging to a


particular class.

Steps to Set up:

Data Preprocessing
Splitting Data into Training and Testing Sets
Model Training
Model Evaluation

Interpreting the Results:


Logistic Regression outputs probabilities between 0 and 1.

Decision Threshold:
If probability > 0.5 → Class 1
If probability ≤ 0.5 → Class 0
Important Metrics:
Accuracy
Precision
Recall
F1 Score
Confusion Matrix

Comparing Models:

Logistic Regression vs Linear Regression:


Linear Regression predicts continuous values, while Logistic
Regression predicts probabilities.
Logistic Regression uses the Sigmoid function, but Linear
Regression does not.

Logistic Regression vs Decision Trees:


Logistic Regression is simpler and more interpretable.
Decision Trees handle non-linear relationships better.

Logistic Regression vs KNN (K-Nearest Neighbours):


Logistic Regression is faster with large datasets.
KNN performs better with small datasets.
Classification Using Logistic Regression:
Applications:
Medical Diagnosis
Email Spam Detection
Customer Churn Prediction
Credit Card Fraud Detection

Steps in Classification:
Import Libraries
Load Dataset
Data Cleaning
Feature Scaling
Model Training
Predictions
Model Evaluation

You might also like