SMDS Unit 5
SMDS Unit 5
Logistic Regression
Syllabus:
The classification problem
Logistic Regression Setup
Interpreting the results
Comparing models
Classification using logistic Regression
The Classification Problem:
Classification is a supervised machine learning technique
used to predict categorical outcomes.
Classification is a way to sort things into different
groups.
It involves identifying which category (class) an
observation belongs to.
Example: Sorting emails as "Spam" or "Not Spam."
Helps in decision-making, like predicting diseases or
fraud detection.
Used in AI, machine learning, and daily applications.
Types of Classification:
Binary Classification:
Only two groups (e.g., Pass/Fail, Yes/No).
Multi-Class Classification:
More than two groups (e.g., Cat/Dog/Rabbit).
Multi-Label Classification:
One item can belong to multiple groups
(e.g., A movie can be both Action &
Comedy).
Working:
A computer looks at past data.
It learns patterns and applies them to new data.
Example: A bank can predict if a loan will be repaid or not.
Real-Life Examples:
Face Recognition
(e.g., Unlocking your phone).
Medical Diagnosis
(e.g., Checking if a person has a disease).
Online Shopping
o (e.g., Recommending products based
on your interest).
Step 1:
Collect Data (e.g., Student’s study hours & exam results).
Step 2:
Clean and prepare the data (remove errors, missing
values).
Step 3:
Split data into two parts
– Training Set & Testing Set.
Step 4:
Train the model (let the computer learn from training
data).
Step 5:
Test the model (check how well it predicts on new data).
Step 6:
Evaluate performance using accuracy, precision, recall,
etc.
Example Interpretation:
Suppose a model predicts if a student will pass an exam:
Probability = 0.92 → Predict "Pass"
Extensions:
Multinomial logistic regression for more than two classes
Regularized logistic regression (L1, L2) for feature selection or
to avoid overfitting
It predicts the probability of an event occurring (e.g., "Spam"
or "Not Spam").
If probability > 0.5, classify as Class 1, else classify as Class 0.
Steps in Classification Using Logistic Regression:
Step 1:
Collect Data
Example: A bank wants to classify if a customer will repay a
loan (Yes/No).
Data includes income, credit score, loan amount, etc.
Step 2:
Preprocess Data
Handle missing values and remove unnecessary features.
Convert categorical data (e.g., "Male/Female") into numerical
format.
Step 3:
Split Data
Divide data into Training Set (80%) and Testing Set (20%).
The model learns patterns from the training data.
Step 4:
Train the Model
Use the Sigmoid function to predict probabilities.
Adjust model parameters to improve accuracy.
Step 5:
Make Predictions
Apply the trained model to new data.
If probability > 0.5, classify as "Yes";
otherwise, classify as "No."
Step 6:
Evaluate Performance
Check accuracy, precision, recall, and F1-score.
Use a Confusion Matrix to see correct vs. incorrect
predictions.
Applications of Logistic Regression in Classification:
Logistic regression is widely used for classification tasks,
particularly when the target variable is binary (e.g., yes/no,
spam/non, disease/no disease).
1. Medical Diagnosis
Application: Predicting whether a patient has a disease (e.g.,
cancer, diabetes) based on symptoms, lab results, or other
medical parameters.
Example: Predicting the presence of heart disease using
features like age, cholesterol, blood pressure, etc
5. Fraud Detection
Application: Classifying financial transactions as fraudulent or
legitimate.
Example: Features could include transaction amount,
location, time, and user behavior patterns.
6. Churn Prediction
Application: Predicting whether a customer will stop using a
service or product.
Example: Telecom companies use logistic regression to
identify customers likely to cancel their plans.
7. Image Recognition (Binary Classification)
Application: Classifying simple images into two categories
(e.g., cat vs. not-cat).
Example: Flattened pixel values serve as features in a logistic
regression model.
8. Text Classification
Application: Classifying short texts, such as tweets, into
categories like positive/negative sentiment.
Example: Logistic regression can handle bag-of-words or TF-
IDF features for this purpose.
Types of Classification:
Data Preprocessing
Splitting Data into Training and Testing Sets
Model Training
Model Evaluation
Decision Threshold:
If probability > 0.5 → Class 1
If probability ≤ 0.5 → Class 0
Important Metrics:
Accuracy
Precision
Recall
F1 Score
Confusion Matrix
Comparing Models:
Steps in Classification:
Import Libraries
Load Dataset
Data Cleaning
Feature Scaling
Model Training
Predictions
Model Evaluation