0% found this document useful (0 votes)
6 views

Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06

Uploaded by

sahilverma20652
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06

Uploaded by

sahilverma20652
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Machine Learning Using

Optimization & Logistic Regression


& Sigmoid Function
Presented by,
Jyotiska Bharadwaj (24/SWE/05)
Tushar (24/SWE/24)
Suraj Prasad Kalauni (24/SWE/03)
Shreyansh Gupta (24/SWE/04)
Date: 16/10/2024 Sahil Verma (24/SWE/23)
What is Machine Learning?
Have you ever thought What is Machine Learning? and why we need it?

“We want Machine to Learn from Experience/Data” :Alan Turing(1947)


What is difference between Linear Algebra and Machine
Learning
Linear Algebra is also one of the tool to understand about data , it uses Matrices to
represent data, but have you ever thought that why we use so many concept of LA
in ML, like without LA ML will be incomplete. Because LA uses Matrices , so in the
below image you can see that how before giving an image as input to system it is
first converted to a matrix , not image all type of data is converted to matrix before
giving as an input to a computer system,hence LA plays an important role in ML.
Why LA is incomplete
LA is Incomplete due to its limitation is some loss function like logarithm and some
complex polynomial function, and due to the problem of Inconsistent
system ,although the problem of inconsistent system can be solved using the
concept of Projection. This is explained in Very much Deep by Dr.Gilbert Strang,
who was formerly professor at MIT USA , he served for around 60 Years in MIT as
professor recently retired, You can have the reference of his LA lectures in his
YouTube Playlist

Link: Gilbert Strang lectures on Linear Algebra (MIT) - YouTube


Types of Machine Learning?
Where to Apply Regression and Where to Apply Classification

● Regression can be applied in Problems where Y/Label/Output is continuous


● Classification can be applied on Problem where Y/Label/Output is Discrete
● Can we apply Regression on Classification Problem or Vice Versa??
● Answer: You can apply , But this is not recommended as every method has its
own significance and swapping method’s like this is not advised.
● Question: 1. Use Learning Algorithm to predict Tomorrow’s Temperature

2. Given historical of two football teams’ wins/losses, examine the statistics of


th two team and predict which team will win tomorrow’s match
Optimization in Machine Learning
Optimization in machine learning is the process of adjusting model parameters to
minimize or maximize an objective function, often called the cost function or loss
function. The goal is to make the model perform better by finding the best possible
set of parameters, often weights in the case of neural networks, that minimize the
prediction error on a given dataset.

Some Method to perform optimization in Machine Learning is Objective(Loss)


Function, Gradient Descent , Regularization e.t.c , we will see Loss function in
detail to understand about it.
Loss Functions/Objective
Function
Introduction to Logistic Regression
❖ Regression is basically supervised learning method
where we forecast a number instead of category.
Examples are car price by its mileage, traffic by time
of the day, demand volume by growth of the company
etc. Regression is perfect when something depends
on time.
❖ Logistic regression is a statistical model used to
predict the probability of a binary outcome (e.g.,
yes/no, true/false, 1/0) based on one or more
predictor variables. Like:- If the probability of an
outcome is greater than 50%, then it will give the O/P
→ 1 & If the probability of an outcome is smaller than
50%, then it will give the O/P → 0.
❖ In order to apply the Logistic regression Approach,
one must ensure that the data should be linearly
separable.
The Sigmoid Function
Definition: Sigmoid function is simply try to convert the independent variable into a expression of probability
that ranges between 0 to 1 with respect to the dependent variable.

Role :- The core of logistic regression is the logistic function, also known as the sigmoid function,
maps any real number to a value between 0 and 1.

Formula: σ(z)= [1 / (1+e-z ) ], where

σ(z) - sigmoid function / Predicting variable


z - Input value, It is a linear combination of input features and their corresponding weights.
e - is Euler's number (approx value 2.718).

If the output of the sigmoid function (estimated probability) is greater than a predefined threshold on
the graph, the model predicts that the instance belongs to that class. If the estimated probability is less
than the predefined threshold, the model predicts that the instance does not belong to the class.
Sigmoid in Logistic Regression
The sigmoid function is applied to the linear output of a logistic regression model to transform it into a
probability between 0 and 1. This transformation is essential because probabilities must fall within this range.
Following is the breakdown of the process:

1. Linear Combination: The linear output of the model is calculated as a weighted sum of the input features
and a bias term:
z = w^T * x + b. Where:
○ z is the linear output
○ w is the weight vector
○ x is the input feature vector
○ b is the bias term
2. Sigmoid Transformation: The sigmoid function is then applied to the linear output:
p(y = 1 | x) = sigmoid(z) = 1 / (1 + e^(-z))

This transforms the linear output into a probability between 0 and 1. As z approaches positive infinity, the sigmoid
function approaches 1, indicating a high probability of the positive class. As z approaches negative infinity, the
sigmoid function approaches 0, indicating a low probability of the positive class.

In essence, the sigmoid function acts as a squishing function, mapping the entire real number line to the
interval [0, 1]. This makes it suitable for modeling probabilities in logistic regression.
Model Optimization in Logistic Regression.
1. Feature Selection and Engineering:-

Feature Selection: Identify and select the most relevant features.

Feature Engineering: Create new features or transform existing ones to improve model performance.

2. Regularization

Regularization helps prevent overfitting by adding a penalty term to the loss function.

● L1 Regularization (Lasso): Encourages sparsity by driving some coefficients to zero.


● L2 Regularization (Ridge): Penalizes large coefficients but doesn’t drive them to zero.

3. Hyperparameter Tuning

Regularization Strength: Adjust the regularization parameter to balance between bias and variance.
Model Optimization in Logistic Regression (Contd.)
● Solver Choice: The choice of optimization algorithm that can affect convergence speed and accuracy.
● Class Weight: Adjust class weights to handle class imbalance. For instance, setting
class_weight='balanced' in scikit-learn can automatically adjust weights based on the frequency of
classes.

4. Model Evaluation and Validation

● Cross-Validation: Use k-fold cross-validation to evaluate the model’s performance and ensure it
generalizes well to unseen data.

5. Algorithm-Specific Insights

● Threshold Adjustment: The default threshold for classification is 0.5, but adjusting it can improve
performance for imbalanced classes.
● Probability Calibration: Use methods like Platt scaling or isotonic regression to calibrate predicted
probabilities (if the predicted probabilities are not well-calibrated).
Model Optimization in Logistic Regression (Contd.)
6. Data Handling

● Handling Missing Data: Impute missing values appropriately or use techniques like mean
imputation or more advanced methods.
● Outlier Detection: Identify and handle outliers that might skew the model.

7. Advanced Techniques

● Ensemble Methods: Combine logistic regression with other models (e.g., using stacking or
bagging) to improve performance.
● Model Interpretation: Use tools like SHAP (SHapley Additive exPlanations) to interpret and
understand the contributions of different features to the predictions.
Regularization in Logistic Regression
Regularization is a crucial technique in logistic regression to prevent overfitting and improve model generalization. It
achieves this by adding a penalty to the loss function, which discourages the model from fitting the noise in the training
data. There are two main types of regularization used in logistic regression.

1. L1 Regularization (Lasso):The cost function with L1 regularization is:

λ is the regularization parameter

θj are the model parameters

m is the number of training examples.


Regularization in Logistic Regression (Contd.)
Effects:

● Feature Selection: L1 regularization can drive some feature coefficients exactly to zero, effectively
performing feature selection and reducing the number of features.
● Sparsity: Encourages sparsity in the parameter vector θθ, which can make the model simpler and
more interpretable.

2. L2 Regularization (Ridge) : The cost function with L2 regularization is:

λ is the regularization parameter.


Regularization in Logistic Regression (Contd.)
Effects:

● Shrinkage: L2 regularization tends to shrink the coefficients but does not make them exactly zero, thus including all
features in the model but reducing their impact.
● Stability: Helps in stabilizing the regression coefficients and improving the generalization by avoiding large coefficient
values.

3. Elastic Net Regularization:The cost function with Elastic Net regularization is

λ1​and λ2 are the regularization parameters for L1 and L2, respectively.

Effects:

● Combination of L1 and L2: Offers the benefits of both L1 and L2 regularization, providing feature selection and shrinking coefficients.
● Flexibility: You can control the balance between L1 and L2 regularization using the parameters.
Optimization Algorithms for Logistic Regression.
1. Gradient Descent - An iterative optimization 5. Coordinate Descent - Optimizes one parameter at a
algorithm that updates parameters in the direction of the time while keeping the others fixed. This can be
negative gradient of the cost function. effective in high-dimensional spaces where the problem
is sparse.
2. Newton’s Method - An iterative optimization method
6. Accelerated Gradient Descent (Nesterov Accelerated
that uses the second-order derivative (Hessian matrix) to
Gradient Descent) - A variant of gradient descent that
find the minimum. It can converge faster than gradient
incorporates momentum to speed up convergence.
descent if the problem is well-behaved.
7. Stochastic Variational Inference -Used in
3. Quasi-Newton Methods - Approximate Newton’s probabilistic models and can be applied to logistic
method to reduce computational complexity. They update regression models with a probabilistic approach. It
an approximation of the Hessian matrix at each iteration. involves approximating the posterior distribution of the
parameters using stochastic optimization.
4. Conjugate Gradient Method - An iterative method
that is particularly useful for large systems of linear 8. Optimization in Libraries - Most machine learning
equations. It can be used for optimization problems where libraries provide built-in optimization algorithms for
the cost function is quadratic. logistic regression.
Multiclass Logistic Regression.
Multiclass logistic regression, also known as multinomial logistic regression, extends binary logistic regression to handle
problems with more than two classes. It is used when the target variable is categorical with more than two levels.

Concepts of Multiclass Logistic Regression

1. Softmax Function

2. Cost Function

Methods for Multiclass Logistic Regression:

There are two main approaches to handling multiclass classification with logistic regression:

1. One-vs-Rest (OvR) or One-vs-All (OvA)-In this approach, a separate binary classifier is trained for each class. Each classifier
distinguishes between the class of interest and all other classes.

2. Softmax Regression (Multinomial Logistic Regression)- This approach generalizes logistic regression to multiple classes
directly using the softmax function. It trains a single model that outputs probabilities for each class.
Evaluation Metrics for Logistic Regression
● Accuracy: Measures the percentage of
correctly predicted observations. Not ideal
for imbalanced datasets.
● Precision: Measures the proportion of
true positives out of all predicted
positives. Important when false positives
are costly.
● Recall (Sensitivity): Measures the
proportion of true positives out of all
actual positives. Useful when false
negatives are costly.
● F1 Score: Harmonic mean of Precision
and Recall, useful when the balance
between the two is important.
Evaluation Metrics for Logistic Regression

● AUC-ROC Curve: Measures


model performance by showing
the tradeoff between true
positive and false positive
rates. The Area Under the
Curve (AUC) indicates the
model's ability to distinguish
between classes.
Hyperparameter Tuning for Logistic Regression
Key hyperparameters:
● Regularization Strength (λ): Controls the penalty for model
complexity. Lower values allow more complexity (potential
overfitting), higher values simplify the model (reduce overfitting).
● Different algorithms to optimize logistic regression (e.g., liblinear,
lbfgs, saga).
● Maximum Iterations: Sets the number of iterations for the
optimization algorithm to converge.

Grid search vs randomized search:

● Grid Search: Exhaustively tests all possible hyperparameter


combinations.
● Randomized Search: Randomly tests combinations for faster
tuning but may not find the absolute best set.
K-Fold Cross-Validation in Logistic Regression
Practical Example of Logistic Regression
The dataset includes 1757 lines of data about
the market for pumpkins, sorted into groupings
by city. This is raw data extracted from the
Specialty Crops Terminal Markets Standard
Reports distributed by the United States
Department of Agriculture.

Let's look at the relationship between color and


other variables using Logistic Regression.

Notebook: LogisticRegression06.ipynb
Logistic Regression vs. Other Algorithms (SV)
Logistic Regression Linear Regression Support Vector Machine

1. Predicts binary/multiclass 1. Predicts continuous values 1. Classification/Regression


labels

2. Sigmoid function for 2. Linear relationship 2. Hyperplane separating


probabilities (y = mx + b) classes

3. Loss function is Log-Loss 3. Loss function is Mean 3. Loss function is Hinge loss
(Cross-Entropy) Squared Error (MSE) (Classification)

4. Use Case: Spam detection, 4. Use Case: House price 4. Use Case: Image
disease diagnosis, etc. prediction, temperature classification, handwriting
forecasting, etc. recognition, etc.
Challenges in Logistic Regression

Challenge Solution

1. Assumes a linear relationship between features 1. Use polynomial features or switch to models
and log-odds, which fails on non-linearly separable like SVM for non-linear data.
data.

2. Highly correlated features lead to unstable and 2. Remove or combine correlated features, or
hard-to-interpret coefficients. apply L2 regularization (Ridge Regression).

3. Overfitting on Small Datasets 3. Apply regularization (L1 or L2) to prevent


overfitting by penalizing large coefficients.

4. Logistic regression often predicts the majority 4. Use oversampling/undersampling, or adjust


class in imbalanced datasets. class weights during training.
Summary
● Machine learning relies on optimization to train models and find the best parameters.
● Logistic regression is a powerful classification algorithm that predicts probabilities for
binary outcomes.
● The sigmoid function plays a crucial role in converting linear outputs to probabilities in
logistic regression.
● Widely used in areas like healthcare, finance, and marketing for classification tasks.
● Simple to implement, interpretable, and effective for binary classification problems.
● Not suitable for complex non-linear problems without feature engineering or advanced
techniques.Algorithms like Adam and RMSprop for faster convergence.
● Combining logistic regression with techniques like kernel methods or neural networks
for better performance on more complex datasets.
Questions
Q. What is the role of the sigmoid function in logistic regression?

A. It maps any real number to a probability between 0 and 1.


B. It calculates the error rate of the model.
C. It adjusts the weights of the model parameters.
D. It normalizes input data.

Correct Answer : A
Thank You !

You might also like