Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
Role :- The core of logistic regression is the logistic function, also known as the sigmoid function,
maps any real number to a value between 0 and 1.
If the output of the sigmoid function (estimated probability) is greater than a predefined threshold on
the graph, the model predicts that the instance belongs to that class. If the estimated probability is less
than the predefined threshold, the model predicts that the instance does not belong to the class.
Sigmoid in Logistic Regression
The sigmoid function is applied to the linear output of a logistic regression model to transform it into a
probability between 0 and 1. This transformation is essential because probabilities must fall within this range.
Following is the breakdown of the process:
1. Linear Combination: The linear output of the model is calculated as a weighted sum of the input features
and a bias term:
z = w^T * x + b. Where:
○ z is the linear output
○ w is the weight vector
○ x is the input feature vector
○ b is the bias term
2. Sigmoid Transformation: The sigmoid function is then applied to the linear output:
p(y = 1 | x) = sigmoid(z) = 1 / (1 + e^(-z))
This transforms the linear output into a probability between 0 and 1. As z approaches positive infinity, the sigmoid
function approaches 1, indicating a high probability of the positive class. As z approaches negative infinity, the
sigmoid function approaches 0, indicating a low probability of the positive class.
In essence, the sigmoid function acts as a squishing function, mapping the entire real number line to the
interval [0, 1]. This makes it suitable for modeling probabilities in logistic regression.
Model Optimization in Logistic Regression.
1. Feature Selection and Engineering:-
Feature Engineering: Create new features or transform existing ones to improve model performance.
2. Regularization
Regularization helps prevent overfitting by adding a penalty term to the loss function.
3. Hyperparameter Tuning
Regularization Strength: Adjust the regularization parameter to balance between bias and variance.
Model Optimization in Logistic Regression (Contd.)
● Solver Choice: The choice of optimization algorithm that can affect convergence speed and accuracy.
● Class Weight: Adjust class weights to handle class imbalance. For instance, setting
class_weight='balanced' in scikit-learn can automatically adjust weights based on the frequency of
classes.
● Cross-Validation: Use k-fold cross-validation to evaluate the model’s performance and ensure it
generalizes well to unseen data.
5. Algorithm-Specific Insights
● Threshold Adjustment: The default threshold for classification is 0.5, but adjusting it can improve
performance for imbalanced classes.
● Probability Calibration: Use methods like Platt scaling or isotonic regression to calibrate predicted
probabilities (if the predicted probabilities are not well-calibrated).
Model Optimization in Logistic Regression (Contd.)
6. Data Handling
● Handling Missing Data: Impute missing values appropriately or use techniques like mean
imputation or more advanced methods.
● Outlier Detection: Identify and handle outliers that might skew the model.
7. Advanced Techniques
● Ensemble Methods: Combine logistic regression with other models (e.g., using stacking or
bagging) to improve performance.
● Model Interpretation: Use tools like SHAP (SHapley Additive exPlanations) to interpret and
understand the contributions of different features to the predictions.
Regularization in Logistic Regression
Regularization is a crucial technique in logistic regression to prevent overfitting and improve model generalization. It
achieves this by adding a penalty to the loss function, which discourages the model from fitting the noise in the training
data. There are two main types of regularization used in logistic regression.
● Feature Selection: L1 regularization can drive some feature coefficients exactly to zero, effectively
performing feature selection and reducing the number of features.
● Sparsity: Encourages sparsity in the parameter vector θθ, which can make the model simpler and
more interpretable.
● Shrinkage: L2 regularization tends to shrink the coefficients but does not make them exactly zero, thus including all
features in the model but reducing their impact.
● Stability: Helps in stabilizing the regression coefficients and improving the generalization by avoiding large coefficient
values.
Effects:
● Combination of L1 and L2: Offers the benefits of both L1 and L2 regularization, providing feature selection and shrinking coefficients.
● Flexibility: You can control the balance between L1 and L2 regularization using the parameters.
Optimization Algorithms for Logistic Regression.
1. Gradient Descent - An iterative optimization 5. Coordinate Descent - Optimizes one parameter at a
algorithm that updates parameters in the direction of the time while keeping the others fixed. This can be
negative gradient of the cost function. effective in high-dimensional spaces where the problem
is sparse.
2. Newton’s Method - An iterative optimization method
6. Accelerated Gradient Descent (Nesterov Accelerated
that uses the second-order derivative (Hessian matrix) to
Gradient Descent) - A variant of gradient descent that
find the minimum. It can converge faster than gradient
incorporates momentum to speed up convergence.
descent if the problem is well-behaved.
7. Stochastic Variational Inference -Used in
3. Quasi-Newton Methods - Approximate Newton’s probabilistic models and can be applied to logistic
method to reduce computational complexity. They update regression models with a probabilistic approach. It
an approximation of the Hessian matrix at each iteration. involves approximating the posterior distribution of the
parameters using stochastic optimization.
4. Conjugate Gradient Method - An iterative method
that is particularly useful for large systems of linear 8. Optimization in Libraries - Most machine learning
equations. It can be used for optimization problems where libraries provide built-in optimization algorithms for
the cost function is quadratic. logistic regression.
Multiclass Logistic Regression.
Multiclass logistic regression, also known as multinomial logistic regression, extends binary logistic regression to handle
problems with more than two classes. It is used when the target variable is categorical with more than two levels.
1. Softmax Function
2. Cost Function
There are two main approaches to handling multiclass classification with logistic regression:
1. One-vs-Rest (OvR) or One-vs-All (OvA)-In this approach, a separate binary classifier is trained for each class. Each classifier
distinguishes between the class of interest and all other classes.
2. Softmax Regression (Multinomial Logistic Regression)- This approach generalizes logistic regression to multiple classes
directly using the softmax function. It trains a single model that outputs probabilities for each class.
Evaluation Metrics for Logistic Regression
● Accuracy: Measures the percentage of
correctly predicted observations. Not ideal
for imbalanced datasets.
● Precision: Measures the proportion of
true positives out of all predicted
positives. Important when false positives
are costly.
● Recall (Sensitivity): Measures the
proportion of true positives out of all
actual positives. Useful when false
negatives are costly.
● F1 Score: Harmonic mean of Precision
and Recall, useful when the balance
between the two is important.
Evaluation Metrics for Logistic Regression
Notebook: LogisticRegression06.ipynb
Logistic Regression vs. Other Algorithms (SV)
Logistic Regression Linear Regression Support Vector Machine
3. Loss function is Log-Loss 3. Loss function is Mean 3. Loss function is Hinge loss
(Cross-Entropy) Squared Error (MSE) (Classification)
4. Use Case: Spam detection, 4. Use Case: House price 4. Use Case: Image
disease diagnosis, etc. prediction, temperature classification, handwriting
forecasting, etc. recognition, etc.
Challenges in Logistic Regression
Challenge Solution
1. Assumes a linear relationship between features 1. Use polynomial features or switch to models
and log-odds, which fails on non-linearly separable like SVM for non-linear data.
data.
2. Highly correlated features lead to unstable and 2. Remove or combine correlated features, or
hard-to-interpret coefficients. apply L2 regularization (Ridge Regression).
Correct Answer : A
Thank You !