0% found this document useful (0 votes)
8 views

Logistic_Regression_Class_Notes

Logistic_Regression_Class_Notes

Uploaded by

zo63toscrib
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Logistic_Regression_Class_Notes

Logistic_Regression_Class_Notes

Uploaded by

zo63toscrib
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Logistic Regression: Class Notes

Introduction
Logistic regression is a statistical method used for binary classification problems. It predicts
the probability that a given input belongs to one of two classes.

Key Concepts
1. Linear Model Foundation:

- Logistic regression models the relationship between input variables (features) and the
output (label) using a logistic function.

- It assumes a linear relationship between the independent variables and the log-odds of the
dependent variable.

2. Sigmoid Function:

- The sigmoid function is used to map predicted values to probabilities:

σ(z) = 1 / (1 + e^(-z))

- z = w^T x + b, where w represents weights, x input features, and b the bias term.

3. Probability and Decision Boundary:

- Outputs probabilities between 0 and 1.

- A threshold (commonly 0.5) is used to classify inputs into two categories.

Mathematical Formulation
1. Log-Odds:

- The model predicts the log-odds of the positive class:

log-odds = log(P(y=1)/P(y=0))

2. Cost Function:

- Logistic regression uses the log-loss (cross-entropy) function:

J(θ) = -(1/m) Σ [y log(ŷ) + (1-y) log(1-ŷ)]


- ŷ: predicted probability

- y: actual label

3. Gradient Descent:

- Optimizes the weights by minimizing the cost function:

θ = θ - α ∇J(θ)

- α: learning rate

- ∇J(θ): gradient of the cost function

Assumptions of Logistic Regression


1. The dependent variable is binary.

2. Observations are independent.

3. Minimal multicollinearity among predictors.

4. The relationship between independent variables and the log-odds is linear.

Evaluation Metrics
1. Accuracy: Fraction of correct predictions.

2. Precision and Recall: Useful for imbalanced datasets.

3. F1 Score: Harmonic mean of precision and recall.

4. ROC Curve and AUC: Measures the model's performance across all classification
thresholds.

Applications
1. Medical diagnosis (e.g., disease prediction).

2. Fraud detection.

3. Customer churn prediction.

4. Binary sentiment analysis.

Advantages
1. Simple to implement and interpret.

2. Computationally efficient.

3. Outputs probabilities, allowing for uncertainty quantification.


Disadvantages
1. Assumes linearity between features and log-odds.

2. Prone to overfitting with too many features.

3. Not suitable for complex relationships without feature engineering.

Extensions
1. Multinomial Logistic Regression: For multi-class classification.

2. Regularized Logistic Regression: L1 (Lasso) and L2 (Ridge) regularizations prevent


overfitting.

Conclusion
Logistic regression remains a foundational and widely-used tool for binary classification
tasks due to its simplicity and interpretability.

You might also like