Implementation of Logistic Regression from Scratch using Python
Last Updated :
05 Aug, 2025
Logistic regression is a statistical method used for binary classification tasks where we need to categorize data into one of two classes. The algorithm differs in its approach as it uses curved S-shaped function (sigmoid function) for plotting any real-valued input to a value between 0 and 1.
Logistic RegressionTo understand it better we will implement logistic regression from scratch in this article.
1. Import Required Libraries
We will import required libraries from python:
- NumPy: Handles mathematical operations.
- Scikit - learn: Splits data into training and testing sets and Standardizes features for faster convergence.
- Matplotlib: Plots the cost function.
Python
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
2. Logistic Regression Class
We define a class LogisticRegressionScratch that implements logistic regression using gradient descent.
- sigmoid(): Converts raw outputs into probabilities.
- cost(): Calculates the logistic loss (cross-entropy).
- fit(): Updates weights and bias using gradient descent.
- predict(): Returns binary predictions (0 or 1).
Python
class LogisticRegressionScratch:
def __init__(self, learning_rate=0.01, iterations=1000):
self.lr = learning_rate
self.iterations = iterations
self.weights = None
self.bias = 0
self.cost_history = []
def sigmoid(self, z):
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-z))
def cost(self, h, y):
"""Cross-entropy loss"""
m = len(y)
return - (1/m) * np.sum(y*np.log(h) + (1-y)*np.log(1-h))
def fit(self, X, y):
"""Train model using gradient descent"""
m, n = X.shape
self.weights = np.zeros(n)
for _ in range(self.iterations):
z = np.dot(X, self.weights) + self.bias
h = self.sigmoid(z)
dw = (1/m) * np.dot(X.T, (h - y))
db = (1/m) * np.sum(h - y)
self.weights -= self.lr * dw
self.bias -= self.lr * db
self.cost_history.append(self.cost(h, y))
def predict(self, X):
"""Make predictions"""
return (self.sigmoid(np.dot(X, self.weights) + self.bias) >= 0.5).astype(int)
3. Load and Prepare Data
We’ll generate a random dataset and standardize it:
- Synthetic data: Points where x1 + x2 > 10 are labeled 1, otherwise 0.
- Scaling: Standardizes features so gradient descent converges smoothly.
- Here 80% data used for training and rest for testing.
Python
# Generate a synthetic dataset
np.random.seed(42)
X = np.random.rand(200, 2) * 10
y = (X[:, 0] + X[:, 1] > 10).astype(int)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale the data for faster convergence
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
4. Train and Evaluate the Model
Python
# Train our logistic regression model
model = LogisticRegressionScratch(learning_rate=0.1, iterations=1000)
model.fit(X_train, y_train)
# Evaluate accuracy
predictions = model.predict(X_test)
accuracy = np.mean(predictions == y_test)
print(f"Model Accuracy: {accuracy:.2f}")
Output:
Model Accuracy: 0.93
5. Visualize Cost Function Convergence
The cost function decreases with each iteration, showing that the model is learning.
Python
plt.plot(model.cost_history)
plt.title("Cost Function Convergence")
plt.xlabel("Iterations")
plt.ylabel("Cost")
plt.grid(True)
plt.show()
Output:
Cross entropy - Cost function convergence- A high accuracy score means the model is correctly classifying most test examples.
- If accuracy is low, it might indicate the need for more data, more iterations or feature engineering.
- The decreasing cost function plot also confirms that the model is learning effectively during training.
Practical Considerations and Limitations
- Feature Scaling: Logistic regression is sensitive to feature scales. Features with larger magnitudes can dominate the learning process, leading to slow convergence or poor performance. StandardScaler normalization is hence, important.
- Learning Rate Selection: Too high learning rate causes divergence, too low results in slow convergence. A good starting point is 0.01-0.1 but this may need adjustment based on our dataset.
- Linear Decision Boundary: It assumes a linear relationship between features and log-odds. For non-linear relationships, feature engineering or polynomial features may be necessary.
- Multicollinearity: Highly correlated features can cause numerical instability in weight updates. Consider feature selection or regularization techniques in such cases.
Logistic Regression Implementation | Machine Learning
Explore
Machine Learning Basics
Python for Machine Learning
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advanced Techniques
Machine Learning Practice