0% found this document useful (0 votes)
4 views

Python Algorithms

Uploaded by

mohanadvani74
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Python Algorithms

Uploaded by

mohanadvani74
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Building

Logistic Regression Algorithm


from Scratch in Python

Without relying on high-level libraries


1 ANSHUMAN JHA
Building Logistic Regression Algorithm from Scratch in Python

Table of Contents
1. Introduction to Logistic Regression
2. Understanding Logistic Regression
3. Mathematical Formulation
4. The Structure of a Logistic Regression
5. Implementation Logistic Regression from Scratch in Python
a. Data Preparation
b. Sigmoid Function
c. Cost Function and Gradient
d. Gradient Descent Algorithm
e. Prediction
f. Logistic regression model
g. Visualizing the Results
h. Diagram: Logistic Regression Decision Boundary
6. Conclusion

2 ANSHUMAN JHA
Building Logistic Regression Algorithm from Scratch in Python

1. Introduction to Logistic Regression Algorithm


Logistic regression is a fundamental machine learning algorithm used for binary classification problems. Unlike linear
regression, which predicts a continuous output, logistic regression predicts a probability that lies between 0 and 1. In
this article, we will explore the fundamental concepts and steps involved in building a logistic regression algorithm from
scratch in Python without relying on high-level libraries.

2. Understanding Logistic Regression Algorithm


Logistic regression is used when the dependent variable is categorical. For binary classification, the output is either 0 or
1. The logistic function (or sigmoid function) is used to model the probability of the default class.
The sigmoid function is defined as:

where z is a linear combination of the input features.

3. Mathematical Formulation
For a given set of features X and labels y:

1. Hypothesis Function: The hypothesis of logistic regression is defined using the sigmoid function:

where θ is the vector of parameters (weights), and x is the feature vector.

2. Cost Function: The cost function for logistic regression is derived from the maximum likelihood estimation:

where m is the number of training examples.

3. Gradient Descent: To minimize the cost function, we use gradient descent:

where α is the learning rate.

3 ANSHUMAN JHA
Building Logistic Regression Algorithm from Scratch in Python

3. The Structure of a Logistic Regression Algorithm


This Structure includes the steps and sub-steps with appropriate labels and connections. Each step corresponds to a
function or a key part of the process described in the provided implementation.

4 ANSHUMAN JHA
Building Logistic Regression Algorithm from Scratch in Python

4. Implementation in Python
Let's implement a simple Logistic Regression Algorithm in Python.

Step 1: Data Preparation


First, we need to prepare our dataset. For simplicity, we will use a synthetic dataset.
import numpy as np

# Generate a synthetic dataset


np.random.seed(0)
num_features = 2
num_samples = 100

# Generate random features


X = np.random.randn(num_samples, num_features)

# Generate random labels (0 or 1)


true_weights = np.array([1.5, -2.0])
bias = 0.5
linear_combination = np.dot(X, true_weights) + bias
y = (linear_combination > 0).astype(int)

# Add an intercept column to X


X = np.hstack((np.ones((num_samples, 1)), X))

Step 2: Sigmoid Function


The sigmoid function is used to map the linear combination of inputs and weights to a probability between 0 and 1.
def sigmoid(z):
return 1 / (1 + np.exp(-z))

Step 3: Cost Function and Gradient


The cost function measures how well the logistic regression model is fitting the data. The gradient of the cost function is
used to update the weights.
def compute_cost_and_gradient(X, y, weights):
m = X.shape[0]
predictions = sigmoid(np.dot(X, weights))
cost = -1/m * np.sum(y * np.log(predictions) + (1 - y) * np.log(1 - predictions))
gradient = 1/m * np.dot(X.T, (predictions - y))
return cost, gradient

5 ANSHUMAN JHA
Building Logistic Regression Algorithm from Scratch in Python
Step 4: Gradient Descent
The gradient descent algorithm iteratively updates the weights to minimize the cost function.
def gradient_descent(X, y, weights, learning_rate, num_iterations):
cost_history = []

for i in range(num_iterations):
cost, gradient = compute_cost_and_gradient(X, y, weights)
weights -= learning_rate * gradient
cost_history.append(cost)

if i % 100 == 0: # Print cost every 100 iterations


print(f'Iteration {i}: Cost {cost}')

return weights, cost_history

Step 5: Prediction
The prediction function uses the trained weights to predict the class labels for new data points.
def predict(X, weights, threshold=0.5):
probabilities = sigmoid(np.dot(X, weights))
return (probabilities >= threshold).astype(int)

Step 6: Putting It All Together


Now, let's combine everything and train our logistic regression model.
# Initialize weights
weights = np.zeros(X.shape[1])

# Set hyperparameters
learning_rate = 0.1
num_iterations = 1000

# Train the model using gradient descent


trained_weights, cost_history = gradient_descent(X, y, weights, learning_rate, num_iterations)

# Make predictions
predictions = predict(X, trained_weights)

# Calculate accuracy
accuracy = np.mean(predictions == y)
print(f'Accuracy: {accuracy * 100}%')

Step 7: Visualizing the Results


Let's visualize the cost function over the iterations to ensure that our gradient descent is working correctly.
import matplotlib.pyplot as plt

plt.plot(cost_history)
plt.xlabel('Iteration')
plt.ylabel('Cost')
plt.title('Cost Function over Iterations')
plt.show()

6 ANSHUMAN JHA
Building Logistic Regression Algorithm from Scratch in Python
Step 8: Diagram: Logistic Regression Decision Boundary
To understand the model's decision boundary, let's plot it along with the data points.
def plot_decision_boundary(X, y, weights):
x1_min, x1_max = X[:, 1].min(), X[:, 1].max()
x2_min, x2_max = X[:, 2].min(), X[:, 2].max()

xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))


grid = np.c_[np.ones(xx1.ravel().shape[0]), xx1.ravel(), xx2.ravel()]
probabilities = sigmoid(np.dot(grid, weights)).reshape(xx1.shape)

plt.contourf(xx1, xx2, probabilities, levels=[0, 0.5, 1], cmap='coolwarm', alpha=0.6)


plt.scatter(X[:, 1], X[:, 2], c=y, cmap='coolwarm', edgecolors='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Decision Boundary')
plt.show()

plot_decision_boundary(X, y, trained_weights)

5. Conclusion
In this article, we built a logistic regression algorithm from scratch in Python. We started with the fundamental
concepts, implemented the sigmoid function, cost function, gradient descent, and prediction function, and finally
visualized the results. This step-by-step approach helps in understanding the inner workings of logistic regression and
provides a solid foundation for more advanced machine learning algorithms.

Constructive comments and feedback are welcomed


7 ANSHUMAN JHA

You might also like