Open In App

Logistic Regression With Polynomial Features

Last Updated : 27 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Logistic regression with polynomial features is a technique used to model complex, non-linear relationships between input variables and the target variable. This approach involves transforming the original input features into higher-degree polynomial features, which can help capture intricate patterns in the data and improve the model's predictive performance.

In this article we will understand the significance of Logistic Regression With Polynomial Features as well it's implementation in scikit-learn.

Understanding Polynomial Features with Logistic Regression

Polynomial features are created by transforming the original input features into a new set of features that include not only the original features but also their polynomial combinations up to a specified degree. This transformation allows logistic regression, which is inherently a linear model, to capture non-linear relationships between the input variables and the target variable.

The degree of the polynomial determines the highest power to which the features are raised. Typically, degrees of 2 or 3 are used, as higher degrees can lead to overfitting.

Logistic regression with polynomial features is a powerful technique for handling non-linear relationships in data. The idea is to transform the input features into higher-degree polynomials, which can capture more complex relationships between variables. This approach is particularly useful when the decision boundary is non-linear, as it can help the model fit the data more accurately.

The Stone-Weierstrass theorem asserts that any continuous function on an interval can be approximated by polynomials. This means that, in theory, polynomial logistic regression can approximate any continuous decision boundary. However, in practice, the choice of polynomial degree is crucial to avoid overfitting or underfitting.

Utilizing Logistic Regression with Polynomial Features

To implement polynomial logistic regression in scikit-learn you need to convert your data to polynomial features using the PolynomialFeatures class, and then build your logistic regression model on these features.

The degree of the polynomial is specified when creating the PolynomialFeatures object.

  • The influence of the polynomial order on the decision boundary has shown that a higher order polynomial can better classify data, especially for nonlinear problems.
  • However, a high polynomial order can also lead to overfitting, while a low order can result in underfitting. Therefore, finding the optimal polynomial order is important for achieving good model performance.

Steps to Implement Polynomial Logistic Regression:

  1. Transforming Features: Use PolynomialFeatures to transform the original features into polynomial features. For example, with two input variables and a degree of 2, the transformed features would include the original features, their squares, and their product.
  2. Fitting the Model: Fit a logistic regression model to the transformed features. This can be done using scikit-learn's LogisticRegression class.
  3. Pipeline (Optional): To streamline the process, a pipeline can be used to combine the transformation and fitting steps into a single object. This helps avoid creating intermediate objects and simplifies the workflow.

Import necessary Libraries and generate synthetic dataset

Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=1)

Split the dataset into training and testing sets

Python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Generate polynomial features

Python
# Generate polynomial features
poly = PolynomialFeatures(degree=2)  # You can change the degree as needed
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)

Train the logistic regression model

Python
model = LogisticRegression()
model.fit(X_train_poly, y_train)

Predictions and Evaluating the Model

Python
y_pred = model.predict(X_test_poly)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

Output:

Accuracy: 1.0
Confusion Matrix:
[[10  0]
 [ 0 10]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00        10

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20

Visualize the decision boundary

Python
# Optional: Visualize the decision boundary (for 2D data only)
def plot_decision_boundary(X, y, model, poly):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
                         np.arange(y_min, y_max, 0.1))
    Z = model.predict(poly.transform(np.c_[xx.ravel(), yy.ravel()]))
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.show()

# Visualize the decision boundary
plot_decision_boundary(X_test, y_test, model, poly)

Output:

download-(17)
Decision Boundary

Creating the pipeline Generating Polynomial Features

Incorporating a pipeline will streamline the workflow by combining steps like preprocessing and model training into a single process.

Python
# Create a pipeline that generates polynomial features and trains a logistic regression model
pipeline = Pipeline([
    ('poly', PolynomialFeatures(degree=2)),  # Generate polynomial features
    ('logistic', LogisticRegression())      # Train logistic regression model
])

# Train the pipeline
pipeline.fit(X_train, y_train)

Advantages and Disadvantages of Logistic Regression With Polynomial Features

Advantages of Logistic Regression With Polynomial Features

  • Polynomial logistic regression can model non-linear relationships, making it more flexible than linear logistic regression.
  • It can improve the classification performance for complex datasets.

Disadvantages of of Logistic Regression With Polynomial Features

  • Higher-degree polynomials can lead to overfitting, where the model performs well on training data but poorly on unseen data.
  • A balance must be struck between the complexity of the model (degree of polynomial) and the risk of overfitting.

Conclusion

In summary, polynomial logistic regression is a powerful technique for handling non-linear decision boundaries in classification tasks. By transforming input features into polynomial features, it allows logistic regression models to capture more complex patterns in the data.


Next Article

Similar Reads