The document outlines the process of using Polynomial Regression to fit a synthetic non-linear dataset defined by the equation y = x^2 + 2x + 3. It includes steps for generating the dataset, visualizing it, implementing Polynomial Regression, and comparing its performance against Simple Linear Regression. The results indicate that Polynomial Regression provides a significantly better fit for the non-linear data compared to the linear model.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
8 views4 pages
Unit 3 7
The document outlines the process of using Polynomial Regression to fit a synthetic non-linear dataset defined by the equation y = x^2 + 2x + 3. It includes steps for generating the dataset, visualizing it, implementing Polynomial Regression, and comparing its performance against Simple Linear Regression. The results indicate that Polynomial Regression provides a significantly better fit for the non-linear data compared to the linear model.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4
Polynomial Regression for Non-Linear Data
Objective: Use Polynomial Regression to fit a non-linear dataset.
Dataset: Create a synthetic dataset with a non-linear relationship (e.g., \(y = x^2 + 2x + 3\)). Tasks: 1. Generate and explore the dataset. 2. Visualize the data to confirm its non-linear nature. 3. Implement Polynomial Regression (e.g., degree=2) to fit the data. 4. Compare the performance with a Simple Linear Regression model.
import numpy as np import pandas as pd
# Set random seed for reproducibility
np.random.seed(42)
# Generate synthetic dataset
X = np.linspace(-10, 10, 100).reshape(-1, 1) # Generate 100 values from -10 to 10 y = X**2 + 2*X + 3 + np.random.normal(0, 5, X.shape) # Quadratic equation with noise
# Convert to DataFrame df = pd.DataFrame({'X': X.flatten(), 'y': y.flatten()})
# Display first few rows
print(df.head())
import matplotlib.pyplot as plt
# Scatter plot to visualize the dataset
plt.scatter(df['X'], df['y'], color='blue', label='Data Points') plt.xlabel('X') plt.ylabel('y') plt.title('Scatter Plot of Non-Linear Dataset') plt.legend() plt.show() from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.pipeline import make_pipeline
plt.scatter(X, y, color='blue', label='Data Points') plt.plot(X, y_pred_poly, color='red', label=f'Polynomial Regression (degree={poly_degree})') plt.xlabel('X') plt.ylabel('y') plt.title('Polynomial Regression Fit') plt.legend() plt.show() # Train a simple linear regression model linear_model = LinearRegression() linear_model.fit(X, y)
# Predictions using Linear Regression
y_pred_linear = linear_model.predict(X)
# Compare Polynomial vs. Linear Regression
plt.scatter(X, y, color='blue', label='Data Points') plt.plot(X, y_pred_linear, color='green', linestyle='dashed', label='Linear Regression') plt.plot(X, y_pred_poly, color='red', label=f'Polynomial Regression (degree={poly_degree})') plt.xlabel('X') plt.ylabel('y') plt.title('Comparison: Polynomial vs. Linear Regression') plt.legend() plt.show() The comparison between Polynomial Regression (degree=2) and Simple Linear Regression shows that Polynomial Regression provides a much better fit for the given dataset. Linear Regression, represented by the green dashed line, assumes a straight-line relationship and fails to capture the quadratic pattern in the data, leading to higher error and poor predictive performance. In contrast, Polynomial Regression (red line) effectively models the curvature, reducing the error and improving accuracy. While Linear Regression underfits the data due to its simplicity, Polynomial Regression balances flexibility and generalization, making it the better choice for this dataset.