Polynomial Regression in R Programming
Last Updated :
04 Jul, 2025
Polynomial Regression is an extension of linear regression where the relationship between the dependent variable (y) and the independent variable (x) is modeled as an nth degree polynomial.
Equation:
y = \beta_0 + \beta_1 x + \beta_2 x^2 + \ldots + \beta_n x^n + \varepsilon
- y:
Predicted output (dependent variable)
- \beta_0:
Intercept (value of y when x=0)
- \beta_1, \beta_2, \ldots, \beta_n: Coefficients for each power of x
- x, x^2, \ldots, x^n: Input variable and its powers
- \varepsilon: Error term (random noise not captured by the model)
Why Polynomial Regression is Needed
Linear regression assumes a straight-line relationship, but fails to capture underlying trends when the data follows a non-linear pattern.
- Low prediction accuracy: The model makes poor estimates of the target values.
- High error rates: The difference between predicted and actual values is large.
- Underfitting: The model is too simple to capture the underlying pattern in the data.
Implementing Polynomial Regression in R
We can implement Polynomial Regression in R by following a series of steps to prepare the data, build the model and evaluate its performance.
1. Installing Required Packages
We install the tidyverse and caret packages for data manipulation, visualization and machine learning tasks.
- tidyverse: Used for data wrangling and plotting.
- caret: Used for simplifying training, tuning and evaluating models.
R
install.packages("tidyverse")
install.packages("caret")
library(tidyverse)
library(caret)
2. Loading the Dataset
We load the Boston housing dataset from the MASS package.
- Boston: Contains housing data for regression modeling.
R
library(MASS)
data("Boston")
3. Splitting the Data
We split the data into training and test sets using createDataPartition() from the caret package.
- createDataPartition(): Used to randomly split the data while preserving the distribution.
R
set.seed(123)
trainIndex <- createDataPartition(Boston$medv, p = 0.8, list = FALSE)
train.data <- Boston[trainIndex, ]
test.data <- Boston[-trainIndex, ]
4. Building the Polynomial Regression Model
We build a polynomial regression model with degree 2 and 5 using lm().
- lm(): Fits linear and polynomial regression models.
- I(): Used to explicitly define powers in formula.
- poly(): Generates orthogonal polynomials when raw = FALSE, raw powers when TRUE.
R
model2 <- lm(medv ~ lstat + I(lstat^2), data = train.data)
model5 <- lm(medv ~ poly(lstat, 5, raw = TRUE), data = train.data)
5. Making Predictions
We make predictions on the test data using the predict() function.
- predict(): Generates predicted values based on the model and new data.
R
pred2 <- predict(model2, test.data)
pred5 <- predict(model5, test.data)
We evaluate model accuracy using RMSE and R² with the postResample() function.
- postResample(): Calculates RMSE and R-squared from predicted and actual values.
R
postResample(pred2, test.data$medv)
postResample(pred5, test.data$medv)
Output:
Output7. Visualizing the Polynomial Fit
We use ggplot2 to plot the data and overlay the polynomial regression curve.
- stat_smooth(): Adds a smoothed conditional mean (like a regression curve) to the plot.
R
ggplot(train.data, aes(lstat, medv)) +
geom_point() +
stat_smooth(method = lm, formula = y ~ poly(x, 5, raw = TRUE))
Output:
OutputThe graph shows a scatterplot of medv vs. lstat with a 5-degree polynomial regression curve overlaid using stat_smooth(). It visually demonstrates how well the model captures the non-linear relationship in the data.
Applications of Polynomial Regression
Polynomial regression is commonly applied in fields where relationships between variables are inherently non-linear, such as:
- Sales forecasting: Models non-linear trends in revenue or product demand over time.
- House price prediction: Captures complex relationships between property features and price.
- Weather modeling: Fits curved patterns in temperature, rainfall, or pollution data.
- Engineering analysis: Models physical phenomena like stress-strain or motion trajectories.
- Medical growth tracking: Analyzes non-linear growth patterns in biological data.
Similar Reads
Regression Analysis in R Programming Regression analysis is a statistical method used to determine the relationship between a dependent variable and one or more independent variables. Regression analysis is commonly used for prediction, forecasting and determining relationships between variables. In R, there are several types of regres
4 min read
Regression and its Types in R Programming Regression analysis is a statistical tool to estimate the relationship between two or more variables. There is always one response variable and one or more predictor variables. Regression analysis is widely used to fit the data accordingly and further, predicting the data for forecasting. It helps b
5 min read
Perform Linear Regression Analysis in R Programming - lm() Function lm() function in R Language is a linear model function, used for linear regression analysis. Syntax: lm(formula) Parameters: formula: model description, such as x ~ y Example 1: Python3 # R program to illustrate # lm function # Creating two vectors x and y x <- c(rep(1:20)) y <- x * 2 # Callin
1 min read
Practice Questions on Polynomials Polynomials are fundamental algebraic expressions that consist of variables and coefficients, incorporating the operations of addition, subtraction, multiplication, and non-negative integer exponents of variables. Understanding polynomials is crucial for solving various mathematical problems in alge
3 min read
Parallel Programming In R Parallel programming is a type of programming that involves dividing a large computational task into smaller, more manageable tasks that can be executed simultaneously. This approach can significantly speed up the execution time of complex computations and is particularly useful for data-intensive a
6 min read
Basic Syntax in R Programming R is the most popular language used for Statistical Computing and Data Analysis with the support of over 10, 000+ free packages in CRAN repository. Like any other programming language, R has a specific syntax which is important to understand if you want to make use of its features. This article assu
3 min read