Solving Linear Regression in Python

Last Updated : 12 Jul, 2025

Linear regression is a widely used statistical method to find the relationship between dependent variable and one or more independent variables. It is used to make predictions by finding a line that best fits the data we have. The most common approach to best fit a linear regression model is least-squares method which minimize the error between the predicted and actual values. Equation of a straight line is given as:

y=mx+b

Where:

m is the slope of the line.
b is the intercept i.e the value of y when x=0.

To build a simple linear regression model we need to calculate the slope (m) and the intercept (b) that best fit the data points. These parameters can be calculated using mathematical formulas derived from the data. Consider a dataset where the independent attribute is represented by x and the dependent attribute is represented by y.

Slope (m): m = \frac{S_{xy}}{S_{xx}}

Where:

Sxy=S_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y}) is the sample covariance.
S_{xx} = \sum (x_i - \bar{x})^2 is the sample variance.

Intercept (b): b = \bar{y} - m \cdot \bar{x}

Where xˉ and yˉ are the means of x and y, respectively.

As per the above formula

Slope = 28/10 = 2.8 Intercept = 14.6 - 2.8 * 3 = 6.2. Therefore the desired equation of the regression model is y = 2.8 x + 6.2

We use these values to predict the values of y for the given values of x.

Python Implementation

Below is the Python code to confirm the calculations and visualize the results.

Step 1: Import Libraries

In this we import all the necessary libraries such as numpy, matplotlib, sklearn and statsmodels.

Python

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm

Step 2: Define Dataset and Compute Slope and Intercept

Next we calculate the slope (b1) and intercept (b0) of the regression line using the least squares method. Also we create a scatter plot of the original data points to visualize the relationship between x and y.

Python

x = np.array([1,2,3,4,5]) 
y = np.array([7,14,15,18,19])
n = np.size(x)

x_mean = np.mean(x)
y_mean = np.mean(y)
x_mean,y_mean

Sxy = np.sum(x*y)- n*x_mean*y_mean
Sxx = np.sum(x*x)-n*x_mean*x_mean

b1 = Sxy/Sxx
b0 = y_mean-b1*x_mean
print('slope b1 is', b1)
print('intercept b0 is', b0)

plt.scatter(x,y)
plt.xlabel('Independent variable X')
plt.ylabel('Dependent variable y')

Output:

Annotation-2025-04-18-035248 — Slope and Intercept

Step 3: Plot Data Points and Regression Line

Now that we have the regression equation we use it to predict the y values for each x. Then we plot both the original points in red and the regression line in green to show the fit.

Python

y_pred = b1 * x + b0

plt.scatter(x, y, color = 'red')
plt.plot(x, y_pred, color = 'green')
plt.xlabel('X')
plt.ylabel('y')

Step 4: Evaluate Model Performance

To evaluate how well our model fits the data we calculate the squared error, mean squared error (MSE) and root mean square error (RMSE). These metrics tell us how far off our predictions are from the actual values

Python

error = y - y_pred
se = np.sum(error**2)
print('squared error is', se)

mse = se/n 
print('mean squared error is', mse)

rmse = np.sqrt(mse)
print('root mean square error is', rmse)

SSt = np.sum((y - y_mean)**2)
R2 = 1- (se/SSt)
print('R square is', R2)

Output:

Annotation-2025-04-18-035535 — Model Evaluation

The output shows the evaluation metrics for a regression model suggest that it has a good fit and accurate predictions.

Multiple Linear Regression using Python - ML

ektamaini

Improve

Article Tags :

Practice Tags :

Machine Learning

Solving Linear Regression in Python

Python Implementation

Step 1: Import Libraries

Step 2: Define Dataset and Compute Slope and Intercept

Step 3: Plot Data Points and Regression Line

Step 4: Evaluate Model Performance

Similar Reads

Thank You!

What kind of Experience do you want to share?