Solving Linear Regression in Python
Last Updated :
18 Apr, 2025
Linear regression is a widely used statistical method to find the relationship between dependent variable and one or more independent variables. It is used to make predictions by finding a line that best fits the data we have. The most common approach to best fit a linear regression model is least-squares method which minimize the error between the predicted and actual values. Equation of a straight line is given as:
y=mx+b
Where:
- m is the slope of the line.
- b is the intercept i.e the value of y when x=0.

To build a simple linear regression model we need to calculate the slope (m) and the intercept (b) that best fit the data points. These parameters can be calculated using mathematical formulas derived from the data. Consider a dataset where the independent attribute is represented by x and the dependent attribute is represented by y.

Slope (m): m = \frac{S_{xy}}{S_{xx}}
Where:
- Sxy=S_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y})
is the sample covariance.
- S_{xx} = \sum (x_i - \bar{x})^2 is the sample variance.
Intercept (b): b = \bar{y} - m \cdot \bar{x}
- Where xˉ and yˉ are the means of x and y, respectively.

As per the above formula
Slope = 28/10 = 2.8 Intercept = 14.6 - 2.8 * 3 = 6.2. Therefore the desired equation of the regression model is y = 2.8 x + 6.2
We use these values to predict the values of y for the given values of x.
Python Implementation
Below is the Python code to confirm the calculations and visualize the results.
Step 1: Import Libraries
In this we import all the necessary libraries such as numpy, matplotlib, sklearn and statsmodels.
Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
Step 2: Define Dataset and Compute Slope and Intercept
Next we calculate the slope (b1) and intercept (b0) of the regression line using the least squares method. Also we create a scatter plot of the original data points to visualize the relationship between x
and y
.
Python
x = np.array([1,2,3,4,5])
y = np.array([7,14,15,18,19])
n = np.size(x)
x_mean = np.mean(x)
y_mean = np.mean(y)
x_mean,y_mean
Sxy = np.sum(x*y)- n*x_mean*y_mean
Sxx = np.sum(x*x)-n*x_mean*x_mean
b1 = Sxy/Sxx
b0 = y_mean-b1*x_mean
print('slope b1 is', b1)
print('intercept b0 is', b0)
plt.scatter(x,y)
plt.xlabel('Independent variable X')
plt.ylabel('Dependent variable y')
Output:
Slope and InterceptStep 3: Plot Data Points and Regression Line
Now that we have the regression equation we use it to predict the y
values for each x
. Then we plot both the original points in red and the regression line in green to show the fit.
Python
y_pred = b1 * x + b0
plt.scatter(x, y, color = 'red')
plt.plot(x, y_pred, color = 'green')
plt.xlabel('X')
plt.ylabel('y')

To evaluate how well our model fits the data we calculate the squared error, mean squared error (MSE) and root mean square error (RMSE). These metrics tell us how far off our predictions are from the actual values
Python
error = y - y_pred
se = np.sum(error**2)
print('squared error is', se)
mse = se/n
print('mean squared error is', mse)
rmse = np.sqrt(mse)
print('root mean square error is', rmse)
SSt = np.sum((y - y_mean)**2)
R2 = 1- (se/SSt)
print('R square is', R2)
Output:
Model EvaluationThe output shows the evaluation metrics for a regression model suggest that it has a good fit and accurate predictions.