0% found this document useful (0 votes)

57 views6 pages

Simple Linear Regression: Math Behind

1. Simple linear regression is an algorithm that finds the linear relationship between input variables (X) and output variables (Y). 2. It assumes a linear relationship between X and Y, and can be used to predict continuous values. 3. Implementing simple linear regression involves calculating the slope (b1 coefficient) and y-intercept (b0 coefficient) based on the input data to predict output values. 4. The document demonstrates how to implement simple linear regression from scratch in Python and compares it to Scikit-Learn's linear regression, finding very similar results.

Uploaded by

Derek Degbedzui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views6 pages

Simple Linear Regression: Math Behind

Uploaded by

Derek Degbedzui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1.

Simple Linear Regression

● Linear regression is the simplest machine learning algorithm you'll encounter
○ Especially simple linear regression
● It is a simple algorithm initially developed in the field of statistics and was studied as
a model for understanding the relationship between input and output variables
● It is a linear model - assumes a linear relationship between input variables (X) and
the output variable (y)
● Used to predict continuous values (e.g., weight, price...)

Simple vs. Multiple linear regression

● Simple linear regression solves problems with only one input feature
● Multiple linear regression solves problems with multiple input features

Assumptions

1. Linear Assumption — model assumes the relationship between variables is linear

2. No Noise — model assumes that the input and output variables are not noisy — so
remove outliers if possible
3. No Collinearity — model will overfit when you have highly correlated input
variables
4. Normal Distribution — the model will make more reliable predictions if your input
and output variables are normally distributed. If that’s not the case, try using some
transforms on your variables to make them more normal-looking
5. Rescaled Inputs — use scalers or normalizer to make more reliable predictions

Take-home point

● Training a simple linear regression model is as simple as solving a couple of

equations

Math behind
● In a nutshell, simple linear regression is based on coefficients - and which you need
to find in order to solve a line equation:

Line equation:

● The coefficient has to be calculated first

● It tells you the slope of the line

B1 coefficient:

● The coefficient relies on the slope

● It represents Y-intercept - location at which the line intercepts the Y-axis

B0 coefficient:

● Let's implement simple linear regression with pure Numpy next

Implementation

● You'll need only Numpy to implement the logic

● Matplotlib is used for optional visualizations
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['figure.figsize'] = (14, 7)
rcParams['axes.spines.top'] = False
rcParams['axes.spines.right'] = False

● The SimpleLinearRegression class is written to follow the familiar Scikit-Learn

syntax
● The coefficients are set to None at the start - __init__() method
● The fit() method calculates the coefficients
● The predict() method essentially implements the line equation
○ Before it does so, it makes sure the coefficients have been
calculated
In [2]:
class SimpleLinearRegression:
'''
A class which implements simple linear regression model.
'''
def __init__(self):
self.b0 = None
self.b1 = None

def fit(self, X, y):

'''
Used to calculate slope and intercept coefficients.

:param X: array, single feature

:param y: array, true values
:return: None
'''
numerator = np.sum((X - np.mean(X)) * (y - np.mean(y)))
denominator = np.sum((X - np.mean(X)) ** 2)
self.b1 = numerator / denominator
self.b0 = np.mean(y) - self.b1 * np.mean(X)

def predict(self, X):

'''
Makes predictions using the simple line equation.

:param X: array, single feature

:return: None
'''
if not self.b0 or not self.b1:
raise Exception('Please call `SimpleLinearRegression.fit(X, y)` before making predictions.')
return self.b0 + self.b1 * X

Testing

● Let's create some dummy data

○ X contains a list of numbers between 1 and 300 (1, 2, 3, ..., 299,
300)
○ y contains normally distributed values centered around X with
standard deviation of 20
● The source data is then visualized:
In [13]:
X = np.arange(start=1, stop=301)
y = np.random.normal(loc=X, scale=20)

plt.scatter(X, y, s=200, c='#087E8B', alpha=0.65)

plt.title('Source dataset', size=20)
plt.xlabel('X', size=14)
plt.ylabel('Y', size=14)
plt.show()

● For validation sake, we'll split the dataset into training and testing parts:
In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

● You can now initialize and train the model, and afterwards make predictions:
In [5]:
model = SimpleLinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)

● Here's how you can get the coefficients:

In [6]:
model.b0, model.b1

● These are the predictions:

In [7]:
preds

● And these are the original values

● Original and predicted differ, but not much
In [8]:
y_test

● You can now evaluate the model by calculating RMSE

○ Root Mean Squared Error
● On average, the model is 21.35 units wrong
● It makes sense, as standard deviation of the dataset is 20
In [9]:
from sklearn.metrics import mean_squared_error

rmse = lambda y, y_pred: np.sqrt(mean_squared_error(y, y_pred))

rmse(y_test, preds)

Visualize the Best-Fit line

● If you re-train the model of the entire dataset and then make predictions for the
entire dataset, you'll get the best fit line
● You can then visualize this line with Matplotlib:
In [14]:
model_all = SimpleLinearRegression()
model_all.fit(X, y)
preds_all = model_all.predict(X)

plt.scatter(X, y, s=200, c='#087E8B', alpha=0.65, label='Source data')

plt.plot(X, preds_all, color='#000000', lw=3, label=f'Best fit line > B0 = {model_all.b0:.2f}, B1 =
{model_all.b1:.2f}')
plt.title('Best fit line', size=20)
plt.xlabel('X', size=14)
plt.ylabel('Y', size=14)
plt.legend()
plt.show()

Comparison with Scikit-Learn

● We want to know if our model is good, so let's compare it with LinearRegression

model from Scikit-Learn
● The input data must be reshaped beforehand:
In [11]:
from sklearn.linear_model import LinearRegression

sk_model = LinearRegression()
sk_model.fit(np.array(X_train).reshape(-1, 1), y_train)
sk_preds = sk_model.predict(np.array(X_test).reshape(-1, 1))
sk_model.intercept_, sk_model.coef_

● Our coefficients were (-1.357484948041531, 1.0026529556316826)

● Not identical, but within a margin of error
● Let's check the RMSE:
In [12]:
rmse(y_test, sk_preds)

21.351850699502783
● Ours was 21.351850699502787, so nearly identical.

ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Simple Linear Regression With Jupyter Notebook: Dr. Alvin Ang
No ratings yet
Simple Linear Regression With Jupyter Notebook: Dr. Alvin Ang
16 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Complete SQL Tutorial in Hindi by Rishabh Mishra
100% (2)
Complete SQL Tutorial in Hindi by Rishabh Mishra
99 pages
Unit 2
No ratings yet
Unit 2
136 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
ML Combined
No ratings yet
ML Combined
254 pages
6 - Classification and Regression Tasks
No ratings yet
6 - Classification and Regression Tasks
115 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
SB200 Users Manual
No ratings yet
SB200 Users Manual
181 pages
Unit 5
No ratings yet
Unit 5
171 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Multiple Linear Regression 3
No ratings yet
Multiple Linear Regression 3
68 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
2 - (9-3) Regression Classifiers
No ratings yet
2 - (9-3) Regression Classifiers
35 pages
ML Experiment No 1 Linear Regression Analysis
No ratings yet
ML Experiment No 1 Linear Regression Analysis
3 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Machine Learning Class Slide
No ratings yet
Machine Learning Class Slide
44 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
Module 4
No ratings yet
Module 4
41 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
ML Linear Regression Trupesh Patel
No ratings yet
ML Linear Regression Trupesh Patel
23 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
CSL0777 L15
No ratings yet
CSL0777 L15
24 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
Regression
No ratings yet
Regression
16 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
ML Unit
No ratings yet
ML Unit
23 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Regression
No ratings yet
Regression
16 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Assigment Regression
No ratings yet
Assigment Regression
9 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Examtopics Dumps
No ratings yet
Examtopics Dumps
174 pages
Rinshad ML
No ratings yet
Rinshad ML
6 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Day 3 ML
No ratings yet
Day 3 ML
4 pages
Regression Model
No ratings yet
Regression Model
6 pages
Simple Linear Regression For Absolute Beginners: Index
No ratings yet
Simple Linear Regression For Absolute Beginners: Index
4 pages
Ex 5
No ratings yet
Ex 5
3 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
3 pages
HP Fcpixracer1 Man PDF
No ratings yet
HP Fcpixracer1 Man PDF
16 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
No ratings yet
Machine Learning Lab: Raheel Aslam (74-FET/BSEE/F16)
4 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Sample Thesis Information System
100% (3)
Sample Thesis Information System
6 pages
Mobile Based Student Attendance System Using Geo-F
No ratings yet
Mobile Based Student Attendance System Using Geo-F
17 pages
Teletext Style Workshop XL by Slidesgo
No ratings yet
Teletext Style Workshop XL by Slidesgo
87 pages
Problem Set 2.projectile Motion
No ratings yet
Problem Set 2.projectile Motion
9 pages
MFX Infotech - Corporate Profile
No ratings yet
MFX Infotech - Corporate Profile
37 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Functions of The WMS V1.5
No ratings yet
Functions of The WMS V1.5
45 pages
английский книга
No ratings yet
английский книга
63 pages
SLM-Service Catalog Template
No ratings yet
SLM-Service Catalog Template
28 pages
Mongodb m102 Homework Solutions
100% (1)
Mongodb m102 Homework Solutions
7 pages
CIB - User Guide English
No ratings yet
CIB - User Guide English
15 pages
BN0926710101 Manual Ultimo Aeg
No ratings yet
BN0926710101 Manual Ultimo Aeg
50 pages
Ruet 0108 22
No ratings yet
Ruet 0108 22
17 pages
4.07 Optimization Problems
No ratings yet
4.07 Optimization Problems
11 pages
Language Pack Availability
No ratings yet
Language Pack Availability
11 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
13 pages
Basic of .Net Framework
No ratings yet
Basic of .Net Framework
6 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
2 - Moxa Eds G512e Series Datasheet v1.0
No ratings yet
2 - Moxa Eds G512e Series Datasheet v1.0
7 pages
SAMv 1
No ratings yet
SAMv 1
23 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Commands 2.1
No ratings yet
Commands 2.1
16 pages
Better Data Science - Generate PDF Reports With Python
No ratings yet
Better Data Science - Generate PDF Reports With Python
5 pages
Similarity Measure of Plithogenic Cubic Vague Sets: Examples and Possibilities
No ratings yet
Similarity Measure of Plithogenic Cubic Vague Sets: Examples and Possibilities
9 pages
Krautkramer DMS-Go - Ultrasonic Thickness Gauge Brochure BHFF20219
No ratings yet
Krautkramer DMS-Go - Ultrasonic Thickness Gauge Brochure BHFF20219
3 pages
Multiple Regression
No ratings yet
Multiple Regression
7 pages
A Machine Learning Model For Flight Delay Prediction: Certificate
No ratings yet
A Machine Learning Model For Flight Delay Prediction: Certificate
17 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Data Encryption Standard
No ratings yet
Data Encryption Standard
7 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
Better Data Science - Make Synthetic Datasets With Python
No ratings yet
Better Data Science - Make Synthetic Datasets With Python
4 pages
Deadlock
No ratings yet
Deadlock
12 pages
MIS - Akhil Part
No ratings yet
MIS - Akhil Part
6 pages
Computers Nursing
No ratings yet
Computers Nursing
2 pages
Aishiteru in Malay - Google Search
No ratings yet
Aishiteru in Malay - Google Search
2 pages