0% found this document useful (0 votes)

15 views8 pages

Chapter3 First Application Linear Regression

Uploaded by

Paterson Nguepi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views8 pages

Chapter3 First Application Linear Regression

Uploaded by

Paterson Nguepi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

School year 2023-2024/ Second semester/ FET / Computer Engineering

Artificial Intelligence and machine learning

Chapter 3: First Machine Learning Algorithm Linear

regression

I. Introduction

Linear regression is one of the most important regression models which are used in machine
learning. In the regression model, the output variable, which has to be predicted, should be a
continuous variable, such as predicting the weight of a person in a class.

The regression model also follows the supervised learning method, which means that to build
the model, we’ll use past data with labels, which helps predict the output variable in the
future.

Using the linear regression model, we’ll predict the relationship between the two factors/
variables. The variable which we are expecting is called the dependent variable.

The linear regression model is of two types:

• Simple linear regression: It contains only one independent variable, which we use to
predict the dependent variable using one straight line.
• Multiple linear regression, which includes more than one independent variable.

In this chapter, we will concentrate on the Simple linear regression model.

II. Simple Linear Regression

We have data from a company containing the amount spent on Marketing and its sales
corresponding to that marketing budget. The data looks like this,

Proposed by Dr. SOP DEFFO Lionel Landry Page 1

School year 2023-2024/ Second semester/ FET / Computer Engineering

Marketing Budget (X) in Actual Sales(Y) in

Thousands Millions
127.4 10.5
364.4 21.4
150 10
128.7 9.6
285.9 17.4
200 12.5
303.3 20
315.7 21
169.8 14.7
104.9 10.1
297.7 21.5
256.4 16.6
249.1 17.1
323.1 20.7
223 15.5
235 13.5
200 12.5

Figure 1: Sample Marketing Data

Using Microsoft Excel charts, we can make a scatter plot that looks like the following for the
above data.

Actual Sales(Y) in Millions

0
0 50 100 150 200 250 300 350 400

Figure 2 : Scatter Plot for the above data

Proposed by Dr. SOP DEFFO Lionel Landry Page 2

School year 2023-2024/ Second semester/ FET / Computer Engineering

The above plot signifies the scatter plot of all the data points according to our given data.
Now, we have to fit a straight line through the data points, which helps us predict future sales.

We know that a straight line is represented as:

y = mx + c

Here, we call the line as Regression Line, which is represented as:

Y = β0 + β1X

Now, there can be so many straight lines that can be passed through the data points. We must
find out the best fit line that can be a model to use it for future predictions. To find the best fit
line among all the lines, we’ll introduce a parameter called Residual(e).

Residual is the difference between Y-axis’s actual value and the Y-axis’s predicted value
based on the straight-line equation for that particular X.

Let’s say we have the scatter plot and straight line like the following figure,

Figure 3 : Image by Author — Calculating Residual value using the graph

Proposed by Dr. SOP DEFFO Lionel Landry Page 3

School year 2023-2024/ Second semester/ FET / Computer Engineering

Now, using the above figure, the residual value for x = 2 is:

Residual(e) = Actual value of Y — the predicted value of Y using the line

e = 3–4 = -1

So, the residual for the value x = 2 is -1.

Similarly, we have a residual value for every data point, which is the difference between the
actual Y value and predicted Y value.

𝑒𝑖 = 𝑦𝑖 — ⏞
𝑦𝑖

So, to find out the best fit line, we’ll use a method called Ordinary Least Squares Method
or Residual Sum of Square (RSS) Method.

𝑅𝑆𝑆 = 𝑒1 2 + 𝑒2 2 + 𝑒3 2 + . . . +𝑒𝑚 2

The RSS value will be least for the best fit line.

III Cost Function

Typically, machine learning models define a cost function for a particular problem. Then we
try to minimize or maximize the cost function based on our requirement. In the above
regression model, the RSS is the cost function; we would like to reduce the cost and find out
the β0 and β1 for the straight-line equation.

Now, let’s come back to our marketing dataset in the excel sheet. Using the Linear
Forecast option in Trendline for the above scatter plot, we’ll directly get the best-fit line
for scatter plot without manually calculating the residual values.

Proposed by Dr. SOP DEFFO Lionel Landry Page 4

School year 2023-2024/ Second semester/ FET / Computer Engineering

Actual Sales(Y) in Millions

25
y = 0.0528x + 3.3525
20

0
0 50 100 150 200 250 300 350 400

Figure 4 : Best-fit line using Microsoft Excel scatter plot options

As we can see,

Slope(β1) = 0.0528

Intercept(β0) = 3.3525

Let us calculate the predicted sales(Y) for all the data points(X) using the above straight-line
equation.

The Predicted sales will be,

Marketing Budget (X) in Actual Sales(Y) in Predicted Sales(Y-pred)

Thousands Millions in Millions
127.4 10.5 10.07922
364.4 21.4 22.59282
150 10 11.2725
128.7 9.6 10.14786
285.9 17.4 18.44802
200 12.5 13.9125
303.3 20 19.36674
315.7 21 20.02146
169.8 14.7 12.31794
104.9 10.1 8.89122
297.7 21.5 19.07106
256.4 16.6 16.89042
249.1 17.1 16.50498

Proposed by Dr. SOP DEFFO Lionel Landry Page 5

School year 2023-2024/ Second semester/ FET / Computer Engineering

323.1 20.7 20.41218

223 15.5 15.1269
235 13.5 15.7605
200 12.5 13.9125

Figure 5 : Predicting the sales using (y=0.0528x+3.33525) equation

After that, let’s also calculate the Residual Square value for each data point.

Residual Square = (Actual Y value — Predicted Y value)²

Let’s see the excel sheet after applying the above formula to calculate residual square.

Marketing Budget (X) in Actual Sales(Y) in Predicted Sales(Y^) in Residual Square

Thousands Millions Millions (y_y-pred)^2
127.4 10.5 10.07922 0.177055808
364.4 21.4 22.59282 1.422819552
150 10 11.2725 1.61925625
128.7 9.6 10.14786 0.30015058
285.9 17.4 18.44802 1.09834592
200 12.5 13.9125 1.99515625
303.3 20 19.36674 0.401018228
315.7 21 20.02146 0.957540532
169.8 14.7 12.31794 5.674209844
104.9 10.1 8.89122 1.461149088
297.7 21.5 19.07106 5.899749524
256.4 16.6 16.89042 0.084343776
249.1 17.1 16.50498 0.3540488
323.1 20.7 20.41218 0.082840352
223 15.5 15.1269 0.13920361
235 13.5 15.7605 5.10986025
200 12.5 13.9125 1.99515625

Figure 6 : Dataset after calculating the Residual Squares

Now, RSS is the sum of all the Residual square values from the above sheet.

RSS = 28.77190461

Since this is the best-fit line, the RSS value we got here is the minimum.

Proposed by Dr. SOP DEFFO Lionel Landry Page 6

School year 2023-2024/ Second semester/ FET / Computer Engineering

If we observe RSS value here, it is an absolute quantity. In the future, if we change the
problem setting where we measure sales in terms of billions instead of millions, the RSS
quantity is going to change.

So, we need to define an alternate measure that is relative and not an absolute quantity. That
alternate measure is called Total Sum of Squares (TSS). Using TSS, we’ll calculate the R²
value, which will determine if the model is viable or not.

TSS = (Y1-Ȳ)² + (Y2-Ȳ)² + (Y3-Ȳ)² + ……. + (Ym-Ȳ)²

Where,
Y1, Y2, Y3,….., Ym are the values from the data points.
Ȳ is the average value of the Y-axis column

Now, after calculating TSS, we will compute R².

R² = 1-(RSS/TSS)

R² value always lies between 0 and 1.

If R² is close to 1, then our model is excellent, and we can use the model to predict the
analysis. The model is not suitable for the predictive analysis if the value is close to 0.

Now, let’s calculate the TSS in our excel dataset.

First, we’ll find out the (Yn-Ȳ)² value for every data point, the Ȳ (Average Y-value) is
15.56470588

Now, the dataset looks like,

Marketing Budget (X) in Actual Sales(Y) in Predicted Sales(Y^) in Residual Square Sum of Square
Thousands Millions Millions (y-y-pred)^2 (y-yaverage)
127.4 10.5 10.07922 0.177055808 25.65124567
364.4 21.4 22.59282 1.422819552 34.05065744
150 10 11.2725 1.61925625 30.96595156
128.7 9.6 10.14786 0.30015058 35.57771626
285.9 17.4 18.44802 1.09834592 3.368304498
200 12.5 13.9125 1.99515625 9.392422145
303.3 20 19.36674 0.401018228 19.67183391
315.7 21 20.02146 0.957540532 29.54242215

Proposed by Dr. SOP DEFFO Lionel Landry Page 7

School year 2023-2024/ Second semester/ FET / Computer Engineering

169.8 14.7 12.31794 5.674209844 0.747716263

104.9 10.1 8.89122 1.461149088 29.86301038
297.7 21.5 19.07106 5.899749524 35.22771626
256.4 16.6 16.89042 0.084343776 1.07183391
249.1 17.1 16.50498 0.3540488 2.357128028
323.1 20.7 20.41218 0.082840352 26.37124567
223 15.5 15.1269 0.13920361 0.004186851
235 13.5 15.7605 5.10986025 4.263010381
200 12.5 13.9125 1.99515625 9.392422145

Figure 6: Computing the Sum of Squares using the Y-value and average of all Y-values

TSS = Sum of all the Sum of Squares from the dataset

TSS = 297.5188235

Since we have already calculated the RSS above. Let’s find out the value of R²,
R² = 1-(RSS/TSS) = 0.903293834.

If we observe the scatter plot graph with the best-fit line above, there below the straight-line
equation, the excel already calculated the R² value as 0.9033, which is what we got using all
the calculations.

Since the R² value is more than 90%, this model is highly recommended to predict future
analysis.

IV Conclusion

The regression model is one of the essential models in machine learning. Using this model,
we can predict the outcome of the variable. If the output variable is categorical, we’ll use
another type of model called the Classification model.

Proposed by Dr. SOP DEFFO Lionel Landry Page 8

ABRM Regression
No ratings yet
ABRM Regression
22 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
132 pages
ML Combined
No ratings yet
ML Combined
254 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
Highlighter Ink Out of Blue Ternate
No ratings yet
Highlighter Ink Out of Blue Ternate
6 pages
DISC 212 Session 13
No ratings yet
DISC 212 Session 13
29 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Regressions Courses
No ratings yet
Regressions Courses
84 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Thesis Evaluation Report TW - 2
100% (3)
Thesis Evaluation Report TW - 2
3 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Intro To Reg Models
No ratings yet
Intro To Reg Models
27 pages
面向对外汉语教学的汉泰比较句式对比研究郭晓露
No ratings yet
面向对外汉语教学的汉泰比较句式对比研究郭晓露
65 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Adon Olam
No ratings yet
Adon Olam
4 pages
Final Cc01 Group7
No ratings yet
Final Cc01 Group7
23 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
JavaScript and jQuery for Data Analysis and Visualization
From Everand
JavaScript and jQuery for Data Analysis and Visualization
Jon Raasch
No ratings yet
Sales and Advertising
No ratings yet
Sales and Advertising
14 pages
Marketing Engineering Notes
100% (1)
Marketing Engineering Notes
46 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
MIS BA 20232024 Notes Chapter03
No ratings yet
MIS BA 20232024 Notes Chapter03
13 pages
Linier Regression
No ratings yet
Linier Regression
19 pages
Chapter Simple Linear Regression 1
100% (1)
Chapter Simple Linear Regression 1
77 pages
Group 1 Practical
No ratings yet
Group 1 Practical
16 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Action Taken Report RAA (Secondary Level) 2024-25
No ratings yet
Action Taken Report RAA (Secondary Level) 2024-25
4 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
67 pages
Exam Questions - Second Term Examination Mathematics For JSS 1 (Basic 7) - ClassRoomNotes
No ratings yet
Exam Questions - Second Term Examination Mathematics For JSS 1 (Basic 7) - ClassRoomNotes
14 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
66 pages
20MBS1001 Rudrakshi
No ratings yet
20MBS1001 Rudrakshi
25 pages
Unit 3
No ratings yet
Unit 3
30 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Combining XGBoost With Particle Swarm Optimization To Improve Phishing Detection (JOURNAL (Revisi Note
No ratings yet
Combining XGBoost With Particle Swarm Optimization To Improve Phishing Detection (JOURNAL (Revisi Note
8 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
New Multicultural Lesson Plan Format-1 1
No ratings yet
New Multicultural Lesson Plan Format-1 1
3 pages
Section 2
No ratings yet
Section 2
22 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Pfe Manual
No ratings yet
Pfe Manual
9 pages
2023 Statistics Fin 10
No ratings yet
2023 Statistics Fin 10
14 pages
Language As Internal
No ratings yet
Language As Internal
7 pages
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
No ratings yet
Introduction To Management Science: Post Mid Sessions 2 & 3 November 4 and 6 2019
26 pages
2629 4843 2 PB
No ratings yet
2629 4843 2 PB
6 pages
2024 LS Grade1 CUF Mathematics1 RVPH Q1 LC7
No ratings yet
2024 LS Grade1 CUF Mathematics1 RVPH Q1 LC7
12 pages
Module 21ST
No ratings yet
Module 21ST
100 pages
3 Da
No ratings yet
3 Da
16 pages
Ls2-Lesson 1 How Do I Get Bad Luck
100% (1)
Ls2-Lesson 1 How Do I Get Bad Luck
5 pages
Lazarski 2021-2022 BABE
No ratings yet
Lazarski 2021-2022 BABE
2 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Group 12 - MicroEconomics - EL
No ratings yet
Group 12 - MicroEconomics - EL
2 pages
CS RGPV - MOOCS - CS - and - IT140723071443
No ratings yet
CS RGPV - MOOCS - CS - and - IT140723071443
6 pages
DLP-DLL Making
No ratings yet
DLP-DLL Making
44 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Descargas 1
No ratings yet
Descargas 1
7 pages
Math Cbse
No ratings yet
Math Cbse
122 pages
Writing Applications Prewriting Handout
No ratings yet
Writing Applications Prewriting Handout
1 page
November 2024 C-20 Time Table
No ratings yet
November 2024 C-20 Time Table
2 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
FS2 - Week 10-11
100% (1)
FS2 - Week 10-11
15 pages
Adulthood: By: Beverlycovita
No ratings yet
Adulthood: By: Beverlycovita
20 pages
Sean Burke - Online Crayfish Dissection
No ratings yet
Sean Burke - Online Crayfish Dissection
10 pages
Trivy
No ratings yet
Trivy
4 pages
Wilmont's Pharmacy Gantt Chart
No ratings yet
Wilmont's Pharmacy Gantt Chart
1 page
Ev3 User Guide
No ratings yet
Ev3 User Guide
69 pages
Cost Benefit Analysis
No ratings yet
Cost Benefit Analysis
5 pages
Renesas Flash Programmer Sample Circuit For Programming PC Serial PDF
No ratings yet
Renesas Flash Programmer Sample Circuit For Programming PC Serial PDF
5 pages

Chapter3 First Application Linear Regression

Uploaded by

Chapter3 First Application Linear Regression

Uploaded by

School year 2023-2024/ Second semester/ FET / Computer Engineering

Artificial Intelligence and machine learning

Chapter 3: First Machine Learning Algorithm Linear

The linear regression model is of two types:

In this chapter, we will concentrate on the Simple linear regression model.

II. Simple Linear Regression

Proposed by Dr. SOP DEFFO Lionel Landry Page 1

Marketing Budget (X) in Actual Sales(Y) in

Figure 1: Sample Marketing Data

Actual Sales(Y) in Millions

Figure 2 : Scatter Plot for the above data

Proposed by Dr. SOP DEFFO Lionel Landry Page 2

We know that a straight line is represented as:

Here, we call the line as Regression Line, which is represented as:

Figure 3 : Image by Author — Calculating Residual value using the graph

Proposed by Dr. SOP DEFFO Lionel Landry Page 3

Residual(e) = Actual value of Y — the predicted value of Y using the line

So, the residual for the value x = 2 is -1.

III Cost Function

Proposed by Dr. SOP DEFFO Lionel Landry Page 4

Actual Sales(Y) in Millions

Figure 4 : Best-fit line using Microsoft Excel scatter plot options

The Predicted sales will be,

Marketing Budget (X) in Actual Sales(Y) in Predicted Sales(Y-pred)

Proposed by Dr. SOP DEFFO Lionel Landry Page 5

323.1 20.7 20.41218

Figure 5 : Predicting the sales using (y=0.0528x+3.33525) equation

Residual Square = (Actual Y value — Predicted Y value)²

Marketing Budget (X) in Actual Sales(Y) in Predicted Sales(Y^) in Residual Square

Figure 6 : Dataset after calculating the Residual Squares

Proposed by Dr. SOP DEFFO Lionel Landry Page 6

TSS = (Y1-Ȳ)² + (Y2-Ȳ)² + (Y3-Ȳ)² + ……. + (Ym-Ȳ)²

Now, after calculating TSS, we will compute R².

R² value always lies between 0 and 1.

Now, let’s calculate the TSS in our excel dataset.

Now, the dataset looks like,

Proposed by Dr. SOP DEFFO Lionel Landry Page 7

169.8 14.7 12.31794 5.674209844 0.747716263

TSS = Sum of all the Sum of Squares from the dataset

Proposed by Dr. SOP DEFFO Lionel Landry Page 8

You might also like