0% found this document useful (0 votes)

36 views7 pages

Linear Regression

Uploaded by

Rutvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views7 pages

Linear Regression

Uploaded by

Rutvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Linear Regression

Some of the most common algorithms are regressions. A

regression predicts a number from infinitely possible number
outputs. Linear regressions fit a straight line to a dataset,
while non-linear regressions fit a curved line to a dataset.
Using only one input variable is called univariate linear
regression, but models can have much more than just one. A
practical example of linear regression is predicting home
prices based on their square footage.

Defining the Model

The function defining a linear regression model is represented as,

fw,b (x) = wx + b

where w represents the slope and b the y-intercept of the line.

Once the line is fitted to the data, the same function definition is applied. The only
difference is the output variable, which is now estimated based on the regression
line. This predicted output and function model are defined as as,

y^ = fw,b (xi ) = wxi + b

The Cost Function

Finding the Cost Function
Since the slope (w ) and y-intercept (b) dramatically influence the position and
direction of the regression line, making sure if fits the data as accurately and
precisely as possible is very important. Measuring how well a line fits the data is
called finding the cost function, which is defined below. The error itself lies in the
difference of the predicted output from the actual output y^i − yi .

Linear Regression 1
m
∑(y^i − yi )2
i=1

It’s important to compute this term for different training examples i in the training
set, up until the number of total training examples m. As the number of training
examples increases, it can be more efficient to calculate the average square
error, rather than the total square error. The cost function J , also called the
squared error cost function, is defined as,

m
1
J (w, b) = ∑(y^i − yi )2
2m
i=1

The cost function most engineers use actually divides by 2m, because it helps
make calculations further into the process neater. Withholding this extra two is
perfectly acceptable. While different applications use different types of cost
functions, squared error cost functions are easily the most common type for
regression problems.

To get a better sense of the parameters that are adjusted in the cost function, it
can be written as,

m
1
J (w, b) = ∑(fw,b (xi ) − yi )2
2m
i=1

The only difference here is the predicted output y^ is represented as the function
model fw,b (x ). Because the the function model contains the only parameters
i

that can be adjusted (slope and y-intercept), this is the complete cost function
definition.

Optimizing the Cost Function

The goal of linear regression algorithms is to minimize the cost function as much
as possible. This is done by comparing its output to the two parameters that
affect its accuracy, w and b.

minimize J (w, b)
w,b

J can be plotted against both parameters individually, which results in 2D graphs

for each. However, to get the full picture, they can also be plotted together in both

Linear Regression 2
2D (contour) and 3D (surface) graphs.

Contour graph of the cost

function; plotted against w and b.
Surface graph of the cost function; plotted against w and
b.

Gradient Descent for Linear Regression

Given all the equations and definitions above, almost everything is present to
programmatically compute the gradient descent for a linear regression. However,
there is one final piece missing, which is expressing the derivative of the cost
∂ ∂
function ∂w J (w, b) and ∂b J (w, b).
Since the cost function and the linear regression model have already been
expressed in terms of the input variables x and y, they are substituted into the
gradient descent algorithm. The derivative is then taken. Consider the following
equations, where m is the number of training examples in the dataset.

Linear regression model Cost function

m
fw,b (x) = wx + b 1
J (w, b) = ∑(fw,b (xi ) − yi )2
2m
i=1

Pre-derived gradient descent algorithm

m
∂ 1
w =w−α J (w, b) ⇒ ∑(fw,b (xi ) − yi )xi )
∂w m
i=1

m
∂ 1
b = b − α J (w, b) ⇒ ∑(fw,b (xi ) − yi )
∂b m
i=1

Linear Regression 3
Final gradient descent algorithm
m
1
w = w − α ∑(fw,b (xi ) − yi )xi
m
i=1

m
1
b = b − α ∑(fw,b (xi ) − yi )
m
i=1

Multiple Feature Linear Regression

Adding more input features is a critical component of achieving accurate machine
learning models. For example, consider the previous example of determining
home prices by square footage. While this data can provide insight, it is probably
not enough for practical use. More variables can be added, such as number of
bedrooms and bathrooms, location, etc.

Notation
When adding more input features, slight modifications to the notation are
necessary,

xj = j th feature
n = number of features
xi = features of ith training example
xj i = value of feature j in ith training example

Model
With the addition of multiple variables comes the necessary updated to the model
functions. Consider the following changes, where n refers to the number of input
features.

Previous single variable Updated multivariable model

model
fw,b (x) = w1 x1 + … + wn xn + b
fw,b (x) = wx + b

To simply the notation, the multivariable model can be written with vectors. Notice
how b is not included in the vectors, as it is just a number and not a parameter.

Linear Regression 4
w = [w1 w2 w3 … wn ]

x = [x1 x2 x3 … xn ]

b, however, is included in the complete model, which is described below.

f w ,b (x) = w ⋅ x + b

Vectorization
Writing vectorized code is essential for turning these linear algebra equations into
something a computer can read and process. It also so happens that GPUs are
very efficient for running vectorized code. Consider the following example, where
the parameters and features are transformed into code.
Notice how the linear algebra count is 1 index, while the Python code is 0
indexed. The Python code utilizes the widely used NumPy package to implement
linear algebra tools, among many others. Without NumPy, the arrow multiplication
between w and x would have to be hardcoded, which could be especially
problematic when running into a high value of n.

Vector notation Python code

Defining parameters and features,

n=3
w = np.array([1.0, 2.5, -3.3])
w = [w1 w2 w3 ] b = 4
x = np.array([10, 20, 30])
b is a number
x = [x1 x2 x3 ]

Without vectorization (NumPy),

fw ,b (x) = (∑ wj xj ) + b
n
f = 0
for j in range(n): # 0 to n-1
j=1 f += w[j] * x[j]
f += b

With vectorization (NumPy),

Linear Regression 5
f w ,b (x) = w ⋅ x + b f = np.dot(w,x) + b

The benefits of using vectorization are not only cleaner and simpler code, but that
the performance is actually much faster. This is because NumPy uses parallel
hardware in the computer.
Consider the example above. Without vectorization, the for loop runs linearly, by
adding the product of w and x for each parameter j , step after step: f += w[j] *
x[j]

However, with vectorization, the products for w and x for each parameter j are
computed in parallel. Using special hardware, these products are then added
together to give the final result for f.

Gradient Descent for Multiple Linear

Regression
Utilizing gradient descent for multiple features follows a very similar process as
multiple feature linear regression: adding vector notation. Consider the following
pre-derived gradient descent algorithm, where multiple features w are used for
m training examples in the dataset,

Pre-derived gradient descent algorithm

∂
w =w−α J (w , b)
∂w
∂
b= b−α J (w , b)
∂b

Final gradient descent algorithm

m
1
wj = wj − α ∑(fw ,b (xi ) − yi )xj
(i)
m i=1
simultaneously update
m
1
bn = bn − α ∑(fw ,b (xi ) − yi ) wj (for j = 1, ⋯ , n) and b
m i=1

Notice how b does not require utilizing n features, because b is a value that does
not change based on the parameter being used. Also, the second xi in the first

Linear Regression 6
equation for w does not use vector notation, and this is because that variable is
dependent on the current row j .

Regularized Linear Regression

Since the cost function has now changed due to regularization, the gradient
descent algorithm will need to be modified too; specifically, the derivative of the
cost function. Given the regularized cost function and gradient descent functions,

m n
1 λ
J (w , b) = ∑(fw ,b (xi ) − yi )2 + ∑ wj2
2m 2m
i=1 j=1

∂ ∂
wj = wj − α J (w , b) b= b−α J (w , b)
∂wj ∂b

Fortunately, the necessary changes require only a small modification to wj in the

gradient descent. Note b is unaffected, because it is not a weighted parameter in
the in the original regularization term.

m
∂ 1 λ
∑(fw ,b (xi ) − yi )xj + wj
(i)
J (w , b) =
∂wj m i=1 m

m
∂ 1
J (w , b) = ∑(fw ,b (xi ) − yi )
∂b m i=1

Combining the equations above, the complete gradient descent is formed,

Complete gradient descent of regularized linear regression

wj = wj − α [ ∑ [(fw ,b (xi ) − yi )xj ] + wj ]

m
1 (i) λ
m i=1 m

m
1
b = b − α ∑(fw ,b (xi ) − yi )
m i=1

Linear Regression 7

SYSCAL Pro Users Manual SYSCAL Pro Stand
No ratings yet
SYSCAL Pro Users Manual SYSCAL Pro Stand
114 pages
Design of Short Columns
No ratings yet
Design of Short Columns
26 pages
An Episodic History of Mathematics PDF
No ratings yet
An Episodic History of Mathematics PDF
483 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
Esci JPP
0% (1)
Esci JPP
27 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
Work Sampling
100% (1)
Work Sampling
69 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Week 04
No ratings yet
Week 04
101 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
Question 1 B
No ratings yet
Question 1 B
6 pages
Mimo Introduction
No ratings yet
Mimo Introduction
13 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
Doherty Power Amplifier For 5G Systems
No ratings yet
Doherty Power Amplifier For 5G Systems
25 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Sap Pi Adapters Faq
100% (3)
Sap Pi Adapters Faq
16 pages
Regression
No ratings yet
Regression
16 pages
Lab02
No ratings yet
Lab02
14 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Linear Regression by Sam
No ratings yet
Linear Regression by Sam
27 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Lecture W2c
No ratings yet
Lecture W2c
16 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
ML 2
No ratings yet
ML 2
155 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Week 4
No ratings yet
Week 4
101 pages
ML Section2
No ratings yet
ML Section2
36 pages
5.2 Regression
No ratings yet
5.2 Regression
19 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
LinearRegression PDF
No ratings yet
LinearRegression PDF
4 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
s15 Pin Out
No ratings yet
s15 Pin Out
4 pages
NDA Books 2022: Best Books To Crack NDA 1 & 2 Exams: Anangsha Patra
No ratings yet
NDA Books 2022: Best Books To Crack NDA 1 & 2 Exams: Anangsha Patra
7 pages
Unit 5: International Financial Management 5.1
No ratings yet
Unit 5: International Financial Management 5.1
4 pages
Statics - Chapter 5
No ratings yet
Statics - Chapter 5
12 pages
CAPM
No ratings yet
CAPM
2 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
8051 Instruction Set
No ratings yet
8051 Instruction Set
50 pages
Iot Based Garbage Management System For Smart City Using Raspberry Pi
No ratings yet
Iot Based Garbage Management System For Smart City Using Raspberry Pi
10 pages
5.0SMLJ24A Datasheet
No ratings yet
5.0SMLJ24A Datasheet
5 pages
Steldeck Slab Design
No ratings yet
Steldeck Slab Design
18 pages
EN UserGuideISAKMetry
No ratings yet
EN UserGuideISAKMetry
32 pages
Article Summary
No ratings yet
Article Summary
8 pages
Industrial Filters PDF
No ratings yet
Industrial Filters PDF
48 pages
IAPM Exam Doc 7-8: Session 7 - MCQ Hack
No ratings yet
IAPM Exam Doc 7-8: Session 7 - MCQ Hack
56 pages
CAL Script For MDG - Governing Profit Center
No ratings yet
CAL Script For MDG - Governing Profit Center
29 pages
I - V Converter
No ratings yet
I - V Converter
4 pages
Dataform
No ratings yet
Dataform
17 pages
ER To Relational Model
No ratings yet
ER To Relational Model
39 pages
Mathematical - Optimization in Management of Services
No ratings yet
Mathematical - Optimization in Management of Services
20 pages
Unit 6
No ratings yet
Unit 6
9 pages
Nokia
No ratings yet
Nokia
2 pages
معاينة جبس
No ratings yet
معاينة جبس
21 pages
H53015302 TRQ XXX
No ratings yet
H53015302 TRQ XXX
2 pages
Final IEEEversion
No ratings yet
Final IEEEversion
7 pages
Geography F1T1 2024 QS Teacher - Co - .Ke
No ratings yet
Geography F1T1 2024 QS Teacher - Co - .Ke
4 pages
CCD Summary
No ratings yet
CCD Summary
2 pages
Motion and Its Types - What Is Motion - Types of Motion PPT 2
No ratings yet
Motion and Its Types - What Is Motion - Types of Motion PPT 2
1 page
Coke Summary
No ratings yet
Coke Summary
2 pages
Fractional Brownian Motion: Approximations and Projections
From Everand
Fractional Brownian Motion: Approximations and Projections
Oksana Banna
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Linear Regression

Some of the most common algorithms are regressions. A

Defining the Model

where w represents the slope and b the y-intercept of the line.

y^ = fw,b (xi ) = wxi + b

The Cost Function

Optimizing the Cost Function

J can be plotted against both parameters individually, which results in 2D graphs

Contour graph of the cost

Gradient Descent for Linear Regression

Linear regression model Cost function

Pre-derived gradient descent algorithm

Multiple Feature Linear Regression

Previous single variable Updated multivariable model

b, however, is included in the complete model, which is described below.

Vector notation Python code

Without vectorization (NumPy),

With vectorization (NumPy),

Gradient Descent for Multiple Linear

Pre-derived gradient descent algorithm

Final gradient descent algorithm

Regularized Linear Regression

Fortunately, the necessary changes require only a small modification to wj in the

Combining the equations above, the complete gradient descent is formed,

Complete gradient descent of regularized linear regression

wj = wj − α [ ∑ [(fw ,b (xi ) − yi )xj ] + wj ]

You might also like