0% found this document useful (0 votes)

13 views43 pages

Linear Regression Multi

Uploaded by

ilham.hasib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views43 pages

Linear Regression Multi

Uploaded by

ilham.hasib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Linear Regression

with Multiple Variables

CSE 445 Machine Learning ECE@NSU

Multiple Features
Size in feet2 (x) Number of Number of Age of Price ($) in 1000’s
Bedrooms Floors home (y)
(years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …

• Notation
– n = number of features
– m = number of training examples
– x(i) = input (features) of ith training example
– = value of features j in ith training example

CSE 445 Machine Learning ECE@NSU

Multiple Features
• Multiple variables = multiple features
• In original version we had
– x = house size (use this to predict)
– y = house price
• If in a new scheme we have more variables (such as
number of bedrooms, number of floors, age of the
home)
– x1, x2, x3, x4 are the four features
• x1 - size (feet squared)
• x2 - Number of bedrooms
• x3 - Number of floors
• x4 - Age of home (years)
– y is the output variable (price)
CSE 445 Machine Learning ECE@NSU
Hypothesis for Multiple Features

Previously: hθ(x) = θ0 + θ1x

hθ(x) = θ0 + θ1x1 + θ2x2 + θ3x3 + … + θnxn
E.g.:
hθ(x) = 80 + 0.1x1 + 0.01x2 + 3x3 - 2x4

CSE 445 Machine Learning ECE@NSU

Hypothesis for Multiple Features
• For convenience of notation, x0 = 1 For every example
i you have an additional 0th feature for each example
• So now your feature vector is n + 1 dimensional
feature vector indexed from 0
– This is a column vector called X
– Each example has a column vector associated with it
• Parameters are also in a 0 indexed n + 1 dimensional
vector
– This is also a column vector called θ
– This vector is the same for each example

CSE 445 Machine Learning ECE@NSU

Hypothesis for Multiple Features
hθ(x) = θ0 + θ1x1 + θ2x2 + … + θnxn
X= θ=

hθ(x) = θTX
ϵ ℝn+1 ϵ ℝn+1

CSE 445 Machine Learning ECE@NSU

Hypothesis for Multiple Features
• hθ(x) = θT X
– θT is an [1 x n+1] matrix
– In other words, because θ is a column vector, the
transposition operation transforms it into a row vector
– So before θ was a matrix [n+1 X 1]
– Now θT is a matrix [1 X n+1]
– Which means the inner dimensions of θT and X match, so
they can be multiplied together as
• [1 X n+1] * [n+1 X 1] = hθ(x)
• So, in other words, the transpose of our parameter vector * an input
example X gives you a predicted hypothesis which is [1 x 1]
dimensions (i.e. a single value)

CSE 445 Machine Learning ECE@NSU

Model for Multiple Features
Hypothesis: hθ(x) = θTX = θ0x0 + θ1x1 + θ2x2 + … + θnxn

Parameters = θ0, θ1, θ2, … θn

Cost function: =

Gradient descent:
Repeat{
– α J()
} (simultaneously update for every j = 0, 1, … n)

CSE 445 Machine Learning ECE@NSU

Gradient Descent Algorithm
Previously (n = 1):
Repeat
{
–α
–α
(simultaneously update , )
}

CSE 445 Machine Learning ECE@NSU

Gradient Descent Algorithm
New algorithm (n ≥ 1):
Repeat
{
–α
(simultaneously update for j = 0, 1, … n)
}

–α
–α
–α
…
CSE 445 Machine Learning ECE@NSU
Gradient Descent Algorithm
• We're doing this for each j (0 until n) as
simultaneous update (like when n = 1)
• So, we reset θj to
– θj minus the learning rate (α) times the partial derivative of
the θ vector with respect to θj
– In non-calculus words, this means that we do
• Learning rate
• Times 1/m (makes the maths easier)
• Times the sum of
– The hypothesis taking in the variable vector, minus the actual value,
times the jth value in that variable vector for each example

CSE 445 Machine Learning ECE@NSU

Feature Scaling
• Having a problem with multiple features
• Make sure those features have a similar scale
– Means gradient descent will converge more quickly
•E.g. x1 = size (0 – 2000 feet2)
x2 = number of bed rooms (1 - 5)
•Means the contours generated if we
θ2 J(θ)
plot θ1 vs. θ2 give a very tall and
thin shape due to the huge range
difference
•Running gradient descent can take a
long time to find the global minimum

θ1
CSE 445 Machine Learning ECE@NSU
Feature Scaling
• Idea: Make sure features are on a similar scale

0 ≤ x1 ≤ 1
0 ≤ x2 ≤ 1 θ2
J(θ)
• If you define each value from
x1 and x2 by dividing by the
max for each feature
• Contours become more like
circles (as scaled between
0 and 1) θ1

CSE 445 Machine Learning ECE@NSU

Feature Scaling
• Get every feature into approximately a -1 ≤ xi ≤ 1 range
• Want to avoid large ranges, small ranges or very
different ranges from one another
• Rule of thumb regarding acceptable ranges
– -3 to +3 is generally fine - any bigger bad
– -1/3 to +1/3 is okay - any smaller bad

x0 = 0
0 ≤ x1 ≤ 3
-2 ≤ x2 ≤ 0.5
-100 ≤ x3 ≤ 100
-0.0001 ≤ x4 ≤ 0.0001
CSE 445 Machine Learning ECE@NSU
Mean Normalization
• Take a feature xi
– Replace it by (xi - mean)/max
– So your values all have an average of about 0
• E.g. -0.5 ≤ x1 ≤ 0.5
-0.5 ≤ x2 ≤ 0.5

• Instead of max, can also be used standard deviation or

(max - min)

CSE 445 Machine Learning ECE@NSU

Learning Rate α
• – α J()
• Debugging: how to make sure gradient descent is
working correctly
• How to choose learning rate α

CSE 445 Machine Learning ECE@NSU

Learning Rate α
min J ( θ)
θ
• – α J()

• Number of iterations varies a lot

– 30 iterations
– 3000 iterations
– 3000000 iterations 0 100 200 300 400
– Very hard to tell in advance how No of iteration
many iterations will be needed
– Can often make a guess based a plot like this after the first 100 or so
iterations

CSE 445 Machine Learning ECE@NSU

Learning Rate α
min J ( θ)
θ

0 100 200 300 400

No of iteration
• Automatic convergence tests
– Declare convergence if decrease by less than 10-3 in one iteration

CSE 445 Machine Learning ECE@NSU

Learning Rate α
J ( θ) J ( θ)

No of iteration No of iteration
J ( θ)
• Gradient descent no working
• Use small α

No of iteration
CSE 445 Machine Learning ECE@NSU
Learning Rate α
• For sufficiently small α, J(θ) should decrease on every
iteration
• But it α is too small, gradient descent can be slow to
convergence
• So
– If α is too small: slow convergence
– If α is too large: J(θ) may not decrease on every iteration;
may not converge
• To choose α, try
…, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, …

CSE 445 Machine Learning ECE@NSU

Feature Selection
• House price prediction
• hθ(x) = θ0 + θ1X frontage + θ2 X depth

• Two features
– Frontage - width of the plot of land along road (x1)
– Depth - depth away from road (x2)

CSE 445 Machine Learning ECE@NSU

Feature Selection
• You don't have to use just two features
– Can create new features

• Might decide that an important feature is the land area

– So, create a new feature area (x3) = frontage * depth
hθ(x) = θ0 + θ1X area
• Area is a better indicator
• Often, by defining new features you may get a better
model

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• May fit the data better
• hθ(x) = θ0 + θ1x + θ2x2
– e.g. quadratic function

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• For housing data could use a quadratic function
– But may not fit the data so well - inflection point
means housing prices decrease when size gets
really big

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• So instead must use a cubic function
• hθ(x) = θ0 + θ1x + θ2x2 + θ3x3

CSE 445 Machine Learning ECE@NSU

Polynomial regression
• So instead must use a cubic function
• hθ(x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3
• hθ(x) = θ0 + θ1(size) + θ2 (size)2 + θ3 (size)3
x1 = (size)
x2 = (size)2
x3 = (size)3

Make sure apply the feature scaling.

size = 1 – 1000
(size)2 = 1 – 1000000
(size)3 = 1 – 109
CSE 445 Machine Learning ECE@NSU
Choice of features
hθ(x) = θ0 + θ1 (size) + θ2

• Instead of a conventional polynomial you could do

variable ^(1/something) - i.e. square root, cubed root,
etc.

CSE 445 Machine Learning ECE@NSU

Gradient Descent
J ( θ)

• In order to minimize cost function , iterative algorithm

takes many steps in multiple iteration of gradient
descent to converge to global minimum

CSE 445 Machine Learning ECE@NSU

Normal Equation
• For some linear regression problems the
normal equation provides a better solution
• So far we've been using gradient descent
– Iterative algorithm which takes steps to converse
• Normal equation solves θ analytically
– Solve for the optimum value of θ in one step
– Has some advantages and disadvantages

CSE 445 Machine Learning ECE@NSU

Normal Equation
• Simplified cost function

If 1D, θ ϵ ℝ, not a vector

• How do you minimize this?
– Set-
=…=0
• Take derivative of J(θ) with respect to θ
• Set that derivative equal to 0
• Allows to solve for the value of θ which minimizes
J(θ)

CSE 445 Machine Learning ECE@NSU

Normal Equation
• In our more complex problems;
– Here θ is an n + 1 dimensional vector of real numbers
– Cost function is a function of the vector value
θ ϵ ℝn+1

• How do we minimize this function-

– Take the partial derivative of J(θ) with respect θj and set
to 0 for every j
= … = 0 (for every j)

CSE 445 Machine Learning ECE@NSU

Normal Equation
• Solve for
)
which minimizes

• This would give the values of θ which minimize J(θ)

• If you work through the calculus and the solution, the
derivation is pretty complex
– Not going to go through here
– Instead, what do you need to know to implement this
process

CSE 445 Machine Learning ECE@NSU

Normal Equation
Size in feet2 Number of Number of Age of home Price ($) in 1000’s
Bedrooms Floors (years)
(x1) (x2) (x3) (x4) (y)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178

• Here
– n=4
– m=4

CSE 445 Machine Learning ECE@NSU

Normal Equation
• Add an extra column (x0 feature)
• Construct a column vector y vector [m x 1] matrix
Size in Number of Number of Age of home Price ($) in 1000’s
feet2 Bedrooms Floors (years)
(x0) (x1) (x2) (x3) (x4) (y)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178

X= y=
• = (XT X)-1XT y

m x (n+1) m

CSE 445 Machine Learning ECE@NSU

General Form
• m training examples and n features
• The design matrix (X)
– Each training example is a n+1 dimensional feature column
vector
– X is constructed by taking each training example,
determining its transpose (i.e. column -> row) and using it
for a row in the design A
– This creates an [m x (n+1)] matrix

CSE 445 Machine Learning ECE@NSU

General Form
• m training examples (x(1), y(1)), …, (x(m), y(m))
• n features
•

ϵ ℝn+1
Design
Matrix

m x (n+1)

CSE 445 Machine Learning ECE@NSU

General Form
• Concrete example with only one feature.
E.g.

mx2

CSE 445 Machine Learning ECE@NSU

Least squares method (Optional)

CSE 445 Machine Learning ECE@NSU

Gradient Descent vs Normal Equation
m training examples, n features
Gradient Descent Normal Equation
Need to choose α No need to choose α
Needs many iteration Don’t need to iterate
Works well even when n is Needs to compute (XT X)-1
massive (millions) • This is the inverse of an n x n
• Better suited to big data matrix
• What is a big n though • With most implementations
• 100 or even a 1000 is still computing a matrix inverse
(relativity) small grows by O(n3)
• If n is 10 000 then look at • So not great
using gradient descent • Slow if n is large
• Can be much slower
n = 106 n = 100, n = 1000, n = 10000

CSE 445 Machine Learning ECE@NSU

Disadvantages of Normal Equation method

• Normal equation
– Can be used for linear regression only
– Can solve w, b without iterations
• Disadvantages
– Doesn't generalize to other learning algorithms
– Slow when number of features is large
(>10000)

CSE 445 Machine Learning ECE@NSU

Normal Equation and Noninvertibility
Normal equation: = (XT X)-1XT y

• What if (XT X)-1 is non-invertible?

(singular/degenetate)
– Only some matrices are invertible
– This should be quite a rare problem
• Octave: pinv(X ' * X) * X ' * y
– pinv (pseudo inverse)
– This gets the right value even if (XT X) is non-invertible

CSE 445 Machine Learning ECE@NSU

Normal Equation and Noninvertibility
Normal equation: = (XT X)-1XT y

• What does it mean for (XT X) to be non-invertible

• Normally two common causes
– Redundant features (Linearly dependent)
• e.g.
– x1 = size in feet
– x2 = size in meters squared
– x1= (3.28)2 x2
• If you find (XT X) to be non-invertible
– Look at features → are features linearly dependent?
– So just delete one, will solve problem.

CSE 445 Machine Learning ECE@NSU

Suggested reading
• Andrew Ng
CS229 Lecture Notes (Page 13 – 15)
Video lectures (see 3.Notes.pdf)
• https://medium.com/@shiny_jay/linear-regr
ession-2c2ae9507aba
• https://medium.com/@qempsil0914/machin
e-learning-notes-week2-multivariate-linear-
regression-mse-gradient-descent-normal-e1
5785f771bd

CSE 445 Machine Learning ECE@NSU

Linear Regression With Multiple Variables
100% (1)
Linear Regression With Multiple Variables
38 pages
Site Selection Criteria
0% (2)
Site Selection Criteria
3 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Slide 4 - Linear Regression With Multiple Variables
100% (1)
Slide 4 - Linear Regression With Multiple Variables
30 pages
Linear Regression - Univariate
No ratings yet
Linear Regression - Univariate
62 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
3-Linear Regression Multiple
No ratings yet
3-Linear Regression Multiple
164 pages
CSE445 T3b Linear Regression Multiple Varable
No ratings yet
CSE445 T3b Linear Regression Multiple Varable
27 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Linear Regression
100% (1)
Linear Regression
51 pages
CSE445 T4a Logistic Regression
No ratings yet
CSE445 T4a Logistic Regression
38 pages
CSE445 T3 Linear Regression One Variable
No ratings yet
CSE445 T3 Linear Regression One Variable
57 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
Lecture 4
No ratings yet
Lecture 4
101 pages
Achine Learning Inear Egression With Multiple Variable: Ntroduction
No ratings yet
Achine Learning Inear Egression With Multiple Variable: Ntroduction
14 pages
Linear Regression: Jia-Bin Huang Virginia Tech
No ratings yet
Linear Regression: Jia-Bin Huang Virginia Tech
59 pages
Lecture Slides - Linear Regression (2025)
No ratings yet
Lecture Slides - Linear Regression (2025)
45 pages
ML03
No ratings yet
ML03
14 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
5.1. Intro To Machine Learning
No ratings yet
5.1. Intro To Machine Learning
34 pages
Cours-1regression Lineaire PDF
No ratings yet
Cours-1regression Lineaire PDF
24 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
ML Lecture # 04 Multiple Regression
No ratings yet
ML Lecture # 04 Multiple Regression
29 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Lec02 - Linear Regression
No ratings yet
Lec02 - Linear Regression
35 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
56 pages
Lecture 6
No ratings yet
Lecture 6
51 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
37 pages
M Tech-Asp3
No ratings yet
M Tech-Asp3
15 pages
Linear Regression With Multiple Variable
No ratings yet
Linear Regression With Multiple Variable
30 pages
Lecture 3-4
No ratings yet
Lecture 3-4
87 pages
Lecture 3-Linear-Regression-Part2
No ratings yet
Lecture 3-Linear-Regression-Part2
45 pages
L2b Regression Fitting Multiple Regression Annotated 3
No ratings yet
L2b Regression Fitting Multiple Regression Annotated 3
28 pages
Lecture 3
No ratings yet
Lecture 3
56 pages
cs229 2
No ratings yet
cs229 2
275 pages
Programming Ex.1
No ratings yet
Programming Ex.1
6 pages
Canon CXDI-40EG X-Ray - Service Manual PDF
No ratings yet
Canon CXDI-40EG X-Ray - Service Manual PDF
320 pages
Applied Machine Learning: Multiple Linear Regression
No ratings yet
Applied Machine Learning: Multiple Linear Regression
25 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Lecture3-Linear Regression With Multiple Variables
No ratings yet
Lecture3-Linear Regression With Multiple Variables
27 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
C1 W2 Lab03 Feature Scaling and Learning Rate Soln
No ratings yet
C1 W2 Lab03 Feature Scaling and Learning Rate Soln
10 pages
Chap 4
No ratings yet
Chap 4
31 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
C1 W2 Lab04 FeatEng PolyReg Soln
No ratings yet
C1 W2 Lab04 FeatEng PolyReg Soln
5 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
42 pages
Lec 03
No ratings yet
Lec 03
42 pages
Week 2
No ratings yet
Week 2
5 pages
Cost Function
No ratings yet
Cost Function
17 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
31 pages
Machine Learning - Home - Week 2 - Notes - Coursera
No ratings yet
Machine Learning - Home - Week 2 - Notes - Coursera
10 pages
Notes 2. Linear - Regression - With - Multiple - Variables
No ratings yet
Notes 2. Linear - Regression - With - Multiple - Variables
10 pages
Gradient Descent Tips: X X X X X
No ratings yet
Gradient Descent Tips: X X X X X
1 page
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Tesla, .. ? / Cold Fusion, Tesla, Zeropoint Energy Utilization.. Pseudoscience?// ( ) ! / Analysis of New Energy Paradigm: Including Controversial & Questionable Claims
100% (1)
Tesla, .. ? / Cold Fusion, Tesla, Zeropoint Energy Utilization.. Pseudoscience?// ( ) ! / Analysis of New Energy Paradigm: Including Controversial & Questionable Claims
498 pages
Machine Learniing
No ratings yet
Machine Learniing
31 pages
KEI HW List Price - 15th Feb 2025
No ratings yet
KEI HW List Price - 15th Feb 2025
1 page
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
Finishing
100% (4)
Finishing
68 pages
COT-MATH - Identifying Parallel, Inter-Secting and Perpendicular Lines
70% (10)
COT-MATH - Identifying Parallel, Inter-Secting and Perpendicular Lines
3 pages
Biogeochemical Cycles
100% (1)
Biogeochemical Cycles
3 pages
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
No ratings yet
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
10 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Automation Studio Evergreen Notes For Interview
No ratings yet
Automation Studio Evergreen Notes For Interview
41 pages
Bloom's Revised Taxonomy of Educational Objectives
No ratings yet
Bloom's Revised Taxonomy of Educational Objectives
36 pages
Autocad Multtiple Choice Questions
No ratings yet
Autocad Multtiple Choice Questions
10 pages
MCP 033123
No ratings yet
MCP 033123
224 pages
Jis G4105
No ratings yet
Jis G4105
2 pages
Đề HSG ANH 12 HÀ TĨNH 2023-2024
No ratings yet
Đề HSG ANH 12 HÀ TĨNH 2023-2024
15 pages
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
No ratings yet
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
1 page
PM 02 03 Management Review
No ratings yet
PM 02 03 Management Review
4 pages
620b6 2. Rkvy Project Proposal For New Project
No ratings yet
620b6 2. Rkvy Project Proposal For New Project
6 pages
LP Death of A Salesman
No ratings yet
LP Death of A Salesman
12 pages
Mathematics 8 Lesson Plan
No ratings yet
Mathematics 8 Lesson Plan
8 pages
Company Law.
No ratings yet
Company Law.
40 pages
Unit 12 Lexis: Commentary
No ratings yet
Unit 12 Lexis: Commentary
5 pages
2024 Preoperative Fasting in Children
No ratings yet
2024 Preoperative Fasting in Children
8 pages
New Unitized Tables Marketing Brief
No ratings yet
New Unitized Tables Marketing Brief
16 pages
3.garrido v. Garrido, A.C. 6373, February 4, 2010
No ratings yet
3.garrido v. Garrido, A.C. 6373, February 4, 2010
8 pages
Internet Society Pulse Platform Presentation Tutorials
No ratings yet
Internet Society Pulse Platform Presentation Tutorials
16 pages
Equity Financing in Cooperatives. Three Case Studies in Dairy Sector
No ratings yet
Equity Financing in Cooperatives. Three Case Studies in Dairy Sector
15 pages
Jealousy, Jealousy - Olivia Rodrigo Song Worksheet
No ratings yet
Jealousy, Jealousy - Olivia Rodrigo Song Worksheet
1 page
Preparing To Take The Solid Edge Certification Exam: Siemens PLM Software
No ratings yet
Preparing To Take The Solid Edge Certification Exam: Siemens PLM Software
1 page
Jenkins BeetleBook Guide
No ratings yet
Jenkins BeetleBook Guide
2 pages
Amoral Politics The Persistent Truth of Machiavellism PDF
No ratings yet
Amoral Politics The Persistent Truth of Machiavellism PDF
2 pages
00.01 Heko Chain Conveyors 2007
No ratings yet
00.01 Heko Chain Conveyors 2007
7 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

Linear Regression Multi

Uploaded by

Linear Regression Multi

Uploaded by

Linear Regression

with Multiple Variables

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

Previously: hθ(x) = θ0 + θ1x

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

Parameters = θ0, θ1, θ2, … θn

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• Instead of max, can also be used standard deviation or

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• Number of iterations varies a lot

CSE 445 Machine Learning ECE@NSU

0 100 200 300 400

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• Might decide that an important feature is the land area

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

Make sure apply the feature scaling.

• Instead of a conventional polynomial you could do

CSE 445 Machine Learning ECE@NSU

• In order to minimize cost function , iterative algorithm

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

If 1D, θ ϵ ℝ, not a vector

CSE 445 Machine Learning ECE@NSU

• How do we minimize this function-

CSE 445 Machine Learning ECE@NSU

• This would give the values of θ which minimize J(θ)

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

• What if (XT X)-1 is non-invertible?

CSE 445 Machine Learning ECE@NSU

• What does it mean for (XT X) to be non-invertible

CSE 445 Machine Learning ECE@NSU

CSE 445 Machine Learning ECE@NSU

You might also like