0% found this document useful (0 votes)
56 views5 pages

Gradient Descent

1. Gradient descent is an optimization algorithm that can be used to find the minimum of a loss function. It works by taking iterative steps proportional to the negative gradient of the function at the current point. 2. To find the least squares solution using gradient descent, we start with a random intercept value. We then calculate the slope of the loss function to determine how to update the intercept value. Repeatedly updating the intercept based on the slope leads us to the minimum. 3. Gradient descent is more efficient than simple methods as it concentrates iterations closer to the optimal point, where the slope is smallest. This allows it to quickly hone in on the minimum compared to uniformly spaced points.

Uploaded by

Mark Schoolwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views5 pages

Gradient Descent

1. Gradient descent is an optimization algorithm that can be used to find the minimum of a loss function. It works by taking iterative steps proportional to the negative gradient of the function at the current point. 2. To find the least squares solution using gradient descent, we start with a random intercept value. We then calculate the slope of the loss function to determine how to update the intercept value. Repeatedly updating the intercept based on the slope leads us to the minimum. 3. Gradient descent is more efficient than simple methods as it concentrates iterations closer to the optimal point, where the slope is smallest. This allows it to quickly hone in on the minimum compared to uniformly spaced points.

Uploaded by

Mark Schoolwork
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Gradient Decent

DataScience is all aboutoptimizing

optimize atonofthings
youto do is
decent allows

Whatgradient

Firstletssee howtonormallyfindleastsquares

Findintercept

1 Pick arandom value forthe interceptThis to


decent something improve
givesgradient

upon

2 Evaluatehow welltheline fitsthedatawiththesumof thesquareresiduals

In ML lingo thisis known asthelossfunction

ex

thisistheresidual

g
so f

as

Height

Weight

3 Plugintotheformula

Y
Predicted value intercept slope value

Height O t 0.647 0.64


Predicted

Residual ObservedHeight PredictedHeight

Residual 3 0.64 2.36

4Repeatfor otherdatapoints
all

5 Getthesum oftherisiduals squared

2.36 t 3.08 t 1.82 3.162 2.88 13.282


justforfunletsdotthese on a graph
Now

ois.rs

f ooo ooo

ooo

Intercept

Theinteresting thing aboutthisisthat can keepon plotting otherresiduals

you

Youarethen leftwith a curvethatlooks a lotlikethis

Butisthatthebestwe cando

eat

weknow thatthe whole point is optimisingTherefor wewantthelowestsum ofsquared

residuals

Seebytryingdifferentlineswe canoptimize andgetlowersum of squaredresiduals

Manytimesthe lowestsum of squaredresiduals isnotthatpointThismeansthatyouhaveto

plugnumbersbetweentheexisting pointstofindthemost optimal solution

Thesearetheplugins
je
ooo

ho
ag if
g ga
ooo

tog

is ao B

Thisiswhere theconcentrationshouldbe

Intercept

Thisisineffectivesincethereisanequalconcentration of plugins

Nowletsseehowyou'ddoit withgradientdecent

Whatmakes gradientdecentunique

The

bigdifferencebetween bothmethods isthatgradientdecentwillhave verylittlepoints

farther
away
fromthe mostoptimalpoint andwill then concentratemorepoint thecloser

youareto themost optimalpoint

oooo oooo

E Bo
I

Intercept

Thisallows it tobemuchmoreefficient

FindInterceptwithGradientDecent

1 Justlikebefore weneedtopick arandomvalueforthegradient decent to improve


upon

2 Now we can applythe principle of solvingfortheresidualto gettingtheintercept

Sum ofSquaredResiduals observed value intercept slope


y Xvalue112

plug in everyvalue andplugin for


Since were solvingfor intercept we can simp1

intercept and get a new predicted height

ex

13 Iintercept10.64 ID

15 intercept10.64 3D

15 Iintercept10.64 512

17 intercept10.64 6 5

18 intercept10.64 8D

plugin intercept webasicallygot anequationforthe curve

since we caneasily

3 Create thecurveusing theequationcreatedbefore

0 I 2

Intercept

Youcanthentakethe derivative ofthe function anddeterminetheslope of


any

givenvalueforthe intercept

sonowyou cantakethederivative ofthesumofthesquaredresiduals

d
intercept
13 Iintercept to64 ID

d I
intercept15 intercept to 64.3112

I will

I
intercept15 intercept to64.5112
goitehaownetofankffkoeasoghuiyofthth.se

d
17 interest10.64 615

dintercept

d I
intercept18 intercept to64 8D

d
dintercept 13 Iintercept10.64
1112

Tosolvethis weneedtoapplythechainrule

Step1 Movethesquaretothefront

Step2 Multiplythatbythederivativeofthestuffinsidetheparenthesis

Step3 Remove constantsthatare notterms fortheintercept

4
Step Simplify Combinelitreterms

ex

d I
intercept13 intercept
10.64ID 213 intercept10.64 IDI

d 3

dintercept
intercept10.64 l 213 intercept10.64 ID

d 1 finalequation

dintercept 3411 intercept10.64 1

dintercept l Il intercept I

t dintercept l Il intercept I x

We can nowusethese formulas tofindthemostoptimalsolution

Whatwe'relookingforis aslope of 0

Note If

In
we wereusingleastsquares we wouldjustfindwhereslopeis0 theotherhandgradient
decentwillfind theminimal valuefroman initial
guess

Now we canuse theseequationsbyplugging in forintercept

213 Ot0.64 ID 4.72


Sowheninterceptequals0theslopeis 4.72
NoteTheclosertheinterceptistotheoptimalvaluetheclosertheslopewillbeto0Thismeansthatwhen
theslopeis closeto0thestepswillgetsmallerandsmaller
Gradientdecent determinesthestepsize
bymultiplyingtheslopebythelearningrate
Wecanthencalculate anewinterceptbysubtractingtheoldinterceptbythestepsize

You might also like