0% found this document useful (0 votes)
63 views13 pages

Lect 7 - Gradient Descent

Gradient descent is an optimization algorithm used to minimize cost or loss functions in machine learning models. It works by iteratively moving the model's parameters, such as weights in a neural network, in the direction of steepest descent as defined by the negative gradient of the cost function. Each pass of the entire training dataset is called an epoch. The learning rate tunes the step size in each iteration to determine how quickly the parameters are adjusted toward a minimum. The goal is to reduce the error between predicted and expected values on each iteration until convergence is reached.

Uploaded by

Amrin Mulani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views13 pages

Lect 7 - Gradient Descent

Gradient descent is an optimization algorithm used to minimize cost or loss functions in machine learning models. It works by iteratively moving the model's parameters, such as weights in a neural network, in the direction of steepest descent as defined by the negative gradient of the cost function. Each pass of the entire training dataset is called an epoch. The learning rate tunes the step size in each iteration to determine how quickly the parameters are adjusted toward a minimum. The goal is to reduce the error between predicted and expected values on each iteration until convergence is reached.

Uploaded by

Amrin Mulani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Gradient Descent

Terminologies…
• Epoch : An epoch is a term used in machine learning and
indicates the number of passes of the entire training dataset
the machine learning algorithm has completed. Datasets are
usually grouped into batches (especially when the amount of
data is very large)
• Simple words ITERATION
Loss Functions
you are on the top of a hill and need to climb down. How do
you decide where to walk towards?

trekking
Steps:
• Look around to see all the possible paths
• Reject the ones going up. This is because these paths would actually
cost me more energy and make my task even more difficult
• Finally, take the path that I think has the most slope downhill

Deciding to go up the slope will cost us energy and time. Deciding to go down will
benefit us. Therefore, it has a negative cost.
Learning rate

• the learning rate is a tuning parameter in an optimization


algorithm that determines the step size at each iteration while
moving toward a minimum of a loss function
Cost function
• It is a function that measures the performance of a Machine
Learning model for given data. Cost Function quantifies the error
between predicted values and expected values and presents it in the
form of a single real number.
• Minimized - then returned value is usually called cost, loss or error.
• Maximized - then the value it yields is named a reward.
Example of cost function
•ŷ - predicted value,
•x - vector of data used for prediction or
training,
•w - weight.
Let’s Assume that x=0,1,2,3
W=2 ; then output values will be :

Let’s Assume that x=0,1,2,3


W=5 ; random parameter
Then y=0,5,10,15
• Now let’s take a small weight
w = 0.5

The cost will become: take


abs value

Y=[0,0.5,1,1.5] The model gives better results when


Now Lets compute error w=0.5 as cost value is smaller.
What are Optimizers?
• Optimizers are the ones that are used to reduce the loss in the model or
to reduce the error rate made by deep learning models.
Definition
• Gradient descent is an optimization algorithm used to minimize some
function by iteratively moving in the direction of steepest descent as
defined by the negative of the gradient.
• In machine learning, we use gradient descent to update
the parameters of our model.
• Parameters  Coefficients in Linear Regression
• Parameters  Weights in neural networks.
Diagrammatic Representation
import numpy as np for i in range(epochs):
import pandas as pd Y_pred = m*X + c # The current predicted value of
import matplotlib.pyplot as plt Y
data = pd.read_csv('C:/MLCourse/Linear Reg error = Y_pred - Y
dataset.csv') meanerror = np.mean(error)
X = data.iloc[:, 0] D_m = 2 * meanx * meanerror
Y = data.iloc[:, 1] D_c = 2 * meanerror
plt.scatter(X, Y) m = m - L * D_m # Update m
plt.show() c = c - L * D_c # Update c
m=0 cost[i] = (1/(2*n))*pow(meanerror,2)
c=0 print("Cost in Iteration : ",i+1 , " is ", cost[i] )
n = len(X) i=list(range(epochs))
epochs=30 plt.scatter(i, cost, color = "m", marker = "o", s = 25)
L=.0001 plt.xticks(range(0,epochs,5))
cost=[None] * epochs plt.yticks(range(0,340,10))
meanx=np.mean(X) plt.xlabel("Epochs")
plt.ylabel("Cost")
plt.show()
Cost in Iteration : 1 is 118.82464342205141
Cost in Iteration : 2 is 41.699553633502205
Cost in Iteration : 3 is 14.633772281201987
Cost in Iteration : 4 is 5.1354816183458105
Cost in Iteration : 5 is 1.8022127818843883
Cost in Iteration : 6 is 0.632456924699045
Cost in Iteration : 7 is 0.22195035215627046
Cost in Iteration : 8 is 0.07788982442675346
Cost in Iteration : 9 is 0.02733415239169711
Cost in Iteration : 10 is 0.009592471063728715
Cost in Iteration : 11 is 0.0033663198986343475
Cost in Iteration : 12 is 0.0011813545836785449
Cost in Iteration : 13 is 0.00041457695477623344
Cost in Iteration : 14 is 0.00014548896140593372
Cost in Iteration : 15 is 5.105695733235355e-05

You might also like