Applications of Gradient Descent in TensorFlow
Last Updated :
24 Apr, 2025
To reduce a model’s cost function, machine learning practitioners frequently employ the gradient descent optimization procedure. It entails incrementally changing the model’s parameters in the direction of the cost function’s steepest decline. A free machine learning package called TensorFlow has built-in support for gradient descent optimization. In this article, we will examine the uses of gradient descent in TensorFlow, as well as how to use TensorFlow’s integrated optimizers to achieve gradient descent.
Gradient Descent:
For determining a function’s minimal value, an iterative optimization process called gradient descent is performed. For training machine learning models, it is frequently employed.
The approach works by incrementally changing a model’s parameters in the direction of the cost function’s steepest descent with respect to those parameters. The cost function, which is a mathematical function, calculates the discrepancy between the model’s projected and actual outputs.
Mathematically speaking, the generic update rule for gradient descent is:
θ = θ - α ∇J(θ)
where:
- θ is the parameter vector that has to be optimized.
- The size of the step in each iteration is determined by the learning rate, which is α.
- The cost function J’s gradient vector, ∇J(θ), shows the cost function’s steepest descent in relation to.
Iteratively updating until a minimum of J is attained is the algorithm’s aim. An essential hyperparameter that affects convergence stability and speed is the learning rate. The method may overshoot the minimum and fail to converge if the learning rate is too high. The approach may take a very long time to converge if the learning rate is too low.
The methods used to compute the gradient and update the parameters vary across several types of gradient descent, including batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.
Implementations
We wish to use gradient descent to optimize a straightforward linear regression model. Finding the slope and intercept parameter values that reduce the model’s mean squared error on a certain batch of training data is the aim of the optimization process. This is how we can use TensorFlow’s built-in optimizers to achieve gradient descent:
Step 1: Import the necessary libraries
Python3
import tensorflow as tf
import matplotlib.pyplot as plt
|
Step 2: Generate some random dataset
Python3
tf.random.set_seed( 23 )
x = tf.random.uniform(
shape = ( 100 , 1 ),
minval = 0 ,
maxval = 100 ,
dtype = tf.dtypes.float32,
)
y = 2 * x + tf.random.normal(shape = ( 100 , 1 ),
mean = 50.0 ,
stddev = 20 ,
dtype = tf.dtypes.float32
)
plt.scatter(x,y)
plt.show()
|
Output:

Input Data
Step 3:Define the weight and bias for model
Python3
W = tf.Variable(tf.random.normal([ 1 ]), name = "weight" )
b = tf.Variable(tf.random.normal([ 1 ]), name = "bias" )
print ( 'Weight :' ,W)
print ( 'Bias :' ,b)
|
Output:
Weight : <tf.Variable 'weight:0' shape=(1,) dtype=float32, numpy=array([0.26008585], dtype=float32)>
Bias : <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=array([0.31952116], dtype=float32)>
Step 4: Define the linear Regression
Python3
def linear_regression(x):
return W * x + b
|
Step 5:Define the mean squared error
Python3
def mean_squared_error(y_true, y_pred):
return tf.reduce_mean(tf.square(y_true - y_pred))
|
Step 6: Define Optimizer or gradient descent
An optimization approach called gradient descent is used in machine learning to reduce the discrepancy between a model’s expected and actual output. The model’s weights and biases are iteratively adjusted depending on the gradient of the loss function relative to the parameters.
By computing the gradients of the error with respect to the parameters and traveling in the direction of the negative gradients, gradient descent updates the weight and bias. The learning rate determines the size of the update. The objective is to locate the model’s error surface’s lowest point.
Here we are using a 0.00001 as the learning rate, we define the stochastic gradient descent (SGD) optimizer.
Python3
optimizer = tf.optimizers.SGD(learning_rate = 0.00001 )
|
Step 7: Define the Training Loop
tf.GradientTape() records the automatic differentiation operations. tape.gradient() release the gradient and the optimizer applies this gradient to weight W and bias b. The training loop is described as a function that receives a batch of data, calculates the gradients of the cost function with respect to the model parameters, and modifies the model parameters via the optimizer.
Python3
def train_step(x, y):
with tf.GradientTape() as tape:
y_pred = linear_regression(x)
loss = mean_squared_error(y, y_pred)
gradients = tape.gradient(loss, [W, b])
optimizer.apply_gradients( zip (gradients, [W, b]))
return loss
|
Step 8: Train the model and plot weight, bias, and loss over the iterations
Iteration vs Weight: Each gradient descent iteration updates the model’s weight. Depending on the gradient and learning rate, the weight may go up or down.
Iteration vs Bias: In addition, each gradient descent iteration updates the model’s bias. Depending on the gradient and learning rate, the bias may rise or decrease.
Iteration vs Loss: The discrepancy between the output that was expected and what actually occurred is represented by the loss function. In gradient descent, the loss function is minimized.
Python3
fig1, (ax1, ax2) = plt.subplots( 1 , 2 , figsize = ( 12 , 4 ), dpi = 500 )
fig2, (ax) = plt.subplots( 1 , figsize = ( 7 , 5 ))
for i in range ( 50 ):
loss = train_step(x, y)
ax1.plot(i, W, 'b*' )
ax2.plot(i, b, 'g+' )
ax.plot(i, loss, 'ro' )
ax1.set_title( 'Weight over iterations' )
ax1.set_xlabel( 'iterations' )
ax1.set_ylabel( 'Weight' )
ax2.set_title( 'Bias over iterations' )
ax2.set_xlabel( 'iterations' )
ax2.set_ylabel( 'Bias' )
ax.set_title( 'Losses over iterations' )
ax.set_xlabel( 'iterations' )
ax.set_ylabel( 'Losses' )
plt.show()
|
Output:

Loss optimization
Step 9: Plot the regression line with input data
Python3
print ( 'Weight :' ,W)
print ( 'Bias :' ,b)
plt.scatter(x, y)
plt.plot(x, W * x + b, color = 'red' )
plt.title( 'Regression Line' )
plt.xlabel( 'Input' )
plt.ylabel( 'Target' )
plt.show()
|
Output:
Weight : <tf.Variable 'weight:0' shape=(1,) dtype=float32, numpy=array([2.6064723], dtype=float32)>
Bias : <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=array([0.36663133], dtype=float32)>

Regression Line
The training data are represented by the blue dots in this image, the optimal linear regression model by the red line, and the actual linear regression model by the green line (which is unknown to the model). It is evident that the optimized model comes quite near to the genuine model.
Full Code:
Python3
import tensorflow as tf
import matplotlib.pyplot as plt
tf.random.set_seed( 23 )
x = tf.random.uniform(
shape = ( 100 , 1 ),
minval = 0 ,
maxval = 100 ,
dtype = tf.dtypes.float32,
)
y = 2 * x + tf.random.normal(shape = ( 100 , 1 ),
mean = 50.0 ,
stddev = 20 ,
dtype = tf.dtypes.float32
)
W = tf.Variable(tf.random.normal([ 1 ]), name = "weight" )
b = tf.Variable(tf.random.normal([ 1 ]), name = "bias" )
def linear_regression(x):
return W * x + b
def mean_squared_error(y_true, y_pred):
return tf.reduce_mean(tf.square(y_true - y_pred))
optimizer = tf.optimizers.SGD(learning_rate = 0.00001 )
def train_step(x, y):
with tf.GradientTape() as tape:
y_pred = linear_regression(x)
loss = mean_squared_error(y, y_pred)
gradients = tape.gradient(loss, [W, b])
optimizer.apply_gradients( zip (gradients, [W, b]))
return loss
fig1, (ax1, ax2) = plt.subplots( 1 , 2 , figsize = ( 12 , 4 ), dpi = 500 )
fig2, (ax) = plt.subplots( 1 , figsize = ( 7 , 5 ))
for i in range ( 50 ):
loss = train_step(x, y)
ax1.plot(i, W, 'b*' )
ax2.plot(i, b, 'g+' )
ax.plot(i, loss, 'ro' )
ax1.set_title( 'Weight over iterations' )
ax1.set_xlabel( 'iterations' )
ax1.set_ylabel( 'Weight' )
ax2.set_title( 'Bias over iterations' )
ax2.set_xlabel( 'iterations' )
ax2.set_ylabel( 'Bias' )
ax.set_title( 'Losses over iterations' )
ax.set_xlabel( 'iterations' )
ax.set_ylabel( 'Losses' )
plt.show()
print ( 'Weight :' , W)
print ( 'Bias :' , b)
plt.scatter(x, y)
plt.plot(x, W * x + b, color = 'red' )
plt.show()
|
Output:

Loss optimization
Weight : <tf.Variable 'weight:0' shape=(1,) dtype=float32, numpy=array([2.6111314], dtype=float32)>
Bias : <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=array([-0.5400178], dtype=float32)>

Regression Line
Conclusion:
An optimization approach called gradient descent is used in machine learning to reduce the discrepancy between a model’s expected and actual output. Plotting the iteration vs. weight, bias, loss, and accuracy allows us to see how gradient descent operates. We may reduce the loss and raise the model’s accuracy by repeatedly modifying the weights and biases of the model depending on the gradient of the loss function with respect to the parameters.
TensorFlow supports a number of gradient descent optimization variants, including:
- Mini-batch gradient descent: At each iteration, the model parameters are updated using a random portion of the training data.
- Utilizes a moving average of previous gradients to hasten convergence and prevent local maxima.
- Optimizes the learning rate adaptively by taking into account the gradients of the cost function.
Moreover, TensorFlow supports more sophisticated optimization methods like Adam, Adagrad, and RMSprop. These methods integrate momentum, adaptive learning rates, and other elements to enhance the optimization process’s convergence and stability.
Similar Reads
Applying Gradient Clipping in TensorFlow
In deep learning, gradient clipping is an essential technique to prevent gradients from becoming too large during backpropagation, which can lead to unstable training and exploding gradients. This article provides a detailed overview of how to apply gradient clipping in TensorFlow, starting from the
5 min read
Gradient Descent Optimization in Tensorflow
Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function. In other words, gradient descent is an iterative algorithm that helps to find the optimal solution to a given problem. In this blog, we will discuss gr
15+ min read
Gradient Descent Algorithm in R
Gradient Descent is a fundamental optimization algorithm used in machine learning and statistics. It is designed to minimize a function by iteratively moving toward the direction of the steepest descent, as defined by the negative of the gradient. The goal is to find the set of parameters that resul
7 min read
Custom gradients in TensorFlow
Custom gradients in TensorFlow allow you to define your gradient functions for operations, providing flexibility in how gradients are computed for complex or non-standard operations. This can be useful for tasks such as implementing custom loss functions, incorporating domain-specific knowledge into
6 min read
Activation Function in TensorFlow
Activation functions add non-linearity to deep learning models and allow them to learn complex patterns. TensorFlowâs tf.keras.activations module provides a variety of activation functions to use in different scenarios. An activation function is a mathematical transformation applied to the output of
4 min read
Automatic differentiation in TensorFlow
In this post, we'll go over the concepts underlying TensorFlow's automated differentiation and provide helpful, step-by-step instructions and screenshots to demonstrate how to utilize it. Automatic differentiation (AD) is a fundamental technique in machine learning, particularly in frameworks like T
5 min read
Python - tensorflow.GradientTape.gradient()
TensorFlow is open-source Python library designed by Google to develop Machine Learning models and deep learning neural networks. gradient() is used to computes the gradient using operations recorded in context of this tape. Syntax: gradient(target, sources, output_gradients, unconnected_gradients)
2 min read
Graphs and Functions in TensorFlow
TensorFlow is a powerful machine learning library that allows developers to create and train models efficiently. One of the foundational concepts in TensorFlow is its computational graph system, which provides a structured way to define and execute operations. Along with graphs, TensorFlow offers tf
9 min read
Python - tensorflow.GradientTape.batch_jacobian()
TensorFlow is open-source Python library designed by Google to develop Machine Learning models and deep learning neural networks. batch_jacobian() is used to compute and stack the per example jacobian. Syntax: batch_jacobian( target, source, unconnected_gradients, parallel_iterations, experimental_u
2 min read
Architecture of TensorFlow
Prerequisite: Introduction to TensorFlow TensorFlow is an end-to-end open-source platform for machine learning developed by Google with many enthusiastic open-source contributors. TensorFlow is scalable and flexible to run on data centers as well as mobile phones. It can run on single-machine as wel
6 min read