0% found this document useful (0 votes)

36 views16 pages

Deep Learning (Part 8) - Coursesteach

Deep Learning

Uploaded by

adrian Iosif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views16 pages

Deep Learning (Part 8) - Coursesteach

Deep Learning

Uploaded by

adrian Iosif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Deep Learning (Part 8)_

Coursesteach
Coursesteach

Demystifying Gradient Descent in Logistic Regression

Sections

 Introduction
 Understanding the logistic regression
 Introduction of Gradient Descent
 What is a gradient, and how is it used in a gradient descent algorithm?
 Why Is Gradient Descent Necessary for Machine Learning?
 How the Gradient Descent Algorithm Works
 More details about gradient Descent
 Advantages of Gradient Descent in Logistic Regression
 Limitations of Gradient Descent in Logistic Regression
 Basic implementation of Logistic Regression Gradient in Python
Conclusion

1- Introduction
Logistic regression is a popular machine learning algorithm used for binary
classification problems. It is widely used in various domains such as finance,
healthcare, and marketing. One of the key components of logistic regression is
gradient descent, which is an optimization algorithm used to find the optimal
parameters of the model. In this blog post, we will delve into the concept of
gradient descent in logistic regression, exploring its workings, advantages, and
limitations.
You’ve seen the logistic regression model, and you’ve seen the loss function that
measures how well you’re doing on the single training example. You’ve also seen
the cost function that measures how well your parameters W and B are doing on
your entire training set.

2. Understanding Logistic Regression

Before diving into gradient descent, it’s essential to understand the basics of
logistic regression. Logistic regression is a statistical model that predicts the
probability of a binary outcome based on one or more independent variables. It
uses a logistic function (also known as the sigmoid function) to map the output
between 0 and 1, representing the probability of the positive class. The goal of
logistic regression is to find the best-fitting line (or hyperplane) that separates the
two classes.

3- Introduction of Gradient Descent

It’s a powerful tool used to optimize the parameters of a model and minimize its
error or cost function [2].In simple terms, gradient descent is a process of finding
the minimum point of a function by following the steepest descent direction [2].
Gradient descent is an iterative optimization algorithm that is widely used in
machine learning. At a high level, gradient descent is a method for finding the
minimum value of a function by iteratively adjusting the function’s parameters
based on the gradient (i.e., the direction of the steepest descent) [2].

Gradient descent is an optimization algorithm used to minimize the cost function

(also known as the loss function) in machine learning models. In logistic
regression, the cost function is based on the difference between the predicted
probabilities and the actual labels. The aim is to minimize this difference and find
the best parameters that maximize the model’s accuracy.
By using gradient descent to minimize the cost function of a machine learning
model, we can find the best set of model parameters for accurate predictions.
This means that it helps us find the best values for our model’s parameters so that
our model can make accurate predictions [2].

It’s used in a wide range of machine-learning algorithms, Such as [2]

 Linear regression,
 Logistic regression,
 Neural networks.

4- What is a gradient, and how is it used in a gradient

descent algorithm?
A gradient is a derivative of a function with more than one input variable in
machine learning. In other words, it assesses how sensitive a function’s output is
to input changes [2].This information is critical for the gradient descent
algorithm, which uses the gradient to update the parameters in the opposite
direction of the gradient towards the cost function’s minimum point [2].

The gradient points toward the steepest ascent, so we simply negate the gradient
vector to find the direction of the steepest descent. The gradient vector’s
magnitude tells us the cost function’s slope in that direction, which is useful for
determining the step size of the gradient descent algorithm [2].

If the slope is steep, we take a larger step toward the minimum point, and if the
slope is shallow, we take a smaller step to avoid overshooting the minimum point
[2].
It is worth noting that the gradient from X0 to X1 is substantially longer than the
one from X3 to X4. This is due to a decrease in the steepness/slope of the hill,
which defines the length of the vector.

This precisely illustrates the hill example since the hill becomes less steep as one
climbs higher. As a result, a lower gradient corresponds to a lower slope and a
smaller step size for the hill climber.

5- Why Is Gradient Descent Necessary for Machine

Learning?
Gradient descent is necessary for optimizing and fine-tuning the parameters of a
machine-learning model. In fact, it is the backbone of many machine-learning
algorithms. Gradient descent aims to minimize a model’s cost or loss function,
which measures how well the model performs [2].

By reducing the cost function, the model becomes better at making predictions
and can generalize better to new data. The gradient descent process involves
iteratively adjusting the model parameters based on the gradient of the cost
function [2].

This means that the model moves toward the steepest descent toward the
minimum point of the cost function. With gradient descent, finding the optimal
set of parameters for a given model would be much easier, and the model’s
performance would improve as a result [2].

6- How the Gradient Descent Algorithm Works

Now let’s talk about how you can use the gradient descent algorithm to train or to
learn the parameters W on your training set. To recap here (Figure 1) is the
familiar logistic regression algorithm and we have on the second line the cost
function J, which is a function of your parameters W and B. And that’s defined as
the average is one over m times has some of this loss function.

Figure 1

Loss or Cost function: And so the loss function measures how well your
algorithms outputs Y hat I on each of the training examples stacks up compares to
the ground true labels Y (it) on each of the training examples. The full formula is
expanded out on the right [1].

A cost function (also known as a loss function) in machine learning is a function

that evaluates the difference between the expected and actual output. A cost
function, in other terms, is a mathematical function that measures how well your
model fits your data [2]. The learning algorithm aims to minimize this cost
function, representing how well the model performs on training data [2]. First,
we define an objective function, typically represented as f(x), that we want to
minimize. In machine learning, the loss or cost function often measures the
difference between the model’s predictions and the actual target values.
So the cost function measures how well your parameters w and b are doing on the
training set. So in order to learn a set of parameters w and b, it seems natural that
we want to find w and b. That make the cost function J of w, b as small as possible.

So, The Figure 1 show an illustration of gradient descent. In this diagram, the
horizontal axes represent your space of parameters w and b in practice w can be
much higher dimensional, but for the purposes of plotting, let’s illustrate w as a
singular number and b as a singular number. The cost function J of w, b is then
some surface above these horizontal axes w and b. So the height of the surface
represents the value of J, b at a certain point. And what we want to do really is to
find the value of w and b that corresponds to the minimum of the cost function J.

Convex and Non-Convex Function: It turns out that this particular cost
function J is a convex function. So it’s just a single big bowl, so this is a convex
function and this is as opposed to functions that look like this, which are
nonconvex and have lots of different local optimal. So the fact that our cost
function J of w, b as defined here is convex, is one of the huge reasons why we use
this particular cost function J for logistic regression.

It’s important to remember that gradient descent can converge to a local minimum
instead of a global minimum, especially when the cost function is non-convex.
Various techniques, such as using different initialization values or optimizing the
learning rate, can help overcome this issue [2].

7- The Steps of performing gradient Descent

Initialize the parameters:

So to find a good value for the parameters, what we’ll do is initialize w and b to
some initial value may be denoted by that little red dot. And for logistic regression,
almost any initialization method works. Usually, you Initialize the values of 0.
Random initialization also works, but people don’t usually do that for logistic
regression. But because this function is convex, no matter where you initialize, you
should get to the same point or roughly the same point.

And what gradient descent does is it starts at that initial point and then takes a step
in the steepest downhill direction. So after one step of gradient descent, you might
end up there because it’s trying to take a step downhill in the direction of steepest
descent or as quickly down who as possible. So that’s one iteration of gradient
descent. And after iterations of gradient descent, you might stop there, three
iterations and so on. I guess this is not hidden by the back of the plot until
eventually, hopefully you converge to this global optimum or get to something
close to the global optimum. So this picture illustrates the gradient descent
algorithm.

8- More details about gradient Descent

Figure 2

Let’s write a little bit more of the details for the purpose of illustration in Figrue 2,
let’s say that there’s some function J of w that you want to minimize and maybe
that function looks like this to make this easier to draw. I’m going to ignore b for
now just to make this one-dimensional plot instead of a higher-dimensional plot.
So gradient descent does this. We’re going to repeatedly carry out the following
update.

We’ll take the value of w and update it. Going to use colon equals to represent
updating w. So set w to w minus alpha times and this is a derivative d of J w d w.
And we repeatedly do that until the algorithm converges.

Learning rate: So a couple of points in the notation alpha here is the learning
rate and controls how big a step we take on each iteration are gradient descent [1],
The learning rate is a hyperparameter that determines how big of steps we take
along our cost function while trying to reach its minimum value. In other words, it
controls how much the parameters are adjusted in the direction of the negative
gradient. The learning rate is a critical parameter that can significantly impact
the convergence and accuracy of the model [2]. If the learning rate is too small,
the algorithm takes small steps toward the minimum point, which can lead to
slow convergence and the possibility of getting stuck in a local minimum. On the
other hand, if the learning rate is too large, the algorithm takes large steps
toward the minimum point, which can cause overshooting and oscillation around
the minimum point. Therefore, choosing an appropriate learning rate is crucial
for the success of the gradient descent algorithm. Typically, the learning rate is
determined through trial and error or using optimization techniques such as grid
search or random search [2]. The learning rate controls how aggressively the
algorithm moves towards the minimum, with smaller values leading to slower
convergence and larger values potentially causing overshooting or instability
[2].

Derivative:- here in Figure 2, this is a derivative. This is basically the update of

the change you want to make to the parameters w, when we start to write code to
implement gradient descent, we’re going to use the convention that the variable
name in our code, d w will be used to represent this derivative term. So when you
write code, you write something like w equals or cold equals w minus alpha time’s
d w. So we use d w to be the variable name to represent this derivative term. Now,
let’s just make sure that this gradient descent update makes sense. Let’s say that w
was over here. So you’re at this point on the cost function J of w. Remember that
the definition of a derivative is the slope of a function at the point. So the
slope of the function is really, the height divided by the width right of the lower
triangle.

Gradient calculation- The gradient of the function, ∇f(x), is a vector that points
in the direction of the steepest increase of the function at point x. It’s calculated by
taking the partial derivatives of the function with respect to each variable (i.e.,
the model’s parameters). The gradient gives us the direction we should move in to
minimize the function [2]. Here, in Figure 2 this tension to J of w at that point.
And so here the derivative is positive. W gets updated as w minus a learning rate
times the derivative, the derivative is positive. And so you end up subtracting from
w. So you end up taking a step to the left and so gradient descent with, make your
algorithm slowly decrease the parameter.

To perform the gradient descent, we iteratively update the variables

(parameters) according to the following rule:

x_new = x_old — α * ∇f(x_old)

Here,

 x_old represents the current values of the variables,

 α is the learning rate (a hyperparameter that determines the step size of
each update),
 x_new are the updated values of the variables

If you had started off with this large value of w. As another example, if w was over
here, then at this point the slope here or dJ detail, you will be negative. And so they
driven to send update with subtract alpha times a negative number. And so end up
slowly increasing w. So you end up you’re making w bigger and bigger with
successive generations of gradient descent. So that hopefully whether you initialize
on the left, wonder right, create into central move you towards this global
minimum here.

Repeat steps- But the overall intuition for now is that this term represents the
slope of the function and we want to know the slope of the function at the current
setting of the parameters so that we can take these steps of steepest descent so that
we know what direction to step in in order to go downhill on the cost function J. So
we wrote our gradient descent for J of w. If only w was your parameter in logistic
regression. Your cost function is a function above w and b. In that case the inner
loop of gradient descent, that is this thing here the thing you have to repeat
becomes as follows. You end up updating w as w minus the learning rate times the
derivative of J of wb respect to w and you update b as b minus the learning rate
times the derivative of the cost function respect to b. So these two equations at the
bottom of the actual update you implement as in the side,

Partial derivative:- this symbol (Figure 2), this is actually just the lower case d
in a fancy font, in a stylized font. But when you see this expression, all this means
is this is the of J of w, b or really the slope of the function J of w, b how much that
function slopes in the w direction. And the rule of the notation and calculus, which
I think is in total logical. But the rule in the notation for calculus, which I think just
makes things much more complicated than you need to be is that if J is a function
of two or more variables, then instead of using lower case d. You use this funny
symbol. This is called a partial derivative symbol, but don’t worry about this. And if
J is a function of only one variable, then you use lower case d. So the only
difference between whether you use this funny partial derivative symbol or
lowercase d. As we did on top is whether J is a function of two or more variables. In
which case use this symbol, the partial derivative symbol or J is only a function of
one variable. Then you use lower case d. This is one of those funny rules of
notation and calculus that I think just make things more complicated than they
need to be. But if you see this partial derivative symbol, all it means is you’re
measuring the slope of the function with respect to one of the variables, and
similarly to adhere to the, formally correct mathematical notation calculus because
here J has two inputs. Not just one. This thing on the bottom should be written
with this partial derivative simple, but it really means the same thing as, almost the
same thing as lowercase d. Finally, when you implement this in code, we’re going
to use the convention that this quantity really the amount I wish you update w will
denote as the variable d w in your code. And this quantity, right, the amount by
which you want to update b with the note by the variable db in your code. All right.
So that’s how you can implement gradient descent.

9. Advantages of Gradient Descent in Logistic

Regression
Gradient descent offers several advantages when used in logistic regression:

Flexibility: Gradient descent can be applied to various optimization problems

beyond logistic regression.
Efficiency: It allows us to efficiently optimize complex models with a large number
of parameters.
Convergence: With careful tuning of hyperparameters, gradient descent can
converge to an optimal solution.
Scalability: By using mini-batch or stochastic gradient descent, it can handle large
datasets efficiently.

10. Limitations of Gradient Descent in Logistic

Regression
While gradient descent is a powerful optimization algorithm, it has certain
limitations:

Local Minima: Gradient descent can get stuck in local minima instead of reaching
the global minimum.
Sensitivity to Initial Parameters: The choice of initial parameters can affect
convergence and model performance.
Learning Rate Selection: Selecting an appropriate learning rate can be challenging
and may require experimentation.
Feature Scaling: Gradient descent can be sensitive to feature scaling, requiring
preprocessing techniques like normalization or standardization.
11- Basic implementation of Logistic Regression
Gradient in Python
1-Import the necessary libraries.

import numpy as np
import pandas as pd

2- Load the training data.

data = pd.read_csv("train.csv")

3- Split the data into features and target.

X = data.drop("target", axis=1)
y = data["target"]

4- Initialize the model parameters.

theta = np.random.randn(X.shape[1])

5- Define the sigmoid function.

def sigmoid(z):
return 1 / (1 + np.exp(-z))

6- Define the loss function.

def loss(theta, X, y):
h = sigmoid(X.dot(theta))
return -np.sum(y * np.log(h) + (1 - y) * np.log(1 - h))

7-Define the gradient descent algorithm.

def gradient_descent(theta, X, y, alpha, num_iterations):

for i in range(num_iterations):
h = sigmoid(X.dot(theta))
gradient = X.T.dot(y - h)
theta -= alpha * gradient
return theta

8- Train the model.

theta = gradient_descent(theta, X, y, alpha=0.01, num_iterations=1000)

9- Make predictions.

def predict(theta, X):

h = sigmoid(X.dot(theta))
y_pred = np.where(h >= 0.5, 1, 0)
return y_pred

y_pred = predict(theta, X)

10-Evaluate the model.

accuracy = np.mean(y_pred == y)
print("Accuracy:", accuracy)

Conclusion
Gradient descent plays a vital role in optimizing logistic regression models by
iteratively updating the parameters in the direction of steepest descent. It offers
flexibility, efficiency, and scalability while providing solutions to binary
classification problems. However, it is important to be aware of its limitations and
consider techniques like feature scaling and careful selection of hyperparameters
for optimal performance.

Please Follow and 👏 Clap for the story courses teach to

see latest updates on this story
If you want to learn more about these topics: Python, Machine Learning Data
Science, Statistic For Machine learning, Linear Algebra for Machine learning
Computer Vision and Research

Then Login and Enroll in Coursesteach to get fantastic content in the data field.

Stay tuned for our upcoming articles becsue we reach end to end ,where we will
explore specific topics related to Deep Learning in more detail!

Remember, learning is a continuous process. So keep learning and keep creating

and Sharing with others!

Note: if you are a Deep Learning export and have some good suggestions to
improve this blog to share, you write comments and contribute.

For More update about Deep Learning Please follow and enroll:

Course: Neural Networks and Deep Learning

📚GitHub Repository

📝Notebook

Do you want to get into data science and AI and need help figuring out how? I can
offer you research supervision and long-term career mentoring.
Skype: themushtaq48, email: [email protected]

References
1- Neural Networks and Deep Learning

2– HOW THE GRADIENT DESCENT ALGORITHM WORKS

Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
Lecture3 - Linear Regression and Logistic Regression
No ratings yet
Lecture3 - Linear Regression and Logistic Regression
60 pages
Logistic Regression
No ratings yet
Logistic Regression
51 pages
Gradient Descend
No ratings yet
Gradient Descend
64 pages
Lec 3
No ratings yet
Lec 3
22 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
Materials For Teaching Vocabulary and Grammar
100% (1)
Materials For Teaching Vocabulary and Grammar
43 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
04gradient Descent
No ratings yet
04gradient Descent
21 pages
ML Lecture # 03 Gradient Descent
No ratings yet
ML Lecture # 03 Gradient Descent
23 pages
Upload Unit 2
No ratings yet
Upload Unit 2
19 pages
Gradient Descent
No ratings yet
Gradient Descent
7 pages
Module 3
No ratings yet
Module 3
27 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
Gradient Descent Omar Abd Almajeed 20211187
No ratings yet
Gradient Descent Omar Abd Almajeed 20211187
4 pages
What Is Gradient Descent - Built in
No ratings yet
What Is Gradient Descent - Built in
11 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
cs188 Fa22 Note21
No ratings yet
cs188 Fa22 Note21
4 pages
GD in LR
No ratings yet
GD in LR
23 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
Introduction To Gradient Descent
No ratings yet
Introduction To Gradient Descent
8 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Cost Function of Logistic Regression
No ratings yet
Cost Function of Logistic Regression
6 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
LInear
No ratings yet
LInear
14 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Piaggio Report
33% (3)
Piaggio Report
66 pages
Soft Skill Training Manual
100% (1)
Soft Skill Training Manual
17 pages
AI33
No ratings yet
AI33
6 pages
MAT6007 - Session8 - Gradient Descent
No ratings yet
MAT6007 - Session8 - Gradient Descent
16 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
UNIT2
No ratings yet
UNIT2
25 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Notice of Meeting-1st MDC Ldip Alignment Meeting & Ws
100% (1)
Notice of Meeting-1st MDC Ldip Alignment Meeting & Ws
5 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Islamic Art
No ratings yet
Islamic Art
34 pages
Summary of 2024 Brochure of Courses
100% (1)
Summary of 2024 Brochure of Courses
34 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
WRO 2022 Future Innovators General Rules
No ratings yet
WRO 2022 Future Innovators General Rules
16 pages
Issue of Duplicate SSC Class Record Book Applicationform
No ratings yet
Issue of Duplicate SSC Class Record Book Applicationform
5 pages
Moon Landing Comprehension
No ratings yet
Moon Landing Comprehension
7 pages
2023 AHA CPR Training 3opportunities 1
No ratings yet
2023 AHA CPR Training 3opportunities 1
1 page
Economic and Political Weekly Vol. 47, No. 10, MARCH 10, 2012
No ratings yet
Economic and Political Weekly Vol. 47, No. 10, MARCH 10, 2012
84 pages
A Spanish Universalist: Maurice Ohana and The Discourse of Primitivism
No ratings yet
A Spanish Universalist: Maurice Ohana and The Discourse of Primitivism
52 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
2025 Teachers Opportunity With Citam
No ratings yet
2025 Teachers Opportunity With Citam
3 pages
22accessing Embodied Imagination22 1
No ratings yet
22accessing Embodied Imagination22 1
225 pages
Haven of Virtue and Excellence Academy Inc.: Program Schedule For Nutrition Month
No ratings yet
Haven of Virtue and Excellence Academy Inc.: Program Schedule For Nutrition Month
3 pages
Assessment Tools
No ratings yet
Assessment Tools
3 pages
Test Results 1st Quarter 2023 2024
No ratings yet
Test Results 1st Quarter 2023 2024
10 pages
Clinical Reasoning Essay
No ratings yet
Clinical Reasoning Essay
4 pages
Java Generics Example Tutorial - Generic Method, Class, Interface
No ratings yet
Java Generics Example Tutorial - Generic Method, Class, Interface
8 pages
Hibernation Lesson Plan
No ratings yet
Hibernation Lesson Plan
3 pages
50 Coding Laws That Would Make You A Decent Programmer
No ratings yet
50 Coding Laws That Would Make You A Decent Programmer
33 pages
Chapter Two: Individual Behavior in Organizations
No ratings yet
Chapter Two: Individual Behavior in Organizations
65 pages
Session 9 - Body Language Basics (Decoding)
No ratings yet
Session 9 - Body Language Basics (Decoding)
16 pages
9 Month Day Wise Study Plan Lyst1731926853840
No ratings yet
9 Month Day Wise Study Plan Lyst1731926853840
29 pages
Mastering NumPy - Part 2 (Array Manipulation)
No ratings yet
Mastering NumPy - Part 2 (Array Manipulation)
12 pages
Curriculum Map 10
No ratings yet
Curriculum Map 10
6 pages
Keeping The Business Environment in Mind: A Case Study For PMP Certification Course
No ratings yet
Keeping The Business Environment in Mind: A Case Study For PMP Certification Course
9 pages
Acko
No ratings yet
Acko
1 page
Ce195 CS02 1Q2021
No ratings yet
Ce195 CS02 1Q2021
1 page
IISER Admission Form 2024
No ratings yet
IISER Admission Form 2024
4 pages
خطة علاجية لمادة اللغة الانجليزية للصف الثامن بعد نتائج الاختبار الوطني
No ratings yet
خطة علاجية لمادة اللغة الانجليزية للصف الثامن بعد نتائج الاختبار الوطني
1 page
Probability Problems
No ratings yet
Probability Problems
3 pages
Senzor Nr. Tip Senzor Lungime (MM) Latime (MM) X Y: Sheet1
No ratings yet
Senzor Nr. Tip Senzor Lungime (MM) Latime (MM) X Y: Sheet1
3 pages
Zaman CV
No ratings yet
Zaman CV
1 page
James Individual Accomplishment Report November
No ratings yet
James Individual Accomplishment Report November
2 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet

Deep Learning (Part 8) - Coursesteach

Uploaded by

Deep Learning (Part 8) - Coursesteach

Uploaded by

Deep Learning (Part 8)_

Demystifying Gradient Descent in Logistic Regression

2. Understanding Logistic Regression

3- Introduction of Gradient Descent

Gradient descent is an optimization algorithm used to minimize the cost function

It’s used in a wide range of machine-learning algorithms, Such as [2]

4- What is a gradient, and how is it used in a gradient

5- Why Is Gradient Descent Necessary for Machine

6- How the Gradient Descent Algorithm Works

A cost function (also known as a loss function) in machine learning is a function

7- The Steps of performing gradient Descent

8- More details about gradient Descent

Derivative:- here in Figure 2, this is a derivative. This is basically the update of

To perform the gradient descent, we iteratively update the variables

x_new = x_old — α * ∇f(x_old)

 x_old represents the current values of the variables,

9. Advantages of Gradient Descent in Logistic

Flexibility: Gradient descent can be applied to various optimization problems

10. Limitations of Gradient Descent in Logistic

2- Load the training data.

3- Split the data into features and target.

4- Initialize the model parameters.

5- Define the sigmoid function.

6- Define the loss function.

7-Define the gradient descent algorithm.

def gradient_descent(theta, X, y, alpha, num_iterations):

8- Train the model.

theta = gradient_descent(theta, X, y, alpha=0.01, num_iterations=1000)

def predict(theta, X):

10-Evaluate the model.

Please Follow and 👏 Clap for the story courses teach to

Remember, learning is a continuous process. So keep learning and keep creating

and Sharing with others!

Course: Neural Networks and Deep Learning

2– HOW THE GRADIENT DESCENT ALGORITHM WORKS

You might also like