0% found this document useful (0 votes)

10 views51 pages

Logistic Regression

The document discusses binary classification using logistic regression and neural networks. It explains how logistic regression can be used to classify images into categories like cat vs non-cat by training a model on pixel intensity values as features. The model learns parameters like weights and bias by minimizing a cost function using gradient descent.

Uploaded by

SANJIDA AKTER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views51 pages

Logistic Regression

Uploaded by

SANJIDA AKTER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Neural Networks Basics

CSE 4237 - Soft Computing

Mir Tafseer Nayeem
Faculty Member, CSE AUST
[email protected]

1
Binary Classification

1 (Cat) vs 0 (non-Cat)

Example: Cat vs Non-Cat

● The goal is to train a classifier with an input image represented by a feature vector 𝑥.
● To predict whether the corresponding label 𝑦 is 1 or 0.
● In this case, whether this is a cat image (1) or a non-cat image (0).

2
Binary Classification

● An image is stored in the computer in three separate matrices corresponding to the Red,
Green, and Blue color channels of the image.
● The three matrices have the same size as the image, for example, the resolution of the cat
image is 64 pixels X 64 pixels, the three matrices (RGB) are 64 X 64 each.

Content Credit: Andrew Ng 3

Binary Classification

● The value in a cell represents the pixel intensity which will be used to create a feature
vector of n dimension. In pattern recognition and machine learning, a feature vector
represents an object, in this case, a cat or no cat.

4
Binary Classification

64 Y (0 or 1)

● To create a feature vector, 𝑥, the pixel intensity values will be “unroll” or “reshape” for each
color. The dimension of the input feature vector 𝑥 is 𝑛_𝑥 = 64 𝑥 64 𝑥 3 = 12 288.

5
Logistic Regression
● Logistic regression is a learning algorithm used in a supervised learning
problem when the output 𝑦 are all either zero or one.
● The goal of logistic regression is to minimize the error between its predictions
and training data.
● Given an image represented by a feature vector 𝑥, the algorithm will evaluate
the probability of a cat being in that image.

6
Logistic Regression

bias
Parameters

Content Credit: Andrew Ng 7

Logistic Regression : Role of bias (b)
● The bias value allows the activation function to be shifted to the left or right, to better
fit the data.
● Changes to the weights alter the steepness of the sigmoid curve, whilst the bias offsets
it, shifting the entire curve so it fits better.
● Bias only influences the output values, it doesn’t interact with the actual input data.
That’s why it is called bias.
● You can think of the bias as a measure of how easy it is to get a node to fire.
○ For a node with a large bias, the output will tend to be intrinsically high, with small
positive weights and inputs producing large positive outputs (near to 1).
○ Biases can be also negative, leading to sigmoid outputs near to 0.
○ If the bias is very small (or 0), the output will be decided by the values of weights
and inputs alone.

8
Logistic Regression
● (𝑤𝑇𝑥 + 𝑏) is a linear function like (𝑎𝑥 + 𝑏), but since we are looking for a probability
constraint between [0,1], the sigmoid function is used.
● The function is bounded between [0,1] as shown in the graph below.

Sig(Z)

9
Logistic Regression

Sig(Z)

10
Logistic Regression: Cost Function
● To train the parameters 𝑤 and 𝑏, we need to define a cost function.

Loss (error) function:

● Loss function measures the discrepancy between the prediction (𝑦̂(𝑖)) and the desired output (𝑦(𝑖)).
● In other words, the loss function computes the error for a single training example.

11
Error / Loss Function
Squared Error Function:

● We can see an extra (1/2) in the right side of the equation. Does it matter?
● It is because when you take the derivative of the cost function, that is used in
updating the parameters during gradient descent, that 2 in the power get
cancelled with the (1/2) multiplier.
● These techniques are or somewhat similar are widely used in math in order
"To make the derivations mathematically more convenient".

12
Is squared error function a good choice?
● The squared error function (commonly used function for linear regression) is not very
suitable for logistic regression.
○ In case of logistic regression, the hypothesis / prediction is non-linear (sigmoid function), which
makes the square error function to be non-convex.
○ On the other hand, logarithmic function is a convex function for which there is no local optima,
so gradient descent works well.
● If you are doing binary classification, squared error function generally also penalize
examples that are correctly classified but are still near the decision boundary, thus
creating a "margin."
● Gradient descent waste a lot of time getting predictions very close to {0, 1}

13
Logistic Regression: Cross Entropy Loss

● Cost function
○ The cost function is the average of the loss function of the entire training set. We are
going to find the parameters 𝑤 𝑎𝑛𝑑 𝑏 that minimize the overall cost function.

14
Gradient Descent

We want to find parameters

W, b that minimize J(W, b)

15
Content Credit: Andrew Ng
Gradient Descent
● Our cost function is convex. We want to parameters
● First we initialize w and b to 0,0 or initialize W, b that minimize J(W, b)
them to a random value in the convex function
and then try to improve the values the reach
minimum value.
● In Logistic regression people always use 0,0
instead of random.
● This function is convex, no matter where you
initialize you should get to the global optimal
point or roughly close the global optimal
point.
Global Optimum

16
Gradient Descent
● Gradient starts at the initial point and take a step We want to parameters
in the steepest downhill direction after each W, b that minimize J(W, b)
iteration.
● It will try to reach to the global optimum or
somewhere near to the global optimum.

Global Optimum

17
Gradient Descent // Repeatedly do that until the algorithm converges.
J(w) Learning Rate
Repeat {

- +
Update or change you want to
W make to the parameter w
Ignore b for now to make it a one dimensional
plot rather than a higher dimensional plot. }
● 𝜶 = Learning Rate: How bigger step we
choose at each iteration of gradient descent.
● Definition of a derivative:
○ Slope of a function at a point.

18
Gradient Descent : Actual Update Rule

We want to parameters
W, b that minimize J(W, b)

Partial Derivative

J(w,b)

19
Derivatives : Intuition
● a=2 f (a) = 6
a = 2.001 f (a) = 6.003 If we shift a by
0.001 then f (a)
Slope (derivative) of f (a) at a = 2 is 3 shift by 3 times
0.001.
● a=5 f (a) = 15
a = 5.001 f (a) = 15.003

Slope (derivative) of f (a) at a = 5 is also 3

d f (a)
a =3
da
The slope or "rate of change" at
any point is 2x.

20
Do we actually need Gradient Descent?
● Let's pretend that we only have 1 weight. To find the ideal value of our weight
that will minimize our cost, we need to try a bunch of values for W, let's say
we test 1000 values. That doesn't seem so bad, after all, my computer is
pretty fast.
● It takes about 0.04 seconds to
check 1000 different weight
Cost Winner values for our neural network.

● Since we’ve computed the cost

for a wide range values of W, we
can just pick the one with the
smallest cost.
W
Try all 1000 values 21
Do we actually need Gradient Descent?
● Let's next consider 2 weights for a moment. ● After our 1 million evaluations we’ve found our
To maintain the same precision we now need solution, but it took an agonizing 40 seconds!
to check 1000 times 1000, or one million Searching through three weights would take a
values. This is a lot of work, even for a fast billion evaluations, or 11 hours!
computer.
● Searching through all 9 weights we need for our
simple network would take
W2 1,268,391,679,350,583.5 years. (Over a
quadrillion years). So for that reason, the "just
Try all 1000 values
try everything" or brute force optimization
method is clearly not going to work.

W1
Try all 1000 values 22
A Famous Quote

23
Computation Graph
● Neural Networks are organized in terms of a forward pass or backward pass.
● Forward Pass / Propagation
○ Which we compute the output of the neural network
● Backward Pass / Propagation
○ Which we use to compute gradients / derivatives
● Computation Graph
○ Explains why it is organized in this way.

24
Computation Graph
J (a, b, c) = 3 (a + bc)

3 steps of computation:

1. u = bc
a
2. v=a+u
3. j = 3v v=a+u j = 3v
b
u = bc
c

25
Logistic regression : Forward Propagation

Computing loss of a single training example

Modify the parameters w1, w2 and b in order to minimize the loss

26
Logistic regression : Backward Propagation

27
Rules for derivatives of logarithmic expressions

If you are unsure about your

derivative check this link to
generate the derivation steps.

28
Logistic regression : Backward Propagation

Ignoring the (-) sign for now. log (x) refers to e base
log or the natural
logarithm (ln(x)) in
mathematical analysis,
physics, chemistry,
statistics, economics, and
some engineering fields.

29
Logistic regression : Backward Propagation

Finally, adding the (-) sign.

30
Logistic regression : Backward Propagation

Applying Chain Rule

31
Logistic regression : Backward Propagation

32
Logistic regression : Backward Propagation

Applying Chain Rule

33
Logistic regression : Backward Propagation

Applying
Chain Rule

34
Logistic regression : Backward Propagation

Applying
Chain Rule

35
Logistic regression : Backward Propagation

Applying
Chain Rule

36
Updating the Parameters: w1, w2 and b

This is one step of

Gradient Descent on a
single example.

Learning Rate

37
Logistic regression Gradient descent on m examples
Basic Parameters

x1 Feature

x2 Feature

w1 Weight of the first feature.

w2 Weight of the second feature.

b Logistic Regression parameter (Bias).

m Number of training examples

y(i) Expected output of i

38
Logistic regression Gradient descent on m examples

For the example 39

Logistic regression Gradient descent on m examples
Derivatives: All it turned out as simple arithmetic operations

d(a) - (y/a) + ((1-y) / (1-a))

d(z) a-y

d(w1) x1 * d(z)

d(w2) x2 * d(z)

d(b) d(z)

40
Logistic regression Gradient descent on m examples
J = 0; dw1 = 0; dw2 =0; db = 0; J /= m
w1 = 0; w2 = 0; b=0; dw1 /= m
dw2 /= m
for i = 1 to m db /= m
# Forward pass
z(i) = w1*x1(i) + w2*x2(i) + b # Gradient descent
a(i) = sigmoid(z(i)) w1 = w1 - alpha * dw1
J += (y(i)*log(a(i)) + (1-y(i))*log(1-a(i))) w2 = w2 - alpha * dw2
b = b - alpha * db
# Backward pass
dz(i) = a(i) - y(i) w1, w2, b are the accumulators and single
dw1 += dz(i) * x1(i) instances for the all m training examples.
n=2
dw2 += dz(i) * x2(i)
db += dz(i) One iteration of gradient descent
41
Logistic regression Gradient descent on m examples
● Previous slide is just one step of Gradient Descent, we need to repeat it
multiple times in order to take multiple steps of gradient descent.
● There are weaknesses in the previous implementation. In order to implement
we need to write two for loops.
● Having explicit for loops in your code make your code less efficient.
● Solution:- vectorization techniques
● To train with larger datasets we need to take the help from vectorization
techniques without using for loops.

42
LR Gradient descent on m examples (modified)
J = 0; dw1 = 0; dw2 = 0; db = 0; dw = np.zeros ((nx, 1))
w1 = 0; w2 = 0; b = 0;

for i = 1 to m
# Forward pass
z(i) = w1*x1(i) + w2*x2(i) + b
a(i) = sigmoid(z(i))
J += - (y(i)*log(a(i)) + (1-y(i))*log(1-a(i)))

# Backward pass
dz(i) = a(i) - y(i)
dw1 += dz(i) * x1(i) n=2
dw += x(i) * dz(i)
dw2 += dz(i) * x2(i)
db += dz(i)

43
Logistic regression Gradient descent on m examples
J=J/m
dw1 = dw1 / m
dw2 = dw2 / m dw = dw / m
db = db / m

# Gradient descent
w1 = w1 - alpha * dw1 We have gone from 2 for loops to 1
w2 = w2 - alpha * dw2 for loop, we still have one for loop
b = b - alpha * db that loops over individual training
examples.
w1, w2, b are the accumulators and single
instances for the all m training examples.

44
Vectorizing Logistic Regression (Forward)

1st training example 2nd training example 3rd training example

We need to do it m times if you have m training examples.

[ ]

(1 x m) dimension 45
(1 x m)
Vectorizing Logistic Regression (Forward)

Broadcasting

(1, 1) dimension
(1 x m) dimension

46
Gradient Computation

47
Gradient Computation

.
.

.
..

48
Implementing Logistic Regression
Single Iteration of Gradient Descent

Gradient
Update 49
What does this have to do with the brain?

50
END

Digital SAT Math Practice Questions
61% (31)
Digital SAT Math Practice Questions
29 pages
A Course in Symbolic Logic
100% (3)
A Course in Symbolic Logic
371 pages
W2 Ann
No ratings yet
W2 Ann
12 pages
Ch03 LogisticRegression
No ratings yet
Ch03 LogisticRegression
79 pages
DeepLearning Introduction
No ratings yet
DeepLearning Introduction
14 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Logisticregression 2021
No ratings yet
Logisticregression 2021
78 pages
Neural Network
No ratings yet
Neural Network
14 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Neural Networks Skimmed - Ipynb - Colab
No ratings yet
Neural Networks Skimmed - Ipynb - Colab
8 pages
Cours 1
No ratings yet
Cours 1
42 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Lecture 1, Part 3: Training A Classifier: Roger Grosse
No ratings yet
Lecture 1, Part 3: Training A Classifier: Roger Grosse
11 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Deep Learning (Part 8) - Coursesteach
No ratings yet
Deep Learning (Part 8) - Coursesteach
16 pages
Lecture3 - Linear Regression and Logistic Regression
No ratings yet
Lecture3 - Linear Regression and Logistic Regression
60 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
351 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
Linearity: Skip To Content
No ratings yet
Linearity: Skip To Content
10 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Fileml
No ratings yet
Fileml
54 pages
Final ML
No ratings yet
Final ML
54 pages
M146 Lec3 Sidenotes S25
No ratings yet
M146 Lec3 Sidenotes S25
29 pages
PDF 1678529419
No ratings yet
PDF 1678529419
100 pages
Artificial Neural Networks and Deep Learning
No ratings yet
Artificial Neural Networks and Deep Learning
55 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
Week 7
No ratings yet
Week 7
53 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
ML Notes
No ratings yet
ML Notes
14 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Computation Graph - 1
No ratings yet
Computation Graph - 1
65 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
9 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
RNN, LSTM, Gru
No ratings yet
RNN, LSTM, Gru
36 pages
3.4. Sharpening Spatial Filtering
No ratings yet
3.4. Sharpening Spatial Filtering
45 pages
3.3. Smoothing Spatial Filtering
No ratings yet
3.3. Smoothing Spatial Filtering
60 pages
8.1. Image Compression
No ratings yet
8.1. Image Compression
121 pages
CSE 4237 SoftCom Solutions
No ratings yet
CSE 4237 SoftCom Solutions
115 pages
1.1. Introduction To DIP
No ratings yet
1.1. Introduction To DIP
61 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
17 pages
Hyperparameters and Parameters
No ratings yet
Hyperparameters and Parameters
8 pages
TSP Using GA
No ratings yet
TSP Using GA
10 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Parameter Calculation
No ratings yet
Parameter Calculation
10 pages
Lecture 1-2
No ratings yet
Lecture 1-2
47 pages
Perceptron
No ratings yet
Perceptron
26 pages
Clustering Part-2
No ratings yet
Clustering Part-2
49 pages
Optimizer
No ratings yet
Optimizer
2 pages
Clustering Part-1
No ratings yet
Clustering Part-1
48 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
Lecture 11 - Large Scale Propagation Model
No ratings yet
Lecture 11 - Large Scale Propagation Model
20 pages
Lecture 12 - Small Scale Fading
No ratings yet
Lecture 12 - Small Scale Fading
32 pages
Lecture 5 - Handoff
No ratings yet
Lecture 5 - Handoff
25 pages
Lecture 5 - Handoff
No ratings yet
Lecture 5 - Handoff
25 pages
Lecture 7 - Trunking
No ratings yet
Lecture 7 - Trunking
25 pages
Eton College King S 13 Plus Maths Scholarship A 2022
No ratings yet
Eton College King S 13 Plus Maths Scholarship A 2022
21 pages
Cambridge IGCSE™: Additional Mathematics 0606/11 May/June 2021
No ratings yet
Cambridge IGCSE™: Additional Mathematics 0606/11 May/June 2021
9 pages
8 - S.Y.B.Sc. Mathematics
No ratings yet
8 - S.Y.B.Sc. Mathematics
8 pages
Geometry Set 2 - MS
No ratings yet
Geometry Set 2 - MS
10 pages
Zak New Errata
No ratings yet
Zak New Errata
1 page
Simplex (Maximization Prob)
No ratings yet
Simplex (Maximization Prob)
11 pages
Skewness N Kurtosis
No ratings yet
Skewness N Kurtosis
4 pages
Assignment7 (Questions)
No ratings yet
Assignment7 (Questions)
3 pages
02 FT PbmSheet
No ratings yet
02 FT PbmSheet
2 pages
EureKalc Manual en
No ratings yet
EureKalc Manual en
51 pages
The CMA Evolution Strategy A Comparing R
No ratings yet
The CMA Evolution Strategy A Comparing R
39 pages
Lecture # 1 (Ex.1.1 To Ex.1.3)
No ratings yet
Lecture # 1 (Ex.1.1 To Ex.1.3)
6 pages
Linear Equation in Two Variables
No ratings yet
Linear Equation in Two Variables
5 pages
Isu Module Subject: GEC 3-Mathematics in The Modern World
No ratings yet
Isu Module Subject: GEC 3-Mathematics in The Modern World
11 pages
Computer Architecture Detailed Answers
No ratings yet
Computer Architecture Detailed Answers
2 pages
UACFOODS RECRUITMENT - Questions and Answers (Updated Copy)
No ratings yet
UACFOODS RECRUITMENT - Questions and Answers (Updated Copy)
150 pages
1 50 System of Number and Conversion
No ratings yet
1 50 System of Number and Conversion
6 pages
.Arch5.4! The Fundamental Theorem of Calculus
No ratings yet
.Arch5.4! The Fundamental Theorem of Calculus
9 pages
Class Notes IIT Delhi
No ratings yet
Class Notes IIT Delhi
14 pages
Paper 2 Section 1 (50 Marks) Answer ALL Questions in The Spaces Provided
No ratings yet
Paper 2 Section 1 (50 Marks) Answer ALL Questions in The Spaces Provided
5 pages
Polynomials CBSE Class9 SamplePaper
No ratings yet
Polynomials CBSE Class9 SamplePaper
4 pages
Logical Reasoning
No ratings yet
Logical Reasoning
1 page
Gat - B - 1
No ratings yet
Gat - B - 1
2 pages
MSC 3rd Syllabus
No ratings yet
MSC 3rd Syllabus
10 pages
06 Iitkg GNK
No ratings yet
06 Iitkg GNK
40 pages
Ready Reckoner of Functions
No ratings yet
Ready Reckoner of Functions
13 pages
Pythagoras Theorem PixiPPt
No ratings yet
Pythagoras Theorem PixiPPt
17 pages
MAT ST2 2018 Memo
No ratings yet
MAT ST2 2018 Memo
12 pages

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

Neural Networks Basics

CSE 4237 - Soft Computing

Example: Cat vs Non-Cat

Content Credit: Andrew Ng 3

Content Credit: Andrew Ng 7

Loss (error) function:

We want to find parameters

Slope (derivative) of f (a) at a = 5 is also 3

● Since we’ve computed the cost

Computing loss of a single training example

Modify the parameters w1, w2 and b in order to minimize the loss

If you are unsure about your

Finally, adding the (-) sign.

Applying Chain Rule

Applying Chain Rule

This is one step of

w1 Weight of the first feature.

w2 Weight of the second feature.

b Logistic Regression parameter (Bias).

m Number of training examples

y(i) Expected output of i

For the example 39

d(a) - (y/a) + ((1-y) / (1-a))

1st training example 2nd training example 3rd training example

We need to do it m times if you have m training examples.

You might also like