0% found this document useful (0 votes)

104 views11 pages

Feedforward Propagation: 1.1 Visualizing The Data

The document describes the process of implementing a neural network for digit recognition. It discusses feedforward propagation, visualizing the training data, modeling the network with an input, hidden and output layer, and defining the cost function. It then covers implementing backpropagation to compute gradients, randomly initializing parameters, checking gradients, learning parameters using fmincg, and making predictions with the learned parameters. The accuracy of the neural network on the digit recognition task is higher than logistic regression, at around 96.5%.

Uploaded by

Astryiah Faine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views11 pages

Feedforward Propagation: 1.1 Visualizing The Data

Uploaded by

Astryiah Faine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

1.

Feedforward Propagation
We first implement feedforward propagation for neural network with the already given weights.
Then we will implement the backpropagation algorithm to learn the parameters for ourselves.
Here we use the term weights and parameters interchangeably.

1.1 Visualizing the data:

Each training example is a 20 pixel by 20 pixel grayscale image of the digit. Each pixel is
represented by a floating point number indicating the grayscale intensity at that location. The
20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector. Each of these training
examples becomes a single row in our data matrix X. This gives us a 5000 by 400 matrix X
where every row is a training example for a handwritten digit image. The second part of the
training set is a 5000-dimensional vector y that contains labels for the training set.
1.2 Model Representation

Our neural network has 3 layers — an input layer, a hidden layer and an output layer. Do
recall that the inputs will be 20 x 20 grey scale images “unrolled” to form 400 input features
which we will feed into the neural network. So our input layer has 400 neurons. Also the
hidden layer has 25 neurons and the output layer 10 neurons corresponding to 10 digits (or
classes) our model predicts. The +1 in the above figure represents the bias term.
We have been provided with a set of already trained network parameters. These are stored in
ex4weights.mat and will be loaded into theta1 and theta2 followed by unrolling into a vector
nn_params. The parameters have dimensions that are sized for a neural network with 25
units in the second layer and 10 output units (corresponding to the 10 digit classes).
1.3
Fe
edf
or
wa
rd
an
d cost function
First we will implement the cost function followed by gradient for the neural network (for which
we use backpropagation algorithm). Recall that the cost function for the neural network with
regularization is

cost function of neural network with regularization

where h(x(i)) is computed as shown in the Figure 2 and K = 10 is the total number of possible
labels. Note that h(x(i)) = a(3) is the activations of the output units. Also, whereas the original
labels (in the variable y) were 1, 2, …, 10, for the purpose of training a neural network, we
need to recode the labels as vectors containing only values 0 or 1, such that
one-hot encoding
This process is called one-hot encoding. The way we do this is by using the get_dummies
function from the ‘pandas library’.

sigmoid function
def sigmoid(z):
return 1/(1+np.exp(-z))

cost function

def nnCostFunc(nn_params, input_layer_size, hidden_layer_size,

num_labels, X, y, lmbda):
theta1 = np.reshape(nn_params[:hidden_layer_size*(input_layer_size+1)],
(hidden_layer_size, input_layer_size+1), 'F')
theta2 = np.reshape(nn_params[hidden_layer_size*(input_layer_size+1):],
(num_labels, hidden_layer_size+1), 'F')
m = len(y)
ones =
np.ones((m,1))
a1 = np.hstack((ones,
X))
a2 = sigmoid(a1 @
theta1.T)
a2 = np.hstack((ones,
a2))
h = sigmoid(a2 @
theta2.T)
y_d =
pd.get_dummies(y.flatten())
temp1 = np.multiply(y_d,
np.log(h))
temp2 = np.multiply(1-y_d, np.log(1-
h))
temp3 = np.sum(temp1 +
temp2)
sum1 = np.sum(np.sum(np.power(theta1[:,1:],2),
axis = 1))
sum2 = np.sum(np.sum(np.power(theta2[:,1:],2),
axis = 1))
return np.sum(temp3 / (-m)) + (sum1 + sum2) * lmbda / (2*m)

2
Ba
ck
pr
op
ag
ation
In this part of the exercise, you will implement the backpropagation algorithm to compute the
gradients for the neural network. Once you have computed the gradient, you will be able to
train the neural network by minimizing the cost function using an advanced optimizer such as
fmincg.

2.1 Sigmoid gradient

We will first implement the sigmoid gradient function. The gradient for the sigmoid function
can be computed as
2.2 Random initialization
When training neural networks, it is important to randomly initialize the parameters for
symmetry breaking. Here we randomly initialize parameters named initial_theta1 and
initial_theta2 corresponding to hidden layer and output layer and unroll into a single vector as
we did earlier.

def randInitializeWeights(L_in,
L_out):
epsilon =
0.12
return np.random.rand(L_out, L_in+1)
* 2 * epsilon - epsilon
initial_theta1 =
randInitializeWeights(input_layer_size,
hidden_layer_size)
initial_theta2 =
randInitializeWeights(hidden_layer_size,
num_labels)
# unrolling parameters into a single
column vector
nn_initial_params = np.hstack((initial_theta1.ravel(order='F'), initial_theta2.ravel(order='F')))

2.3 Backpropagation
Backpropagation is not so complicated algorithm once you get the hang of it.
I strongly urge you to watch the Andrew’s videos on backprop multiple times.
In summary we do the following by looping through every training example:
1. Compute the forward propagate to get the output activation a3.
2. Calculate the error term d3 that’s obtained by subtracting actual output from our calculated
output a3.
3. For hidden layer, error termd2 can be calculated as below:

4. Accumulate the gradients in delta1 and delta2 .

5. Obtain the gradients for the neural network by diving the accumulated gradients (of step 4)
by m.
6. Add the regularization terms to the gradients.

def nnGrad(nn_params, input_layer_size, hidden_layer_size,

num_labels, X, y, lmbda):
initial_theta1 =
np.reshape(nn_params[:hidden_layer_size*(input_layer_size+1)],
(hidden_layer_size, input_layer_size+1), 'F')
initial_theta2 =
np.reshape(nn_params[hidden_layer_size*(input_layer_size+1):], (num_labels,
hidden_layer_size+1), 'F')
y_d =
pd.get_dummies(y.flatten())
delta1 =
np.zeros(initial_theta1.shape)
delta2 =
np.zeros(initial_theta2.shape)
m = len(y)
for i in
range(X.shape[0]):
ones =
np.ones(1)
a1 = np.hstack((ones,
X[i]))
z2 = a1 @
initial_theta1.T
a2 = np.hstack((ones,
sigmoid(z2)))
z3 = a2 @
initial_theta2.T
a3 =
sigmoid(z3)
d3 = a3 -
y_d.iloc[i,:][np.newaxis,:]
z2 = np.hstack((ones,
z2))
d2 = np.multiply(initial_theta2.T @ d3.T,
sigmoidGrad(z2).T[:,np.newaxis])
delta1 = delta1 + d2[1:,:] @
a1[np.newaxis,:]
delta2 = delta2 + d3.T @
a2[np.newaxis,:]
delta1 /=
m
delta2 /=
m
#print(delta1.shape,
delta2.shape)
delta1[:,1:] = delta1[:,1:] + initial_theta1[:,1:] *
lmbda / m
delta2[:,1:] = delta2[:,1:] + initial_theta2[:,1:] *
lmbda / m
return np.hstack((delta1.ravel(order='F'), delta2.ravel(order='F')))
By the way, the for-loop in the above code can be eliminated if you can use a highly
vectorized implementation. But for those who are new to backprop it is okay to use for-loop to
gain a much better understanding. Running the above function with initial parameters gives
nn_backprop_Params which we will be using while performing gradient checking.

2.
4 Gradient checking
Why do we need Gradient checking ? To make sure that our backprop algorithm has no bugs
in it and works as intended. We can approximate the derivative of our cost function with:

The gradients computed using backprop and numerical approximation should agree to at
least 4 significant digits to make sure that our backprop implementation is bug free.
def checkGradient(nn_initial_params,nn_backprop_Params,input_layer_size,
hidden_layer_size, num_labels,myX,myy,mylambda=0.):
myeps =
0.0001
flattened =
nn_initial_params
flattenedDs =
nn_backprop_Params
n_elems =
len(flattened)
#Pick ten random elements, compute numerical gradient, compare to
respective D's
for i in
range(10):
x=
int(np.random.rand()*n_elems)
epsvec =
np.zeros((n_elems,1))
epsvec[x] =
myeps
cost_high = nnCostFunc(flattened + epsvec.flatten(),input_layer_size, hidden_layer_size,
num_labels,myX,myy,mylambda)
cost_low = nnCostFunc(flattened - epsvec.flatten(),input_layer_size, hidden_layer_size,
num_labels,myX,myy,mylambda)
mygrad = (cost_high - cost_low) /
float(2*myeps)
print("Element: {0}. Numerical Gradient = {1:.9f}. BackProp Gradient =
{2:.9f}.".format(x,mygrad,flattenedDs[x]))

2.
5
Le
ar
ni
ng
pa
rameters using fmincg
After you have successfully implemented the neural network cost function and gradient
computation, the next step is to use fmincg to learn a good set of parameters for the neural
network. theta_opt contains unrolled parameters that we just learnt which we roll to get
theta1_opt and theta2_opt.

theta_opt = opt.fmin_cg(maxiter = 50, f = nnCostFunc, x0 = nn_initial_params, fprime

= nnGrad, \
args = (input_layer_size, hidden_layer_size, num_labels, X, y.flatten(),
lmbda))
theta1_opt = np.reshape(theta_opt[:hidden_layer_size*(input_layer_size+1)],
(hidden_layer_size, input_layer_size+1), 'F')
theta2_opt = np.reshape(theta_opt[hidden_layer_size*(input_layer_size+1):], (num_labels,
hidden_layer_size+1), 'F')

2.6 Prediction using learned parameters

It’s time to see how well our newly learned parameters are performing by calculating the
accuracy of the model. Do recall that when we used linear classifier like Logistic Regression
we got an accuracy of 95.08%. Neural network should give us a better accuracy.

def predict(theta1, theta2, X,

y):
m = len(y)
ones =
np.ones((m,1))
a1 = np.hstack((ones,
X))
a2 = sigmoid(a1 @
theta1.T)
a2 = np.hstack((ones,
a2))
h = sigmoid(a2 @
theta2.T)
return np.argmax(h, axis = 1) + 1
This should give a value of 96.5% (this may vary by about 1% due to the random initialization).
It is to be noted that by tweaking the hyperparameters we can still obtain a better accuracy.

Secondary Data Analysis
100% (1)
Secondary Data Analysis
257 pages
Psychological Association of The Philippines (PAP) 55th Convention 2018 Abstracts
No ratings yet
Psychological Association of The Philippines (PAP) 55th Convention 2018 Abstracts
131 pages
Maintenance Culture As A Means of Sustainanbility in Building1
100% (1)
Maintenance Culture As A Means of Sustainanbility in Building1
66 pages
Afsha Manual
No ratings yet
Afsha Manual
99 pages
Hero Marketing
No ratings yet
Hero Marketing
76 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
A Gentle Introduction To Neural Networks With Python
100% (1)
A Gentle Introduction To Neural Networks With Python
85 pages
A Gentle Introduction To Neural Networks With Python
No ratings yet
A Gentle Introduction To Neural Networks With Python
85 pages
Study Guide Ch04 TB Student
100% (2)
Study Guide Ch04 TB Student
7 pages
Dissertation Acknowledgement Template
100% (2)
Dissertation Acknowledgement Template
9 pages
Fidic - White Book Intro
0% (1)
Fidic - White Book Intro
8 pages
(Higher Education Dynamics 13) - Transforming Higher Education - A Comparative Study - Professor Ivar Bleiklie, Maurice Kogan (Auth.)
No ratings yet
(Higher Education Dynamics 13) - Transforming Higher Education - A Comparative Study - Professor Ivar Bleiklie, Maurice Kogan (Auth.)
206 pages
JSREP Volume 39 Issue 188ج1 Pages 360-406
No ratings yet
JSREP Volume 39 Issue 188ج1 Pages 360-406
48 pages
Question Example
No ratings yet
Question Example
10 pages
NBC News/WSJ/Marist Florida Poll
100% (1)
NBC News/WSJ/Marist Florida Poll
5 pages
Geddes, Barbara - Paradigms and Castles
No ratings yet
Geddes, Barbara - Paradigms and Castles
108 pages
Assessment Tool For Evaluating Nursing Standard in Nursing Services and Nursing Education
100% (2)
Assessment Tool For Evaluating Nursing Standard in Nursing Services and Nursing Education
11 pages
The Digital World Connection and Reading Practices of College Students
No ratings yet
The Digital World Connection and Reading Practices of College Students
12 pages
Irrational Labs - Designing Experiments
No ratings yet
Irrational Labs - Designing Experiments
75 pages
International Classification of Functioning, Disability and Health
No ratings yet
International Classification of Functioning, Disability and Health
33 pages
Comparison of Vehicle-Based Crash Severity Metrics For Predicting
No ratings yet
Comparison of Vehicle-Based Crash Severity Metrics For Predicting
23 pages
SWOT Analysis: S W O T
No ratings yet
SWOT Analysis: S W O T
14 pages
Organizational Leadership
100% (1)
Organizational Leadership
2 pages
Leadership, Management, Bioethics & Research
100% (1)
Leadership, Management, Bioethics & Research
8 pages
Lesson Plan Template: Different Ways To Assess Student Learning and How To Do Learning Centers
No ratings yet
Lesson Plan Template: Different Ways To Assess Student Learning and How To Do Learning Centers
4 pages
A General Methodology For Modeling Loss Given Default
No ratings yet
A General Methodology For Modeling Loss Given Default
4 pages
Grid Organization Development
No ratings yet
Grid Organization Development
4 pages
Back-Propagation Is Very Simple. Who Made It Complicated
No ratings yet
Back-Propagation Is Very Simple. Who Made It Complicated
26 pages
Lesson 6 - Sampling Design and Measurement (Rev)
No ratings yet
Lesson 6 - Sampling Design and Measurement (Rev)
13 pages
Chi-Square Test Lecture
No ratings yet
Chi-Square Test Lecture
6 pages
9 Neural Networks Learning
No ratings yet
9 Neural Networks Learning
38 pages
Bpo & Kpo Sector Marketing)
100% (1)
Bpo & Kpo Sector Marketing)
25 pages
Argumentative Essay English 10
No ratings yet
Argumentative Essay English 10
2 pages
9 Stage Planner How We Organise Ourselves
88% (8)
9 Stage Planner How We Organise Ourselves
4 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
5 pages
ANN Programs
No ratings yet
ANN Programs
20 pages
Neural Networks: Learning: Cost Function
No ratings yet
Neural Networks: Learning: Cost Function
33 pages
Learning Algorithm
No ratings yet
Learning Algorithm
100 pages
Deep Learning Lab Manual - 23-24
No ratings yet
Deep Learning Lab Manual - 23-24
41 pages
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Seminar Topics
No ratings yet
Seminar Topics
12 pages
ANN Model Calculation Example Ascii
No ratings yet
ANN Model Calculation Example Ascii
3 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
Tutorial On Neural Networks - 18MAR2024
No ratings yet
Tutorial On Neural Networks - 18MAR2024
33 pages
DL Lab Manual
No ratings yet
DL Lab Manual
52 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Supervised Assignment 2
No ratings yet
Supervised Assignment 2
2 pages
Neural Networks: Learning: Introduction To Machine Learning
No ratings yet
Neural Networks: Learning: Introduction To Machine Learning
8 pages
Back Propagation Neural Network in Python
No ratings yet
Back Propagation Neural Network in Python
2 pages
R Deep Neural Network Step by Step
No ratings yet
R Deep Neural Network Step by Step
27 pages
Program 4
No ratings yet
Program 4
4 pages
FE57700
No ratings yet
FE57700
2 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Test 2 Lab 6
No ratings yet
Test 2 Lab 6
8 pages
Day1 06 Simple NN Python
No ratings yet
Day1 06 Simple NN Python
18 pages
Exp - 4 - 5 (Prakash)
No ratings yet
Exp - 4 - 5 (Prakash)
10 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
3rd Ass
No ratings yet
3rd Ass
6 pages
Notebook - Deep Neural Networks
No ratings yet
Notebook - Deep Neural Networks
28 pages
Pr3 ANN WriteUp
No ratings yet
Pr3 ANN WriteUp
8 pages
Lab 8
No ratings yet
Lab 8
10 pages
New Exp
No ratings yet
New Exp
12 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
555610a19 DL Exp4
No ratings yet
555610a19 DL Exp4
11 pages
GK Deeplearning
No ratings yet
GK Deeplearning
15 pages
Da 3 Lab DL 21BCE2687
No ratings yet
Da 3 Lab DL 21BCE2687
15 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Building Your Deep Neural Network - Step by Step v8 PDF
No ratings yet
Building Your Deep Neural Network - Step by Step v8 PDF
44 pages
X OR Problem Using DNN
No ratings yet
X OR Problem Using DNN
3 pages
CO429
No ratings yet
CO429
4 pages
Week 7 - Lab
No ratings yet
Week 7 - Lab
6 pages
AI Expt 07
No ratings yet
AI Expt 07
4 pages
FFNN
No ratings yet
FFNN
3 pages
Machine Learning Part 9
No ratings yet
Machine Learning Part 9
33 pages
Experiment 2.4 DL
No ratings yet
Experiment 2.4 DL
4 pages
ML Assignment-9
No ratings yet
ML Assignment-9
4 pages
Using A Three Layer Deep Neural Network To Solve An Unsupervised Learning Problem
No ratings yet
Using A Three Layer Deep Neural Network To Solve An Unsupervised Learning Problem
13 pages
Lab 12
No ratings yet
Lab 12
6 pages
Experiment 9 1
No ratings yet
Experiment 9 1
3 pages
Python Code PDF
No ratings yet
Python Code PDF
3 pages
Trainina A NN Backpropagation
No ratings yet
Trainina A NN Backpropagation
6 pages
Lab 4
No ratings yet
Lab 4
2 pages
Code For Mean Squared
No ratings yet
Code For Mean Squared
2 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
Python
No ratings yet
Python
3 pages
Experiment 4 NN
No ratings yet
Experiment 4 NN
3 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Feedforward Propagation: 1.1 Visualizing The Data

Uploaded by

Feedforward Propagation: 1.1 Visualizing The Data

Uploaded by

1.

1.1 Visualizing the data:

cost function of neural network with regularization

def nnCostFunc(nn_params, input_layer_size, hidden_layer_size,

2.1 Sigmoid gradient

4. Accumulate the gradients in delta1 and delta2 .

def nnGrad(nn_params, input_layer_size, hidden_layer_size,

theta_opt = opt.fmin_cg(maxiter = 50, f = nnCostFunc, x0 = nn_initial_params, fprime

2.6 Prediction using learned parameters

def predict(theta1, theta2, X,

You might also like