Neural Network
Neural Network
Neural Network
CS- 715
MS (Computer Science)
Neural Network
L1.5
How ANN Works
• So all of these 784 neurons in our image make 1 st layer of our
network. The last layer in this case will have 10 neurons that will
map output on them. These 10 neurons (0-9) also have some
activation value, means how network sure that corresponding
image is this digit.
L1.6
How ANN Works
• The way of network operates activation in one layer is
determined the activities of next layer.
• Means according to the brightness of pixel in feeding image, that
pattern of activation cause some very specific pattern to the next
layer, which pass final values to output layer. So the network
choose brightest neuron of that output layer.
• This is helpful to detect edges and patterns in the image.
L1.7
ANN Layers
• Artificial Neural Networks are made up of layers and layers of
connected input units and output units called neurons.
• A single layer neural network is called a perceptron. Multiple
hidden layers may also be present in an artificial neural network.
The input units(receptor), connection weights, summing
function, computation and output units (effectors) are what
makes up an artificial neuron.
1. Input layer
• Number of neuron is equal to number of pixels in referred
image, these pixels value is used for activation only.
2. Convolutional / Hidden layer
• Researchers actual don’t know what these layers do inside. But
we assume in digit recognition NN hidden layers do edge
detection by using pixel values to narrow down our problem.
L1.8
ANN Layers
• And next layers find patterns from these edges for identification.
2. Output layer
• Used to show our final results. The number of neurons in this
layer depends upon classes in our problem.
L1.9
Network parameters
• This can be considered as weighted directed graphs where the
neurons could be compared to the nodes
• The connection between two neurons as weighted edges. The
weight can be positive and negative also.
W1a1+w2a2+w3a3+….wnan
• The activation should be between 0/1, a common function that
does this is sigmoid function also called logistic curve.
Sigmoid function (W1a1+w2a2+w3a3+….wnan
• The processing element of a neuron receives many signals (both
from other neurons and as input signals from the external world).
• Signals are sometimes modified at the receiving synapse and the
weighted inputs are summed at the processing element. If it
crosses the threshold, it goes as input to other neurons (or as
output to the external world) and the process repeats. L1.10
Network parameters
L1.12
Example1
L1.14
-Basics of ANN we need
• The equation for the neural network is a linear combination
of the independent variables and their respective weights
and bias (or the intercept) term for each neuron. The neural
network equation looks like this:
L1.15
- No. of layers and nodes in Example1
• It has the following neural network with an architecture of
[4, 5, 3, 2] and is depicted below:
1. 4 independent variables or the Xs in the input layer, L1
2. 5 neurons in the first hidden layer, L2
3. 3 neurons in the second hidden layer, L3, and
4. 2 in the output layer L4 with two nodes, Q1 and Q2
L1.20
Back Propagation
• To train the network through supervised learning, the
model’s predicted output is compared to the actual output
(that is known to be correct) and the difference between
these two results is measured and is known as the cost or
cost value.
• The purpose of training is to reduce the cost value until the
• model’s prediction closely matches the correct output. This
is achieved by incrementally tweaking the network’s
weights until the lowest possible cost value is obtained.
• This process of training the neural network is called
backpropagation. Rather than navigate left to right like how
data is fed into a neural network, back-propagation is done
in reverse and runs from the output layer from the right
towards the input layer on the left. L1.21
Squashing the Neural Net
• For a binary classification problem, we know that Sigmoid
is needed to transform the linear equation to a nonlinear
equation. For a particular node, the transformation is as
follows:
N1 = W11*X1 + W12*X2 + W13*X3 + W14*X4 + W10
L1.23
Forward propagation
• The process of going from left to right i.e. from the Input layer
to the Output Layer to correct or adjust weight, is Forward
Propagation
• Our binary classification dataset had input X as 4 * 8 matrix
with 4 input variables and 8 records and the Y variable is 2 * 8
matrix with two columns, for class 1 and 0, respectively with 8
records. It had some categorical variables post converting it to
dummy variables.
• The idea here is that we start with the input layer of the 4*8
matrix and want to get the output of (2*8). The hidden layers
and the neurons in each of the hidden layers are hyper
parameters and so are defined by the user. How we achieve the
output is via matrix multiplication between the input
variables and the weights of each layer.
L1.24
Forward propagation
• We have seen above that the weights will have a matrix for
each of the respective layers. We perform matrix multiplication
starting from the input matrix of 4 * 8 with the weight matrix
between the L1 and L2 layer to get the next matrix of the next
layer L3. Similarly, we will do this for each layer and repeat
the process until we reach the output layer of the dimensions 2
* 8.
L1.25
Forward propagation
• Now, let’s break down the steps to understand how the matrix
multiplication in Forward propagation works:
L1.27
Forward propagation
• For matrix multiplication, the number of columns of the first
matrix must be equal to the number of rows of the second
matrix. Our first matrix of input has 8 columns and the second
matrix of weights has 4 rows hence, we can’t multiply the two
matrices.
• Note that for the next layer between L2 and L3, the input this time
will not be X but will be h1, which results from L1 and L2.
Z2 = Wh2 * h1 + bh2
L1.28
Conclusion
1. First, we initialize the weights with some random values,
which are mostly between 0 and 1.
2. Calculate the output which is to predict the values and estimate
the loss.
3. Then, adjust the weights such that the loss will be minimized.
L1.29